Coursera_Managing Big Data with MySQL_Note 3

Subqueries and Derived Tables

Subqueries

Subqueires, which are also sometimes called inner queires or nested queries, are queries that are embedded within the context of another query. The output of a subquery is incorporated into the queries that surround it.

Subqueires can be used in SELECT, WHERE, and FROM clauses.

The Main reasons to use subqueries are:

  • Sometimes they are the most logical way to retrieve the information you want
  • They can be used to isolate each logical part of a statement, which can be helpful for troubleshooting long and complicated queries
  • Sometimes they run faster than joins
Tips:

Subqueires must be enclosed in parentheses.

Queries that include subqueries always run the innermost subquery first, and then run subsequent queries sequentially in order from the innermost query to the outermost query.

Subqueries have a couple of rules that joins don’t:

  • ORDER BY phrase cannot be used in subqueries (although ORDER BY phrase can still be used in outer queires that contain subqueires).
  • Subqueires in SELECT or WHERE clauses that return more than one row must be used in combination with operators that are explicitly designed to handle multiple valuse, such as the IN operator. Otherwise, subqueries in SELECT or WHERE statement can output no more than 1 row.
  • EXISTS and NOT EXISTS

EXISTS and NOT EXISTS perform similar functions to IN and NOT IN, but EXISTS can only be used in subqueries.

What’s more, EXISTS/NOT EXISTS are logical statements. Rather than returning raw data, per se, EXISTS/NOT EXISTS statements return a value of TRUE or FALSE.

As a pratical consequence, EXISTS statements are often writen using an asterisk (*) after the SELECT clause rather than explicit column names.

1
2
3
4
5
SELECT DISTINCT u.user_guid AS uUserID
FROM users u
WHERE EXISTS (SELECT d.user_guid
FROM dogs d
WHERE u.user_guid =d.user_guid);

is the same if we wrote:

1
2
3
4
5
SELECT DISTINCT u.user_guid AS uUserID
FROM users u
WHERE EXISTS (SELECT *
FROM dogs d
WHERE u.user_guid =d.user_guid);

Essentially, both of these queries say give us all the distinct user_guids from the users table that have a value of TRUE in our EXISTS clause.

The results would be equivalent to an inner join with GROUP BY query.

Importtant points:

i.e:

1
2
3
4
5
6
7
SELECT DistinctUUsersID.user_guid AS uUserID, d.user_guid AS dUserID, count(*) AS numrows
FROM (SELECT DISTINCT u.user_guid
FROM users u) AS DistinctUUsersID
LEFT JOIN dogs d
ON DistinctUUsersID.user_guid=d.user_guid
GROUP BY DistinctUUsersID.user_guid
ORDER BY numrows DESC
  • First, we are required to give an alias to any derived table we create in subqueries within FROM statements. Otherwise there would be no way for the database to refer to the multiple columns within the temporary results we create.
  • Second, we need to use this alias every time we want to execute a function that uses the derived table. Remember that the results in which we are interested require a join between the dogs table and the temporary table, not the dogs table and the original users table with duplicates. That means we need to make sure we referenct the temporary table alias in the ON, GROUP BY, and SELECT clauses.
  • Third, relatedly, aliases used within subqueries can refer to tables outside of the subqueries. However, outer queries cannot refer to aliases created within subqueries unless those aliases are explictly part of subquery output.
  • Another thing to take note of is that when you use subqueries in FROM statements, the temporary table you create can have multiple columns in the output (unlike when you use subqueries in outside SELECT statements). But for that same reason, subqueries in FROM statements can be very computationally intensive. Therefore, it’s a good idea to use them sparingly, especially when you have very large data sets.

Useful Logical Functions

If Expressions

IF ([your conditions], [value outputted if conditions are met], [value outputted if conditions are NOT met])

Multiple IF expressions can be nested to result in more than two possible outputs. When we nest IF expressions, it is important to encase each IF expression — as well as the entire IF expression put together — in parentheses.

CASE Expressions

  • Make sure to include the word END at the end of the expression
  • CASE expressions do not require parentheses
  • ELSE expressions are optional
  • If an ELSE expression is omitted, NULL values will be outputed for all rows that do not meet any of the conditions stated explicitly in the expression
  • CASE expressions can be used anywhere in a SQL statememt, including in GROUP BY, HAVING and ORDER BY clauses or the SELECT column list.

CASE Expression syntax

slightly more compact one:

CASE Expression syntax

CASE expression could be used to:

  • rename or revise values in a column
  • standardize or combine several values int oone
  • Break values in a column up into multiple groups that meet specific criteria or that have specific ranges of values.

Order of operations within logical expressions

EVALUATION ORDER

  1. NOT
  2. AND
  3. OR