### Data Query Language

Query:

SQL code that describes the data you desire and the format in which you want it.

### Six SQL Keywords

SELECT the data I want FROM these database and tables WHERE these criteria are met GROUP (BY) this field HAVING this property ORDER (BY) this field or list;

• SELECT the data I want
• FROM these database and tables
• WHERE these criteria are met, and
• GROUP (BY) this field
• HAVING this property, then
• ORDER (BY) this field or list;

In most SQL interfaces, extra wide spaces are usually ingonred, unless they’re within a string of text and casing quotation marks

No data = NULL, rather than 0

#### Run Order

This diagram shows us that data are actually grouped(GROUP BY) before the SELECT expressions are applied.
That means that when a GROUP BY expression is included in an SQL query, there is no way to use a SELECT statement to summarize daa that cross multiple groups.

#### SQL

Every time we run a line of SQL code in Jupyter, we will need to preface the line with %sql.

Get to Know Our Data !

Before we start querying,

1. Conform how many tables each database has
2. Identify the fields contained in each table of the database.
• SHOW tables: To determine how many tables each database has

SHOW columns FROM databasename.tablename

An alternate way to learn the same information would be to use the DESCRIBE function. The syntax is: DESCRIBE tablename

### Operator

• IN operator
• LIKE operator: allows us to specify a pattern that the textual data we query has to match.

In this syntax, the percent sign indicates a wild card. Wild cards represent unlimited numbers of missing letters. This is how the placement of the percent sign would affect the results of the query:

• WHERE breed LIKE (“s%”) = the breed must start with “s”, but can have any number of letters after the “s”
• WHERE breed LIKE (“%s”) = the breed must end with “s”, but can have any number of letters before the “s”
• WHERE breed LIKE (“%s%”) = the breed must contain an “s” somewhere in its name, but can have any number of letters before or after the “s”

### Clause

• DISCTINCT clause

When the DISCTINCT clause is used with multiple columns in a SELECT statement, the combinaiton of all the columns together is used to determine the uniqueness of a row in a result set.

When we use the DISTINCT clause with the LIMIT clause in a statement, MySQL stops searching when it finds the number of unique rows specified in th LIMIT clause, not when it goes through the number of rows in the LIMIT clause.

• OFFSET

LIMIT … OFFSET…

• WHERE

The expression that follows a WHERE clause has to be applicable to each row of data in a column

• GROUP BY

It’s userful to remember that SQL output is always a table.

How could we construct a valid table that would have columns for aggregate counts and individual columns at the same time?

A RULE:

Every non-aggregated field that is listed in the SELECT list must be listed in the GROUP BY list.

Note:

If we include a GROUP BY clause in our query, but forget to aggregate a column included in our SELECT list, MySQL will output a random row from the un-aggregated column when the query is executed.

e.g:

>

This query would be executed in MySQL, but a random value in the d.dog_guid and d.breed_type would be outputted for each output row tha reports a count of a given breed number.

Difference between SELECT/GROUP BY and COUNT DISTINCT:

COUNT DISTINCT dose NOT count NULL values, while SELECT/GROUP BY clauses roll up NULL values into one group.

If we want to infer the number of disctinct entries from the results of a query using joins and GROUP BY clauses, remember to include an IS NOT NULL clause to ensure we ae not counting NULL valuse.

• HAVING

The expression that follows a HAVING clause has to be applicable or computable using a group of data.

### JOIN

If we did not incldue the WHERE clause to combine two or more tables, sql would request a Caetesian product from datasets, and we could be waiting for our query for days(and will probally get in trouble with our database administrator)

##### Distinguish Between theory and real life:

Database operations, like those that join tables, are based on set theory that assumes there are no duplicate row in your tables. In real life, duplicate rows get enttered all the time, and it can cause you to misinterpret your queries if you do no understand the consequences.

##### Tips:
• Avoid making assumptions about your data or your analyses. For example, rather than assume that all the values in a column are unique just because some documentation says they should be, check for yourself!
• Always look at example outputs of your queries before you strongly interpret aggregate calculations. Take extra care to do this when your queries require joins.
• When your queries require multiple layers of functions or joins, examine the output of each layer or join first before you combine them all together.
• Adopt a healthy skepticsm of all your data and results. If you see something you don’t expect, make sure you explore it before interpreting it strongly or incorporating it into other analyses.

### Function

Function What is Returns
AVG() Column average
COUNT() # of rows in a column
MAX() a column’s highest value
MIN() a column’s lowest value
SUM() sum of all a column’s values

### Three Question need to be thinking about

• What questions will provide actionable insight?
• Will the answers to my questions matter
• Are the answers to my questions correct

### Question

-[ ] 1. Difference

-[ ] 2. would be associated with ….

This query told us that user ‘ce7b75bc-7144-11e5-ba71-058fbc01cf0b’ would be associated with 913,138 rows in the output of the outer join we designed!