Skip to content

Commit d17b3ac

Browse files
committedAug 5, 2019
2019 update
1 parent bce2f4c commit d17b3ac

15 files changed

+1971
-344
lines changed
 

‎.gitignore

+11-2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,15 @@ python/.ipynb_checkpoints/*
66

77
.idea/
88

9-
handout.*
9+
handout*
10+
11+
PostgreSQL-Cheat-Sheet.pdf
12+
13+
datafiles/
14+
15+
db_setup.txt
16+
17+
*.tar
18+
19+
netids.txt
1020

11-
PostgreSQL-Cheat-Sheet.pdf

‎PITCHME.md

+2-49
Original file line numberDiff line numberDiff line change
@@ -106,33 +106,7 @@ Tables are linked together with *keys* (relational model of data)
106106

107107
---
108108

109-
# When to Use SQL/Relational Databases
110-
111-
---
112-
113-
![good reason](https://imgs.xkcd.com/comics/algorithms.png)
114-
115-
---
116-
117-
![bad reason](https://s-media-cache-ak0.pinimg.com/736x/8d/91/18/8d9118b4ffae7881453f34a645b66264--web-images-mauve.jpg)
118-
119-
---
120-
121-
## When to Consider
122-
<hr>
123-
124-
Current system is unmanageable
125-
126-
A lot of duplicated information
127-
128-
Different types of *related* observations
129-
130-
Access or analyze a subset of observations
131-
132-
---
133-
134-
## Other Factors
135-
<hr>
109+
## When are Databases Useful?
136110

137111
Multiple people need access to the same, current data
138112

@@ -142,16 +116,6 @@ Enforce constraints about data values, structure, and relationships
142116

143117
Automatically maintain attributes of data like "last modified," "created date"
144118

145-
---
146-
147-
## Maybe look elsewhere when
148-
<hr>
149-
150-
Entire dataset is generated together
151-
152-
Data is static and manageable size
153-
154-
Data that would be in a table is used as a complete unit
155119

156120
---
157121

@@ -171,21 +135,10 @@ Data that would be in a table is used as a complete unit
171135

172136
---
173137

174-
## Relational data conventions
175-
<hr>
176-
177-
**Primary keys** serve as ID to your primary data entities
178-
179-
**Foreign keys** serve as ID in one table that identifies a row in another table (e.g. children -> parents)
180-
181-
**Relationship tables** express many-to-many type relationships (e.g. Netflix account <-> movie data)
182-
183-
184-
---
185138

186139
## Entity-relationship diagram example
187140

188-
<img src="http://www.postgresqltutorial.com/wp-content/uploads/2013/05/PostgreSQL-Sample-Database.png"
141+
<img src="http://www.postgresqltutorial.com/wp-content/uploads/2018/03/dvd-rental-sample-database-diagram.png"
189142
width="500">
190143

191144

‎explore_db.pdf

89.1 KB
Binary file not shown.

‎presentation_assets/joins.png

71.4 KB
Loading

‎r/r_databases.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ R Markdown lets you execute SQL queries directly. You first set up a `DBI` conn
269269

270270
````r
271271
`r ''````{r}
272-
library(RPostgreSQL)
272+
library(RPostgres)
273273
con <- dbConnect(RPostgres::Postgres(), host="localhost", dbname="dvdrental")
274274
```
275275
````

‎r/r_databases.html

+293-39
Large diffs are not rendered by default.

‎readme.md

+40-27
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,42 @@
11
# Databases
22

3-
This workshop uses PostgreSQL. Much of the material applies generically to SQL and other relational database systems, but some of it is specific to PostgreSQL.
3+
There are four databases workshops that cover parts of the material in this repo.
44

5-
This workshop uses the database discussed in, and follows much of the content of, the [PostgreSQL Tutorial](http://www.postgresqltutorial.com/).
5+
The workshops use PostgreSQL. Much of the material applies generically to SQL and other relational database systems, but some of it is specific to PostgreSQL.
66

7-
The workshop starts with the presentation below, then [SQL Part 1](sql/part1.md).
7+
Some of the workshops use the database discussed in, and follow much of the content of, the [PostgreSQL Tutorial](http://www.postgresqltutorial.com/), which makes use of the [Skalia database](https://www.jooq.org/sakila).
88

9-
## Presentation Materials
109

11-
[![GitPitch](https://gitpitch.com/assets/badge.svg)](https://gitpitch.com/nuitrcs/databases_workshop/)
10+
## Selecting and Joining Data
11+
12+
[Intro presentation](https://gitpitch.com/nuitrcs/databases_workshop/)
13+
14+
The workshop follows the files below, which have links to the exercises.
15+
16+
[Part 1](sql/part1.md)
17+
18+
[Part 2](sql/part2.md)
19+
20+
21+
22+
## Exploring Data
23+
24+
See [Exploring Data](sql/exploring.md) for materials and exercises.
25+
26+
This workshop uses data files from the [nycflights13 R package](https://github.com/hadley/nycflights13) and open data from the city of Evanston.
27+
28+
## Updating and Changing Data
29+
30+
Uses materials from [Part 4](sql/part4.md), which are in the process of being updated/expanded.
31+
32+
## Creating and Designing Databases
33+
34+
[Part 3](sql/part3.md) materials are relevant, but materials for this workshop are currently being updated/expanded.
35+
36+
## R and Python
37+
38+
The [R](/r) and [Python](/python) materials may also be of interest to those taking any of the above workshops.
39+
1240

1341
## Note
1442

@@ -22,21 +50,19 @@ See Section 1. Getting started with PostgreSQL, from the [PostgreSQL Tutorial](h
2250

2351
While in-person workshop participants do not need to install PostgreSQL, you will need a terminal program capable of creating an SSH connection to a remote server. On a Mac, the built-in Terminal program will work. On Windows, we suggest [PuTTY](http://www.putty.org/) if you don't already have another program installed.
2452

25-
Typing long commands in a terminal can be tedious. We also recommend you install [DataGrip](https://www.jetbrains.com/datagrip/) for working with databases. It has a free 30 day trial or you can apply for a JetBrains academic license for free.
26-
27-
This repository also includes materials for connecting to a database using Python or R. For Python, you will need to install the `psycopg2` package. For R, you will need the package `RPostgreSQL`.
53+
This repository also includes materials for connecting to a database using Python or R. For Python, you will need to install the `psycopg2` package. For R, you will need the package `RPostgres`.
2854

29-
## Resources
55+
# Resources
3056

31-
### Background
57+
## Background
3258

3359
[Basic Explanation of Relational Databases](http://www.bbc.co.uk/education/guides/ztsvb9q/revision/1): from the BBC, a quick explanation of relational databases
3460

35-
### Software
61+
## Software
3662

3763
[DataGrip Tutorial](https://www.youtube.com/watch?v=Xb9K8IAdZNg): video on how to use the DataGrip program; it even uses the same database we use in this workshop.
3864

39-
### Reference
65+
## Reference
4066

4167
[PostgreSQL cheat sheet](http://www.postgresqltutorial.com/wp-content/uploads/2018/03/PostgreSQL-Cheat-Sheet.pdf): a list of basic commands and patterns for statements
4268

@@ -46,11 +72,11 @@ This repository also includes materials for connecting to a database using Pytho
4672

4773
[psql commands cheat sheet](http://www.postgresonline.com/downloads/special_feature/postgresql83_psql_cheatsheet.pdf): describe commands, other slash commands
4874

49-
### Additional Exercises/Tutorials
75+
## Additional Exercises/Tutorials
5076

5177
_These resources use PostgreSQL or SQL generally._
5278

53-
[Mode SQL Tutorials](https://mode.com/sql-tutorial/)
79+
[Mode SQL Tutorials](https://mode.com/sql-tutorial/): good introduction, and you can try running queries on their platform
5480

5581
[PostgreSQL Exercises](https://pgexercises.com/): interactive, online exercises to practice SQL skills in a PostgreSQL environment.
5682

@@ -70,17 +96,4 @@ _These resources use PostgreSQL or SQL generally._
7096

7197
[SQL for Data Analysis](https://www.udacity.com/course/sql-for-data-analysis--ud198#) from Udacity - an online self-paced course
7298

73-
### SQLite
74-
75-
The workshop uses the PostgreSQL database system, but if you're working on your own projects, [SQLite](https://www.sqlite.org/) may be a good option. SQLite doesn't require running a server and it creates a database in a single, portable file locally on your computer.
76-
77-
[Intro to SQL with SQLite](https://github.com/tthibo/SQL-Tutorial)
78-
79-
[Software Carpentry Databases and SQL](http://swcarpentry.github.io/sql-novice-survey/): introductory workshop using SQLite
80-
81-
[Databases with Python and Pandas](https://www.dataquest.io/blog/python-pandas-databases/): from Data Quest. Examples of using a SQLite database with Python and pandas too.
82-
83-
84-
### Other
8599

86-
[Programming for Biologists](http://www.programmingforbiologists.org/exercises/): includes a section on databases; it uses MS Access instead of PostgreSQL or SQLite, but many of the concepts should be the same.

‎sql/explore.md

+947
Large diffs are not rendered by default.

‎sql/explore_answers.md

+366
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,366 @@
1+
# Exercise Answers
2+
3+
Answers to some of the exercises in [explore.md](explore.md).
4+
5+
6+
## Exploring a database
7+
8+
### WITH
9+
10+
Select all of the flights made by the oldest plane in the data.
11+
12+
Hint: Get the year of the oldest plane with
13+
14+
```sql
15+
SELECT min(year)
16+
FROM planes;
17+
```
18+
19+
Then get the tailnum for this plane. Then get the flights. Use a WITH clause in your query.
20+
21+
```sql
22+
WITH minyear AS
23+
(SELECT min(year) AS year FROM planes),
24+
oldplane AS
25+
(SELECT tailnum FROM planes, minyear WHERE planes.year = minyear.year)
26+
SELECT *
27+
FROM flights, oldplane
28+
WHERE flights.tailnum = oldplane.tailnum;
29+
```
30+
31+
32+
### More exploring
33+
34+
Write a query to find the names of all businesses that had a violation of type `'(6) FOOD PROTECTION: Potentially hazardous food properly thawed.'`.
35+
36+
```sql
37+
SELECT name
38+
FROM business
39+
INNER JOIN violations
40+
ON business.license = violations.license
41+
WHERE violation = '(6) FOOD PROTECTION: Potentially hazardous food properly thawed.';
42+
```
43+
44+
Select the business name, violation type, and comment for only violations that were found on the most recent inspection of a business.
45+
46+
```sql
47+
SELECT name, violation, comments
48+
FROM violations
49+
INNER JOIN business
50+
ON business.license = violations.license
51+
WHERE date = last_inspection;
52+
```
53+
54+
Select violations for inspections where the inspection score was below 80. Select an appropriate/informative subset of the columns. Order the results in an meaningful order. Hint: you need to use both the business license and inspection date information to join the tables properly.
55+
56+
```sql
57+
SELECT name, inspections.date, score, violation
58+
FROM inspections
59+
INNER JOIN business
60+
ON inspections.license = business.license
61+
INNER JOIN violations
62+
ON violations.license = business.license AND violations.date = inspections.date
63+
WHERE score < 80
64+
ORDER BY name, date;
65+
```
66+
67+
Bonus: What is the most common violation type in inspections with total scores less than 80?
68+
69+
```sql
70+
SELECT violation, count(*)
71+
FROM inspections
72+
INNER JOIN business
73+
ON inspections.license = business.license
74+
INNER JOIN violations
75+
ON violations.license = business.license AND violations.date = inspections.date
76+
WHERE score < 80
77+
GROUP BY violation
78+
ORDER BY count DESC;
79+
```
80+
81+
82+
## Data Types
83+
84+
Select `time_hour` and `time_hour` cast as date from `weather`; limit to a few rows.
85+
86+
```sql
87+
SELECT time_hour, time_hour::date
88+
FROM weather
89+
LIMIT 20;
90+
```
91+
92+
## Missing Rows
93+
94+
Are there days for any airports where there are no weather observations at all?
95+
96+
Hint: one way to do this is to see what the normal number of days per airport is in the data (if there is a normal value), and then look for abnormal values.
97+
98+
Hint 2: Look at the subquery section above for an example query that might be useful.
99+
100+
101+
```sql
102+
WITH DISTINCT_days AS
103+
(SELECT DISTINCT origin, year, month, day
104+
FROM weather)
105+
SELECT origin, count(*)
106+
FROM DISTINCT_days
107+
GROUP BY origin
108+
ORDER BY count DESC;
109+
```
110+
111+
Bonus exercise: Which hour of the day is most likely to be missing a weather observation?
112+
113+
```sql
114+
SELECT hour, count(*)
115+
FROM weather
116+
GROUP BY hour
117+
ORDER BY count;
118+
```
119+
120+
The midnight (hour 0) hour has the fewest observations in the data, so is therefore the most likely to be missing.
121+
122+
Bonus exercise 2: Are there any duplicate measurements (same airport and time) in the weather data?
123+
124+
```sql
125+
SELECT origin, year, month, day, hour, count(*)
126+
FROM weather
127+
GROUP BY origin, year, month, day, hour
128+
HAVING count(*) > 1;
129+
```
130+
131+
132+
133+
## LIKE
134+
135+
How many Starbucks are in the Evanston inspection data?
136+
137+
```sql
138+
SELECT count(*)
139+
FROM business
140+
WHERE name LIKE 'Starbucks%';
141+
```
142+
143+
How many distinct violation types have FOOD in the violation name?
144+
145+
```sql
146+
SELECT count(DISTINCT violation)
147+
FROM violations
148+
WHERE violation LIKE '%FOOD%';
149+
```
150+
151+
Which violation type is most likely to be a critical violation (see the comments)?
152+
153+
```sql
154+
SELECT violation, count(*)
155+
FROM violations
156+
WHERE comments LIKE '%CRITICAL VIOLATION%'
157+
GROUP BY violation
158+
ORDER BY count;
159+
```
160+
161+
Find any violation entries that do not conform to the pattern of `(#) VIOLATION TITLE: violation description`
162+
163+
```sql
164+
SELECT DISTINCT violation
165+
FROM violations
166+
WHERE violation NOT LIKE '(%) %: %';
167+
```
168+
169+
Challenge: Get all addresses where the street name starts with A (and only those addresses).
170+
171+
172+
```sql
173+
SELECT address
174+
FROM business
175+
WHERE address LIKE '% A% AVE';
176+
```
177+
178+
Note that the solution above is specific to the values in the data. There are no "Streets" that start with A, only "Avenues", We'd have to change the query if the data was different.
179+
180+
181+
182+
183+
184+
185+
186+
187+
188+
189+
190+
## String Splitting
191+
192+
Get just the street name from the business address (the result won't be perfect at this stage -- just do what you can with a simple query). Group and count to see which street has the most food businesses.
193+
194+
Let's take this in steps
195+
196+
```sql
197+
SELECT split_part(address, ' ', 2)
198+
FROM business;
199+
```
200+
201+
What are the unique values? (To check)
202+
203+
```sql
204+
SELECT DISTINCT split_part(address, ' ', 2)
205+
FROM business;
206+
```
207+
208+
Which are most common?
209+
210+
```sql
211+
SELECT DISTINCT split_part(address, ' ', 2) AS street, count(*)
212+
FROM business
213+
GROUP BY street
214+
ORDER BY count DESC;
215+
```
216+
217+
## Numerical Functions
218+
219+
How much total precipitation did each airport get by month? Hint: Use `sum()`, and you'll need to group by several columns. Order the results in a reasonable order.
220+
221+
```sql
222+
SELECT origin, month, sum(precip)
223+
FROM weather
224+
GROUP BY origin, month
225+
ORDER BY origin, month;
226+
```
227+
228+
You could include year above, but since everything is 2013, it's not necessary.
229+
230+
```sql
231+
SELECT month, round(avg(precip), 2)
232+
FROM (SELECT origin, month, sum(precip) AS precip
233+
FROM weather
234+
GROUP BY origin, month) AS monthly
235+
GROUP BY month
236+
ORDER BY month;
237+
```
238+
239+
## Numerical Distributions
240+
241+
Examine the distribution of temperature like we did with humidity. Then do it by airport (origin).
242+
243+
```sql
244+
SELECT trunc(temp, -1), count(*)
245+
FROM weather
246+
GROUP BY trunc(temp, -1)
247+
ORDER BY trunc(temp, -1);
248+
```
249+
250+
```sql
251+
SELECT origin, trunc(temp, -1), count(*)
252+
FROM weather
253+
GROUP BY origin, trunc(temp, -1)
254+
ORDER BY origin, trunc(temp, -1);
255+
```
256+
257+
258+
## Date Part Functions
259+
260+
Which day of the week had the most rain at JFK airport?
261+
262+
```sql
263+
SELECT date_part('dow', time_hour) AS dow, sum(precip)
264+
FROM weather
265+
WHERE origin='JFK'
266+
GROUP BY dow
267+
ORDER BY sum(precip) DESC;
268+
```
269+
270+
What week of the year in 2018 were the most food inspections done?
271+
272+
```sql
273+
SELECT date_part('week', date) AS week, count(*)
274+
FROM inspections
275+
WHERE date >= '2018-01-01' AND date <= '2018-12-31'
276+
GROUP BY week
277+
ORDER BY count DESC;
278+
```
279+
280+
Bonus (Hard, because we didn't learn everything you need): What days had no inspections? The `generate_series()` function can be used to create a series of dates.
281+
282+
Example:
283+
284+
```sql
285+
SELECT generate_series('2018-01-01', '2019-05-14', '1 day'::interval) AS days;
286+
```
287+
288+
The above are timestamps, but they can be compared to dates.
289+
290+
Can you use the above to help you find dates with no inspections?
291+
292+
293+
There are a few options:
294+
295+
```sql
296+
SELECT date
297+
FROM generate_series('2018-01-01', '2019-05-14', '1 day'::interval) AS days
298+
LEFT JOIN inspections
299+
ON days = inspections.date
300+
WHERE inspections.date IS NULL;
301+
```
302+
303+
OR
304+
305+
```sql
306+
WITH days AS
307+
(SELECT *
308+
FROM generate_series('2018-01-01',
309+
'2019-05-14', '1 day'::interval) AS day)
310+
SELECT day
311+
FROM days
312+
WHERE day NOT IN (SELECT date FROM INSPECTIONS);
313+
```
314+
315+
What days of the week are days without inspections? Any that aren't on the weekend?
316+
317+
These use the WITH option, but you can do this as well with other versions of the query.
318+
319+
```sql
320+
WITH days AS (SELECT * FROM generate_series('2018-01-01', '2019-05-14', '1 day'::interval) AS day)
321+
SELECT day, date_part('dow', day)
322+
FROM days
323+
WHERE day NOT IN (SELECT date FROM INSPECTIONS);
324+
```
325+
326+
```sql
327+
WITH days AS (SELECT * FROM generate_series('2018-01-01', '2019-05-14', '1 day'::interval) AS day)
328+
SELECT day, date_part('dow', day)
329+
FROM days
330+
WHERE day NOT IN (SELECT date FROM INSPECTIONS)
331+
AND date_part('dow', day) NOT IN (0,6);
332+
```
333+
334+
335+
336+
## Lag
337+
338+
We don't just have to lag dates themselves -- we can also lag other columns by their date values. Let's get the changes in temperature from one hour to the next at JFK. And the find the biggest change.
339+
340+
Remember to limit to just rows for JFK, and order by time_hour overall. And you'll want the absolute value of the difference with function `abs`.
341+
342+
```sql
343+
WITH changes AS
344+
(SELECT time_hour, temp,
345+
lag(temp) OVER (ORDER BY time_hour) AS lagged,
346+
abs(temp-lag(temp) OVER (ORDER BY time_hour)) AS diff
347+
FROM weather
348+
WHERE origin='JFK'
349+
ORDER BY time_hour)
350+
SELECT time_hour, temp, lagged, diff
351+
FROM changes
352+
WHERE diff = (SELECT max(diff) FROM changes);
353+
```
354+
355+
This seems impossible, so let's take a look:
356+
357+
```sql
358+
SELECT *
359+
FROM weather
360+
WHERE origin='JFK'
361+
AND date_trunc('day', time_hour) = '2013-05-09';
362+
```
363+
364+
Looks like bad data entry.
365+
366+

‎sql/part1.md

+78-84
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# SQL Part 1
22

3+
* [Connecting](#connecting)
4+
* [Select basics](#select)
5+
* [`LIMIT`](#limit)
6+
* [`OFFSET`](#offset)
7+
* [`WHERE`](#where)
8+
- [`BETWEEN`](#between)
9+
- [`IN`](#in)
10+
- [`LIKE`](#like)
11+
- [`IS NULL`](#is-null)
12+
* [`ORDER BY`](#order-by)
13+
* [`DISTINCT`](#distinct)
14+
* [Functions and Arithmetic](#functions-and-arithmetic)
15+
* [`GROUP BY`](#group-by)
16+
* [`HAVING`](#having)
17+
* [Dates](#dates)
18+
19+
20+
321
## Connecting
422

523
To connect to a database, you need the following information:
@@ -10,10 +28,9 @@ To connect to a database, you need the following information:
1028
* Password:
1129
* Port: 5432 (default for PostgreSQL)
1230

13-
You also need a client program to connect to the database. It's suggested that you start with `psql`, which is a command-line client. Output below is from `psql`.
14-
15-
In the workshop, you only have permission to read data from the dvdrental database, not add, update, or delete data, or modify the database. When we get to changing databases, you'll have access to a database you have permissions to modify.
31+
You also need a client program to connect to the database. Output below is from a command line client called `psql`.
1632

33+
In the introductory workshop, you only have permission to read data from the dvdrental database, not add, update, or delete data, or modify the database.
1734

1835
# Database Schema
1936

@@ -23,7 +40,9 @@ The schema (the set of tables, their columns and types, and the relationships be
2340

2441
We can also get information directly from the database itself. These commands are specific to each database system. The commands below are for PostgreSQL.
2542

26-
First, use `\d` to get a list of relations (tables, views, sequences) -- we'll talk about what views and sequences are later.
43+
## `psql` describe commands
44+
45+
With `psql`, you can use `\d` to get a list of relations (tables, views, sequences).
2746

2847
```sql
2948
\dvdrental=# \d
@@ -127,15 +146,51 @@ There are also other `\d` describe functions, among them:
127146

128147
You can get a complete list of backslash commands with `\?`.
129148

149+
150+
## Other programs
151+
152+
Each database client will have its own way to show you information about the tables in the database.
153+
154+
## Useful Notes
155+
156+
A useful setting for psql is:
157+
158+
```sql
159+
\pset format wrapped
160+
```
161+
162+
Comments in SQL files:
163+
164+
```sql
165+
/* this is a comment; it can
166+
span multiple lines */
167+
168+
SELECT * FROM actor; /* this is also a comment */
169+
170+
-- this is a single line comment
171+
172+
SELECT * from actor; -- another single line comment
173+
```
174+
175+
176+
177+
130178
# `SELECT`
131179

132-
Select is the command we use most often in SQL. It let's us select data (specified rows and columns) from one or more tables. Columns are selected by name, rows are selected with conditional statements (values of a particular column meeting some criteria).
180+
Select is the command we use most often in SQL. It lets us select data (specified rows and columns) from one or more tables. Columns are selected by name, rows are selected with conditional statements (values of a particular column meeting some criteria).
133181

134182
The basic format of a `SELECT` command is
135183

136184
```sql
137-
SELECT column_1, column_2
138-
FROM table1;
185+
SELECT <columns>
186+
FROM <table>;
187+
```
188+
189+
For example:
190+
191+
```sql
192+
SELECT actor_id, first_name
193+
FROM actor;
139194
```
140195

141196
`SELECT` and `FROM` are reserved keywords. SQL is case-insensitive, but many times you'll see the key terms in all caps. Note that you use a semicolon `;` to end the statement. You can also split a SQL statement across multiple lines -- the space between the terms doesn't matter (a new line counts as space).
@@ -285,7 +340,7 @@ SELECT * FROM customer WHERE store_id=2;
285340

286341
(_Note: Going forward, output will only be included when there's something about it to discuss._)
287342

288-
You can combined conditions together with `AND` and `OR`:
343+
You can combine conditions together with `AND` and `OR`:
289344

290345
```sql
291346
SELECT * FROM customer WHERE store_id=2 AND customer_id=400;
@@ -346,61 +401,13 @@ SELECT * FROM film WHERE film_id IN (3,5,7,9);
346401

347402
Select rows from the actor table where the first name is Angela, Angelina, or Audrey using `IN`.
348403

349-
### `LIKE`
404+
---
350405

351-
`LIKE` lets you do pattern matching on strings. The only two pattern characters are `_` for a single character and `%` for any number of characters (including none). In some implementations of SQL, `LIKE` is case-insensitive. In PostgreSQL, it is case-sensitive; `ILIKE` is the PostgreSQL case-insensitive version.
406+
Break for exercises: [part1_exercises.md](part1_exercises.md) - sections Describe Commands and Select.
352407

353-
Get names of actors that start with A.
354-
355-
```sql
356-
SELECT * FROM actor WHERE first_name LIKE 'A%';
357-
```
408+
---
358409

359-
Note that the following will yield no results:
360410

361-
```sql
362-
SELECT * FROM actor WHERE first_name LIKE 'a%';
363-
```
364-
365-
But the following is ok:
366-
367-
```sql
368-
SELECT * FROM actor WHERE first_name ILIKE 'a%';
369-
```
370-
371-
Get 4 letter names starting with A:
372-
373-
```sql
374-
SELECT * FROM actor WHERE first_name LIKE 'A___';
375-
```
376-
377-
Get names that end with y:
378-
379-
```sql
380-
SELECT * FROM actor WHERE first_name LIKE '%y';
381-
```
382-
383-
Any names with a z in them?
384-
385-
```sql
386-
SELECT * FROM actor WHERE first_name ILIKE '%z%';
387-
```
388-
389-
```sql
390-
dvdrental=# SELECT * FROM actor WHERE first_name ILIKE '%z%';
391-
actor_id | first_name | last_name | last_update
392-
----------+------------+-----------+------------------------
393-
11 | Zero | Cage | 2013-05-26 14:47:57.62
394-
89 | Charlize | Dench | 2013-05-26 14:47:57.62
395-
121 | Liza | Bergman | 2013-05-26 14:47:57.62
396-
(3 rows)
397-
```
398-
399-
See that we have results where Z is the first letter (since % can match 0 characters) as well as results where there's a z in the middle of the name.
400-
401-
### Exercise
402-
403-
Select rows from the city table where city starts with a B.
404411

405412
### `IS NULL`
406413

@@ -427,6 +434,8 @@ SELECT * FROM address WHERE address2 = NULL;
427434
`NULL` values are omitted from the results of comparison tests.
428435

429436

437+
438+
430439
## `ORDER BY`
431440

432441
We can determine the order that our results are shown in:
@@ -477,6 +486,11 @@ SELECT DISTINCT customer_id, staff_id FROM payment;
477486
SELECT DISTINCT amount FROM payment ORDER BY amount;
478487
```
479488

489+
---
490+
491+
Break for exercises: [part1_exercises.md](part1_exercises.md) - sections for Like, Distinct, and Order by exercises
492+
493+
---
480494

481495

482496
## Functions and Arithmetic
@@ -653,38 +667,18 @@ dvdrental=# select rental_date from rental where rental_date<'2005-05-25';
653667
(8 rows)
654668
```
655669

656-
This will get you everything before 2005-05-25 00:00:00. If you want just the date part of a datetime:
670+
This will get you everything before 2005-05-25 00:00:00.
657671

658-
```sql
659-
SELECT rental_date FROM rental
660-
WHERE date_trunc('day', rental_date) = '2005-05-24';
661-
```
672+
TODO: check date equality
662673

663-
or
674+
---
664675

665-
```sql
666-
SELECT rental_date FROM rental
667-
WHERE cast(rental_date as date) = '2005-05-24';
668-
```
676+
Break for exercises: [part1_exercises.md](part1_exercises.md) - remaining sections
669677

670-
or
678+
---
671679

672-
```sql
673-
SELECT rental_date FROM rental
674-
WHERE rental_date::date = '2005-05-24';
675-
```
676680

677-
# Comments
678681

679-
Comments in SQL files:
680682

681-
```sql
682-
/* this is a comment; it can
683-
span multiple lines */
684683

685-
SELECT * FROM actor; /* this is also a comment */
686684

687-
-- this is a single line comment
688-
689-
SELECT * from actor; -- another single line comment
690-
```

‎sql/part1_exercises.md

+11-8
Original file line numberDiff line numberDiff line change
@@ -23,31 +23,34 @@ Select the row from customer for customer named Jamie Rice.
2323

2424
Select amount and payment_date from payment where the amount paid was less than $1.
2525

26-
What are the different rental durations that the store allows?
2726

28-
## Exercise: Counting
2927

30-
How many films are rated NC-17? How many are rated PG or PG-13?
28+
## Exercise: Distinct
29+
30+
What are the different rental durations that the store allows?
31+
3132

3233

33-
Challenge: How many different customers have entries in the rental table? [Hint](http://www.w3resource.com/sql/aggregate-functions/count-with-distinct.php)
3434

3535
## Exercise: Order By
3636

3737
What are the IDs of the last 3 customers to return a rental?
3838

3939

40-
## Exercise: Like
4140

42-
Select film title that have "Dragon" in them.
41+
## Exercise: Counting
42+
43+
How many films are rated NC-17? How many are rated PG or PG-13?
44+
45+
46+
Challenge: How many different customers have entries in the rental table? [Hint](http://www.w3resource.com/sql/aggregate-functions/count-with-distinct.php)
47+
4348

44-
Challenge: only select titles that have just the word "Dragon" (not "Dragonfly") in them.
4549

4650
## Exercise: Group By
4751

4852
Does the average replacement cost of a film differ by rating?
4953

50-
Which store (`store_id`) has the most customers whose first name starts with M?
5154

5255
Challenge: Are there any customers with the same last name?
5356

‎sql/part1_exercises_with_answers.md

+25-50
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Select the row from customer for customer named Jamie Rice.
4242

4343
Select amount and payment_date from payment where the amount paid was less than $1.
4444

45-
What are the different rental durations that the store allows?
45+
4646

4747
#### Solution
4848

@@ -75,43 +75,15 @@ WHERE amount < 1;
7575
```
7676

7777

78-
```sql
79-
SELECT DISTINCT rental_duration FROM film;
80-
```
81-
82-
83-
## Exercise: Counting
84-
85-
How many films are rated NC-17? How many are rated PG or PG-13?
86-
87-
88-
Challenge: How many different customers have entries in the rental table? [Hint](http://www.w3resource.com/sql/aggregate-functions/count-with-distinct.php)
89-
90-
#### Solution
91-
92-
```sql
93-
SELECT count(*) FROM film
94-
WHERE rating = 'NC-17';
95-
```
96-
97-
98-
```sql
99-
SELECT count(*) FROM film
100-
WHERE rating in ('PG', 'PG-13');
101-
```
78+
## Exercise: Distinct
10279

103-
or
80+
What are the different rental durations that the store allows?
10481

10582
```sql
106-
SELECT count(*) FROM film
107-
WHERE rating = 'PG' OR rating = 'PG-13';
83+
SELECT DISTINCT rental_duration FROM film;
10884
```
10985

110-
Challenge:
11186

112-
```sql
113-
SELECT COUNT(DISTINCT customer_id) FROM rental;
114-
```
11587

11688

11789

@@ -131,41 +103,49 @@ LIMIT 3;
131103

132104

133105

134-
## Exercise: Like
135106

136-
Select film title that have "Dragon" in them.
137107

138-
Challenge: only select titles that have just the word "Dragon" (not "Dragonfly") in them.
108+
## Exercise: Counting
109+
110+
How many films are rated NC-17? How many are rated PG or PG-13?
111+
112+
113+
Challenge: How many different customers have entries in the rental table? [Hint](http://www.w3resource.com/sql/aggregate-functions/count-with-distinct.php)
139114

140115
#### Solution
141116

142117
```sql
143-
SELECT title FROM film
144-
WHERE title like '%Dragon%';
118+
SELECT count(*) FROM film
119+
WHERE rating = 'NC-17';
145120
```
146121

147-
Challenge:
148122

149-
This solution only works because the titles are all two words, so we can just check the beginning or end of the title:
123+
```sql
124+
SELECT count(*) FROM film
125+
WHERE rating in ('PG', 'PG-13');
126+
```
127+
128+
or
150129

151130
```sql
152-
SELECT title FROM film
153-
WHERE title like '% Dragon'
154-
OR title like 'Dragon %';
131+
SELECT count(*) FROM film
132+
WHERE rating = 'PG' OR rating = 'PG-13';
155133
```
156134

157-
For a more general purpose solution, you would need to use [regular expressions](https://www.postgresql.org/docs/current/static/functions-matching.html) to match word boundaries, but we're not going to cover that here.
135+
Challenge:
158136

159137
```sql
160-
SELECT title FROM film WHERE title ~ '.*\mDragon\M.*';
138+
SELECT COUNT(DISTINCT customer_id) FROM rental;
161139
```
162140

163141

142+
143+
144+
164145
## Exercise: Group By
165146

166147
Does the average replacement cost of a film differ by rating?
167148

168-
Which store (`store_id`) has the most customers whose first name starts with M?
169149

170150
Challenge: Are there any customers with the same last name?
171151

@@ -176,11 +156,6 @@ SELECT rating, avg(replacement_cost) FROM film
176156
GROUP BY rating;
177157
```
178158

179-
```sql
180-
SELECT store_id, count(*) FROM customer
181-
WHERE first_name LIKE 'M%' GROUP BY store_id;
182-
```
183-
184159
Challenge:
185160

186161
```sql

‎sql/part2.md

+76-39
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
This section first covers the topics of aliasing and subqueries, then we get to joining tables, which is the real power of relational databases.
44

5+
* [Aliasing](#aliasing)
6+
* [Subqueries](#subqueries)
7+
* [Joins](#joins)
8+
- [`INNER JOIN`](#inner-join)
9+
- [`LEFT JOIN`](#left-join)
10+
- [`FULL OUTER JOIN`](#full-outer-join)
11+
12+
513
# Aliasing
614

715
You can rename columns and tables in queries. This will mostly be useful when we're joining tables together, but it can also be useful when you're working with functions.
@@ -22,6 +30,8 @@ In the output above, the name of the column is the alias.
2230

2331
One important note is that _column_ aliases can't be used in where or having clauses:
2432

33+
TODO: check
34+
2535
```sql
2636
SELECT title, rating AS rate
2737
FROM film
@@ -58,7 +68,7 @@ WHERE rental_rate < (SELECT avg(rental_rate) FROM film)
5868
ORDER BY rental_rate DESC;
5969
```
6070

61-
The subquery is executed first, and the the result is used the broader query.
71+
The subquery is executed first, and then the result is used the broader query.
6272

6373
We can also use subqueries with `IN`. Find customers with an address in `postal_code` 35200
6474

@@ -78,9 +88,11 @@ SELECT count(customer_id) FROM
7888
HAVING count(*) > 30) AS foo;
7989
```
8090

91+
`foo` is a common throwaway name that gets used -- you can pick any name you want for the alias though.
92+
8193
## Exercise
8294

83-
Find the title of movies that have the maximum replacement fee.
95+
Find the titles of movies that have the maximum replacement fee.
8496

8597

8698
# Joins
@@ -100,7 +112,8 @@ The first and most common type of join is called an inner join. You specify the
100112
Let's start with the example we just used above: customers with postal code 52137. To start with, how do we join the tables generally:
101113

102114
```sql
103-
SELECT * FROM customer INNER JOIN address
115+
SELECT * FROM customer
116+
INNER JOIN address
104117
ON customer.address_id = address.address_id;
105118
```
106119

@@ -109,12 +122,26 @@ This matches up the customer to the full address information.
109122
Then we can select a specific postal code if we want:
110123

111124
```sql
112-
SELECT * FROM customer INNER JOIN address
125+
SELECT * FROM customer
126+
INNER JOIN address
113127
ON customer.address_id = address.address_id
114128
WHERE postal_code='52137';
115129
```
116130

117-
Note that both tables have a column called `address_id`. We add the table name to the front of the column name when referencing them. You can do this anytime, but typically only do it when you're joining and there's ambiguity.
131+
Note that both tables have a column called `address_id`. We add the table name to the front of the column name when referencing them. You can do this anytime, but typically only do it when you're joining and there's ambiguity.
132+
133+
We can also group by, order by, and use other where clause conditions on the joined tables. For example, we can count the customers in each postal code.
134+
135+
```sql
136+
SELECT postal_code, count(*)
137+
FROM customer
138+
INNER JOIN address
139+
ON customer.address_id = address.address_id
140+
GROUP BY postal_code
141+
ORDER BY count;
142+
```
143+
144+
TODO: check above
118145

119146
### Alternative Syntax
120147

@@ -150,24 +177,26 @@ LIMIT 10;
150177

151178
Join the store table to the address table to add the address information to the store information.
152179

153-
Select film\_id, category\_id, and name from joining the film\_category and category tables, only where the category\_id is less than 10.
180+
154181

155182
### Table Names and Aliases
156183

157184
We can alias tables as well as columns. If a column name appears in both tables, then we have to specify the table name when selecting it.
158185

159186
```sql
160187
SELECT first_name, last_name, customer.address_id, postal_code
161-
FROM customer, address
162-
WHERE customer.address_id = address.address_id;
188+
FROM customer
189+
INNER JOIN address
190+
ON customer.address_id = address.address_id;
163191
```
164192

165193
If we don't put a table name in front of `address_id` we get an error:
166194

167195
```sql
168196
dvdrental=# SELECT first_name, last_name, address_id, postal_code
169-
dvdrental-# FROM customer, address
170-
dvdrental-# WHERE customer.address_id = address.address_id;
197+
dvdrental-# FROM customer
198+
dvdrental-# INNER JOIN address
199+
dvdrental-# ON customer.address_id = address.address_id;
171200
ERROR: column reference "address_id" is ambiguous
172201
LINE 1: SELECT first_name, last_name, address_id, postal_code
173202
^
@@ -177,13 +206,31 @@ To make the references easier, it's common to alias table names
177206

178207
```sql
179208
SELECT first_name, last_name, c.address_id, postal_code
180-
FROM customer AS c, address AS a
181-
WHERE c.address_id = a.address_id;
209+
FROM customer AS c
210+
INNER JOIN address AS a
211+
ON c.address_id = a.address_id;
212+
```
213+
214+
and we often drop the `AS`:
215+
216+
```sql
217+
SELECT first_name, last_name, c.address_id, postal_code
218+
FROM customer c
219+
INNER JOIN address a
220+
ON c.address_id = a.address_id;
182221
```
183222

184223
The _table_ aliases can be used in the where clause as well as the select part of the statement.
185224

186-
The 'AS' can also be dropped in an alias. We'll do this below.
225+
---
226+
227+
Break for exercises: [part2_exercises.md](part2_exercises.md) - Subqueries, Inner Joins, and Joining and Grouping: Customer Spending
228+
229+
---
230+
231+
232+
233+
187234

188235
### More than 2 Tables
189236

@@ -192,7 +239,8 @@ We can join more than 2 tables together. Let's match the names of actors with t
192239

193240
```sql
194241
SELECT title, first_name, last_name
195-
FROM film f INNER JOIN film_actor fa ON f.film_id=fa.film_id
242+
FROM film f
243+
INNER JOIN film_actor fa ON f.film_id=fa.film_id
196244
INNER JOIN actor a ON fa.actor_id=a.actor_id;
197245
```
198246

@@ -211,21 +259,29 @@ Join store, address, and city tables to show the store\_id, address, and city na
211259

212260
## `LEFT JOIN`
213261

214-
With an inner join, we only get the results that are in both tables. But sometimes we want to know which rows in a table don't have a match in the other table. For this we can use a `LEFT JOIN` or `RIGHT JOIN` (depending on which table you want all of the results from).
262+
With an inner join, we only get the results that are in both tables. But there are other types of joins.
263+
264+
![](presentation_assets/joins.png)
265+
266+
267+
268+
If we want to know which rows in a table don't have a match in the other table, we use a `LEFT JOIN` or `RIGHT JOIN` (depending on which table you want all of the results from).
215269

216270
In the dvd database, there can be films that don't have an inventory record. We don't want these to be dropped from our results of joining the film and inventory tables. Start with the join.
217271

218272
```sql
219273
SELECT f.film_id, title, inventory_id, store_id
220-
FROM film f LEFT JOIN inventory i
274+
FROM film f
275+
LEFT JOIN inventory i
221276
ON f.film_id=i.film_id;
222277
```
223278

224279
Now find the rows where there isn't matching inventory:
225280

226281
```sql
227282
SELECT f.film_id, title, inventory_id, store_id
228-
FROM film f LEFT JOIN inventory i
283+
FROM film f
284+
LEFT JOIN inventory i
229285
ON f.film_id=i.film_id
230286
WHERE i.film_id IS NULL;
231287
```
@@ -242,29 +298,10 @@ A `FULL OUTER JOIN` is like doing a left and right join at the same time: you ge
242298
There aren't any tables with this type of relationship to each other in the dvdrental database, so we aren't going to do an example here. The syntax is the same as the other joins.
243299

244300

245-
# Views
246-
247-
A view is a virtual table that has the results of a query in it (a result set). You give it a name like you would a table, and you can use it like a table. It's useful when you have common views of the data that you need to access often. It's a way to save common queries, particularly ones that are long or complicated.
248301

249-
We can list views with
250302

251-
```sql
252-
\dv
253-
```
254-
255-
And then select from them like a table
256-
257-
```sql
258-
select * from actor_info limit 5;
259-
```
260-
261-
You can create views with `CREATE VIEW` and a select query:
262-
263-
```sql
264-
CREATE VIEW named_film_actor AS
265-
SELECT f.film_id, title, a.actor_id, first_name, last_name
266-
FROM film f, film_actor fa, actor a
267-
WHERE f.film_id=fa.film_id AND fa.actor_id=a.actor_id;
268-
```
303+
---
269304

305+
Break for exercises: [part2_exercises.md](part2_exercises.md) - Remaining Sections
270306

307+
---

‎sql/part2_exercises.md

+17-9
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# SQL Part 2: Exercises
22
----
33

4-
There may be other ways to achieve the same result. Remember that SQL commands are not case sensitive (but data values are).
4+
There may be other ways to achieve the same result. Remember that SQL commands are not case sensitive (but data values and table and column names are).
55

66
All of these exercises use the `dvdrental` database.
77

@@ -15,28 +15,36 @@ What films are actors with ids 129 and 195 in together?
1515
Challenge: How many actors are in more films than actor id 47? Hint: this takes 2 subqueries (one nested in the other). Work inside out: 1) how many films is actor 47 in; 2) which actors are in more films than this? 3) Count those actors.
1616

1717

18+
## Exercise: Inner Joins
1819

19-
## Exercise: Joining Customers, Payments, and Staff
20+
Select `first_name`, `last_name`, `amount`, and `payment_date` by joining the customer and payment tables.
21+
22+
Select film\_id, category\_id, and name from joining the film\_category and category tables, only where the category\_id is less than 10.
23+
24+
25+
## Exercise: Joining and Grouping: Customer Spending
26+
27+
Get a list of the names of customers who have spent more than $150, along with their total spending.
28+
29+
Who is the customer with the highest average payment amount?
2030

21-
Join the customer and payment tables together with an inner join; select customer id, name, amount, and date and order by customer id. Then join the staff table to them as well to add the staff's name.
2231

2332
## Exercise: Joining for Better Addresses
2433

2534
Create a list of addresses that includes the name of the city instead of an ID number and the name of the country as well.
2635

36+
## Exercise: Joining Customers, Payments, and Staff
2737

28-
## Exercise: Joining and Grouping
38+
Join the customer and payment tables together with an inner join; select customer id, name, amount, and date and order by customer id. Then join the staff table to them as well to add the staff's name.
2939

30-
Repeating an exercise from Part 1, but adding in information from additional tables: Which film (_by title_) has the most actors? Which actor (_by name_) is in the most films?
3140

32-
Challenge: Which two actors have been in the most films together? Hint: You can join a table to itself by including it twice with different aliases. Hint 2: Try writing the query first to find the answer in terms of actor ids (not names); then for a super challenge (it takes a complicated query), rewrite it to get the actor names instead of the IDs. Hint 3: make sure not to count pairs twice (a in the movie with b and b in the movie with a) and avoid counting cases of an actor being in a movie with themselves.
41+
## Exercise: Joining and Grouping: Films and Actors
3342

43+
Repeating an exercise from Part 1, but adding in information from additional tables: Which film (_by title_) has the most actors? Which actor (_by name_) is in the most films?
3444

35-
## Exercise: Joining and Grouping 2
45+
Challenge: Which two actors have been in the most films together? Hint: You can join a table to itself by including it twice with different aliases. Hint 2: Try writing the query first to find the answer in terms of actor ids (not names); then for a super challenge (it takes a complicated query), rewrite it to get the actor names instead of the IDs. Hint 3: make sure not to count pairs twice (a in the movie with b and b in the movie with a) and avoid counting cases of an actor being in a movie with themselves.
3646

37-
Get a list of the names of customers who have spent more than $150, along with their total spending.
3847

39-
Who is the customer with the highest average payment amount?
4048

4149

4250

‎sql/part2_exercises_with_answers.md

+104-36
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,76 @@ SELECT count(actor_id) FROM
3838
```
3939

4040

41+
## Exercise: Inner Joins
42+
43+
Select `first_name`, `last_name`, `amount`, and `payment_date` by joining the customer and payment tables.
44+
45+
46+
Select film\_id, category\_id, and name from joining the film\_category and category tables, only where the category\_id is less than 10.
47+
48+
49+
#### Solutions
50+
51+
52+
```sql
53+
SELECT first_name, last_name, amount, payment_date
54+
FROM customer c
55+
INNER JOIN payment p
56+
ON c.customer_id=p.customer_id;
57+
```
58+
59+
```sql
60+
SELECT film_id, c.category_id, name
61+
FROM film_category fc
62+
INNER JOIN category c
63+
ON fc.category_id = c.category_id
64+
WHERE c.category < 10;
65+
```
66+
67+
TODO: check above
68+
69+
70+
71+
## Exercise: Joining and Grouping: Customer Spending
72+
73+
Get a list of the names of customers who have spent more than $150, along with their total spending.
74+
75+
Who is the customer with the highest average payment amount?
76+
77+
78+
#### Solution
79+
80+
```sql
81+
SELECT first_name, last_name, sum(amount)
82+
FROM customer c
83+
INNER JOIN payment p
84+
ON c.customer_id=p.customer_id
85+
GROUP BY first_name, last_name
86+
HAVING sum(amount) > 150;
87+
```
88+
89+
```sql
90+
SELECT c.customer_id, first_name, last_name, avg(amount)
91+
FROM customer c
92+
INNER JOIN payment p
93+
ON c.customer_id=p.customer_id
94+
GROUP BY c.customer_id, first_name, last_name
95+
ORDER BY avg(amount) DESC
96+
LIMIT 1;
97+
```
98+
99+
100+
101+
102+
41103

42104
## Exercise: Joining Customers, Payments, and Staff
43105

106+
107+
44108
Join the customer and payment tables together with an inner join; select customer id, name, amount, and date and order by customer id. Then join the staff table to them as well to add the staff's name.
45109

46-
#### Solution
110+
#### Solutions
47111

48112
```sql
49113
SELECT
@@ -83,6 +147,16 @@ Create a list of addresses that includes the name of the city instead of an ID n
83147

84148
#### Solution
85149

150+
151+
```sql
152+
SELECT address, address2, district, postal_code, city, country
153+
FROM address
154+
INNER JOIN city ON address.city_id=city.city_id
155+
INNER JOIN country ON city.country_id = country.country_id;
156+
```
157+
158+
or
159+
86160
```sql
87161
SELECT address, address2, district, postal_code, city, country
88162
FROM address, city, country
@@ -93,7 +167,7 @@ AND city.country_id = country.country_id;
93167

94168

95169

96-
## Exercise: Joining and Grouping
170+
## Exercise: Joining and Grouping: Films and Actors
97171

98172
Repeating an exercise from Part 1, but adding in information from additional tables: Which film (_by title_) has the most actors? Which actor (_by name_) is in the most films?
99173

@@ -102,6 +176,27 @@ Challenge: Which two actors have been in the most films together? Hint: You can
102176

103177
#### Solution
104178

179+
180+
```sql
181+
SELECT title, count(actor_id)
182+
FROM film, film_actor
183+
WHERE film.film_id=film_actor.film_id
184+
GROUP BY title
185+
ORDER BY count(actor_id) DESC
186+
LIMIT 1;
187+
```
188+
189+
```sql
190+
SELECT first_name, last_name, count(film_id)
191+
FROM actor, film_actor
192+
WHERE actor.actor_id=film_actor.actor_id
193+
GROUP BY first_name, last_name
194+
ORDER BY count(film_id) DESC
195+
LIMIT 1;
196+
```
197+
198+
** Alternative Syntax:**
199+
105200
```sql
106201
SELECT title, count(actor_id)
107202
FROM film, film_actor
@@ -124,9 +219,9 @@ Challenge:
124219

125220
```sql
126221
SELECT a.actor_id, b.actor_id, count(*)
127-
FROM film_actor a, film_actor b -- join the table to itself
128-
WHERE a.film_id=b.film_id -- on the film id
129-
AND a.actor_id > b.actor_id -- avoid duplicates and matching to the same actor
222+
FROM film_actor a, film_actor b -- join the table to itself
223+
WHERE a.film_id=b.film_id -- on the film id
224+
AND a.actor_id > b.actor_id -- avoid duplicates and matching to the same actor
130225
GROUP BY a.actor_id, b.actor_id
131226
ORDER BY count(*) DESC
132227
LIMIT 1;
@@ -138,10 +233,10 @@ Super Challenge:
138233
SELECT c.first_name, c.last_name, d.first_name, d.last_name, fcount
139234
FROM
140235
(SELECT a.actor_id AS a1, b.actor_id AS a2, count(*) AS fcount
141-
FROM film_actor a, film_actor b -- join the table to itself
142-
WHERE a.film_id=b.film_id -- on the film id
143-
AND a.actor_id > b.actor_id -- avoid duplicates and matching to the same actor
144-
GROUP BY a.actor_id, b.actor_id) foo -- this is the query from above
236+
FROM film_actor a, film_actor b -- join the table to itself
237+
WHERE a.film_id=b.film_id -- on the film id
238+
AND a.actor_id > b.actor_id -- avoid duplicates and matching to the same actor
239+
GROUP BY a.actor_id, b.actor_id) foo -- this is the query from above
145240
INNER JOIN actor c ON c.actor_id=a1
146241
INNER JOIN actor d ON d.actor_id=a2
147242
ORDER BY fcount DESC LIMIT 1;
@@ -150,32 +245,5 @@ ORDER BY fcount DESC LIMIT 1;
150245
There are other ways to accomplish the above.
151246

152247

153-
## Exercise: Joining and Grouping 2
154-
155-
Get a list of the names of customers who have spent more than $150, along with their total spending.
156-
157-
Who is the customer with the highest average payment amount?
158-
159-
160-
#### Solution
161-
162-
```sql
163-
SELECT first_name, last_name, sum(amount)
164-
FROM customer c INNER JOIN payment p
165-
ON c.customer_id=p.customer_id
166-
GROUP BY first_name, last_name
167-
HAVING sum(amount) > 150;
168-
```
169-
170-
```sql
171-
SELECT c.customer_id, first_name, last_name, avg(amount)
172-
FROM customer c INNER JOIN payment p
173-
ON c.customer_id=p.customer_id
174-
GROUP BY c.customer_id, first_name, last_name
175-
ORDER BY avg(amount) DESC
176-
LIMIT 1;
177-
```
178-
179-
180248

181249

0 commit comments

Comments
 (0)
Please sign in to comment.