Filtering Results
= equal
<> not equal
< less than
> greater than
<= less than or equal to
>= greater than or equal to
SELECT DISTINCT
Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the DISTINCT keyword. This might be useful if, for example, you're interested in knowing which languages are represented in the films table:
SELECT DISTINCT language
FROM films;
PRACTICE WITH COUNT
As you've seen, COUNT(*) tells you how many rows are in a table. However, if you want to count the number of non-missing values in a particular column, you can call COUNT on just that column.
For example, to count the number of birth dates present in the people table:
SELECT COUNT(birthdate)
FROM people;
It's also common to combine COUNT with DISTINCT to count the number of distinct values in a column.
For example, this query counts the number of distinct birth dates contained in the people
table:
SELECT COUNT(DISTINCT birthdate)
FROM people;
Count the number of unique countries in the films table.
SELECT COUNT(DISTINCT country)
FROM films;
WHERE ARE
SELECT title
FROM films
WHERE release_year > 1994
AND release_year < 2000;
-- OR
SELECT title
FROM films
WHERE release_year > 1994 AND < 2000;
OR
SELECT title, release_year
FROM films
WHERE language = 'Spanish' AND release_year < 2000;
Nice example: Get all details for Spanish language films released after 2000, but before 2010.
SELECT *
FROM films
WHERE language = 'Spanish'
AND release_year > 2000
AND release_year < 2010;
WHERE AND OR
What if you want to select rows based on multiple conditions where some but not all of the conditions need to be met? For this, SQL has the OR operator. You now know how to select rows that meet some but not all conditions by combining AND and OR.
For example, the following returns all films released in either 1994 or 2000:
SELECT title
FROM films
WHERE release_year = 1994
OR release_year = 2000;
When combining AND and OR, be sure to enclose the individual clauses in parentheses, like so:
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
For example, the following query selects all films that were released in 1994 or 1995 which had a rating of PG or R
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
Now you'll write a query to get the title and release year of films released in the 90s which were in French or Spanish and which took in more than $2M gross.
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish')
AND gross > 2000000;
-- gross é receita bruta
BETWEEN
Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:
SELECT title
FROM films
WHERE release_year
BETWEEN 1994 AND 2000;
ps. It's important to remember that BETWEEN is inclusive, meaning the beginning and end values are included in the results!
For example, suppose we have a table called kids. We can get the names of all kids between the ages of 2 and 12 from the United States:
SELECT name
FROM kids
WHERE age BETWEEN 2 AND 12
AND nationality = 'USA';
WHERE IN
As you've seen, WHERE is very useful for filtering results. However, if you want to filter based on many conditions, WHERE can get unwieldy. For example:
SELECT name
FROM kids
WHERE age = 2
OR age = 4
OR age = 6
OR age = 8
OR age = 10;
Enter the IN operator! The IN operator allows you to specify multiple values in a WHERE clause, making it easier and quicker to specify multiple OR conditions! Neat, right? So, the above example would become simply:
SELECT name
FROM kids
WHERE age IN (2, 4, 6, 8, 10);
Get the title and release year of all films released in 1990 or 2000 that were longer than two hours. Remember, duration is in minutes!
SELECT title, release_year
FROM films
WHERE release_year IN (1990, 2000)
AND duration > 120;
Get the title and language of all films which were in English, Spanish, or French.
SELECT title, language
FROM films
WHERE language IN ('English', 'Spanish', 'French');
INTRODUCTION TO NULL AND NOT NULL
In SQL, NULL represents a missing or unknown value. You can check for NULL values using the expression IS NULL
. For example, to count the number of missing birth dates in the people table:
SELECT COUNT(*)
FROM people
WHERE birthdate IS NULL;
Sometimes, you'll want to filter out missing values so you only get results which are not NULL. To do this, you can use the IS NOT NULL operator.
For example, this query gives the names of all people whose birth dates are not missing in the people table.
SELECT name
FROM people
WHERE birthdate IS NOT NULL;
LIKE AND NOT LIKE
As you've seen, the WHERE clause can be used to filter text data. However, so far you've only been able to filter by specifying the exact text you're interested in. In the real world, often you'll want to search for a pattern rather than a specific text string.
In SQL, the LIKE operator can be used in a WHERE clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE:
The %
wildcard will match zero, one, or many characters in text. For example, the following query matches companies like 'Data'
, 'DataC'
, 'DataCamp'
, 'DataMind'
, and so on:
SELECT name
FROM companies
WHERE name LIKE 'Data%';
The _
wildcard will match a single character. For example, the following query matches companies like 'DataCamp'
, 'DataComp'
, and so on:
SELECT name
FROM companies
WHERE name LIKE 'DataC_mp';
PS. You can also use the NOT LIKE
operator to find records that don't match the pattern you specify.
Get the names of all people whose names begin with 'B'. The pattern you need is 'B%'.
SELECT name
FROM people
WHERE name LIKE 'B%';
Get the names of people whose names have 'r' as the second letter. The pattern you need is '_r%'.
SELECT name
FROM people
WHERE name LIKE '_r%';
Get the names of people whose names don't start with A. The pattern you need is 'A%'.
SELECT name
FROM people
WHERE name NOT LIKE 'A%';
AGGREGATE FUNCTIONS
Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.
For example,
SELECT AVG(budget)
FROM films;
gives you the average
value from the budget column of the films table. Similarly, the MAX
function returns the highest budget:
SELECT MAX(budget)
FROM films;
The SUM
function returns the result of adding up the numeric values in a column:
SELECT SUM(budget)
FROM films;
PS. You can probably guess what the MIN function does! Now it's your turn to try out some SQL functions. ;)
Use the
SUM
function to get the total duration of all films.
SELECT SUM(duration)
FROM films;
Get the average duration of all films.
SELECT AVG(duration)
FROM films;
Get the duration of the shortest film.
SELECT MIN(duration)
FROM films;
Get the amount grossed by the worst performing film.
SELECT MIN(gross)
FROM films;
Combining aggregate functions with WHERE
ggregate functions can be combined with the WHERE clause to gain further insights from your data.
For example, to get the total budget of movies made in the year 2010 or later:
SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;
Get the average amount grossed by all films whose titles start with the letter 'A'.
SELECT AVG(gross)
FROM films
WHERE title LIKE 'A%'
Get the amount grossed by the worst performing film in 1994.
SELECT MIN(gross)
FROM films
WHERE release_year = 1994
-- pior rendimento na bilheteria
Get the amount grossed by the best performing film between 2000 and 2012, inclusive.
SELECT MAX(gross)
FROM films
WHERE release_year BETWEEN 2000 AND 2012
N NOTE ON ARITHMETIC
In addition to using aggregate functions, you can perform basic arithmetic with symbols like +
, -
, *
, and /
.
So, for example, this gives a result of 12:
SELECT (4 * 3);
However, the following gives a result of 1:
SELECT (4 / 3);
What's going on here? SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!
If you want more precision when dividing, you can add decimal places to your numbers. For example,
SELECT (4.0 / 3.0) AS result;
gives you the result you would expect: 1.333
.
IT'S AS SIMPLE AS ALIASING
You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,
SELECT MAX(budget)
FROM films;
gives you a result with one column, named max. But what if you use two functions like this?
SELECT MAX(budget), MAX(duration)
FROM films;
OUTPUT
Well, then you'd have two columns named max, which isn't very useful!
To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS
keyword, which you've already seen earlier in this course.
For example, in the above example we could use aliases to make the result clearer:
SELECT MAX(budget) AS max_budget,
MAX(duration) AS max_duration
FROM films;
OUPUT
Aliases are helpful for making results more readable!
Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as net_profit.
SELECT title, (gross - budget) AS net_profit
FROM films;
-- nesse caso pode dar até net_profit negativo
Get the title and duration in hours for all films. The duration is in minutes, so you'll need to divide by 60.0 to get the duration in hours. Alias the duration in hours as duration_hours.
SELECT title, duration/60.0 AS duration_hours
FROM films;
OUTPUT
Get the average duration in hours for all films, aliased as avg_duration_hours.
SELECT AVG(duration)/60.0 AS avg_duration_hours
FROM films
OUTPUT
Get the percentage of people who are no longer alive. Alias the result as percentage_dead. Remember to use 100.0 and not 100!
-- get the count(deathdate) and multiply by 100.0
-- then divide by count(*)
SELECT COUNT(deathdate) * 100.0 / COUNT(*) AS percentage_dead
FROM people
Get the number of years between the newest film and oldest film. Alias the result as difference.
SELECT MAX(release_year) - MIN(release_year) AS difference
FROM films;
OUTPUT
Get the number of decades the films table covers. Alias the result as number_of_decades. The top half of your fraction should be enclosed in parentheses.
SELECT (MAX(release_year) - MIN(release_year)) / 10.0
AS number_of_decades
FROM films;
ORDER BY
In this chapter you'll learn how to sort and group your results to gain further insight. Let's go!
In SQL, the ORDER BY
keyword is used to sort results in ascending or descending order according to the values of one or more columns.
By default ORDER BY
will sort in ascending order. If you want to sort the results in descending order, you can use the DESC
keyword. For example,
SELECT title
FROM films
ORDER BY release_year DESC;
gives you the titles of films sorted by release year, from newest to oldest.
Get the names of people from the people table, sorted alphabetically.
SELECT name
FROM people
ORDER BY name;
Get the names of people, sorted by birth date.
SELECT name
FROM people
ORDER BY birthdate;
Get the birth date and name for every person, in order of when they were born.
SELECT birthdate, name
FROM people
ORDER BY birthdate;
Get the title of films released in 2000 or 2012, in the order they were released.
SELECT title
FROM films
WHERE (release_year = 2000 OR release_year = 2012)
ORDER BY release_year
Get all details for all films except those released in 2015 and order them by duration.
SELECT *
FROM films
WHERE release_year <> 2015
ORDER BY duration;
Get the title and gross earnings for movies which begin with the letter 'M' and order the results alphabetically.
SELECT title, gross
FROM films
WHERE title LIKE 'M%'
ORDER BY title;
Get the IMDB score and film ID for every film from the reviews table, sorted from highest to lowest score.
SELECT imdb_score, film_id
FROM reviews
ORDER BY imdb_score DESC;
SORTING MULTIPLE COLUMNS
ORDER BY can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example,
SELECT birthdate, name
FROM people
ORDER BY birthdate, name;
sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. The order of columns is important!
Get the birth date and name of people in the people table, in order of when they were born and alphabetically by name.
SELECT birthdate, name
FROM people
ORDER BY birthdate, name;
GROUP BY
For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, GROUP BY allows you to group a result by one or more columns, like so:
SELECT sex, count(*)
FROM employees
GROUP BY sex;
OUTPUT
Commonly, GROUP BY is used with aggregate functions like COUNT() or MAX(). Note that GROUP BY always goes after the FROM clause!
Note that you can combine GROUP BY
with ORDER BY
to group your results, calculate something about them, and then order your results. For example,
OUTPUT
because there are more females at our company than males. Note also that ORDER BY always goes after GROUP BY
. Let's try some exercises!
Get the release year and count of films released in each year.
SELECT release_year, count(*)
FROM films
GROUP BY release_year;
Get the release year and average duration of all films, grouped by release year.
SELECT release_year, AVG(duration)
FROM films
GROUP BY release_year;
OUTPUT
Get the release year and largest budget for all films, grouped by release year.
SELECT release_year, MAX(budget)
FROM films
GROUP BY release_year;
Get the IMDB score and count of film reviews grouped by IMDB score in the reviews table.
SELECT imdb_score, COUNT(*)
FROM reviews
GROUP BY imdb_score;
OUTPUT
PS. Make sure to always put the ORDER BY clause at the end of your query. You can't sort values that you haven't calculated yet!
Get the release year and lowest gross earnings per release year.
SELECT release_year, MIN(gross)
FROM films
GROUP BY release_year;
OUTPUT
Get the language and total gross amount films in each language made.
SELECT language, SUM(gross)
FROM films
GROUP BY language;
OUTPUT
Get the release year, country, and highest budget spent making a film for each year, for each country. Sort your results by release year and country.
SELECT release_year, country, MAX(budget)
FROM films
GROUP BY release_year, country
ORDER BY release_year, country;
OUTPUT
Get the country, release year, and lowest amount grossed per release year per country. Order your results by country and release year.
SELECT country, release_year, MIN(gross)
FROM films
GROUP BY release_year, country
ORDER BY country, release_year;
HAVING
PS. In SQL, aggregate functions can't be used in WHERE clauses. For example, the following query is invalid: PS. Quando voce quiser usar o where, mas queria que fosse o statement de AVG, MAX, MIN, COUNT por exemplo, terá que usar o HAVING
--DARÁ INVÁLIDO
SELECT release_year
FROM films
GROUP BY release_year
WHERE COUNT(title) > 10;
This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the HAVING clause comes in. For example,
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;
shows only those years in which more than 10 films were released.
In how many different years were more than 200 movies released?
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 200;
Modify your query so that only years with an average budget of greater than $60 million are included.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000;
OUTPUT
Finally, modify your query to order the results from highest average gross earnings to lowest.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
ORDER BY AVG(gross) DESC;
OUTPUT
EXTRA
Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as avg_budget and avg_gross respectively.
-- select country, average budget,
-- and average gross
SELECT country, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
-- from the films table
FROM films
-- group by country
GROUP BY country
-- where the country has more than 10 titles
HAVING COUNT(country) > 10
-- order by country
ORDER BY country
-- limit to only show 5 results
LIMIT 5;
INTRODUCTION TO INNER JOIN
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
Real Example:
SELECT p1.country, p1.continent, prime_minister, president
FROM prime_ministers AS p1
INNER JOIN presidents AS p2
ON p1.country = p2.country;
OUTPUT
Throughout this course, you'll be working with the countries
database containing information about the most populous world cities as well as country-level economic data, population data, and geographic data. This countries
database also contains information on languages spoken in each country.
You can see the different tables in this database by clicking on the tabs on the bottom right below query.sql. Click through them to get a sense for the types of data that each table contains before you continue with the course! Take note of the fields that appear to be shared across the tables.
You'll start off with a SELECT statement and then build up to an inner join with the cities
and countries
tables. Let's get to it!
cities
table
countries
table
Instructions 1
- Inner join the
cities
table on the left to thecountries
table on the right, keeping all of the fields in both tables. - You should match the tables on the
country_code
field incities
and thecode
field incountries.
- Do not alias your tables here or in the next step. Using
cities
andcountries
is fine for now.
SELECT *
FROM cities
-- 1. Inner join to countries
INNER JOIN countries
-- 2. Match on the country codes
ON cities.country_code = countries.code;
OUTPUT
Instructions 2
-
Modify the SELECT statement to keep only the name of the city, the name of the country, and the name of the region the country resides in.
-
Recall from our Intro to SQL for Data Science course that you can alias fields using AS. Alias the name of the city AS city and the name of the country AS country.
-- 1. Select name fields (with alias) and region
SELECT cities.name AS city, countries.name AS country, region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;
OUTPUT
GENERAL RULE INNER JOIN
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;
Notice that to select a field in your query that appears in multiple tables, you'll need to identify which table/table alias you're referring to by using a . in your SELECT statement. (É UM EXEMPLO COM A TABELA CITIES E COUNTRIES QUE TEM O MESMO NAME
PARA DIFERENTES PARTES, LOGO, VOCÊ COLOCA AS
PARA DIFERENCIAR OS 2)
You'll now explore a way to get data from both the countries
and economies
tables to examine the inflation rate for both 2010 and 2015.
And population
table
Instructions
-
Join the tables
countries
(left) andeconomies
(right) aliasingcountries AS c
andeconomies AS e
. -
Specify the field to match the tables
ON
. -
From this join,
SELECT
:
···c.code
, aliased as country_code
.··
···name
, year
, and inflation_rate
, not aliased.··
-- 3. Select fields with aliases
SELECT c.code AS country_code, name, year, inflation_rate
FROM countries AS c
-- 1. Join to economies (alias e)
INNER JOIN economies AS e
-- 2. Match on code
ON c.code = e.code;
OUTPUT
INSTRUCTIONS
-
Inner join countries (left) and populations (right) on the code and country_code fields respectively.
-
Alias countries AS c and populations AS p.
-
Select code, name, and region from countries and also select year and fertility_rate from populations (5 fields in total).
-- 4. Select fields
SELECT code, name, region, year, fertility_rate
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join with populations (as p)
INNER JOIN populations as p
-- 3. Match on country code
ON c.code = p.country_code;
OUTPUT
INSTRUCTIONS
-
Add an additional inner join with economies to your previous query by joining on code.
-
Include the unemployment_rate column that became available through joining with economies.
-
Note that year appears in both populations and economies, so you have to explicitly use e.year instead of year as you did before.
-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join to populations (as p)
INNER JOIN populations AS p
-- 3. Match on country code
ON c.code = p.country_code
-- 4. Join to economies (as e)
INNER JOIN economies AS e
-- 5. Match on country code
ON c.code = e.code;
OUTPUT
INSTRUCTIONS
-
Scroll down the query result and take a look at the results for Albania from your previous query. Does something seem off to you?
-
The trouble with doing your last join on c.code = e.code and not also including year is that e.g. the 2010 value for fertility_rate is also paired with the 2015 value for unemployment_rate.
-
Fix your previous query: in your last ON clause, use AND to add an additional joining condition. In addition to joining on code in c and e, also join on year in e and p.
-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join to populations (as p)
INNER JOIN populations AS p
-- 3. Match on country code
ON c.code = p.country_code
-- 4. Join to economies (as e)
INNER JOIN economies AS e
-- 5. Match on country code and year
ON c.code = e.code AND e.year = p.year;
OUTPUT
INNER JOIN via USING
Usa USING quando os nomes das tabelas são iguais, como no exemplo
When joining tables with a common field name, e.g.
SELECT *
FROM countries
INNER JOIN economies
ON countries.code = economies.code
You can use USING as a shortcut:
SELECT *
FROM countries
INNER JOIN economies
USING(code)
You'll now explore how this can be done with the countries
and languages
tables.
INSTRUCTIONS
-
Inner join countries on the left and languages on the right with USING(code).
-
select the fields corresponding to:
1.country name AS country,
2.continent name,
3.language name AS language, and
4.whether or not the language is official.
- Remember to alias your tables using the first letter of their names.
-- 4. Select fields
SELECT c.name AS country, continent, l.name AS language, official
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join to languages (as l)
INNER JOIN languages AS l
-- 3. Match using code
USING(code)
OUTPUT
SELF-JOIN
Self-joins are used to compare values in a field to other values of the same field from within the same table.
What if you wanted to create a new table showing countries that are in the same continent matched as pairs? Let's explore a chunk of INNER JOIN code using the prime_ministers table.
CASE WHEN and THEN
CASE is a way to do multiple if-then-else
statements in a simplified way in SQL.
You can now see the basic layout for creating a new field containing the groupings. How might we fill them in? After the first WHEN should specify that we want to check for indep_year being less than 1900. Next we want indep_year_group to contain 'between 1900 and 1930' in the next blank. Lastly any other record not matching these conditions will be assigned the value of 'after 1930' for indep_year_group.
Check out the completed query with completed blanks. Notice how the values of indep_year are grouped in indep_year_group. Also observe how continent relates to indep_year_group.
SELF-JOIN EXAMPLES
In this exercise, you'll use the populations
table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!
Since you'll be joining the populations table to itself, you can alias populations as p1
and also populations
as p2
. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.
INSTRUCTION 1.1
Population Table
-
Join populations with itself ON country_code.
-
select the country_code from p1 and the size field from both p1 and p2. SQL won't allow same-named fields, so alias p1.size as size2010 and p2.size as size2015.
-- 4. Select fields with aliases
SELECT p1.country_code, p1.size AS size2010, p2.size AS size2015
-- 1. From populations (alias as p1)
FROM populations AS p1
-- 2. Join to itself (alias as p2)
INNER JOIN populations AS p2
-- 3. Match on country code
ON p1.country_code = p2.country_code
OUTPUT
INSTRUCTION 1.2
Notice from the result that for each country_code you have four entries laying out all combinations of 2010 and 2015.
- Extend the ON in your query to include only those records where the p1.year (2010) matches with p2.year - 5 (2015 - 5 = 2010). This will omit the three entries per country_code that you aren't interested in.
-- 5. Select fields with aliases
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015
-- 1. From populations (alias as p1)
FROM populations as p1
-- 2. Join to itself (alias as p2)
INNER JOIN populations as p2
-- 3. Match on country code
ON p1.country_code = p2.country_code
-- 4. and year (with calculation)
AND p1.year = p2.year - 5;
INSTRUCTION 1.3
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015,
-- 1. calculate growth_perc
((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- 2. From populations (alias as p1)
FROM populations AS p1
-- 3. Join to itself (alias as p2)
INNER JOIN populations AS p2
-- 4. Match on country code
ON p1.country_code = p2.country_code
-- 5. and year (with calculation)
AND p1.year = p2.year - 5;
OUTPUT
CASE WHEN AND THEN EXAMPLES
Often it's useful to look at a numerical field not as raw data, but instead as being in different categories or groups.
You can use CASE with WHEN
, THEN
, ELSE
, and END
to define a new grouping field.
INSTRUCTIONS
Using the countries table, create a new field AS geosize_group that groups the countries into three groups:
- If
surface_area
is greater than 2 million,geosize_group
is'large'
. - If
surface_area
is greater than 350 thousand but not larger than 2 million,geosize_group
is'medium'
. - Otherwise, geosize_group is
'small'
.
SELECT name, continent, code, surface_area,
-- 1. First case
CASE WHEN surface_area > 2000000 THEN 'large'
-- 2. Second case
WHEN surface_area > 350000 THEN 'medium'
-- 3. Else clause + end
ELSE 'small' END
-- 4. Alias name
AS geosize_group
-- 5. From table
FROM countries;
OUTPUT
INNER CHALLENGE
The table you created with the added geosize_group
field has been loaded for you here with the name countries_plus
. Observe the use of (and the placement of) the INTO
command to create this countries_plus table:
SELECT name, continent, code, surface_area,
CASE WHEN surface_area > 2000000
THEN 'large'
WHEN surface_area > 350000
THEN 'medium'
ELSE 'small' END
AS geosize_group
INTO countries_plus
FROM countries;
You will now explore the relationship between the size of a country in terms of surface area and in terms of population using grouping fields created with CASE
.
INSTRUCTIONS 1.1
SELECT country_code, size,
-- 1. First case
CASE WHEN size > 50000000 THEN 'large'
-- 2. Second case
WHEN size > 1000000 THEN 'medium'
-- 3. Else clause + end
ELSE 'small' END
-- 4. Alias name (popsize_group)
AS popsize_group
-- 5. From table
FROM populations
-- 6. Focus on 2015
WHERE year = 2015;
OUTPUT
INSTRUCTION 1.2
-
Use
INTO
to save the result of the previous query aspop_plus
. You can see an example of this in the countries_plus code in the assignment text. Make sure to include a;
at the end of yourWHERE
clause! -
Then, include another query below your first query to display all the records in pop_plus using
SELECT * FROM pop_plus
; so that you generate results and this will displaypop_plus
in query result.
SELECT country_code, size,
CASE WHEN size > 50000000 THEN 'large'
WHEN size > 1000000 THEN 'medium'
ELSE 'small' END
AS popsize_group
-- 1. Into table
INTO pop_plus
FROM populations
WHERE year = 2015;
-- 2. Select all columns of pop_plus
SELECT * FROM pop_plus
output
INSTRUCTION 1.3
- Keep the first query intact that creates pop_plus using
INTO
. - Write a query to join
countries_plus AS c
on the left withpop_plus AS p
on the right matching on the country code fields. - Sort the data based on geosize_group, in ascending order so that large appears on top.
- Select the
name
,continent
,geosize_group
, andpopsize_group
fields.
**LEFT and RIGHT JOINs
Congratulations on completing Chapter 1 on INNER JOINs. Welcome to Chapter 2 on OUTER JOINs! You can remember outer joins as reaching OUT to another table while keeping all of the records of the original table. Inner joins keep only the records IN both tables. You'll begin this chapter by exploring (1) LEFT JOINs, (2) RIGHT JOINs, and (3) FULL JOINs, which are the three types of OUTER JOINs. Let's begin by exploring how a LEFT JOIN differs from an INNER JOIN via a diagram.
The RIGHT JOIN is much less common than the LEFT JOIN so we won't spend as much time on it here.
LEFT JOIN EXAMPLES
Now you'll explore the differences between performing an inner join and a left join using the cities and countries tables.
You'll begin by performing an inner join with the cities table on the left and the countries table on the right. Remember to alias the name of the city field as city and the name of the country field as country.
INSTRUCTIONS 1.1
- Fill in the code based on the instructions in the code comments to complete the inner join. Note how many records are in the result of the join in the query result tab.
-- Select the city name (with alias), the country code,
-- the country name (with alias), the region,
-- and the city proper population
SELECT c1.name AS country, code, c2.name AS city,
region, city_proper_pop
-- From left table (with alias)
FROM countries AS c1
-- Join to right table (with alias)
INNER JOIN cities AS c2
-- Match on country code
ON c1.code = c2.country_code
-- Order by descending country code
ORDER BY code DESC;
INSTRUCTION 1.2
- Change the code to perform a LEFT JOIN instead of an INNER JOIN. After executing this query, note how many records the query result contains.
SELECT c1.name AS city, code, c2.name AS country,
region, city_proper_pop
FROM cities AS c1
-- 1. Join right table (with alias)
LEFT JOIN countries AS c2
-- 2. Match on country code
ON c1.country_code = c2.code
-- 3. Order by descending country code
ORDER BY code DESC;
OUTPUT
INSTRUCTION
- Modify your code to calculate the average GDP per capita AS avg_gdp for each region in 2010.
- Select the region and avg_gdp fields.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code field'''
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region;
INSTRUCTION 1.2
- Arrange this data on average GDP per capita for each region in 2010 from highest to lowest average GDP per capita.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region
-- Order by descending avg_gdp
ORDER BY avg_gdp DESC;
RIGHT JOINS
- INNER JOIN vs LEFT JOIN
Recall that an INNER JOIN keeps only the records that have matching key field values in both tables. A LEFT JOIN keeps all of the records in the left table while bringing in missing values for those key field values that don't appear in the right table.
- LEFT JOIN vs RIGHT JOIN
Now you can see the differences between a LEFT JOIN and a RIGHT JOIN. The id values of 2 and 3 in the left table do not match with the id values in the right table, so missing values are brought in for them in the LEFT JOIN. Likewise for the RIGHT JOIN, missing values are brought in for id values of 5 and 6.
- FULL JOIN diagram
Note the missing values here and that all six of the values of id are included in the table. You can also see from the SQL code to produce this FULL JOIN result that the general format aligns closely with the SQL syntax you've seen for both an INNER JOIN and a LEFT JOIN. You'll next explore an example from the leaders database.
FULL JOIN EXAMPLES
In this exercise, you'll examine how your results differ when using a full join versus using a left join versus using an inner join with the countries
and currencies
tables.
You will focus on the North American region
and also where the name
of the country is missing. Dig in to see what we mean!
Begin with a full join with countries on the left and currencies on the right. The fields of interest have been SELECTed for you throughout this exercise.
Then complete a similar left join and conclude with an inner join.
Countries table
Currencies table
INSTRUCTIONS 1.1
- Choose records in which
region
corresponds to North America or isNULL
.
SELECT name AS country, code, region, basic_unit
-- 3. From countries
FROM countries
-- 4. Join to currencies
FULL JOIN currencies
-- 5. Match on code
USING (code) -- É usado quando os nomes da coluna são iguais, como todos são code, então para facilitar uso USING code e não countries.code = currencies.code
-- 1. Where region is North America or null
WHERE region = 'North America' OR region IS null
-- 2. Order by region
ORDER BY region;
INSTRUCTIONS 1.2
- Repeat the same query as above but use a LEFT JOIN instead of a FULL JOIN. Note what has changed compared to the FULL JOIN result!
SELECT name AS country, code, region, basic_unit
-- 1. From countries
FROM countries
-- 2. Join to currencies
LEFT JOIN currencies
-- 3. Match on code
USING (code)
-- 4. Where region is North America or null
WHERE region = 'North America' OR region IS null
-- 5. Order by region
ORDER BY region;
INSTRUCTIONS 1.3
- Repeat the same query as above but use an
INNER JOIN
instead of aFULL JOIN
. Note what has changed compared to the FULL JOIN and LEFT JOIN results!
SELECT name AS country, code, region, basic_unit
FROM countries
-- 1. Join to currencies
INNER JOIN currencies
USING (code)
-- 2. Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- 3. Order by region
ORDER BY region;
Have you kept an eye out on the different numbers of records these queries returned? The FULL JOIN query returned 17 rows, the OUTER JOIN returned 4 rows, and the INNER JOIN only returned 3 rows. Do these results make sense to you?
FULL JOIN EXAMPLES (2)
You'll now investigate a similar exercise to the last one, but this time focused on using a table with more records on the left than the right. You'll work with the languages and countries tables.
countries table
languages table
Begin with a full join with languages on the left and countries on the right. Appropriate fields have been selected for you again here.
INSTRUCTION 1.1
- Choose records in which countries.name starts with the capital letter 'V' or is NULL.
- Arrange by countries.name in ascending order to more clearly see the results.
SELECT countries.name, code, languages.name AS language
-- 3. From languages
FROM languages
-- 4. Join to countries
FULL JOIN countries
-- 5. Match on code
USING (code)
-- 1. Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- 2. Order by ascending countries.name
ORDER BY countries.name;
INSTRUCTION 1.2
- Repeat the same query as above but use a left join instead of a full join. Note what has changed compared to the full join result!
SELECT countries.name, code, languages.name AS language
FROM languages
-- 1. Join to countries
LEFT JOIN countries
-- 2. Match using code
USING (code)
-- 3. Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
ORDER BY countries.name;
INSTRUCTIONS 1.3
- Repeat once more, but use an inner join instead of a left join. Note what has changed compared to the full join and left join results.
SELECT countries.name, code, languages.name AS language
FROM languages
-- 1. Join to countries
INNER JOIN countries
USING (code)
-- 2. Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
ORDER BY countries.name;
Well done. Again, make sure to compare the number of records the different types of joins return and try to verify whether the results make sense.
FULL JOIN (3)
You'll now explore using two consecutive full joins on the three tables you worked with in the previous two exercises.
INSTRUCTIONS
- Complete a full join with countries on the left and languages on the right.
- Next, full join this result with currencies on the right.
- Use LIKE to choose the Melanesia and Micronesia regions (Hint: 'M%esia').
- Select the fields corresponding to the country name AS country, region, language name AS language, and basic and fractional units of currency.
-- 7. Select fields (with aliases)
SELECT c1.name AS country, region, l.name AS language, basic_unit, frac_unit
-- 1. From countries (alias as c1)
FROM countries AS c1
-- 2. Join with languages (alias as l)
FULL JOIN languages AS l
-- 3. Match on code
USING (code)
-- 4. Join with currencies (alias as c2)
FULL JOIN currencies AS c2
-- 5. Match on code
USING (code)
-- 6. Where region like Melanesia and Micronesia
WHERE region LIKE 'M%esia';
CROSS JOIN
- CROSS JOIN diagram
In this diagram we have two tables named table1 and table2. Each table only has one field, both with the name of id. The result of the CROSS JOIN is all nine combinations of the id values of 1, 2, and 3 in table1 with the id values of A, B, and C for table2. Next you'll explore an example from the leaders database and look over the SQL syntax for a CROSS JOIN.
- Pairing prime ministers with presidents
Suppose that all prime ministers in North America and Oceania in the prime_ministers table are scheduled for individual meetings with all presidents in the presidents table. You can look at all of these combinations by using a CROSS JOIN. The syntax here remains similar to what you've seen earlier in the course. We use a WHERE clause to focus on only prime ministers in North America and Oceania in the prime_ministers table. The results of the query give us the pairings for the two prime ministers in North America and Oceania from the prime_ministers table with the seven presidents in the presidents table.
A table of two cities
This exercise looks to explore languages potentially and most frequently spoken in the cities of Hyderabad, India and Hyderabad, Pakistan.
You will begin with a cross join with cities AS c on the left and languages AS l on the right. Then you will modify the query using an inner join in the next tab.
INSTRUCTIONS 1.1
- Create the cross join as described above. (Recall that cross joins do not use ON or USING.)
- Make use of LIKE and Hyder% to choose Hyderabad in both countries.
- Select only the city name AS city and language name AS language.
-- 4. Select fields
SELECT c.name AS city, l.name AS language
-- 1. From cities (alias as c)
FROM cities AS c
-- 2. Join to languages (alias as l)
CROSS JOIN languages AS l
-- 3. Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';
INSTRUCTIONS 1.2
- use an inner join instead of a cross join. Think about what the difference will be in the results for this inner join result and the one for the cross join.
-- 5. Select fields
SELECT c.name AS city, l.name AS language
-- 1. From cities (alias as c)
FROM cities AS c
-- 2. Join to languages (alias as l)
INNER JOIN languages AS l
-- 3. Match on country code
ON c.country_code = l.code
-- 4. Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';
Good one! Can you see the difference between a CROSS JOIN and a INNER JOIN?
Outer challenge
Now that you're fully equipped to use outer joins, try a challenge problem to test your knowledge!
In terms of life expectancy for 2010, determine the names of the lowest five countries and their regions.
INSTRUCTIONS
- Select country name AS country, region, and life expectancy AS life_exp.
- Make sure to use LEFT JOIN, WHERE, ORDER BY, and LIMIT.
-- Select fields
SELECT c.name AS country, region,
life_expectancy AS life_exp
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (alias as p)
LEFT JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Focus on 2010
WHERE year = 2010
-- Order by life_exp
ORDER BY life_exp
-- Limit to 5 records
LIMIT 5;
STATE OF THE UNION
By contrast (with the same two tables left_one and right_one), UNION ALL includes all duplicates in its result. So left_one and right_one both having four records yields eight records for the result of the UNION ALL. If it were the case that right_one had these same four values and also one more value of 1 for id, you'd see three entries for 1 in the resulting UNION ALL. Let's check out the SQL syntax using the leaders database for both UNION and UNION ALL, but first you'll see one more table in the leaders database.
that UNION and UNION ALL clauses do not do the lookup step that JOINs do. They simply stack records on top of each other from one table to the next.
UNION EXAMPLES
Economies table
Economies2015 table
Economies2010 table
INSTRUCTION
- Combine these two tables into one table containing all of the fields in economies2010. The economies table is also included for reference.
- Sort this resulting single table by country code and then by year, both in ascending order.
-- Select fields from 2010 table
SELECT *
-- From 2010 table
FROM economies2010
-- Set theory clause
UNION
-- Select fields from 2015 table
SELECT *
-- From 2015 table
FROM economies2015
-- Order by code and year
ORDER BY code, year;
UNION 2 EXAMPLE
INSTRUCTION
- Determine all (non-duplicated) country codes in either the cities or the currencies table. The result should be a table with only one field called country_code.
- Sort by country_code in alphabetical order.
-- Select field
SELECT DISTINCT country_code
-- From cities
FROM cities
-- Set theory clause
UNION
-- Select field
SELECT DISTINCT code
-- From currencies
FROM currencies
-- Order by country_code
order by country_code;
UNION ALL EXAMPLE
As you saw, duplicates were removed from the previous two exercises by using UNION
.
To include duplicates, you can use UNION ALL
.
INSTRUCTION
- Determine all combinations (include duplicates) of country code and year that exist in either the economies or the populations tables. Order by code then year.
- The result of the query should only have two columns/fields. Think about how many records this query should result in.
- You'll use code very similar to this in your next exercise after the video. Make note of this code after completing it.
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
UNION ALL
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code, year
ORDER BY code, year;
INTERSECT DIAGRAM AND SQL CODE
Vale lembrar que intersect precisa ter o SELECT igual na coluna.
Next, let's think about what would happen if we tried to select two columns instead of one from our previous example. The code shown does just that. What will be the result of this query? Will this also give you the names of the countries that have both a prime minister and a president? Hmmm [PAUSE] The actual result is an empty table. Why is that? When INTERSECT looks at two columns it includes both columns in the search. So it didn't find any countries with prime ministers AND presidents having the same name. INTERSECT looks for RECORDS in common, not individual key fields like what a join does to match. This is an important distinction.
INTERSECT EXAMPLE
- Again, order by
code
and then byyear
, both in ascending order. - Note the number of records here (given at the bottom of query result) compared to the similar
UNION ALL
query result (814 records).
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
INTERSECT
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code and year
ORDER BY code, year;
INTERSCT 2 EXAMPLE
As you think about major world cities and their corresponding country, you may ask which countries also have a city with the same name as their country name?
INSTRUCTIONS
Use INTERSECT
to answer this question with countries and cities!
-- Select fields
SELECT name, code
-- From countries
FROM countries
-- Set theory clause
INTERSECT
-- Select fields
SELECT name, country_code
-- From cities
FROM cities;
EXCEPTIONAL
- Monarchs that aren't prime ministers
You saw earlier that there are some monarchs that also act as the prime minister for their country. One way to determine those monarchs in the monarchs table that do not also hold the title of prime minister is to use the EXCEPT clause. [CLICK] This SQL query selects the monarch field from monarchs and then looks for common entries with the prime_ministers field, while also keeping track of the country for each leader. [CLICK] You can see in the resulting query that only the two European monarchs are not also prime ministers in the leaders database.
- EXCEPT diagram
This diagram gives the structure of EXCEPT clauses. Only the records that appear in the left table BUT DO NOT appear in the right table are included.
EXCEPT EXAMPLES
Get the names of cities in cities
which are not noted as capital cities in countries
as a single field result.
Note that there are some countries in the world that are not included in the countries
table, which will result in some cities not being labeled as capital cities when in fact they are.
INSTRUCTIONS
cities table
countries table
- Order the resulting field in ascending order.
- Can you spot the city/cities that are actually capital cities which this query misses?
-- Select field
SELECT name
-- From cities
FROM cities
-- Set theory clause
EXCEPT
-- Select field
SELECT capital
-- From countries
FROM countries
-- Order by result
ORDER BY name;
EXCEPT 2
Now you will complete the previous query in reverse!
Determine the names of capital cities that are not listed in the cities table.
INSTRUCTIONS
- Order by capital in ascending order.
- The cities table contains information about 236 of the world's most populous cities. The result of your query may surprise you in terms of the number of capital cities that DO NOT appear in this list!
-- Select field
SELECT capital
-- From countries
FROM countries
-- Set theory clause
EXCEPT
-- Select field
SELECT name
-- From cities
FROM cities
-- Order by ascending capital
ORDER BY capital;
You'll now close this chapter by returning to joins. The six joins you've worked with so far are all additive joins in that they add columns to the original "left" table. Can you name all six? (1. INNER JOIN, 2. self-join, 3. LEFT JOIN, 4. RIGHT JOIN, 5. FULL JOIN, and 6. CROSS JOIN).
SEMI-JOINS AND ANTI-JOINS
SEMI-JOIN EXAMPLE
You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.
languages table
countries table
INSTRUCTIONS 1.1
Flash back to our Intro to SQL for Data Science course and begin by selecting all country codes in the Middle East as a single field result using SELECT, FROM, and WHERE.
-- Select code
SELECT code
-- From countries
FROM countries
-- Where region is Middle East
WHERE region = 'Middle East';
INSTRUCTIONS 1.2
- Comment out the answer to the previous tab by surrounding it in
/*
and*/
. You'll come back to it! - Below the commented code, select only unique languages by name appearing in the
languages
table. - Order the resulting single field table by
name
in ascending order.
/*SELECT code
FROM countries
WHERE region = 'Middle East';*/
-- Select field
SELECT DISTINCT(name)
-- From languages
FROM languages
-- Order by name
ORDER BY name;
INSTRUCTIONS 1.3
Now combine the previous two queries into one query:
- Add a WHERE IN statement to the SELECT DISTINCT query, and use the commented out query from the first instruction in there. That way, you can determine the unique languages spoken in the Middle East.
Carefully review this result and its code after completing it. It serves as a great example of subqueries, which are the focus of Chapter 4.
-- Select distinct fields
SELECT DISTINCT(name)
-- From languages
FROM languages
-- Where in statement
WHERE code IN
-- Subquery
(SELECT code
FROM countries
WHERE region = 'Middle East')
-- Order by name
ORDER BY name;
Diagnosing problems using anti-join
Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.
You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!
INSTRUCTIONS
Begin by determining the number of countries in countries
that are listed in Oceania using SELECT, FROM, and WHERE.
-- Select statement
SELECT count(code)
-- From countries
FROM countries
-- Where continent is Oceania
WHERE continent = 'Oceania';
INSTRUCTIONS 1.2
- complete an inner join with
countries AS c1
on the left andcurrencies AS c2
on the right to get the different currencies used in the countries of Oceania. - Match
ON
thecode
field in the two tables. - Include the country
code
, country name, andbasic_unit AS currency
.
Observe query result and make note of how many different countries are listed here.
-- 5. Select fields (with aliases)
SELECT c1.code, name, basic_unit AS currency
-- 1. From countries (alias as c1)
FROM countries as c1
-- 2. Join with currencies (alias as c2)
INNER JOIN currencies c2
-- 3. Match on code
ON c1.code = c2.code
-- 4. Where continent is Oceania
WHERE continent = 'Oceania';
INSTRUCTION 1.3
Note that not all countries in Oceania were listed in the resulting inner join with currencies. Use an anti-join to determine which countries were not included!
- Use NOT IN and (SELECT code FROM currencies) as a subquery to get the country code and country name for the Oceanian countries that are not included in the currencies table.
-- 3. Select fields
SELECT code, name
-- 4. From Countries
FROM countries
-- 5. Where continent is Oceania
WHERE continent = 'Oceania'
-- 1. And code not in
AND code NOT IN
-- 2. Subquery
(SELECT code
FROM currencies);
SET THEORY CHALLENGE
Congratulations! You've now made your way to the challenge problem for this third chapter. Your task here will be to incorporate two of UNION/UNION ALL/INTERSECT/EXCEPT to solve a challenge involving three tables.
In addition, you will use a subquery as you have in the last two exercises! This will be great practice as you hop into subqueries more in Chapter 4!
INSTRUCTIONS
- dentify the country codes that are included in either economies or currencies but not in populations.
- Use that result to determine the names of cities in the countries that match the specification in the previous instruction.
-- Select the city name
SELECT name
-- Alias the table where city name resides
FROM cities AS c1
-- Choose only records matching the result of multiple set theory clauses
WHERE country_code IN
(
-- Select appropriate field from economies AS e
SELECT e.code
FROM economies AS e
-- Get all additional (unique) values of the field from currencies AS c2
UNION
SELECT c2.code
FROM currencies AS c2
-- Exclude those appearing in populations AS p
EXCEPT
SELECT p.country_code
FROM populations AS p
);
Subqueries inside WHERE and SELECT clauses
You'll now try to figure out which countries had high average life expectancies (at the country level) in 2015.
INSTRUCTIONS 1.1
populations table
- Begin by calculating the average life expectancy across all countries for 2015.
-- Select average life_expectancy
SELECT avg(life_expectancy)
-- From populations
FROM populations
-- Where year is 2015
WHERE year = 2015
INSTRUCTION 1.2
-- Select fields
SELECT *
-- From populations
FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy >
-- 1.15 * subquery
1.15 *
(SELECT avg(life_expectancy)
-- From populations
FROM populations
-- Where year is 2015
WHERE year = 2015 )
AND year = 2015;
Subquery inside where (2)
Use your knowledge of subqueries in WHERE
to get the urban area population for only capital cities.
INSTRUCTIONS
- Make use of the capital field in the countries table in your subquery.
- Select the city name, country code, and urban area population fields.
-- 2. Select fields
SELECT cities.name, country_code, urbanarea_pop
-- 3. From cities
FROM cities
-- 4. Where city name in the field of capital cities
WHERE name IN
-- 1. Subquery
(SELECT capital
FROM countries)
ORDER BY urbanarea_pop DESC;
Subquery inside select
In this exercise, you'll see how some queries can be written using either a join or a subquery.
You have seen previously how to use GROUP BY
with aggregate functions and an inner join to get summarized information from multiple tables.
The code given in query.sql selects the top nine countries in terms of number of cities appearing in the cities
table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the commented out code to get the same result as the code shown.
INSTRUCTIONS 1.1
Just Submit Answer here!
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
/*
SELECT ___ AS ___,
(SELECT ___
FROM ___
WHERE countries.code = cities.country_code) AS cities_num
FROM ___
ORDER BY ___ ___, ___
LIMIT 9;
*/
INSTRUCTIONS 1.2
- Remove the comments around the second query and comment out the first query instead.
- Convert the GROUP BY code to use a subquery inside of SELECT, i.e. fill in the blanks to get a result that matches the one given using the GROUP BY code in the first query.
- Again, sort the result by cities_num descending and then by country ascending.
/*SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
*/
SELECT countries.name AS country,
(SELECT COUNT(*)
FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;
Subquery inside FROM clause
The last type of subquery you will work with is one inside of FROM
.
You will use this to determine the number of languages spoken for each country, identified by the country's local name! (Note this may be different than the name field and is stored in the local_name field.)
INSTRUCTIONS 1.1
- Begin by determining for each country code how many languages are listed in the languages table using SELECT, FROM, and GROUP BY.
- Alias the aggregated field as lang_num.
-- Select fields (with aliases)
SELECT code, COUNT(name) AS lang_num
-- From languages
FROM languages
-- Group by code
GROUP BY code;
INSTRUCTIONS 1.2
- Include the previous query (aliased as subquery) as a subquery in the FROM clause of a new query.
- Select the local name of the country from countries.
- Also, select lang_num from subquery.
- Make sure to use WHERE appropriately to match code in countries and in subquery.
- Sort by lang_num in descending order.
SELECT local_name, subquery.lang_num
FROM countries,
(SELECT code, COUNT(*) AS lang_num
FROM languages
GROUP BY code) AS subquery
WHERE countries.code = subquery.code
ORDER BY lang_num DESC;
ADVANCED SUBQUERY
You can also nest multiple subqueries to answer even more specific questions.
In this exercise, for each of the six continents listed in 2015, you'll identify which country had the maximum inflation rate (and how high it was) using multiple subqueries. The table result of your query in Task 3 should look something like the following, where anything between < > will be filled in with appropriate values:
Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.
INSTRUCTIONS 1.1
- Now it's time to append the second part's query to the first part's query using AND and IN to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015!
- For the sake of practice, change all joining conditions to use ON instead of USING (based upon the same column, code).
- Revisit the sample output in the assignment text at the beginning of the exercise to see how this matches up.
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
ON countries.code = economies.code
-- Where year is 2015
WHERE year = 2015;
INSTRUCTIONS 1.2
Select the maximum inflation rate in 2015 AS max_inf
grouped by continent using the previous step's query as a subquery in the FROM
clause.
Thus, in your subquery you should:
+ Create an inner join with countries
on the left and economies
on the right with USING
(without aliasing your tables or columns).
+ Retrieve the country name, continent, and inflation rate for 2015.
+ Alias the subquery as subquery
.
This will result in the six maximum inflation rates in 2015 for the six continents as one field table. Make sure to not include continent
in the outer SELECT
statement.
-- Select the maximum inflation rate as max_inf
SELECT MAX(inflation_rate) AS max_inf
-- Subquery using FROM (alias as subquery)
FROM ( SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
USING(code)
WHERE year = 2015
) AS subquery
-- Group by continent
GROUP BY continent;
INSTRUCTIONS 1.3
- Now it's time to append your second query to your first query using AND and IN to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015.
- For the sake of practice, change all joining conditions to use ON instead of USING.
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
ON countries.code = economies.code
-- Where year is 2015
WHERE year = 2015
-- And inflation rate in subquery (alias as subquery)
AND inflation_rate IN (
SELECT MAX(inflation_rate) AS max_inf
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
ON economies.code = countries.code
WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent);
INSTRUCTIONS 1.3
- Select unique country names. Also select the total investment and imports fields.
- Use a left join with countries on the left. (An inner join would also work, but please use a left join here.)
- Match on code in the two tables AND use a subquery inside of ON to choose the appropriate languages records.
- Order by country name ascending.
- Use table aliasing but not field aliasing in this exercise.
-- Select fields
SELECT DISTINCT c.name, total_investment, imports
-- From table (with alias)
FROM countries AS c
-- Join with table (with alias)
LEFT JOIN economies AS e
-- Match on code
ON (c.code = e.code
-- and code in Subquery
AND c.code IN (
SELECT l.code
FROM languages AS l
WHERE official = 'True'
) )
-- Where region and year are correct
WHERE region = 'Central America' AND year = 2015
-- Order by field
ORDER BY c.name;
Wow! Well done! This code works since each of the six maximum inflation rate values occur only once in the 2015 data. Think about whether this particular code involving subqueries would work in cases where there are ties for the maximum inflation rate values.
SUBQUERY CHALLENGE
Let's test your understanding of the subqueries with a challenge problem! Use a subquery to get 2015 economic data for countries that do not have
gov_form
of'Constitutional Monarchy'
or'Republic'
in theirgov_form
.
Here, gov_form
stands for the form of the government for each country. Review the different entries for gov_form
in the countries
table.
INSTRUCTIONS
- Select the country code, inflation rate, and unemployment rate.
- Order by inflation rate ascending.
- Do not use table aliasing in this exercise.
-- Select fields
SELECT code, inflation_rate, unemployment_rate
-- From economies
FROM economies
-- Where year is 2015 and code is not in
WHERE year = 2015 AND code NOT IN
-- Subquery
(SELECT code
FROM countries
WHERE (gov_form = 'Constitutional Monarchy' OR gov_form LIKE '%Republic%'))
-- Order by inflation rate
ORDER BY inflation_rate;
FINAL CHALLENGE
Welcome to the end of the course! The next three exercises will test your knowledge of the content covered in this course and apply many of the ideas you've seen to difficult problems. Good luck!
Read carefully over the instructions and solve them step-by-step, thinking about how the different clauses work together.
In this exercise, you'll need to get the country names and other 2015 data in the economies table and the countries table for Central American countries with an official language.
INSTRUCTIONS
- Select unique country names. Also select the total investment and imports fields.
- Use a left join with countries on the left. (An inner join would also work, but please use a left join here.)
- Match on code in the two tables AND use a subquery inside of ON to choose the appropriate languages records.
- Order by country name ascending.
- Use table aliasing but not field aliasing in this exercise.
economies table
countries table
languages table
-- Select fields
SELECT DISTINCT c.name, total_investment, imports
-- From table (with alias)
FROM countries AS c
-- Join with table (with alias)
LEFT JOIN economies AS e
-- Match on code
ON (c.code = e.code
-- and code in Subquery
AND c.code IN (
SELECT l.code
FROM languages AS l
WHERE official = 'True'
) )
-- Where region and year are correct
WHERE region = 'Central America' AND year = 2015
-- Order by field
ORDER BY c.name;
Final Challenge 2
Whoofta! That was challenging, huh?
Let's ease up a bit and calculate the average fertility rate for each region in 2015.
INSTRUCTIONS
- Include the name of region, its continent, and average fertility rate aliased as avg_fert_rate.
- Sort based on avg_fert_rate ascending.
- Remember that you'll need to GROUP BY all fields that aren't included in the aggregate function of SELECT.
-- Select fields
SELECT c.region, c.continent, AVG(fertility_rate) AS avg_fert_rate
-- From left table
FROM populations AS p
-- Join to right table
INNER JOIN countries AS c
-- Match on join condition
ON c.code = p.country_code
-- Where specific records matching some condition
WHERE year = 2015
-- Group appropriately
GROUP BY region, continent
-- Order appropriately
ORDER BY avg_fert_rate;
FINAL CHALLENGE 3
Welcome to the last challenge problem. By now you're a query warrior! Remember that these challenges are designed to take you to the limit to solidify your SQL knowledge! Take a deep breath and solve this step-by-step.
You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using city_proper_pop
and metroarea_pop
in cities
.
Do not use table aliasing in this exercise.
INSTRUCTIONS
- Select the city name, country code, city proper population, and metro area population.
- Calculate the percentage of metro area population composed of city proper population for each city in cities, aliased as city_perc.
- Focus only on capital cities in Europe and the Americas in a subquery.
- Make sure to exclude records with missing data on metro area population.
- Order the result by city_perc descending.
- Then determine the top 10 capital cities in Europe and the Americas in terms of this city_perc percentage.
-- Select fields
SELECT name, country_code, city_proper_pop, metroarea_pop,
-- Calculate city_perc
city_proper_pop / metroarea_pop * 100 AS city_perc
-- From appropriate table
FROM cities
-- Where
WHERE name IN
-- Subquery
(SELECT capital
FROM countries
WHERE (continent = 'Europe'
OR continent LIKE '%America'))
AND metroarea_pop IS NOT NULL
-- Order appropriately
ORDER BY city_perc DESC
-- Limit amount
LIMIT 10;