Skip to content

Latest commit

 

History

History
2509 lines (1805 loc) · 85.2 KB

fundamentalsql.md

File metadata and controls

2509 lines (1805 loc) · 85.2 KB

SQL

TIPS

Filtering Results

    = equal
    <> not equal
    < less than
    > greater than
    <= less than or equal to
    >= greater than or equal to

SELECT DISTINCT

Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the DISTINCT keyword. This might be useful if, for example, you're interested in knowing which languages are represented in the films table:

SELECT DISTINCT language
FROM films;

PRACTICE WITH COUNT

As you've seen, COUNT(*) tells you how many rows are in a table. However, if you want to count the number of non-missing values in a particular column, you can call COUNT on just that column.

For example, to count the number of birth dates present in the people table:

SELECT COUNT(birthdate)
FROM people;

It's also common to combine COUNT with DISTINCT to count the number of distinct values in a column.

For example, this query counts the number of distinct birth dates contained in the people table:

SELECT COUNT(DISTINCT birthdate)
FROM people;

Count the number of unique countries in the films table.

SELECT COUNT(DISTINCT country)
FROM films; 

WHERE ARE

SELECT title
FROM films
WHERE release_year > 1994
AND release_year < 2000;

-- OR

SELECT title
FROM films
WHERE release_year > 1994 AND < 2000; 

OR

SELECT title, release_year
FROM films
WHERE language = 'Spanish' AND release_year < 2000; 

Nice example: Get all details for Spanish language films released after 2000, but before 2010.

SELECT *
FROM films
WHERE language = 'Spanish' 
AND release_year > 2000 
AND release_year < 2010;

WHERE AND OR

What if you want to select rows based on multiple conditions where some but not all of the conditions need to be met? For this, SQL has the OR operator. You now know how to select rows that meet some but not all conditions by combining AND and OR.

For example, the following returns all films released in either 1994 or 2000:

SELECT title
FROM films
WHERE release_year = 1994
OR release_year = 2000;

When combining AND and OR, be sure to enclose the individual clauses in parentheses, like so:

SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');

For example, the following query selects all films that were released in 1994 or 1995 which had a rating of PG or R

SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');

Now you'll write a query to get the title and release year of films released in the 90s which were in French or Spanish and which took in more than $2M gross.

SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish')
AND gross > 2000000;
-- gross é receita bruta

BETWEEN

Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:

SELECT title
FROM films
WHERE release_year
BETWEEN 1994 AND 2000;

ps. It's important to remember that BETWEEN is inclusive, meaning the beginning and end values are included in the results!

For example, suppose we have a table called kids. We can get the names of all kids between the ages of 2 and 12 from the United States:

SELECT name
FROM kids
WHERE age BETWEEN 2 AND 12
AND nationality = 'USA';

WHERE IN

As you've seen, WHERE is very useful for filtering results. However, if you want to filter based on many conditions, WHERE can get unwieldy. For example:

SELECT name
FROM kids
WHERE age = 2
OR age = 4
OR age = 6
OR age = 8
OR age = 10;

Enter the IN operator! The IN operator allows you to specify multiple values in a WHERE clause, making it easier and quicker to specify multiple OR conditions! Neat, right? So, the above example would become simply:

SELECT name
FROM kids
WHERE age IN (2, 4, 6, 8, 10);

Get the title and release year of all films released in 1990 or 2000 that were longer than two hours. Remember, duration is in minutes!

SELECT title, release_year
FROM films
WHERE release_year IN (1990, 2000)
AND duration > 120; 

Get the title and language of all films which were in English, Spanish, or French.

SELECT title, language 
FROM films
WHERE language IN ('English', 'Spanish', 'French');

INTRODUCTION TO NULL AND NOT NULL

In SQL, NULL represents a missing or unknown value. You can check for NULL values using the expression IS NULL. For example, to count the number of missing birth dates in the people table:

SELECT COUNT(*)
FROM people
WHERE birthdate IS NULL;

Sometimes, you'll want to filter out missing values so you only get results which are not NULL. To do this, you can use the IS NOT NULL operator.

For example, this query gives the names of all people whose birth dates are not missing in the people table.

SELECT name
FROM people
WHERE birthdate IS NOT NULL;

LIKE AND NOT LIKE

As you've seen, the WHERE clause can be used to filter text data. However, so far you've only been able to filter by specifying the exact text you're interested in. In the real world, often you'll want to search for a pattern rather than a specific text string.

In SQL, the LIKE operator can be used in a WHERE clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE:

The % wildcard will match zero, one, or many characters in text. For example, the following query matches companies like 'Data', 'DataC', 'DataCamp', 'DataMind', and so on:

SELECT name
FROM companies
WHERE name LIKE 'Data%';

The _ wildcard will match a single character. For example, the following query matches companies like 'DataCamp', 'DataComp', and so on:

SELECT name
FROM companies
WHERE name LIKE 'DataC_mp';

PS. You can also use the NOT LIKE operator to find records that don't match the pattern you specify.

Get the names of all people whose names begin with 'B'. The pattern you need is 'B%'.

SELECT name
FROM people 
WHERE name LIKE 'B%'; 

Get the names of people whose names have 'r' as the second letter. The pattern you need is '_r%'.

SELECT name
FROM people
WHERE name LIKE '_r%';

Get the names of people whose names don't start with A. The pattern you need is 'A%'.

SELECT name
FROM people 
WHERE name NOT LIKE 'A%'; 

AGGREGATE FUNCTIONS

Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

For example,

SELECT AVG(budget)
FROM films;

gives you the average value from the budget column of the films table. Similarly, the MAX function returns the highest budget:

SELECT MAX(budget)
FROM films;

The SUM function returns the result of adding up the numeric values in a column:

SELECT SUM(budget)
FROM films;

PS. You can probably guess what the MIN function does! Now it's your turn to try out some SQL functions. ;)

Use the SUM function to get the total duration of all films.

SELECT SUM(duration)
FROM films; 

Get the average duration of all films.

SELECT AVG(duration)
FROM films;

Get the duration of the shortest film.

SELECT MIN(duration)
FROM films;

Get the amount grossed by the worst performing film.

SELECT MIN(gross)
FROM films;

Combining aggregate functions with WHERE

ggregate functions can be combined with the WHERE clause to gain further insights from your data.

For example, to get the total budget of movies made in the year 2010 or later:

SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;

Get the average amount grossed by all films whose titles start with the letter 'A'.

SELECT AVG(gross)
FROM films
WHERE title LIKE 'A%'

Get the amount grossed by the worst performing film in 1994.

SELECT MIN(gross)
FROM films
WHERE release_year = 1994
-- pior rendimento na bilheteria

Get the amount grossed by the best performing film between 2000 and 2012, inclusive.

SELECT MAX(gross)
FROM films
WHERE release_year BETWEEN 2000 AND 2012

N NOTE ON ARITHMETIC

In addition to using aggregate functions, you can perform basic arithmetic with symbols like +, -, *, and /.

So, for example, this gives a result of 12:

SELECT (4 * 3);

However, the following gives a result of 1:

SELECT (4 / 3);

What's going on here? SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!

If you want more precision when dividing, you can add decimal places to your numbers. For example,

SELECT (4.0 / 3.0) AS result;

gives you the result you would expect: 1.333.

IT'S AS SIMPLE AS ALIASING

You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,

SELECT MAX(budget)
FROM films;

gives you a result with one column, named max. But what if you use two functions like this?

SELECT MAX(budget), MAX(duration)
FROM films;

OUTPUT

image

Well, then you'd have two columns named max, which isn't very useful!

To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword, which you've already seen earlier in this course.

For example, in the above example we could use aliases to make the result clearer:

SELECT MAX(budget) AS max_budget,
       MAX(duration) AS max_duration
FROM films;

OUPUT

image

Aliases are helpful for making results more readable!

Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as net_profit.

SELECT title, (gross - budget) AS net_profit
FROM films; 

-- nesse caso pode dar até net_profit negativo

Get the title and duration in hours for all films. The duration is in minutes, so you'll need to divide by 60.0 to get the duration in hours. Alias the duration in hours as duration_hours.

SELECT title, duration/60.0 AS duration_hours
FROM films; 

OUTPUT

image

Get the average duration in hours for all films, aliased as avg_duration_hours.

SELECT AVG(duration)/60.0 AS avg_duration_hours
FROM films

OUTPUT

image

Get the percentage of people who are no longer alive. Alias the result as percentage_dead. Remember to use 100.0 and not 100!

-- get the count(deathdate) and multiply by 100.0
-- then divide by count(*)

SELECT COUNT(deathdate) * 100.0 / COUNT(*) AS percentage_dead
FROM people

Get the number of years between the newest film and oldest film. Alias the result as difference.

SELECT MAX(release_year) - MIN(release_year) AS difference
FROM films; 

OUTPUT

image

Get the number of decades the films table covers. Alias the result as number_of_decades. The top half of your fraction should be enclosed in parentheses.

SELECT (MAX(release_year) - MIN(release_year)) / 10.0
AS number_of_decades
FROM films;

ORDER BY

In this chapter you'll learn how to sort and group your results to gain further insight. Let's go!

In SQL, the ORDER BY keyword is used to sort results in ascending or descending order according to the values of one or more columns.

By default ORDER BY will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword. For example,

SELECT title
FROM films
ORDER BY release_year DESC;

gives you the titles of films sorted by release year, from newest to oldest.

Get the names of people from the people table, sorted alphabetically.

SELECT name
FROM people
ORDER BY name; 

Get the names of people, sorted by birth date.

SELECT name
FROM people
ORDER BY birthdate; 

Get the birth date and name for every person, in order of when they were born.

SELECT birthdate, name
FROM people
ORDER BY birthdate;

Get the title of films released in 2000 or 2012, in the order they were released.

SELECT title
FROM films
WHERE (release_year = 2000 OR release_year = 2012)
ORDER BY release_year

Get all details for all films except those released in 2015 and order them by duration.

SELECT *
FROM films
WHERE release_year <> 2015
ORDER BY duration; 

Get the title and gross earnings for movies which begin with the letter 'M' and order the results alphabetically.

SELECT title, gross
FROM films
WHERE title LIKE 'M%'
ORDER BY title; 

Get the IMDB score and film ID for every film from the reviews table, sorted from highest to lowest score.

SELECT imdb_score, film_id
FROM reviews
ORDER BY imdb_score DESC;

SORTING MULTIPLE COLUMNS

ORDER BY can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example,

SELECT birthdate, name
FROM people
ORDER BY birthdate, name;

sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. The order of columns is important!

Get the birth date and name of people in the people table, in order of when they were born and alphabetically by name.

SELECT birthdate, name
FROM people
ORDER BY birthdate, name;

GROUP BY

For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, GROUP BY allows you to group a result by one or more columns, like so:

SELECT sex, count(*)
FROM employees
GROUP BY sex;

OUTPUT

image

Commonly, GROUP BY is used with aggregate functions like COUNT() or MAX(). Note that GROUP BY always goes after the FROM clause!

Note that you can combine GROUP BY with ORDER BY to group your results, calculate something about them, and then order your results. For example,

OUTPUT

image

because there are more females at our company than males. Note also that ORDER BY always goes after GROUP BY. Let's try some exercises!

Get the release year and count of films released in each year.

SELECT release_year, count(*)
FROM films
GROUP BY release_year; 

image

Get the release year and average duration of all films, grouped by release year.

SELECT release_year, AVG(duration)
FROM films
GROUP BY release_year;

OUTPUT

image

Get the release year and largest budget for all films, grouped by release year.

SELECT release_year, MAX(budget)
FROM films
GROUP BY release_year;

Get the IMDB score and count of film reviews grouped by IMDB score in the reviews table.

SELECT imdb_score, COUNT(*)
FROM reviews
GROUP BY imdb_score;

OUTPUT

image

PS. Make sure to always put the ORDER BY clause at the end of your query. You can't sort values that you haven't calculated yet!

Get the release year and lowest gross earnings per release year.

SELECT release_year, MIN(gross)
FROM films
GROUP BY release_year;

OUTPUT

image

Get the language and total gross amount films in each language made.

SELECT language, SUM(gross)
FROM films
GROUP BY language;

OUTPUT

image

Get the release year, country, and highest budget spent making a film for each year, for each country. Sort your results by release year and country.

SELECT release_year, country, MAX(budget)
FROM films
GROUP BY release_year, country
ORDER BY release_year, country; 

OUTPUT

image

Get the country, release year, and lowest amount grossed per release year per country. Order your results by country and release year.

SELECT country, release_year, MIN(gross)
FROM films
GROUP BY release_year, country
ORDER BY country, release_year;

image

HAVING

PS. In SQL, aggregate functions can't be used in WHERE clauses. For example, the following query is invalid: PS. Quando voce quiser usar o where, mas queria que fosse o statement de AVG, MAX, MIN, COUNT por exemplo, terá que usar o HAVING

--DARÁ INVÁLIDO
SELECT release_year
FROM films
GROUP BY release_year
WHERE COUNT(title) > 10;

This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the HAVING clause comes in. For example,

SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;

shows only those years in which more than 10 films were released.

In how many different years were more than 200 movies released?

SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 200;

image

Modify your query so that only years with an average budget of greater than $60 million are included.

SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990 
GROUP BY release_year
HAVING AVG(budget) > 60000000;

OUTPUT

image

Finally, modify your query to order the results from highest average gross earnings to lowest.

SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
ORDER BY AVG(gross) DESC;

OUTPUT

image

EXTRA

Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as avg_budget and avg_gross respectively.

-- select country, average budget, 
--     and average gross
SELECT country, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
-- from the films table
FROM films
-- group by country 
GROUP BY country
-- where the country has more than 10 titles
HAVING COUNT(country) > 10
-- order by country
ORDER BY country
-- limit to only show 5 results
LIMIT 5; 

image

INTRODUCTION TO INNER JOIN

image

SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;

Real Example:

SELECT p1.country, p1.continent, prime_minister, president
FROM prime_ministers AS p1
INNER JOIN presidents AS p2 
ON p1.country = p2.country;

OUTPUT

image

Throughout this course, you'll be working with the countries database containing information about the most populous world cities as well as country-level economic data, population data, and geographic data. This countries database also contains information on languages spoken in each country.

You can see the different tables in this database by clicking on the tabs on the bottom right below query.sql. Click through them to get a sense for the types of data that each table contains before you continue with the course! Take note of the fields that appear to be shared across the tables.

image

You'll start off with a SELECT statement and then build up to an inner join with the cities and countries tables. Let's get to it!

cities table

image

countries table

image

Instructions 1

  • Inner join the cities table on the left to the countries table on the right, keeping all of the fields in both tables.
  • You should match the tables on the country_code field in cities and the code field in countries.
  • Do not alias your tables here or in the next step. Using cities and countries is fine for now.
SELECT * 
FROM cities
  -- 1. Inner join to countries
  INNER JOIN countries
    -- 2. Match on the country codes
    ON cities.country_code = countries.code;

OUTPUT

image

Instructions 2

  • Modify the SELECT statement to keep only the name of the city, the name of the country, and the name of the region the country resides in.

  • Recall from our Intro to SQL for Data Science course that you can alias fields using AS. Alias the name of the city AS city and the name of the country AS country.

-- 1. Select name fields (with alias) and region 
SELECT cities.name AS city, countries.name AS country, region
FROM cities
  INNER JOIN countries
    ON cities.country_code = countries.code;

OUTPUT

image

GENERAL RULE INNER JOIN

SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;

Notice that to select a field in your query that appears in multiple tables, you'll need to identify which table/table alias you're referring to by using a . in your SELECT statement. (É UM EXEMPLO COM A TABELA CITIES E COUNTRIES QUE TEM O MESMO NAMEPARA DIFERENTES PARTES, LOGO, VOCÊ COLOCA AS PARA DIFERENCIAR OS 2)

You'll now explore a way to get data from both the countries and economies tables to examine the inflation rate for both 2010 and 2015.

image

image

And population table

image

Instructions

  • Join the tables countries (left) and economies (right) aliasing countries AS c and economies AS e.

  • Specify the field to match the tables ON.

  • From this join, SELECT:

···c.code, aliased as country_code.·· ···name, year, and inflation_rate, not aliased.··

-- 3. Select fields with aliases
SELECT c.code AS country_code, name, year, inflation_rate
FROM countries AS c
  -- 1. Join to economies (alias e)
  INNER JOIN economies AS e
    -- 2. Match on code
    ON c.code = e.code;

OUTPUT

image

INSTRUCTIONS

  • Inner join countries (left) and populations (right) on the code and country_code fields respectively.

  • Alias countries AS c and populations AS p.

  • Select code, name, and region from countries and also select year and fertility_rate from populations (5 fields in total).

-- 4. Select fields
SELECT code, name, region, year, fertility_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join with populations (as p)
  INNER JOIN populations as p
    -- 3. Match on country code
    ON c.code = p.country_code; 

OUTPUT

image

INSTRUCTIONS

  • Add an additional inner join with economies to your previous query by joining on code.

  • Include the unemployment_rate column that became available through joining with economies.

  • Note that year appears in both populations and economies, so you have to explicitly use e.year instead of year as you did before.

-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join to populations (as p)
  INNER JOIN populations AS p
    -- 3. Match on country code
    ON c.code = p.country_code
  -- 4. Join to economies (as e)
  INNER JOIN economies AS e
    -- 5. Match on country code
    ON c.code = e.code;

OUTPUT

image

INSTRUCTIONS

  • Scroll down the query result and take a look at the results for Albania from your previous query. Does something seem off to you?

  • The trouble with doing your last join on c.code = e.code and not also including year is that e.g. the 2010 value for fertility_rate is also paired with the 2015 value for unemployment_rate.

  • Fix your previous query: in your last ON clause, use AND to add an additional joining condition. In addition to joining on code in c and e, also join on year in e and p.

-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join to populations (as p)
  INNER JOIN populations AS p
    -- 3. Match on country code
    ON c.code = p.country_code
  -- 4. Join to economies (as e)
  INNER JOIN economies AS e
    -- 5. Match on country code and year
    ON c.code = e.code AND e.year = p.year; 

OUTPUT

image

INNER JOIN via USING

Usa USING quando os nomes das tabelas são iguais, como no exemplo

image

When joining tables with a common field name, e.g.

SELECT *
FROM countries
  INNER JOIN economies
    ON countries.code = economies.code
   

You can use USING as a shortcut:

SELECT *
FROM countries
  INNER JOIN economies
    USING(code)

You'll now explore how this can be done with the countries and languages tables.

INSTRUCTIONS

  • Inner join countries on the left and languages on the right with USING(code).

  • select the fields corresponding to:

1.country name AS country,

2.continent name,

3.language name AS language, and

4.whether or not the language is official.

  • Remember to alias your tables using the first letter of their names.
-- 4. Select fields
SELECT c.name AS country, continent, l.name AS language, official
  -- 1. From countries (alias as c)
  FROM countries AS c 
  -- 2. Join to languages (as l)
  INNER JOIN languages AS l
    -- 3. Match using code
    USING(code)

OUTPUT

image

SELF-JOIN

Self-joins are used to compare values in a field to other values of the same field from within the same table.

What if you wanted to create a new table showing countries that are in the same continent matched as pairs? Let's explore a chunk of INNER JOIN code using the prime_ministers table.

image

image

CASE WHEN and THEN

image

CASE is a way to do multiple if-then-else statements in a simplified way in SQL.

image

You can now see the basic layout for creating a new field containing the groupings. How might we fill them in? After the first WHEN should specify that we want to check for indep_year being less than 1900. Next we want indep_year_group to contain 'between 1900 and 1930' in the next blank. Lastly any other record not matching these conditions will be assigned the value of 'after 1930' for indep_year_group.

Check out the completed query with completed blanks. Notice how the values of indep_year are grouped in indep_year_group. Also observe how continent relates to indep_year_group.

image

SELF-JOIN EXAMPLES

In this exercise, you'll use the populations table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!

Since you'll be joining the populations table to itself, you can alias populations as p1 and also populations as p2. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.

INSTRUCTION 1.1

Population Table

image

  • Join populations with itself ON country_code.

  • select the country_code from p1 and the size field from both p1 and p2. SQL won't allow same-named fields, so alias p1.size as size2010 and p2.size as size2015.

-- 4. Select fields with aliases
SELECT p1.country_code, p1.size AS size2010, p2.size AS size2015
-- 1. From populations (alias as p1)
FROM populations AS p1
  -- 2. Join to itself (alias as p2)
  INNER JOIN populations AS p2
    -- 3. Match on country code
    ON p1.country_code = p2.country_code

OUTPUT

image

INSTRUCTION 1.2

Notice from the result that for each country_code you have four entries laying out all combinations of 2010 and 2015.

  • Extend the ON in your query to include only those records where the p1.year (2010) matches with p2.year - 5 (2015 - 5 = 2010). This will omit the three entries per country_code that you aren't interested in.
-- 5. Select fields with aliases
SELECT p1.country_code,
       p1.size AS size2010,
       p2.size AS size2015
-- 1. From populations (alias as p1)
FROM populations as p1
  -- 2. Join to itself (alias as p2)
  INNER JOIN populations as p2
    -- 3. Match on country code
    ON p1.country_code = p2.country_code
        -- 4. and year (with calculation)
       AND p1.year = p2.year - 5; 

image

INSTRUCTION 1.3

image

SELECT p1.country_code,
       p1.size AS size2010, 
       p2.size AS size2015,
       -- 1. calculate growth_perc
       ((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- 2. From populations (alias as p1)
FROM populations AS p1
  -- 3. Join to itself (alias as p2)
  INNER JOIN populations AS p2
    -- 4. Match on country code
    ON p1.country_code = p2.country_code
        -- 5. and year (with calculation)
        AND p1.year = p2.year - 5;

OUTPUT

image

CASE WHEN AND THEN EXAMPLES

Often it's useful to look at a numerical field not as raw data, but instead as being in different categories or groups.

You can use CASE with WHEN, THEN, ELSE, and END to define a new grouping field.

INSTRUCTIONS

Using the countries table, create a new field AS geosize_group that groups the countries into three groups:

  • If surface_area is greater than 2 million, geosize_group is 'large'.
  • If surface_area is greater than 350 thousand but not larger than 2 million, geosize_group is 'medium'.
  • Otherwise, geosize_group is 'small'.
SELECT name, continent, code, surface_area,
    -- 1. First case
    CASE WHEN surface_area > 2000000 THEN 'large'
        -- 2. Second case
        WHEN  surface_area > 350000 THEN 'medium'
        -- 3. Else clause + end
        ELSE 'small' END
        -- 4. Alias name
        AS geosize_group
-- 5. From table
FROM countries;

OUTPUT

image

INNER CHALLENGE

The table you created with the added geosize_group field has been loaded for you here with the name countries_plus. Observe the use of (and the placement of) the INTO command to create this countries_plus table:

SELECT name, continent, code, surface_area,
    CASE WHEN surface_area > 2000000
            THEN 'large'
       WHEN surface_area > 350000
            THEN 'medium'
       ELSE 'small' END
       AS geosize_group
INTO countries_plus
FROM countries;

You will now explore the relationship between the size of a country in terms of surface area and in terms of population using grouping fields created with CASE.

INSTRUCTIONS 1.1

image

SELECT country_code, size,
    -- 1. First case
    CASE WHEN size > 50000000 THEN 'large'
        -- 2. Second case
        WHEN size > 1000000 THEN 'medium'
        -- 3. Else clause + end
        ELSE 'small' END
        -- 4. Alias name (popsize_group)
        AS popsize_group
-- 5. From table
FROM populations
-- 6. Focus on 2015
WHERE year = 2015;

OUTPUT

image

INSTRUCTION 1.2

  • Use INTO to save the result of the previous query as pop_plus. You can see an example of this in the countries_plus code in the assignment text. Make sure to include a ; at the end of your WHERE clause!

  • Then, include another query below your first query to display all the records in pop_plus using SELECT * FROM pop_plus; so that you generate results and this will display pop_plus in query result.

SELECT country_code, size,
    CASE WHEN size > 50000000 THEN 'large'
        WHEN size > 1000000 THEN 'medium'
        ELSE 'small' END
        AS popsize_group
-- 1. Into table
INTO pop_plus
FROM populations
WHERE year = 2015;

-- 2. Select all columns of pop_plus
SELECT * FROM pop_plus

output

image

INSTRUCTION 1.3

  • Keep the first query intact that creates pop_plus using INTO.
  • Write a query to join countries_plus AS c on the left with pop_plus AS p on the right matching on the country code fields.
  • Sort the data based on geosize_group, in ascending order so that large appears on top.
  • Select the name, continent, geosize_group, and popsize_group fields.

CHAPTER 2 - OUTER JOINs

**LEFT and RIGHT JOINs

Congratulations on completing Chapter 1 on INNER JOINs. Welcome to Chapter 2 on OUTER JOINs! You can remember outer joins as reaching OUT to another table while keeping all of the records of the original table. Inner joins keep only the records IN both tables. You'll begin this chapter by exploring (1) LEFT JOINs, (2) RIGHT JOINs, and (3) FULL JOINs, which are the three types of OUTER JOINs. Let's begin by exploring how a LEFT JOIN differs from an INNER JOIN via a diagram.

image

image

image

The RIGHT JOIN is much less common than the LEFT JOIN so we won't spend as much time on it here.

image

LEFT JOIN EXAMPLES

Now you'll explore the differences between performing an inner join and a left join using the cities and countries tables.

You'll begin by performing an inner join with the cities table on the left and the countries table on the right. Remember to alias the name of the city field as city and the name of the country field as country.

INSTRUCTIONS 1.1

  • Fill in the code based on the instructions in the code comments to complete the inner join. Note how many records are in the result of the join in the query result tab.
-- Select the city name (with alias), the country code,
-- the country name (with alias), the region,
-- and the city proper population
SELECT c1.name AS country, code, c2.name AS city,
       region, city_proper_pop
-- From left table (with alias)
FROM countries AS c1
  -- Join to right table (with alias)
  INNER JOIN cities AS c2
    -- Match on country code
    ON c1.code = c2.country_code
-- Order by descending country code
ORDER BY code DESC;

image

INSTRUCTION 1.2

  • Change the code to perform a LEFT JOIN instead of an INNER JOIN. After executing this query, note how many records the query result contains.
SELECT c1.name AS city, code, c2.name AS country,
       region, city_proper_pop
FROM cities AS c1
  -- 1. Join right table (with alias)
  LEFT JOIN countries AS c2
    -- 2. Match on country code
    ON c1.country_code = c2.code
-- 3. Order by descending country code
ORDER BY code DESC;

OUTPUT

image

image

INSTRUCTION

  • Modify your code to calculate the average GDP per capita AS avg_gdp for each region in 2010.
  • Select the region and avg_gdp fields.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
  -- Left join with economies (alias as e)
  LEFT JOIN economies AS e
    -- Match on code field'''
    ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region;

image

INSTRUCTION 1.2

  • Arrange this data on average GDP per capita for each region in 2010 from highest to lowest average GDP per capita.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
  -- Left join with economies (alias as e)
  LEFT JOIN economies AS e
    -- Match on code fields
    ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region
-- Order by descending avg_gdp
ORDER BY avg_gdp DESC;

image

RIGHT JOINS

  1. INNER JOIN vs LEFT JOIN

Recall that an INNER JOIN keeps only the records that have matching key field values in both tables. A LEFT JOIN keeps all of the records in the left table while bringing in missing values for those key field values that don't appear in the right table.

image

  1. LEFT JOIN vs RIGHT JOIN

Now you can see the differences between a LEFT JOIN and a RIGHT JOIN. The id values of 2 and 3 in the left table do not match with the id values in the right table, so missing values are brought in for them in the LEFT JOIN. Likewise for the RIGHT JOIN, missing values are brought in for id values of 5 and 6.

image

  1. FULL JOIN diagram

Note the missing values here and that all six of the values of id are included in the table. You can also see from the SQL code to produce this FULL JOIN result that the general format aligns closely with the SQL syntax you've seen for both an INNER JOIN and a LEFT JOIN. You'll next explore an example from the leaders database.

image

image

image

image

FULL JOIN EXAMPLES

In this exercise, you'll examine how your results differ when using a full join versus using a left join versus using an inner join with the countries and currencies tables.

You will focus on the North American region and also where the name of the country is missing. Dig in to see what we mean!

Begin with a full join with countries on the left and currencies on the right. The fields of interest have been SELECTed for you throughout this exercise.

Then complete a similar left join and conclude with an inner join.

Countries table

image

Currencies table

image

INSTRUCTIONS 1.1

  • Choose records in which region corresponds to North America or is NULL.
SELECT name AS country, code, region, basic_unit
-- 3. From countries
FROM countries
  -- 4. Join to currencies
  FULL JOIN currencies
    -- 5. Match on code
    USING (code) -- É usado quando os nomes da coluna são iguais, como todos são code, então para facilitar uso USING code e não countries.code = currencies.code
-- 1. Where region is North America or null
WHERE region = 'North America' OR region IS null 
-- 2. Order by region
ORDER BY region;

INSTRUCTIONS 1.2

  • Repeat the same query as above but use a LEFT JOIN instead of a FULL JOIN. Note what has changed compared to the FULL JOIN result!
SELECT name AS country, code, region, basic_unit
-- 1. From countries
FROM countries
  -- 2. Join to currencies
  LEFT JOIN currencies
    -- 3. Match on code
    USING (code)
-- 4. Where region is North America or null
WHERE region = 'North America' OR region IS null
-- 5. Order by region
ORDER BY region;

image

INSTRUCTIONS 1.3

  • Repeat the same query as above but use an INNER JOIN instead of a FULL JOIN. Note what has changed compared to the FULL JOIN and LEFT JOIN results!
SELECT name AS country, code, region, basic_unit
FROM countries
  -- 1. Join to currencies
  INNER JOIN currencies
    USING (code)
-- 2. Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- 3. Order by region
ORDER BY region;

image

Have you kept an eye out on the different numbers of records these queries returned? The FULL JOIN query returned 17 rows, the OUTER JOIN returned 4 rows, and the INNER JOIN only returned 3 rows. Do these results make sense to you?

FULL JOIN EXAMPLES (2)

You'll now investigate a similar exercise to the last one, but this time focused on using a table with more records on the left than the right. You'll work with the languages and countries tables.

countries table

image

languages table

image

Begin with a full join with languages on the left and countries on the right. Appropriate fields have been selected for you again here.

INSTRUCTION 1.1

  • Choose records in which countries.name starts with the capital letter 'V' or is NULL.
  • Arrange by countries.name in ascending order to more clearly see the results.
SELECT countries.name, code, languages.name AS language
-- 3. From languages
FROM languages
  -- 4. Join to countries
   FULL JOIN countries
    -- 5. Match on code
    USING (code)
-- 1. Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- 2. Order by ascending countries.name
ORDER BY countries.name;

image

INSTRUCTION 1.2

  • Repeat the same query as above but use a left join instead of a full join. Note what has changed compared to the full join result!
SELECT countries.name, code, languages.name AS language
FROM languages
  -- 1. Join to countries
  LEFT JOIN countries
    -- 2. Match using code
    USING (code)
-- 3. Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
ORDER BY countries.name;

image

INSTRUCTIONS 1.3

  • Repeat once more, but use an inner join instead of a left join. Note what has changed compared to the full join and left join results.
SELECT countries.name, code, languages.name AS language
FROM languages
  -- 1. Join to countries
  INNER JOIN countries
    USING (code)
-- 2. Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
ORDER BY countries.name;

image

Well done. Again, make sure to compare the number of records the different types of joins return and try to verify whether the results make sense.

FULL JOIN (3)

You'll now explore using two consecutive full joins on the three tables you worked with in the previous two exercises.

INSTRUCTIONS

  • Complete a full join with countries on the left and languages on the right.
  • Next, full join this result with currencies on the right.
  • Use LIKE to choose the Melanesia and Micronesia regions (Hint: 'M%esia').
  • Select the fields corresponding to the country name AS country, region, language name AS language, and basic and fractional units of currency.
-- 7. Select fields (with aliases)
SELECT c1.name AS country, region, l.name AS language, basic_unit, frac_unit
-- 1. From countries (alias as c1)
FROM countries AS c1
  -- 2. Join with languages (alias as l)
  FULL JOIN languages AS l
    -- 3. Match on code
    USING (code)
  -- 4. Join with currencies (alias as c2)
  FULL JOIN currencies AS c2
    -- 5. Match on code
    USING (code)
-- 6. Where region like Melanesia and Micronesia
WHERE region LIKE 'M%esia';

image

CROSS JOIN

  1. CROSS JOIN diagram

In this diagram we have two tables named table1 and table2. Each table only has one field, both with the name of id. The result of the CROSS JOIN is all nine combinations of the id values of 1, 2, and 3 in table1 with the id values of A, B, and C for table2. Next you'll explore an example from the leaders database and look over the SQL syntax for a CROSS JOIN.

image

  1. Pairing prime ministers with presidents

Suppose that all prime ministers in North America and Oceania in the prime_ministers table are scheduled for individual meetings with all presidents in the presidents table. You can look at all of these combinations by using a CROSS JOIN. The syntax here remains similar to what you've seen earlier in the course. We use a WHERE clause to focus on only prime ministers in North America and Oceania in the prime_ministers table. The results of the query give us the pairings for the two prime ministers in North America and Oceania from the prime_ministers table with the seven presidents in the presidents table.

image

A table of two cities

This exercise looks to explore languages potentially and most frequently spoken in the cities of Hyderabad, India and Hyderabad, Pakistan.

You will begin with a cross join with cities AS c on the left and languages AS l on the right. Then you will modify the query using an inner join in the next tab.

INSTRUCTIONS 1.1

  • Create the cross join as described above. (Recall that cross joins do not use ON or USING.)
  • Make use of LIKE and Hyder% to choose Hyderabad in both countries.
  • Select only the city name AS city and language name AS language.
-- 4. Select fields
SELECT c.name AS city, l.name AS language
-- 1. From cities (alias as c)
FROM cities AS c        
  -- 2. Join to languages (alias as l)
  CROSS JOIN languages AS l 
-- 3. Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';

image

INSTRUCTIONS 1.2

  • use an inner join instead of a cross join. Think about what the difference will be in the results for this inner join result and the one for the cross join.
-- 5. Select fields
SELECT c.name AS city, l.name AS language 
-- 1. From cities (alias as c)
FROM cities AS c     
  -- 2. Join to languages (alias as l)
  INNER JOIN languages AS l
    -- 3. Match on country code
    ON c.country_code = l.code
-- 4. Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';

image

Good one! Can you see the difference between a CROSS JOIN and a INNER JOIN?

Outer challenge

Now that you're fully equipped to use outer joins, try a challenge problem to test your knowledge!

In terms of life expectancy for 2010, determine the names of the lowest five countries and their regions.

INSTRUCTIONS

  • Select country name AS country, region, and life expectancy AS life_exp.
  • Make sure to use LEFT JOIN, WHERE, ORDER BY, and LIMIT.
-- Select fields
SELECT c.name AS country, region,
life_expectancy AS life_exp
-- From countries (alias as c)
FROM countries AS c
  -- Join to populations (alias as p)
  LEFT JOIN populations AS p
    -- Match on country code
    ON c.code = p.country_code
-- Focus on 2010
WHERE year = 2010
-- Order by life_exp
ORDER BY life_exp
-- Limit to 5 records
LIMIT 5; 

image

STATE OF THE UNION

image

image

image

By contrast (with the same two tables left_one and right_one), UNION ALL includes all duplicates in its result. So left_one and right_one both having four records yields eight records for the result of the UNION ALL. If it were the case that right_one had these same four values and also one more value of 1 for id, you'd see three entries for 1 in the resulting UNION ALL. Let's check out the SQL syntax using the leaders database for both UNION and UNION ALL, but first you'll see one more table in the leaders database.

image

image

that UNION and UNION ALL clauses do not do the lookup step that JOINs do. They simply stack records on top of each other from one table to the next.

UNION EXAMPLES

Economies table

image

Economies2015 table

image

Economies2010 table

image

INSTRUCTION

  • Combine these two tables into one table containing all of the fields in economies2010. The economies table is also included for reference.
  • Sort this resulting single table by country code and then by year, both in ascending order.
-- Select fields from 2010 table
SELECT *
  -- From 2010 table
FROM economies2010
	-- Set theory clause
UNION
-- Select fields from 2015 table
SELECT *
  -- From 2015 table
FROM economies2015
-- Order by code and year
ORDER BY code, year;

image

UNION 2 EXAMPLE

INSTRUCTION

  • Determine all (non-duplicated) country codes in either the cities or the currencies table. The result should be a table with only one field called country_code.
  • Sort by country_code in alphabetical order.
-- Select field
SELECT DISTINCT country_code
  -- From cities
  FROM cities
	-- Set theory clause
	UNION
-- Select field
SELECT DISTINCT code
  -- From currencies
  FROM currencies
-- Order by country_code
order by country_code;

image

UNION ALL EXAMPLE

As you saw, duplicates were removed from the previous two exercises by using UNION.

To include duplicates, you can use UNION ALL.

INSTRUCTION

  • Determine all combinations (include duplicates) of country code and year that exist in either the economies or the populations tables. Order by code then year.
  • The result of the query should only have two columns/fields. Think about how many records this query should result in.
  • You'll use code very similar to this in your next exercise after the video. Make note of this code after completing it.
-- Select fields
SELECT code, year
  -- From economies
  FROM economies
	-- Set theory clause
	UNION ALL
-- Select fields
SELECT country_code, year
  -- From populations
  FROM populations
-- Order by code, year
ORDER BY code, year;

image

INTERSECT DIAGRAM AND SQL CODE

Vale lembrar que intersect precisa ter o SELECT igual na coluna.

image

image

Next, let's think about what would happen if we tried to select two columns instead of one from our previous example. The code shown does just that. What will be the result of this query? Will this also give you the names of the countries that have both a prime minister and a president? Hmmm [PAUSE] The actual result is an empty table. Why is that? When INTERSECT looks at two columns it includes both columns in the search. So it didn't find any countries with prime ministers AND presidents having the same name. INTERSECT looks for RECORDS in common, not individual key fields like what a join does to match. This is an important distinction.

image

INTERSECT EXAMPLE

  • Again, order by code and then by year, both in ascending order.
  • Note the number of records here (given at the bottom of query result) compared to the similar UNION ALL query result (814 records).
-- Select fields
SELECT code, year
  -- From economies
  FROM economies
	-- Set theory clause
	INTERSECT
-- Select fields
SELECT country_code, year
  -- From populations
  FROM populations
-- Order by code and year
ORDER BY code, year;

image

INTERSCT 2 EXAMPLE

As you think about major world cities and their corresponding country, you may ask which countries also have a city with the same name as their country name?

INSTRUCTIONS

Use INTERSECT to answer this question with countries and cities!

-- Select fields
SELECT name, code
  -- From countries
  FROM countries
	-- Set theory clause
	INTERSECT
-- Select fields
SELECT name, country_code
  -- From cities
FROM cities;

image

EXCEPTIONAL

image

image

  1. Monarchs that aren't prime ministers

You saw earlier that there are some monarchs that also act as the prime minister for their country. One way to determine those monarchs in the monarchs table that do not also hold the title of prime minister is to use the EXCEPT clause. [CLICK] This SQL query selects the monarch field from monarchs and then looks for common entries with the prime_ministers field, while also keeping track of the country for each leader. [CLICK] You can see in the resulting query that only the two European monarchs are not also prime ministers in the leaders database.

  1. EXCEPT diagram

image

This diagram gives the structure of EXCEPT clauses. Only the records that appear in the left table BUT DO NOT appear in the right table are included.

EXCEPT EXAMPLES

Get the names of cities in cities which are not noted as capital cities in countries as a single field result.

Note that there are some countries in the world that are not included in the countries table, which will result in some cities not being labeled as capital cities when in fact they are.

INSTRUCTIONS

cities table

image

countries table

image

  • Order the resulting field in ascending order.
  • Can you spot the city/cities that are actually capital cities which this query misses?
-- Select field
SELECT name
  -- From cities
  FROM cities
	-- Set theory clause
  EXCEPT
-- Select field
SELECT capital
  -- From countries
  FROM countries
-- Order by result
ORDER BY name;

image

EXCEPT 2

Now you will complete the previous query in reverse!

Determine the names of capital cities that are not listed in the cities table.

INSTRUCTIONS

  • Order by capital in ascending order.
  • The cities table contains information about 236 of the world's most populous cities. The result of your query may surprise you in terms of the number of capital cities that DO NOT appear in this list!
-- Select field
SELECT capital
  -- From countries
  FROM countries
	-- Set theory clause
	EXCEPT
-- Select field
SELECT name
  -- From cities
  FROM cities
-- Order by ascending capital
ORDER BY capital;

image

You'll now close this chapter by returning to joins. The six joins you've worked with so far are all additive joins in that they add columns to the original "left" table. Can you name all six? (1. INNER JOIN, 2. self-join, 3. LEFT JOIN, 4. RIGHT JOIN, 5. FULL JOIN, and 6. CROSS JOIN).

SEMI-JOINS AND ANTI-JOINS

SEMI-JOIN EXAMPLE

You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.

languages table

image

countries table

image

INSTRUCTIONS 1.1

Flash back to our Intro to SQL for Data Science course and begin by selecting all country codes in the Middle East as a single field result using SELECT, FROM, and WHERE.

-- Select code
SELECT code
  -- From countries
  FROM countries
-- Where region is Middle East
WHERE region = 'Middle East';

INSTRUCTIONS 1.2

  • Comment out the answer to the previous tab by surrounding it in /* and */. You'll come back to it!
  • Below the commented code, select only unique languages by name appearing in the languages table.
  • Order the resulting single field table by name in ascending order.
/*SELECT code
  FROM countries
WHERE region = 'Middle East';*/

-- Select field
SELECT DISTINCT(name)
  -- From languages
  FROM languages
-- Order by name
ORDER BY name;

INSTRUCTIONS 1.3

Now combine the previous two queries into one query:

  • Add a WHERE IN statement to the SELECT DISTINCT query, and use the commented out query from the first instruction in there. That way, you can determine the unique languages spoken in the Middle East.

Carefully review this result and its code after completing it. It serves as a great example of subqueries, which are the focus of Chapter 4.

-- Select distinct fields
SELECT DISTINCT(name)
  -- From languages
  FROM languages
-- Where in statement
WHERE code IN
  -- Subquery
  (SELECT code
   FROM countries
   WHERE region = 'Middle East')
-- Order by name
ORDER BY name;

image

Diagnosing problems using anti-join

Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.

You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!

INSTRUCTIONS

Begin by determining the number of countries in countries that are listed in Oceania using SELECT, FROM, and WHERE.

-- Select statement
SELECT count(code)
  -- From countries
  FROM countries
-- Where continent is Oceania
WHERE continent = 'Oceania';

image

INSTRUCTIONS 1.2

  • complete an inner join with countries AS c1 on the left and currencies AS c2 on the right to get the different currencies used in the countries of Oceania.
  • Match ON the code field in the two tables.
  • Include the country code, country name, and basic_unit AS currency.

Observe query result and make note of how many different countries are listed here.

-- 5. Select fields (with aliases)
SELECT c1.code, name, basic_unit AS currency
  -- 1. From countries (alias as c1)
  FROM countries as c1
  	-- 2. Join with currencies (alias as c2)
  	INNER JOIN currencies c2
    -- 3. Match on code
    ON c1.code = c2.code
-- 4. Where continent is Oceania
WHERE continent = 'Oceania';

image

INSTRUCTION 1.3

Note that not all countries in Oceania were listed in the resulting inner join with currencies. Use an anti-join to determine which countries were not included!

  • Use NOT IN and (SELECT code FROM currencies) as a subquery to get the country code and country name for the Oceanian countries that are not included in the currencies table.
-- 3. Select fields
SELECT code, name
  -- 4. From Countries
  FROM countries
  -- 5. Where continent is Oceania
  WHERE continent = 'Oceania'
  	-- 1. And code not in
  	AND code NOT IN
  	-- 2. Subquery
  	(SELECT code
  	 FROM currencies);

image

SET THEORY CHALLENGE

Congratulations! You've now made your way to the challenge problem for this third chapter. Your task here will be to incorporate two of UNION/UNION ALL/INTERSECT/EXCEPT to solve a challenge involving three tables.

In addition, you will use a subquery as you have in the last two exercises! This will be great practice as you hop into subqueries more in Chapter 4!

INSTRUCTIONS

  • dentify the country codes that are included in either economies or currencies but not in populations.
  • Use that result to determine the names of cities in the countries that match the specification in the previous instruction.
-- Select the city name
SELECT name
  -- Alias the table where city name resides
  FROM cities AS c1
  -- Choose only records matching the result of multiple set theory clauses
  WHERE country_code IN
(
    -- Select appropriate field from economies AS e
    SELECT e.code
    FROM economies AS e
    -- Get all additional (unique) values of the field from currencies AS c2  
    UNION
    SELECT c2.code
    FROM currencies AS c2
    -- Exclude those appearing in populations AS p
    EXCEPT
    SELECT p.country_code
    FROM populations AS p
);

image

Subqueries inside WHERE and SELECT clauses

You'll now try to figure out which countries had high average life expectancies (at the country level) in 2015.

INSTRUCTIONS 1.1

populations table

image

image

  • Begin by calculating the average life expectancy across all countries for 2015.
-- Select average life_expectancy
SELECT avg(life_expectancy)
  -- From populations
  FROM populations
-- Where year is 2015
WHERE year = 2015

image

INSTRUCTION 1.2

image

-- Select fields
SELECT *
  -- From populations
  FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy > 
  -- 1.15 * subquery
  1.15 *
  (SELECT avg(life_expectancy)
  -- From populations
  FROM populations
-- Where year is 2015
WHERE year = 2015 )
AND year = 2015;

image

Subquery inside where (2)

Use your knowledge of subqueries in WHERE to get the urban area population for only capital cities.

INSTRUCTIONS

  • Make use of the capital field in the countries table in your subquery.
  • Select the city name, country code, and urban area population fields.
-- 2. Select fields
SELECT cities.name, country_code, urbanarea_pop
  -- 3. From cities
  FROM cities
-- 4. Where city name in the field of capital cities
WHERE name IN
  -- 1. Subquery
  (SELECT capital
   FROM countries)
ORDER BY urbanarea_pop DESC;

image

Subquery inside select

In this exercise, you'll see how some queries can be written using either a join or a subquery.

You have seen previously how to use GROUP BY with aggregate functions and an inner join to get summarized information from multiple tables.

The code given in query.sql selects the top nine countries in terms of number of cities appearing in the cities table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the commented out code to get the same result as the code shown.

INSTRUCTIONS 1.1

Just Submit Answer here!

SELECT countries.name AS country, COUNT(*) AS cities_num
  FROM cities
    INNER JOIN countries
    ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;

/* 
SELECT ___ AS ___,
  (SELECT ___
   FROM ___
   WHERE countries.code = cities.country_code) AS cities_num
FROM ___
ORDER BY ___ ___, ___
LIMIT 9;
*/

image

INSTRUCTIONS 1.2

  • Remove the comments around the second query and comment out the first query instead.
  • Convert the GROUP BY code to use a subquery inside of SELECT, i.e. fill in the blanks to get a result that matches the one given using the GROUP BY code in the first query.
  • Again, sort the result by cities_num descending and then by country ascending.
/*SELECT countries.name AS country, COUNT(*) AS cities_num
  FROM cities
    INNER JOIN countries
    ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
*/

SELECT  countries.name AS country,
  (SELECT COUNT(*)
   FROM cities
   WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;

image

Subquery inside FROM clause

The last type of subquery you will work with is one inside of FROM.

You will use this to determine the number of languages spoken for each country, identified by the country's local name! (Note this may be different than the name field and is stored in the local_name field.)

INSTRUCTIONS 1.1

  • Begin by determining for each country code how many languages are listed in the languages table using SELECT, FROM, and GROUP BY.
  • Alias the aggregated field as lang_num.
-- Select fields (with aliases)
SELECT code, COUNT(name) AS lang_num
  -- From languages
  FROM languages
-- Group by code
GROUP BY code;

image

INSTRUCTIONS 1.2

  • Include the previous query (aliased as subquery) as a subquery in the FROM clause of a new query.
  • Select the local name of the country from countries.
  • Also, select lang_num from subquery.
  • Make sure to use WHERE appropriately to match code in countries and in subquery.
  • Sort by lang_num in descending order.
SELECT local_name, subquery.lang_num
  FROM countries,
  	(SELECT code, COUNT(*) AS lang_num
  	 FROM languages
  	 GROUP BY code) AS subquery
  WHERE countries.code = subquery.code
ORDER BY lang_num DESC;

image

ADVANCED SUBQUERY

You can also nest multiple subqueries to answer even more specific questions.

In this exercise, for each of the six continents listed in 2015, you'll identify which country had the maximum inflation rate (and how high it was) using multiple subqueries. The table result of your query in Task 3 should look something like the following, where anything between < > will be filled in with appropriate values:

image

Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.

INSTRUCTIONS 1.1

  • Now it's time to append the second part's query to the first part's query using AND and IN to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015!
  • For the sake of practice, change all joining conditions to use ON instead of USING (based upon the same column, code).
  • Revisit the sample output in the assignment text at the beginning of the exercise to see how this matches up.
-- Select fields
SELECT name, continent, inflation_rate
  -- From countries
  FROM countries
  	-- Join to economies
  	INNER JOIN economies
    -- Match on code
    ON countries.code = economies.code
-- Where year is 2015
WHERE year = 2015;

image

INSTRUCTIONS 1.2

Select the maximum inflation rate in 2015 AS max_inf grouped by continent using the previous step's query as a subquery in the FROM clause.

Thus, in your subquery you should: + Create an inner join with countries on the left and economies on the right with USING (without aliasing your tables or columns). + Retrieve the country name, continent, and inflation rate for 2015. + Alias the subquery as subquery.

This will result in the six maximum inflation rates in 2015 for the six continents as one field table. Make sure to not include continent in the outer SELECT statement.

-- Select the maximum inflation rate as max_inf
SELECT MAX(inflation_rate) AS max_inf
  -- Subquery using FROM (alias as subquery)
  FROM ( SELECT name, continent, inflation_rate
      FROM countries
      INNER JOIN economies
      USING(code)
      WHERE year = 2015
      ) AS subquery
-- Group by continent
GROUP BY continent;

image

INSTRUCTIONS 1.3

  • Now it's time to append your second query to your first query using AND and IN to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015.
  • For the sake of practice, change all joining conditions to use ON instead of USING.
-- Select fields
SELECT name, continent, inflation_rate
  -- From countries
  FROM countries
	-- Join to economies
	INNER JOIN economies
	-- Match on code
	ON countries.code = economies.code
  -- Where year is 2015
  WHERE year = 2015
    -- And inflation rate in subquery (alias as subquery)
    AND inflation_rate IN (
        SELECT MAX(inflation_rate) AS max_inf
        FROM (
             SELECT name, continent, inflation_rate
             FROM countries
             INNER JOIN economies
             ON economies.code = countries.code
             WHERE year = 2015) AS subquery
      -- Group by continent
        GROUP BY continent);

image

INSTRUCTIONS 1.3

  • Select unique country names. Also select the total investment and imports fields.
  • Use a left join with countries on the left. (An inner join would also work, but please use a left join here.)
  • Match on code in the two tables AND use a subquery inside of ON to choose the appropriate languages records.
  • Order by country name ascending.
  • Use table aliasing but not field aliasing in this exercise.
-- Select fields
SELECT DISTINCT c.name, total_investment, imports
  -- From table (with alias)
  FROM countries AS c
    -- Join with table (with alias)
    LEFT JOIN economies AS e
      -- Match on code
      ON (c.code = e.code
      -- and code in Subquery
        AND c.code IN (
          SELECT l.code
          FROM languages AS l
          WHERE official = 'True'
        ) )
  -- Where region and year are correct
  WHERE region = 'Central America'  AND year = 2015
-- Order by field
ORDER BY c.name;

image

Wow! Well done! This code works since each of the six maximum inflation rate values occur only once in the 2015 data. Think about whether this particular code involving subqueries would work in cases where there are ties for the maximum inflation rate values.

SUBQUERY CHALLENGE

Let's test your understanding of the subqueries with a challenge problem! Use a subquery to get 2015 economic data for countries that do not have

  • gov_form of 'Constitutional Monarchy' or
  • 'Republic' in their gov_form.

Here, gov_form stands for the form of the government for each country. Review the different entries for gov_form in the countries table.

INSTRUCTIONS

  • Select the country code, inflation rate, and unemployment rate.
  • Order by inflation rate ascending.
  • Do not use table aliasing in this exercise.
-- Select fields
SELECT code, inflation_rate, unemployment_rate
  -- From economies
  FROM economies
  -- Where year is 2015 and code is not in
  WHERE year = 2015 AND code NOT IN
  	-- Subquery
  	(SELECT code
  	 FROM countries
  	 WHERE (gov_form = 'Constitutional Monarchy' OR gov_form LIKE '%Republic%'))
-- Order by inflation rate
ORDER BY inflation_rate;

image

FINAL CHALLENGE

Welcome to the end of the course! The next three exercises will test your knowledge of the content covered in this course and apply many of the ideas you've seen to difficult problems. Good luck!

Read carefully over the instructions and solve them step-by-step, thinking about how the different clauses work together.

In this exercise, you'll need to get the country names and other 2015 data in the economies table and the countries table for Central American countries with an official language.

INSTRUCTIONS

  • Select unique country names. Also select the total investment and imports fields.
  • Use a left join with countries on the left. (An inner join would also work, but please use a left join here.)
  • Match on code in the two tables AND use a subquery inside of ON to choose the appropriate languages records.
  • Order by country name ascending.
  • Use table aliasing but not field aliasing in this exercise.

economies table

image

countries table

image

languages table

image

-- Select fields
SELECT DISTINCT c.name, total_investment, imports
  -- From table (with alias)
  FROM countries AS c
    -- Join with table (with alias)
    LEFT JOIN economies AS e
      -- Match on code
      ON (c.code = e.code
      -- and code in Subquery
        AND c.code IN (
          SELECT l.code
          FROM languages AS l
          WHERE official = 'True'
        ) )
  -- Where region and year are correct
  WHERE region = 'Central America'  AND year = 2015
-- Order by field
ORDER BY c.name;

image

Final Challenge 2

Whoofta! That was challenging, huh?

Let's ease up a bit and calculate the average fertility rate for each region in 2015.

INSTRUCTIONS

  • Include the name of region, its continent, and average fertility rate aliased as avg_fert_rate.
  • Sort based on avg_fert_rate ascending.
  • Remember that you'll need to GROUP BY all fields that aren't included in the aggregate function of SELECT.
-- Select fields
SELECT c.region, c.continent, AVG(fertility_rate) AS avg_fert_rate
  -- From left table
  FROM populations AS p
    -- Join to right table
    INNER JOIN  countries AS c
      -- Match on join condition
      ON c.code = p.country_code
  -- Where specific records matching some condition
  WHERE year = 2015
-- Group appropriately
GROUP BY region, continent
-- Order appropriately
ORDER BY avg_fert_rate;

image

FINAL CHALLENGE 3

Welcome to the last challenge problem. By now you're a query warrior! Remember that these challenges are designed to take you to the limit to solidify your SQL knowledge! Take a deep breath and solve this step-by-step.

You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using city_proper_pop and metroarea_pop in cities.

Do not use table aliasing in this exercise.

INSTRUCTIONS

  • Select the city name, country code, city proper population, and metro area population.
  • Calculate the percentage of metro area population composed of city proper population for each city in cities, aliased as city_perc.
  • Focus only on capital cities in Europe and the Americas in a subquery.
  • Make sure to exclude records with missing data on metro area population.
  • Order the result by city_perc descending.
  • Then determine the top 10 capital cities in Europe and the Americas in terms of this city_perc percentage.
-- Select fields
SELECT name, country_code, city_proper_pop, metroarea_pop,
	  -- Calculate city_perc
      city_proper_pop / metroarea_pop * 100 AS city_perc
  -- From appropriate table    
  FROM cities
  -- Where
  WHERE name IN
    -- Subquery
    (SELECT capital
     FROM countries
     WHERE (continent = 'Europe'
        OR continent LIKE '%America'))
       AND metroarea_pop IS NOT NULL
-- Order appropriately
ORDER BY city_perc DESC
-- Limit amount
LIMIT 10;

image