Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
* Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.

Checklist:
- [ ] Create a branch called `assignment-two`.
- [ ] Ensure that the repository is public.
- [ ] Review [the PR description guidelines](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md#guidelines-for-pull-request-descriptions) and adhere to them.
- [X] Create a branch called `assignment-two`.
- [ X] Ensure that the repository is public.
- [ X] Review [the PR description guidelines](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md#guidelines-for-pull-request-descriptions) and adhere to them.
- [ ] Verify that the link is accessible in a private browser window.

If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via our Slack. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges.
Expand Down Expand Up @@ -55,7 +55,27 @@ The store wants to keep customer addresses. Propose two architectures for the CU

```
Your answer...
```
`Overwrite is type 1, which changes overwrite the old values, so if the bookstore only ever needs the current address in the CUSTOMER_ADDRESS table, we will have these variables below:

(customer_id,
address_line1 ,
address_line2 ,
city,
state,
postal_code,
country)
But if it can retain changes, it is gonna be type 2, meaning it keeps historical versions. So in the CUSTOMER_ADDRESS table, the below variables will exist:
(customer_id,
address_line1,
address_line2,
city,
state,
postal_code,
country,
effective_from,
effective_to,
effective_date,
changed_by)

***

Expand Down Expand Up @@ -184,4 +204,4 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c

```
Your thoughts...
```
The article suggests the myth: “The model did it” when in fact underpaid people did, because it suggested that many of the systems we treat as automated are underpinned by human work. The story implies that automated systems make those human workers invisible and may shift human labour behind the scenes. Which means their labour condition are unethical. From another point of view, there is a risk of human biases because those were humans who decided which data to include and which ones to exclude. As a result, humans inadvertently induce biases during their data collection for Neural nets or large language models. To remedy this, it is necessary to value the human labour in the process that we have vast access to computers, which can understand the dimensions of a hot dog! Use them with responsibility and also the vulnerability that makes these technologies possible.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
230 changes: 230 additions & 0 deletions 02_activities/assignments/DC_Cohort/assignment_two.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
Write SQL
COALESCE
#Our favourite manager wants a detailed long list of products, but is afraid of tables! We tell them, no problem!
#We can produce a list with all of the appropriate details.
#Using the following syntax you create our super cool and not at all needy manager a list:
SELECT
product_name || ', ' || product_size|| ' (' || product_qty_type || ')'
FROM product
#But wait! The product table has some bad data (a few NULL values).
#Find the NULLs and then using COALESCE, replace the NULL with a blank for the first column with nulls,
and 'unit' for the second column with nulls.

SELECT
product_name
|| ', '
|| COALESCE(product_size, '') -- first NULL -> blank
|| ' ('
|| COALESCE(product_qty_type, 'unit') -- second NULL -> 'unit'
|| ')' AS product_display
FROM product;

#Windowed Functions
-- 1) build distinct visits per customer then number them
WITH visits AS (
SELECT DISTINCT customer_id, market_date
FROM customer_purchases
)
SELECT
customer_id,
market_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_number
FROM visits
ORDER BY customer_id, market_date;

SELECT
customer_id,
market_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS rn_desc
FROM (
SELECT DISTINCT customer_id, market_date
FROM customer_purchases
) AS distinct_visits
ORDER BY customer_id, market_date DESC;

WITH numbered_visits AS (
SELECT
customer_id,
market_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS rn_desc
FROM (
SELECT DISTINCT customer_id, market_date
FROM customer_purchases
)
)
SELECT *
FROM numbered_visits
WHERE rn_desc = 1
ORDER BY customer_id;


#Using a COUNT() window function, include a value along with each row of the customer_purchases
table that indicates how many different times that customer has purchased that product_id.

WITH distinct_customer_product_dates AS (
SELECT
customer_id,
product_id,
market_date
FROM customer_purchases
GROUP BY customer_id, product_id, market_date
),


counts AS (
SELECT
customer_id,
product_id,
COUNT(*) AS times_purchased_distinct_dates
FROM distinct_customer_product_dates
GROUP BY customer_id, product_id
)


SELECT
cp.*,
c.times_purchased_distinct_dates
FROM customer_purchases cp
LEFT JOIN counts c
ON cp.customer_id = c.customer_id
AND cp.product_id = c.product_id
ORDER BY cp.customer_id, cp.product_id, cp.market_date;


#String manipulations
SELECT
product_name,
CASE
WHEN INSTR(product_name, '-') > 0 THEN
TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
ELSE
NULL
END AS description_after_hyphen
FROM product
WHERE INSTR(product_name, '-') > 0; -- optionally filter only rows that have a hyphen

#Filter the query to show any product_size value that contain a number with REGEXP.
SELECT *
FROM product
WHERE product_size REGEXP '[0-9]';

#UNION
Using a UNION, write a query that displays the market dates with the highest and lowest total sales.

WITH totals AS (
SELECT
market_date,
SUM(cost_to_customer_per_qty) AS total_sales
FROM customer_purchases
GROUP BY market_date
),
ranked AS (
SELECT
market_date,
total_sales,
RANK() OVER (ORDER BY total_sales DESC) AS rank_desc, -- 1 = highest
RANK() OVER (ORDER BY total_sales ASC) AS rank_asc -- 1 = lowest
FROM totals
)
-- pick highest total_sales days (rank_desc = 1) union lowest (rank_asc = 1)
SELECT 'best_day' AS which, market_date, total_sales
FROM ranked
WHERE rank_desc = 1

UNION

SELECT 'worst_day' AS which, market_date, total_sales
FROM ranked
WHERE rank_asc = 1
ORDER BY which;


Section 3:

#Cross Join

WITH vp AS (
SELECT
v.vendor_id,
v.vendor_name,
p.product_id,
p.product_name,
v.original_price
FROM vendor_inventory vi
JOIN vendor v ON vi.vendor_id = v.vendor_id
JOIN product p ON vi.product_id = p.product_id
GROUP BY v.vendor_id, v.vendor_name, p.product_id, p.product_name, v.original_price
),
cust AS (
SELECT customer_id FROM customer
)
-- Cross join vp with every customer and sum 5 * price for each cross row
SELECT
vp.vendor_name,
vp.product_name,
SUM(5 * vp.original_price) AS projected_revenue -- each cross-row contributes 5*price

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number 5 is abstract number

FROM vp
CROSS JOIN cust
GROUP BY vp.vendor_id, vp.product_id, vp.vendor_name, vp.product_name
ORDER BY vp.vendor_name, vp.product_name;

#INSERT
Create a new table "product_units".
DROP TABLE IF EXISTS product_units;

CREATE TABLE product_units AS
SELECT
p.*,
CURRENT_TIMESTAMP AS snapshot_timestamp
FROM product p
WHERE product_qty_type = 'unit';


PRAGMA table_info(product_units);
SELECT * FROM product_units LIMIT 10;

#Using INSERT, add a new row to the product_unit
INSERT INTO product_units
SELECT
p.*,
CURRENT_TIMESTAMP
FROM product p
WHERE p.product_name = 'Apple Pie' -- change product name if needed
LIMIT 1;

#DELETE
DELETE FROM product_units
WHERE product_name = 'Apple Pie'
AND snapshot_timestamp < (
SELECT MAX(snapshot_timestamp)
FROM product_units pu2
WHERE pu2.product_name = product_units.product_name
);

#UPDATE
ALTER TABLE product_units
ADD current_quantity INT;


SELECT
product_id,
quantity,
market_date
FROM vendor_inventory
WHERE (product_id, market_date) IN (
SELECT product_id, MAX(market_date)
FROM vendor_inventory
GROUP BY product_id
);
UPDATE product_units
SET current_quantity = COALESCE(
(
SELECT vi.quantity
FROM vendor_inventory vi
WHERE vi.product_id = product_units.product_id
ORDER BY vi.market_date DESC
LIMIT 1
),
0
);

Loading