-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathPostclassassignment.txt
67 lines (49 loc) · 2.16 KB
/
Postclassassignment.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
25/08/2023
Apply masking on table
create new user in sql & view masked data in sql table.
https://www.programiz.com/
https://regex101.com/
SQL:
write query for customer full name address, phone, email, compony.
Write query to show how many customer are in same compony
Write query to show how many customer are in same country
28/08/2023
SELECT * FROM [SalesLT].[SalesOrderHeader]
select * from [SalesLT].[SalesOrderDetail]
select * from [SalesLT].[Product]
write a sql query as per product id & product name to identify diffrence between Due date & ship date
29/08/2023
ADF -> PUT ZIP FILE IN BLOB
31/08/2023
https://www.microsoft.com/en-us/download/details.aspx?id=39717
zip -> unzip(3csv - ADLS/Raw/YourName)-> 1Parquet(1 - ADLS/refined/YourName)
ADLS(person.csv or CustomerSource.csv) -> ADF -> SQL Table(dbo.person or dbo.CustomerSource)
SQL->
1.Write SQL query to display top 10 sold product with product name, productid, quantitiy
2. Replace Product size s-> 30, M-> 32, L-> 34, XL-> 36, XXL-> 38 to display availble product group by size.
Refrence:
select * from [SalesLT].[Address]
select * from [SalesLT].[Product]
select * from [SalesLT].[SalesOrderDetail]
Python :
Take input of country & display acronym with respect to that
https://travel.state.gov/content/travel/en/us-visas/visa-information-resources/fees/country-acronyms.html
05/09/2023
Tranform customersource data with new column of salesperson, timestamp, Gender.
06/09/2023
remove underscore, columnname camel case, diffrent column for date & time
df = spark.sql("SELECT * FROM `samples`.`nyctaxi`.`trips`")
display(df)
07/09/2023
df = (spark # Our SparkSession & Entry Point
.read # Our DataFrameReader
<<FILL_IN>> # Read in the parquet files
<<FILL_IN>> # Reduce the columns to just the one
<<FILL_IN>> # Produce a unique set of values
)
totalArticles = df.<<FILL_IN>>
Timestamp -> InsertTime, Dataype-> timestmao
remove underscore from phone column
firstname-> uppercase
--> 11/09/2023
load data from dataframe to adls delta table under refined container with below column -> fullname, gender, phone, emailaddress