Postclassassignment.txt

25/08/2023
Apply masking on table
create new user in sql & view masked data in sql table.

https://www.programiz.com/
https://regex101.com/

SQL:
write query for customer full name address, phone, email, compony.
Write query to show how many customer are in same compony 
Write query to show how many customer are in same country 


28/08/2023
SELECT *   FROM [SalesLT].[SalesOrderHeader]
select * from [SalesLT].[SalesOrderDetail]
select * from [SalesLT].[Product]

write a sql query as per product id & product name to identify diffrence between Due date & ship date


29/08/2023
ADF -> PUT ZIP FILE IN BLOB

31/08/2023
https://www.microsoft.com/en-us/download/details.aspx?id=39717
zip -> unzip(3csv - ADLS/Raw/YourName)-> 1Parquet(1 - ADLS/refined/YourName)

ADLS(person.csv or CustomerSource.csv) -> ADF -> SQL Table(dbo.person or dbo.CustomerSource)

SQL->
1.Write SQL query to display top 10 sold product with product name, productid, quantitiy
2. Replace Product size s-> 30, M-> 32, L-> 34, XL-> 36, XXL-> 38 to display availble product group by size.

Refrence:
select * from [SalesLT].[Address]
select * from [SalesLT].[Product]
select * from [SalesLT].[SalesOrderDetail]

Python : 
Take input of country & display acronym with respect to that
https://travel.state.gov/content/travel/en/us-visas/visa-information-resources/fees/country-acronyms.html


05/09/2023
Tranform customersource data with new column of salesperson, timestamp, Gender.

06/09/2023
remove underscore, columnname camel case, diffrent column for date & time
df = spark.sql("SELECT * FROM `samples`.`nyctaxi`.`trips`")
display(df)

07/09/2023
df = (spark                    # Our SparkSession & Entry Point
  .read                        # Our DataFrameReader
  <<FILL_IN>>                  # Read in the parquet files
  <<FILL_IN>>                  # Reduce the columns to just the one
  <<FILL_IN>>                  # Produce a unique set of values
)
totalArticles = df.<<FILL_IN>> 

Timestamp -> InsertTime, Dataype-> timestmao
remove underscore from phone column
firstname-> uppercase

--> 11/09/2023
load data from dataframe to adls delta table under refined container with below column -> fullname, gender, phone, emailaddress