Skip to content

Jualns/Training-PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark & SQL Interview Problem Solutions

This repository contains a collection of solutions to interview problems focused on PySpark and SQL. The problems were sourced from various online platforms, and each solution was implemented as a way to practice and enhance my skills in data processing, querying, and analytics using PySpark and SQL.

Repository Structure

  • Problems: Contains detailed descriptions of the problems, including examples and explanations to provide context and clarity.
  • Datasets: Provides a collection of tables used to solve the problems, offering diverse scenarios for practice and exploration.
  • Solutions: Includes Python scripts demonstrating the application of SQL to solve the problems. Each file features SQL queries addressing various challenges, such as joins, subqueries, aggregations, and data manipulation. Additionally, the solutions include variations implemented using PySpark for broader applicability.

Skills Developed

  • PySpark: Working with distributed data processing, DataFrame operations, RDD transformations, and Spark SQL.
  • SQL: Advanced SQL queries, including joins, window functions, CTEs (Common Table Expressions), subqueries, and aggregations.
  • Problem Solving: Developing solutions to complex data-related problems often encountered in technical interviews.

Contributing

Feel free to fork this repository, make improvements, or add new problems and solutions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages