This repository contains a collection of solutions to interview problems focused on PySpark and SQL. The problems were sourced from various online platforms, and each solution was implemented as a way to practice and enhance my skills in data processing, querying, and analytics using PySpark and SQL.
- Problems: Contains detailed descriptions of the problems, including examples and explanations to provide context and clarity.
- Datasets: Provides a collection of tables used to solve the problems, offering diverse scenarios for practice and exploration.
- Solutions: Includes Python scripts demonstrating the application of SQL to solve the problems. Each file features SQL queries addressing various challenges, such as joins, subqueries, aggregations, and data manipulation. Additionally, the solutions include variations implemented using PySpark for broader applicability.
- PySpark: Working with distributed data processing, DataFrame operations, RDD transformations, and Spark SQL.
- SQL: Advanced SQL queries, including joins, window functions, CTEs (Common Table Expressions), subqueries, and aggregations.
- Problem Solving: Developing solutions to complex data-related problems often encountered in technical interviews.
Feel free to fork this repository, make improvements, or add new problems and solutions.