update guide t remove grammar errors

c0d33ngr · c0d33ngr · commit d229df0a5ed8 · 2024-09-26T01:46:30.000+01:00
Signed-off-by: Jeffrey &lt;jeffwhewhetu@gmail.com&gt;
diff --git a/guides/20240920_guide_building_a_duckdb_playground_with_daytona.md b/guides/20240920_guide_building_a_duckdb_playground_with_daytona.md
@@ -1,60 +1,60 @@
 ---
 title: "Building DuckDB Playground Environment in Daytona Workspace."
-description: "Set up a DuckDB environment in Daytona Workspace and master some data tasks including cleaning, reformating and splitting CSV file, with this step-by-step guide."
+description: "Set up a DuckDB environment in Daytona Workspace and master some data tasks including cleaning, reformatting, and splitting a CSV file, with this step-by-step guide."
 date: 2024-09-20
 author: "Jeffrey Whewhetu"
-tags: ["DuckDB", "OLAP", "daytona", "Python"]
+tags: ["DuckDB", "OLAP", "Daytona", "Python"]
 ---
 
 # Building DuckDB Playground Environment in Daytona Workspace
 
 # Introduction
-This is a comprehensive hands-on guide in using [DuckDB](20240922_definition_duckdb.md) database to perform a real world data project in a containerized [workspace](20240819_definition_daytona workspace.md) using Daytona. You'll follow me along from setup to actually working with DuckDB cli and even with [Python](20240820_defintion_python.md) via it's Client API. So it's a long ride and you can get a coffee closed by.
+This is a comprehensive hands-on guide in using [DuckDB](20240922_definition_duckdb.md) database to perform a real-world data project in a containerized [workspace](20240819_definition_daytona workspace.md) using Daytona. You'll follow me along from setup to actually working with DuckDB cli and even with [Python](20240820_defintion_python.md) via its Client API. So it's a long ride and you can get a coffee nearby.
 
-In this comprehensive guide, you will learn how to prepare personal loan marketing campaign data for importation into a DuckDB database and do some analysis on the dataset. Your tasks will include collecting and reviewing the data, cleaning and structuring it according to a specification, handling errors and inconsistencies, transforming and splitting it into multiple CSV files. The CSV file you'll work on is called `bank_marketing.csv`, download from GitHub [here](https://github.com/c0d33ngr/playground-duckdb/blob/main/bank_marketing.csv)
+In this comprehensive guide, you will learn how to prepare personal loan marketing campaign data for importation into a DuckDB database and analyze the dataset. Your tasks will include collecting and reviewing the data, cleaning and structuring it according to a specification, handling errors and inconsistencies, and transforming and splitting it into multiple CSV files. The CSV file you'll work on is called `bank_marketing.csv`, download from GitHub [here](https://github.com/c0d33ngr/playground-duckdb/blob/main/bank_marketing.csv)
 
 # TL;DR
 
 - What you need to follow along with the guide.
-- What's DuckDB and Why use it
+- What's DuckDB and Why Use it
 - Set up a Daytona Workspace with DuckDB [environment](20240819_definition_development environment.md)
 - Hands-on practice using DuckDB as a CLI Tool
 - Hands-on practice using DuckDB client API with [Python](20240820_defintion_python.md)
 - Conclusion
 
 # Prerequisites
 
-To follow along with hands-on guide about DuckDB Playground in Daytona, you'll need to have the following;
+To follow along with a hands-on guide about DuckDB Playground in Daytona, you'll need to have the following;
 
 - An [IDE](20240819_definition_integrated development environment _ide_.md)(It could be VS Code, or JetBrains) or just a terminal.
-- [Docker](20240819_definition_docker.md) installation on your PC or Mac. Click here for more info
-- Daytona installation on your PC or Mac. Click here for more info
-- A GitHub account to create a [repository](20240819_definition_repository.md). Link here to create one, if you don’t have
-- Basic knowledge of [Git](20240819_definition_git.md) and GitHub
+- [Docker](20240819_definition_docker.md) installation on your PC or Mac. Click here for more info.
+- Daytona installation on your PC or Mac. Click here for more info.
+- A GitHub account to create a [repository](20240819_definition_repository.md). Link here to create one, if you don’t have one.
+- Basic knowledge of [Git](20240819_definition_git.md) and GitHub.
 
-# What's DuckDB and Why use it
+# What's DuckDB and Why Use it
 
 ## DuckDB
 
-[DuckDB](20240922_definition_duckdb.md) is a fast in-process data analytical database with support of feature-rich SQL dialect complemented with deep integrations into client APIs. It's designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. It's specialized for [online analytical processing (OLAP)](20240922_definition_online_analytical_processing_olap.md) workloads
+[DuckDB](20240922_definition_duckdb.md) is a fast in-process data analytical database with support of feature-rich SQL dialect complemented with deep integrations into client APIs. It's designed to perform highly complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. It's specialized for [online analytical processing (OLAP)](20240922_definition_online_analytical_processing_olap.md) workloads.
 
 ## Features of it
 
-DuckDB has lots of features that make it stand out among other databases which focus on [OLAP](20240922_definition_online_analytical_processing_olap.md). Some of the features are:
+DuckDB has many features that make it stand out among other databases focusing on [OLAP](20240922_definition_online_analytical_processing_olap.md). Some of the features are:
 
-- **Simple:** It's very simple to install and perform embedded in-process operation.
+- **Simple:** It's straightforward to install and perform embedded in-process operations.
 - **Portable:** Since it has no external dependencies, it's extremely portable and can be compiled for all major operating systems and CPU architectures.
-- **Feature-Rich:** DuckDB has some interesting features such as extensive support for SQL complex queries, integrations to languages like [Python](20240820_defintion_python.md), R and Java and data can be stored as persistent, single-file databases.
-- **Speed:** it's faster as it uses columnar-vectorized query execution engine which improves performance to run [OLAP](20240922_definition_online_analytical_processing_olap.md) workloads.
+- **Feature-Rich:** DuckDB has some interesting features such as extensive support for SQL complex queries, integrations to languages like [Python](20240820_defintion_python.md), R and Java, and data can be stored as persistent, single-file databases.
+- **Speed:** It's faster as it uses a columnar-vectorized query execution engine which improves performance to run [OLAP](20240922_definition_online_analytical_processing_olap.md) workloads.
 - **Free:** Lastly, it's a free [open source](20240819_definition_open source.md) database system which anyone can use because of its permissive MIT License.
 
 # Setting up Daytona Workspace for DuckDB Playground
 
-Alright that's enough reading, now let us get started to writing codes. To do so you will need to set up a DuckDB [environment](20240819_definition_development environment.md) in a [Daytona workspace](20240819_definition_daytona workspace.md). Let’s begin.
+Alright, that's enough reading, now let us start writing codes. To do so you will need to set up a DuckDB [environment](20240819_definition_development environment.md) in a [Daytona workspace](20240819_definition_daytona workspace.md). Let’s begin.
 
 ## Step 1: Create a GitHub Repository
 
-First head to GitHub website and create a [repository](20240819_definition_repository.md) with the name of your choice. For my repository name, I’ll use `playground-duckdb`. The full URL path to the repository is `https://github.com/c0d33ngr/playground-duckdb`
+First head to the GitHub website and create a [repository](20240819_definition_repository.md) with the name of your choice. For my repository name, I’ll use `playground-duckdb`. The full URL path to the repository is `https://github.com/c0d33ngr/playground-duckdb`
 
 ## Step 2: Clone the repository using Git
 
@@ -64,19 +64,19 @@ In my case, it’s `git clone https://github.com/c0d33ngr/playground-duckdb`
 
 ## Step 3: Prepare your `devcontainer.json` file and dataset in CSV format
 
-Run the command to move into your cloned repository but don’t forget to replace `playground-duckdb` with your own repository name you created if yours isn’t the same with mine.
+Run the command to move into your cloned repository but don’t forget to replace `playground-duckdb` with the repository name you created if yours isn’t the same as mine.
 
 ```bash
 cd playground-duckdb
 ```
 
-Download the bank campaign dataset you are going to perform data tasks on which is in CSV format, from GitHub repo [here](https://github.com/c0d33ngr/playground-duckdb/blob/main/bank_marketing.csv).
+Download the bank campaign dataset you are going to perform data tasks on which is in CSV format, from the GitHub repo [here](https://github.com/c0d33ngr/playground-duckdb/blob/main/bank_marketing.csv).
 
 Note: It has to be in the directory of your clone repository. In my case, it's inside `playground-duckdb`. 
 
-Now, lets proceed to the next step.
+Now, let us proceed to the next step.
 
-Create a hidden directory named `.devcontainer` where our `devcontainer.json` file will be. Let’s do so and move into it
+Create a hidden directory named `.devcontainer` where our `devcontainer.json` file will be. Let’s do so and move into it.
 
 Run the command to do so
 
@@ -92,7 +92,7 @@ I use `nano` to create my `.devcontainer.json` file using this command.
 nano devcontainer.json
 ```
 
-Paste this code into your `devcontainer.json` file
+Paste this code into your `devcontainer.json` file.
 
 ```yaml
 {
@@ -109,81 +109,81 @@ Paste this code into your `devcontainer.json` file
 The `devcontainer.json` content contains configurations to start your DuckDB environment in a [Daytona workspace](20240819_definition_daytona workspace.md).
 
 - `name`: This sets the name of the development container environment to `DuckDB Playground`.
-- `image`: This uses a base Ubuntu image from Microsoft image repository.
-- `features`: This configuration add DuckDB installation and Python setups in the Daytona workspace
-- `postCreateComand`: This install the Python packages needed for this guide into the workspace.
+- `image`: This uses a base Ubuntu image from the Microsoft image repository.
+- `features`: This configuration adds DuckDB installation and Python setups in the Daytona workspace
+- `postCreateComand`: This installs the Python packages needed for this guide into the workspace.
 
-After created and saved the `devcontainer.json` file, move up back to the root directory of your clone [repository](20240819_definition_repository.md). For me, I run the command below
+After creating and saving the `devcontainer.json` file, move up back to the root directory of your clone [repository](20240819_definition_repository.md). For me, I run the command below.
 
 ```bash
 cd ../..
 ```
 
 ## Step 4: Commit and Push Changes to GitHub
 
-Run this commands to push your changes to GitHub
+Run these commands to push your changes to GitHub.
 
 ```bash
 git add .
 git commit -m “add devcontainer.json file”
 git push
 ```
 
-Now, you have successfully push our updated repository that contains our configuration file (`devcontainer.json`) for our DuckDB environment
+Now, you have successfully pushed our updated repository, which contains our configuration file (`devcontainer.json`) for our DuckDB environment.
 
 ## Step 5: Verify Daytona Installation
 
-Run this command to check `daytona` is properly installed in your PC or Mac
+Run this command to check `daytona` is properly installed on your PC or Mac.
 
 ```bash
 daytona –-version
 ```
 
-You should see your version of `daytona` installed
+You should see your version of `daytona` installed.
 
 ## Step 6: Create a Daytona Workspace with DuckDB Playground Environment in it
 
-Let’s start daytona server by running the command
+Let’s start the daytona server by running the command.
 
 ```bash
 daytona serve
 ```
 
-You should see logs like my screenshot
+You should see logs like my screenshot.
 
 Open a new tab in your terminal, for Linux its `Shift + Ctrl + T`
 
-Run the command below in a new tab of your terminal and follow the prompt instruction. It would ask you for a [workspace](20240819_definition_daytona workspace.md) name to use, just choose the default.
+Run the command below in a new tab of your terminal and follow the prompt instructions. It would ask you for a [workspace](20240819_definition_daytona workspace.md) name to use, choose the default.
 
 Replace `USERNAME` and `REPOSITORY-NAME` with your username for GitHub and the repository name you created earlier.
 
 ```bash
 daytona create https://github.com/USERNAME/REPOSITORY-NAME
 ```
 
-In my case, it's this
+In my case, it's this.
 
 ```bash
 daytona create https://github.com/c0d33ngr/playground-duckdb
 ```
 
-After you successfully ran the above command you should see screenshot like mine showing your Daytona workspace that contains the DuckDB environment is running
+After you successfully run the above command you should see a screenshot like mine showing your Daytona workspace that contains the DuckDB environment is running.
 
 You can now run this command to open the DuckDB [environment](20240819_definition_development environment.md) in your default [IDE](20240819_definition_integrated development environment _ide_.md) you choose when installing Daytona (Replace `WORKSPACE-NAME` with the name you used when creating the workspace above, in my case it's `playground-duckdb`).
 
 ```bash
 daytona code WORKSPACE-NAME
 ```
 
-That’s it. Daytona will create a DuckDB playground environment for you and open it in your default IDE you set.
+That’s it. Daytona will create a DuckDB playground environment for you and open it in the default IDE you set.
 
 # Using DuckDB as a Command Line Interface (CLI) Tool
 
-In this section, you'll learn how to work with [DuckDB](20240922_definition_duckdb.md) by creating a database from a CSV file, examining its structure, retrieving distinct values, and exporting data to separate CSV files for client, campaign, and economics data. Finally, you'll verify the exported data, gaining hands-on experience with DuckDB's querying and data manipulation capabilities. Lets get started
+In this section, you'll learn how to work with [DuckDB](20240922_definition_duckdb.md) by creating a database from a CSV file, examining its structure, retrieving distinct values, and exporting data to separate CSV files for client, campaign, and economics data. Finally, you'll verify the exported data, gaining hands-on experience with DuckDB's querying and data manipulation capabilities. Let us get started.
 
 ## Step 1: Enter DuckDB Interactive Shell
 
-By now, you should be in your default IDE set up using `daytona`. In your IDE terminal, type the command below to enter into DuckDB database shell in interactive mode where you'll run some SQL based queries that conformed to DuckDB database.
+By now, you should be in your default IDE set up using `daytona`. In your IDE terminal, type the command below to enter into the DuckDB database shell in interactive mode where you'll run some SQL-based queries that conform to the DuckDB database.
 
 ```sql
 duckdb
@@ -231,18 +231,18 @@ COPY (
 ) TO 'client.csv' (DELIMITER ',', HEADER TRUE);
 ```
 
-## Step 5: Retrieve List of Distinct Records in `day` Column
+## Step 5: Retrieve the List of Distinct Records in `day` Column
 
-Run the following SQL query to retrieve a list of distinct days from the bank_marketing table. The results would be useful in the preparation of the SQL query for step 7. We need to know the unique records in the `day` column.
+Run the following SQL query to retrieve a list of distinct days from the bank_marketing table. The results would be useful in preparing the SQL query for step 7. We need to know the unique records in the `day` column.
 
 ```sql
 SELECT DISTINCT day
 FROM 'bank_marketing.csv';
 ```
 
-## Step 6: Retrieve List of Distinct Records in `month` Column
+## Step 6: Retrieve the List of Distinct Records in `month` Column
 
-Run the following SQL query to retrieve list of distinct months from the `bank_marketing` table. The results are also needed for the creation of a new column called `last_contact_date` later in step 7.
+Run the following SQL query to retrieve the list of distinct months from the `bank_marketing` table. The results are also needed for the creation of a new column called `last_contact_date` later in step 7.
 
 ```sql
 SELECT DISTINCT month
@@ -283,15 +283,15 @@ COPY (
             WHEN LOWER(month) = 'oct' THEN 10
             WHEN LOWER(month) = 'nov' THEN 11
             WHEN LOWER(month) = 'dec' THEN 12
-            ELSE NULL  -- default value if month is unknown
+            ELSE NULL  -- default value if the month is unknown
           END,
           CAST(day AS BIGINT)
       ) AS last_contact_date
   FROM bank_marketing
 ) TO 'campaign.csv' (DELIMITER ',', HEADER TRUE);
 ```
 
-## Step 8: Export Economical Data to CSV
+## Step 8: Export economic data to CSV
 
 Run the following SQL query to export economics data to a CSV file named `economics.csv`
 
@@ -307,7 +307,7 @@ COPY (
 
 ## Step 9: Read Data from Exported CSV files
 
-Run the following SQL queries to read data from the `client.csv`, `campaign.csv` and `economics.csv` files.
+Run the following SQL queries to read data from the `client.csv`, `campaign.csv`, and `economics.csv` files.
 
 ```sql
 SELECT *
@@ -324,11 +324,11 @@ SELECT *
 FROM 'economics.csv';
 ```
 
-Now, our three CSV files are prepared for some analysis using DuckDB Client API via Python. Let head to the next section for the analysis.
+Now, our three CSV files have been prepared for analysis using DuckDB Client API via Python. Let's head to the next section for the analysis.
 
 # Using DuckDB with Python through its Client API
 
-In this section, you'll learn how to analyze and visualize data using [DuckDB](20240922_definition_duckdb.md) and [Matplotlib](20240922_definition_matplotlib.md). You'll calculate the campaign success rate, create a bar chart to compare average client age by education level, and generate a scatter plot to explore the relationship between contact duration and campaign outcome. We'll use the cleaned and transformed CSV files spilt from our `bank_marketing.csv` in this section. 
+In this section, you'll learn how to analyze and visualize data using [DuckDB](20240922_definition_duckdb.md) and [Matplotlib](20240922_definition_matplotlib.md). You'll calculate the campaign success rate, create a bar chart to compare average client age by education level and generate a scatter plot to explore the relationship between contact duration and campaign outcome. We'll use the cleaned and transformed CSV files split from our `bank_marketing.csv` in this section. 
 
 ## Step 1: Analysis of Customer Campaign Success Rate
 
@@ -385,7 +385,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-Run the file in your IDE terminal using `python3 client_age_by_education.py` and you should see visualization.
+Run the file in your IDE terminal using `python3 client_age_by_education.py` and you should see the visualization.
 
 ## Step 3: Analysis and Visualization of Contact Duration and Campaign Outcome through Correlation
 
@@ -421,10 +421,10 @@ That's it. You have done lots of data tasks using [DuckDB](20240922_definition_d
 
 # Conclusion
 
-In this comprehensive guide, you have explored the capabilities of using DuckDB in a Daytona Workspace with no stress through hands-on example.
+In this comprehensive guide, you have explored the capabilities of using DuckDB in a Daytona Workspace with no stress through hands-on examples.
 Throughout this guide, you have gained practical experience in:
-- Creating and managing database with DuckDB in memory
-- Perform SQL queries for data cleaning, transformation and splitting
+- Creating and managing a database with DuckDB in memory
+- Perform SQL queries for data cleaning, transformation, and splitting
 - Integration of DuckDB using its Client API with Python for data analysis.
 
 # References