Amazon_Vine_Analysis

By using PySpark, Google Colab, PgAdmin, AWS RDS and S3

Project overview

The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies like SellBy pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.

In this project, we have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We picked one of these datasets and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Also, we used PySpark to determine if there is any bias toward favorable reviews from Vine members in the dataset.

Resources

"https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Home_Entertainment_v1_00.tsv.gz"

Results

The customers_table DataFrame

The products_table DataFrame

The review_id_table DataFrame

The vine_table DataFrame

DataFrames into pgAdmin

customers_table

products_table

review_id_table

vine_table

vine_table analysis

Vine reviews 261 and non-Vine reviews 24040

There were total 11005 five star reviews. Vine reviews were five stars 106 and non-Vine reviews were five stars 10899

Percentage of Vine reviews there were five stars is 40.61% and for non-Vine reviews five stars is 45.34%.

Summary: Determine Bias of Vine Reviews.

The output of both percentages does not have enough margin to decide whether they contain any bias or not in the Vine program. The vine sample size still has a decent number while it is important to note that the non-vine sample has not much difference. The output of both percentages does not have enough margin to decide whether they contain any bias or not in the Vine program. We can perform further analysis on verified purchases to determine the percentage and compare them to see if we can decide if this reveals any positivity bias.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Module_16		Module_16
Amazon_Reviews_ETL.ipynb		Amazon_Reviews_ETL.ipynb
Amazon_Reviews_ETL_starter_code.ipynb		Amazon_Reviews_ETL_starter_code.ipynb
README.md		README.md
Vine_Review_Analysis.ipynb		Vine_Review_Analysis.ipynb
challenge_schema.sql		challenge_schema.sql
practiceSpark.ipynb		practiceSpark.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon_Vine_Analysis

Project overview

Resources

Results

DataFrames into pgAdmin

vine_table analysis

Summary: Determine Bias of Vine Reviews.

About

Releases

Packages

Languages

Wamuza1/Amazon_Vine_Analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon_Vine_Analysis

Project overview

Resources

Results

DataFrames into pgAdmin

vine_table analysis

Summary: Determine Bias of Vine Reviews.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages