Amazon Product Review Analysis

This repository contains an end-to-end Amazon Product Review Analysis project, demonstrating how to build and deploy LLM-powered sentiment analysis using both BERT (from Hugging Face Transformers) and a custom LSTM. The project also includes steps to design an ad ranking system leveraging sentiment scores, and guidance for AWS cloud deployment using Lambda or SageMaker.

Project Overview

Goal: Develop a comprehensive pipeline that performs:

Sentiment Analysis of Amazon product reviews (negative, neutral, positive), or direct star rating regression (1–5).
Ad Ranking System using sentiment scores to improve user engagement and click-through rates (CTR).
Cloud Deployment for scalable, low-latency inference on AWS (Lambda or SageMaker).

This project was inspired by real-world scenarios where companies like Amazon, Apple, Netflix, etc., use machine learning to optimize user experience and product recommendations.

Dataset & Preprocessing

Dataset: The project uses a .jsonl.gz file of Amazon product reviews (e.g., Movies_and_TV.jsonl.gz), which contains fields like:
- rating (1–5 star rating)
- text (the review content)
- Other optional metadata (e.g., title, timestamp, asin, etc.)
Preprocessing:
- Parsing: We read the GZ file line-by-line using Python’s gzip and json.
- Cleaning: Convert text to lowercase, remove punctuation, etc. (especially for LSTM tokenization).
- Labeling:
  - 3-Class Classification:
    - 0 = Negative (rating ≤ 2),
    - 1 = Neutral (rating = 3),
    - 2 = Positive (rating ≥ 4).
  - Regression: Predict the exact star rating from 1 to 5.
Train-Test Split: We split the dataset into 80% training and 20% test (or other ratios) using sklearn.model_selection.train_test_split, with stratification by sentiment if doing classification.

Modeling Approaches

BERT (Transformers)

Library: Hugging Face Transformers (transformers package).
Model: BertForSequenceClassification for 3-class classification or regression (num_labels=1 with problem_type="regression").
Tokenization: BertTokenizer (bert-base-uncased by default).
Training: Fine-tuned with AdamW optimizer, typical learning rate of 2e-5 for a few epochs.
Saving: model.save_pretrained() + tokenizer.save_pretrained().

Custom LSTM

Architecture: An embedding layer -> LSTM -> linear layer for output.
Preprocessing: Build a vocabulary (word_to_idx), convert each review to sequences of word indices, handle padding.
Training: Use CrossEntropyLoss for classification (3-class) or MSELoss for rating regression.
Saving: Use torch.save(model.state_dict(), "lstm.pth").

Three-Class Classification vs. Regression

3-Class:
- Negative, Neutral, Positive
- Helpful if you want discrete categories for your ad ranking logic.
Regression:
- Predict star rating from 1.0 to 5.0 (continuous).
- Offers finer granularity; can rank products by exact predicted rating.

Ad Ranking System

Predict Sentiment or Rating: For each product review, use the trained model to generate a sentiment score or predicted star rating.
Aggregate Scores:
- For each product (identified by asin), compute the average predicted sentiment/ rating across all its reviews.
Ranking:
- Sort products in descending order of average sentiment/rating.
- (Optional) Combine with other signals like historical CTR or user context.
CTR Improvement:
- Hypothetically, we measure CTR via an A/B test comparing the new ranking method to a baseline. In many real-world scenarios, a 15% improvement can be achieved with better product targeting.

Deployment

AWS SageMaker

Save Model:

model.save_pretrained('bert_3class_model')
tokenizer.save_pretrained('bert_3class_tokenizer')

Upload the artifacts to S3 using awscli or boto3.
Create SageMaker Endpoint:
- Define a PyTorchModel or HuggingFaceModel referencing your S3 artifacts.
- Provide an inference.py script that loads the model and handles prediction.
- Deploy to an instance type like ml.m5.large.

AWS Lambda

Package a smaller model (e.g., DistilBERT or a minimal LSTM) into a ZIP under 250MB (unzipped).
Upload to AWS Lambda, potentially using a Lambda Layer for large dependencies (like torch).
API Gateway can trigger the function for on-demand sentiment predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Code		Code
Dataset		Dataset
Report		Report
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Product Review Analysis

Table of Contents

Project Overview

Dataset & Preprocessing

Modeling Approaches

BERT (Transformers)

Custom LSTM

Three-Class Classification vs. Regression

Ad Ranking System

Deployment

AWS SageMaker

AWS Lambda

About

Releases

Packages

Languages

monishafr/Amazon-Product-Review-Analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Review Analysis

Table of Contents

Project Overview

Dataset & Preprocessing

Modeling Approaches

BERT (Transformers)

Custom LSTM

Three-Class Classification vs. Regression

Ad Ranking System

Deployment

AWS SageMaker

AWS Lambda

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages