Books Scraping and Analysis Project

Overview

This project involves scraping book data from the website books.toscrape.com and performing analysis on the collected data. The project consists of two main parts: data scraping and data analysis.

Data Scraping

The data scraping part of the project uses Python libraries requests and BeautifulSoup to extract book data from the website. The scraped data includes book title, price, availability, and description. The data is stored in a JSON file named books.json.

Data Analysis

The data analysis part of the project uses Python libraries pandas, numpy, matplotlib, and seaborn to analyze the collected data. The analysis includes:

Overview of the data
Handling missing values
Price analysis (average, median, highest, and * lowest prices)
Availability analysis (average and median availability)
Category analysis (number of books by category)
Price distribution by category
Correlation analysis between price and availability
Top 5 most expensive and cheapest books

Requirements

To run the project, you need to install the following Python libraries:

requests beautifulsoup4 pandas numpy matplotlib seaborn scipy

You can install the libraries using pip: bash pip install -r requirements.txt

Usage

Clone the repository: git clone https://github.com/your-username/books-scraping-analysis.git
Navigate to the project directory: cd books-scraping-analysis
Run the data scraping script: python src/scraping.py
Run the data analysis script: python src/analysis.py

Results

The results of the analysis are stored in the data/analysis directory. The results include:

cleaned_books.csv: cleaned data in CSV format
price_distribution.png: histogram of price distribution
books_by_category.png: bar chart of number of books by category
price_by_category.png: box plot of price distribution by category
price_vs_availability.png: scatter plot of correlation between price and availability

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scr		scr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Books Scraping and Analysis Project

Overview

Data Scraping

Data Analysis

Requirements

Usage

Results

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Jim-by/scrape_analysis_books

Folders and files

Latest commit

History

Repository files navigation

Books Scraping and Analysis Project

Overview

Data Scraping

Data Analysis

Requirements

Usage

Results

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages