Web Scraping and NLP for IPCC Climate Reports

This project combines web scraping and Natural Language Processing (NLP) to extract and analyze climate reports from the IPCC. The notebook includes steps to download PDF reports, parse content, and apply NLP techniques to derive insights from the data.

Project Overview

This repository focuses on:

Web Scraping: Automated extraction and downloading of multiple PDF reports from the IPCC website.
Data Processing: Conversion and management of large PDF files, including validation and file existence checks.
Natural Language Processing (NLP): Applying NLP to analyze textual content from climate reports, enabling data-driven insights.

Steps and tools

1. Data Extraction:

Web scraping with Beautifulsoup used to extract data from the IPCC website

Data Transformation:
- PDF files containing climate change reports are converted to text format.
- Text data is cleaned and pre-processed for NLP analysis.

3. NLP Analysis:

SpaCy NLP tools were used to analyze the text data, including:

a. Keyword extraction: Identifying key terms and concepts related to climate change.

b. Topic modeling: Identifying the main topics discussed in the reports.

4. Information Visualization:

The results of the NLP analysis are visualized using Worldcloud. This allows users to explore and understand the key findings of the climate change reports.

5. Input and Output Program

The program allows users to input a specific keyword or phrase.
The program then retrieves all paragraphs from the reports that contain the input keyword or phrase, this allows users to find specific information related to their interests quickly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Web Scraping and NLP for IPCC Climate Reports

Project Overview

Steps and tools

1. Data Extraction:

3. NLP Analysis:

4. Information Visualization:

5. Input and Output Program

Files

README.md

Latest commit

History

README.md

File metadata and controls

Web Scraping and NLP for IPCC Climate Reports

Project Overview

Steps and tools

1. Data Extraction:

3. NLP Analysis:

4. Information Visualization:

5. Input and Output Program