Summarization App

Overview

The Summarization App is a web-based tool that provides text summarization for both books and YouTube videos. The app can:

Summarize chapters from books hosted on Project Gutenberg (only UTF-8 format).
Generate summaries from YouTube videos using either existing transcripts or transcriptions of the audio.

This project leverages several state-of-the-art machine learning models, such as BART for text summarization and Whisper for audio transcription, all through an intuitive interface built with Streamlit.

Features

1. Book Summarization:

Fetches book content directly from Project Gutenberg (UTF-8 format only).
Automatically extracts chapters from the book.
Summarizes selected chapters using BART.

2. YouTube Video Summarization:

Fetches and summarizes transcripts of YouTube videos.
If no transcript is available, downloads and transcribes the audio using Whisper, then generates a summary.

Requirements

Python Version

Ensure you have Python 3.8 or above installed.

Dependencies

The app relies on several key Python libraries:

streamlit: For the web interface.
transformers: To use BART for text summarization.
torch: To handle model computations.
nltk: For sentence tokenization.
whisper: For audio transcription.
yt-dlp: For downloading YouTube audio.
pytube: To fetch video metadata.
youtube_transcript_api: For fetching YouTube transcripts.
requests: For fetching book content.

Installation

1. Clone the Repository

git clone https://github.com/your-username/summarization-app.git
cd summarization-app

2. Set up the Virtual Environment

It’s recommended to use a virtual environment to manage dependencies:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

Install the required Python packages:

pip install -r requirements.txt

4. Download NLTK Data

The app requires the punkt tokenizer from NLTK for sentence tokenization. To download it:

python -c "import nltk; nltk.download('punkt')"

5. Install yt-dlp

Ensure that yt-dlp is installed to handle YouTube video downloads. You can install it via pip:

pip install yt-dlp

Or download directly from the official yt-dlp repository.

Running the App

To start the Streamlit app, run:

streamlit run app.py

This will open the app in your web browser.

Usage

Book Summarization

Supported Source: The app only supports books from Project Gutenberg that are in UTF-8 text format.
Book URL Requirements:
- Ensure the book URL is from Project Gutenberg.
- The book must be in UTF-8 plain text format.
- The URL should directly point to the UTF-8 version of the plain text file.

How to Get the Right Format from Project Gutenberg:

Navigate to Project Gutenberg.
Search for the book you wish to summarize.
On the book’s download page, scroll down to the Download options.
Click on the Plain Text UTF-8 format.
- Example: https://www.gutenberg.org/files/1342/1342-0.txt (for Pride and Prejudice).
Copy the URL of the UTF-8 plain text file and use it in the app.

If you attempt to use a different format (e.g., HTML, PDF, or other encodings), the app will not be able to process the book.

YouTube Video Summarization

Supported Source: The app accepts YouTube video URLs.
Transcript: If the video has a transcript, the app fetches it automatically.
Transcription: If no transcript is available, the app downloads the video’s audio and transcribes it using Whisper.

App Interface

Main Page:
- Select either Book Summarization or YouTube Video Summarization.
Book Summarization:
- Enter the URL of a book from Project Gutenberg (in UTF-8 format).
- Fetch the book content and select a chapter.
- Summarize the selected chapter and download the summary.
YouTube Video Summarization:
- Enter a YouTube video URL.
- Fetch the video’s transcript (if available) or transcribe its audio using Whisper.
- Summarize the transcript and download the summary.

Models Used

1. BART:

Model: facebook/bart-large-cnn
Purpose: Summarization of text (used for both book chapters and YouTube transcripts).

2. Whisper:

Model: openai/whisper
Purpose: Transcription of YouTube audio (used when no transcript is available).

Folder Structure

The summaries and transcripts are saved in the following structure:

summaries/
  └── <Book or Video Title> - <Author or Video ID>/
      ├── Chapter_<n>.txt
      ├── transcript.txt
      └── summary.txt

Future Improvements

Adding support for more book formats and sources.
Enhancing the transcription quality using Whisper's larger models.
Implementing a more robust chapter detection algorithm for books.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
bksm.py		bksm.py
enhanced_working.py		enhanced_working.py
old-app.py		old-app.py
project.py		project.py
t5.py		t5.py
try_working.py		try_working.py
ytsm.py		ytsm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarization App

Overview

Features

1. Book Summarization:

2. YouTube Video Summarization:

Requirements

Python Version

Dependencies

Installation

1. Clone the Repository

2. Set up the Virtual Environment

3. Install Dependencies

4. Download NLTK Data

5. Install yt-dlp

Running the App

Usage

Book Summarization

How to Get the Right Format from Project Gutenberg:

YouTube Video Summarization

App Interface

Models Used

1. BART:

2. Whisper:

Folder Structure

Future Improvements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Summarization App

Overview

Features

1. Book Summarization:

2. YouTube Video Summarization:

Requirements

Python Version

Dependencies

Installation

1. Clone the Repository

2. Set up the Virtual Environment

3. Install Dependencies

4. Download NLTK Data

5. Install yt-dlp

Running the App

Usage

Book Summarization

How to Get the Right Format from Project Gutenberg:

YouTube Video Summarization

App Interface

Models Used

1. BART:

2. Whisper:

Folder Structure

Future Improvements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages