EpubConsolidator

EpubConsolidator is a versatile Python tool designed to simplify the process of converting .epub files into clean, consolidated text suitable for querying any large language model (LLM). It efficiently processes .epub files, removing unnecessary HTML and metadata, and segments the text into manageable parts to overcome character limits typically imposed by LLMs.

Overview

The tool is ideal for users looking to extract pure textual content from books, making it easier to leverage LLMs for insights, research, or any form of textual analysis. EpubConsolidator ensures that users receive only the essential content, free from formatting distractions, allowing for more effective interaction with various LLM technologies.

Components

EpubConsolidator consists of two main components:

EpubExtractor: Extracts .xhtml and .html files based on the order defined in the .opf file contained within the .epub, ensuring the textual content retains its original narrative sequence.
EpubConsolidator: Cleans the extracted files by removing HTML tags and unnecessary sections like indexes or footnotes. The process respects the character limitations of LLMs by segmenting the text into parts.

Usage

To use EpubConsolidator, simply place your .epub files in the same directory as the tool and execute the provided script. This automatically handles the extraction and consolidation of the text into clean segments ready for LLM processing.

How to Run

Clone or download the repository.
Ensure your .epub files are in the epub folder.
Execute the script to start the extraction and consolidation process:

python run.py

Find the output in the 'books' directory, organized into subdirectories named after the original .epub files, with consolidated text files labeled as booksegment*.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
consolidate_epub.py		consolidate_epub.py
epub_extractor.py		epub_extractor.py
readme.md		readme.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EpubConsolidator

Overview

Components

Usage

How to Run

About

Releases

Packages

Languages

mateogon/EpubConsolidator

Folders and files

Latest commit

History

Repository files navigation

EpubConsolidator

Overview

Components

Usage

How to Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages