PDF Database Upload by faizaan3424 · Pull Request #5 · adanomad/pdf-highlight-oa

faizaan3424 · 2024-10-01T02:06:02Z

Project Documentation

Overview

This project enhances an existing application by adding functionality to store entire PDFs in a SQLite database. The goal is efficient storage, retrieval, and management of PDFs alongside their associated highlights.

Approach

The implementation involves several key components:

Data Encoding: PDFs are converted into binary data for storage using BLOB.
Database Schema: Two new tables (pdfs, highlights) manage PDFs and their metadata.
API Endpoints: RESTful endpoints handle PDF uploads, retrieval, and deletion.
New Classes/Utilities: Utilities abstract database operations and PDF handling.

Database Schema

New tables are introduced:

pdfs: Stores PDF metadata.
- pdfId, fileName, data (binary PDF data).
highlights: Stores highlights with:
- id, pdfId (foreign key), page number, coordinates, etc.

API Endpoints

POST /api/pdf/upload: Uploads PDF to the database using PDFStorage.
DELETE /api/pdf/delete: Deletes PDF by pdfId.

New Classes/Utilities

PDFStorage: A utility class for PDF operations:
- savePDF(), saveBulkPDFs(), getPdf(), deletePDF(), close().
sqliteUtils: Manages database migrations and highlights with new methods:
- savePdf(), getPdf(), deletePdf().

Frontend Integration

App.tsx: Manages file uploads, PDF viewing, and highlights:
- File Upload: Converts PDFs to a searchable format using OCR.
- Highlight Management: Displays highlights for PDFs.
- Search Functionality: Allows keyword searches in the PDF.
- API Interaction: Communicates with the backend for PDF and highlight management.

Challenges

Binary Data Handling: Careful management to avoid data corruption.
Database Schema Migration: Ensuring existing data is not affected.
Performance: Managing upload and retrieval without performance degradation.
OCR Integration: Adding OCR for searchable PDFs increased complexity.

Future Work

File Compression: Implement compression to save space.
Pagination and Lazy Loading: Optimize retrieval for better performance.
User Authentication: Securely associate PDFs with users.
Error Handling Enhancements: Improve feedback and error management.
Cloud Storage Integration: Scale with cloud services.

Conclusion

This project successfully extends the application's functionality by integrating PDF storage. New classes and utilities enhance the system, providing a foundation for future improvements.

sunapi386 and others added 2 commits September 25, 2024 18:07

update with highlight functionality

d2eb3a8

implemented functionality to save pdf into sqlite database

cd22cf4

sunapi386 force-pushed the main branch from d2eb3a8 to f257e67 Compare October 2, 2024 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF Database Upload#5

PDF Database Upload#5
faizaan3424 wants to merge 2 commits intoadanomad:mainfrom
faizaan3424:main

faizaan3424 commented Oct 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

faizaan3424 commented Oct 1, 2024

Project Documentation

Overview

Approach

Database Schema

API Endpoints

New Classes/Utilities

Frontend Integration

Challenges

Future Work

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants