Skip to content

PDF Database Upload#5

Open
faizaan3424 wants to merge 2 commits intoadanomad:mainfrom
faizaan3424:main
Open

PDF Database Upload#5
faizaan3424 wants to merge 2 commits intoadanomad:mainfrom
faizaan3424:main

Conversation

@faizaan3424
Copy link

Project Documentation

Overview

This project enhances an existing application by adding functionality to store entire PDFs in a SQLite database. The goal is efficient storage, retrieval, and management of PDFs alongside their associated highlights.

Approach

The implementation involves several key components:

  • Data Encoding: PDFs are converted into binary data for storage using BLOB.
  • Database Schema: Two new tables (pdfs, highlights) manage PDFs and their metadata.
  • API Endpoints: RESTful endpoints handle PDF uploads, retrieval, and deletion.
  • New Classes/Utilities: Utilities abstract database operations and PDF handling.

Database Schema

New tables are introduced:

  • pdfs: Stores PDF metadata.
    • pdfId, fileName, data (binary PDF data).
  • highlights: Stores highlights with:
    • id, pdfId (foreign key), page number, coordinates, etc.

API Endpoints

  • POST /api/pdf/upload: Uploads PDF to the database using PDFStorage.
  • DELETE /api/pdf/delete: Deletes PDF by pdfId.

New Classes/Utilities

  • PDFStorage: A utility class for PDF operations:

    • savePDF(), saveBulkPDFs(), getPdf(), deletePDF(), close().
  • sqliteUtils: Manages database migrations and highlights with new methods:

    • savePdf(), getPdf(), deletePdf().

Frontend Integration

  • App.tsx: Manages file uploads, PDF viewing, and highlights:
    • File Upload: Converts PDFs to a searchable format using OCR.
    • Highlight Management: Displays highlights for PDFs.
    • Search Functionality: Allows keyword searches in the PDF.
    • API Interaction: Communicates with the backend for PDF and highlight management.

Challenges

  • Binary Data Handling: Careful management to avoid data corruption.
  • Database Schema Migration: Ensuring existing data is not affected.
  • Performance: Managing upload and retrieval without performance degradation.
  • OCR Integration: Adding OCR for searchable PDFs increased complexity.

Future Work

  • File Compression: Implement compression to save space.
  • Pagination and Lazy Loading: Optimize retrieval for better performance.
  • User Authentication: Securely associate PDFs with users.
  • Error Handling Enhancements: Improve feedback and error management.
  • Cloud Storage Integration: Scale with cloud services.

Conclusion

This project successfully extends the application's functionality by integrating PDF storage. New classes and utilities enhance the system, providing a foundation for future improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants