Skip to content

Storage and upload of pdf into SQLite database + Multi-pdf search#6

Open
quanttrinh wants to merge 8 commits intoadanomad:mainfrom
quanttrinh:main
Open

Storage and upload of pdf into SQLite database + Multi-pdf search#6
quanttrinh wants to merge 8 commits intoadanomad:mainfrom
quanttrinh:main

Conversation

@quanttrinh
Copy link

Overview

This pull request implement three features:

  • Storage and upload of pdf into SQLite database.
  • Multi-pdf search.
  • Extra feature: Ability to diable/enable ocr and multi-pdf search by passing environment variables (NEXT_PUBLIC_DEBUG_OCRPDF_ENABLED and NEXT_PUBLIC_DEBUG_MULTISEARCH_ENABLED).

Setup

Clone the repo:

git clone https://github.com/QuanTrinhCA/pdf-highlight-oa.git

Go to root directory:

cd pdf-highlight-oa

Setup environment variables:

echo 'NEXT_PUBLIC_DEBUG_OCRPDF_ENABLED="true"       # Enable ocr
NEXT_PUBLIC_DEBUG_MULTISEARCH_ENABLED="true"  # Enable multi-search' >> .env.local

Install dependencies and start dev server:

pnpm install
pnpm run dev

Approach

The general philosophy is that the changes shall be contained and minimally affects existing features and code.

Storage and upload of pdf into SQLite database

  • Created the interface for accessing the SQLite database in a similar style to the existing highlightStorage.ts to maintain consistency.
  • Rewrite sqliteUtils.ts, creating a parent SQLiteDatabase class which contains common functions.
  • Created PdfSQLiteDatabase and HighlightsSQLiteDatabase classes which extends SQLiteDatabase to maintain separation between added pdf db feature and existing highlight db feature. Each class accesses a separate db file to keep the existing hightlight db unmodified.
  • Add pdf file to base64 converter function in pdfUtils.ts.
  • Created API endpoints (get and update) for pdf upload.
  • Add upload functionality to handleFileUpload in App.tsx.

Multi-pdf search

  • Added extra storedHightlights state + props to store all highlights from all pdfs.
  • Modified handleSearch to additionally fetch pdfs from API and perform search of them and append the result to storedHightlights as well as saving them in the database.
  • Added callback function in App.tsx and pass it to Sidebar.tsx to change pdf when clicked on highlight of a different pdf compared to current.
  • Added dynamic JSX to sidebar.tsx to correctly display highlights of all documents.
  • Modified existing highlight changing functions to correctly account for the extra storedHightlights state (preventing out of sync situations between the extra and existing hightlights states).

Challenges

  • Understanding the code base and how everything works (how the sidebar gets data, pdf uploads, search, etc.)
  • Ensuring existing code and functionality remains as unmodified as possible.
  • Preventing the desynchronization of the states storing highlights.
  • Dynamically display highlights of all pdfs in the sidebar (how to bring data into the sidebar, formatting, etc.).
  • Preventing duplicate highlights (hightlights for the same term) --- Unsolved due to time constraints but I have some idea how to.
  • Finishing everything before the deadline since the notification of this take home assignment went to my spam folder for some arbitrary reasons.

Possible improvements

  • Better error handling and optimization.
  • Preventing duplicate highlights.
  • Proper sanitization of SQL query.
  • Expansion of API.
  • Lazy/staggered loading for the sidebar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants