This project is a PDF viewer and vector search application that can search through both images and text using HuggingFace Transformers
- PDF document upload and display
- Page navigation (next, previous, jump to specific page)
- Zoom in/out functionality
- Document information display (total pages, current page)
- Indexing of text, including with OCR, and all images
- Vector search across multiple PDFs
- Text highlighting for search matches
- Sidebar for search results and navigation
- Responsive design for various screen sizes
- Persistent storage of highlights using SQLite
- Next.js
- React
- TypeScript
- react-pdf library for PDF rendering
- Tailwind CSS for styling
- SQLite for local highlight storage
- HuggingFace for embedding data in preparation for vector search
- Clone the repository
- Install dependencies:
pnpm install - Run the development server:
pnpm run dev - Open http://localhost:3000 in your browser
app/page.js: Main entry point of the applicationapp/components/: React components for various parts of the applicationapp/utils/: Utility functions for PDF processing and highlight storageapp/styles/: CSS files for stylingapp/api/: API routes for handling highlight operations
App.tsx: Core application componentPdfViewer.tsx: Handles PDF rendering and navigationKeywordSearch.tsx: Manages keyword search functionalityHighlightPopup.tsx: Displays information about highlighted textSidebar.tsx: Shows search results and navigation optionshighlightStorage.ts: Manages highlight storage operationsclipText.tsandclipImage.ts: Handles embedding data
- Has a highlight storage system supporting both SQLite
- Semantic search across images and text in multiple PDFs
- API routes for creating, retrieving, updating, and deleting highlights
- User authentication and document permissions (currently disabled)
- Export/import as JSON functionality for highlights
- Scroll the sidebar highlighted area into view across different PDFs.
- Implement annotation tools (e.g., freehand drawing, text notes)
- Add support for multiple document search
- Pre-process batch PDFs for quicker highlights
- Enhance mobile responsiveness for better small-screen experience
- Optimize performance for large PDF files
- Upload the PDF into the database.
- Next.js for the React framework
- react-pdf for PDF rendering capabilities
- Tailwind CSS for utility-first CSS framework
- HuggingFace for the embedding model
- Tesseract OCR for OCR output