PDF Vector Search application

Project Overview

This project is a PDF viewer and vector search application that can search through both images and text using HuggingFace Transformers

Features

PDF document upload and display
Page navigation (next, previous, jump to specific page)
Zoom in/out functionality
Document information display (total pages, current page)
Indexing of text, including with OCR, and all images
Vector search across multiple PDFs
Text highlighting for search matches
Sidebar for search results and navigation
Responsive design for various screen sizes
Persistent storage of highlights using SQLite

Technologies Used

Next.js
React
TypeScript
react-pdf library for PDF rendering
Tailwind CSS for styling
SQLite for local highlight storage
HuggingFace for embedding data in preparation for vector search

Getting Started

Clone the repository
Install dependencies: pnpm install
Run the development server: pnpm run dev
Open http://localhost:3000 in your browser

Project Structure

app/page.js: Main entry point of the application
app/components/: React components for various parts of the application
app/utils/: Utility functions for PDF processing and highlight storage
app/styles/: CSS files for styling
app/api/: API routes for handling highlight operations

Key Components

App.tsx: Core application component
PdfViewer.tsx: Handles PDF rendering and navigation
KeywordSearch.tsx: Manages keyword search functionality
HighlightPopup.tsx: Displays information about highlighted text
Sidebar.tsx: Shows search results and navigation options
highlightStorage.ts: Manages highlight storage operations
clipText.ts and clipImage.ts: Handles embedding data

Features

Has a highlight storage system supporting both SQLite
Semantic search across images and text in multiple PDFs
API routes for creating, retrieving, updating, and deleting highlights
User authentication and document permissions (currently disabled)
Export/import as JSON functionality for highlights
Scroll the sidebar highlighted area into view across different PDFs.

Future Improvements

Implement annotation tools (e.g., freehand drawing, text notes)
Add support for multiple document search
Pre-process batch PDFs for quicker highlights
Enhance mobile responsiveness for better small-screen experience
Optimize performance for large PDF files
Upload the PDF into the database.

License

MIT License

Acknowledgements

Next.js for the React framework
react-pdf for PDF rendering capabilities
Tailwind CSS for utility-first CSS framework
HuggingFace for the embedding model
Tesseract OCR for OCR output

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
app		app
public		public
scripts		scripts
.env		.env
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
next.config.mjs		next.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Vector Search application

Project Overview

Features

Technologies Used

Getting Started

Project Structure

Key Components

Features

Future Improvements

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

DanaY326/PDF_vector_search

Folders and files

Latest commit

History

Repository files navigation

PDF Vector Search application

Project Overview

Features

Technologies Used

Getting Started

Project Structure

Key Components

Features

Future Improvements

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages