Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instructions
Installation remains largely unchanged. This application requires both a Google Cloud API Key and a Open AI API Key. Instructions for these can be find at the below. Afterwards, put both keys into your .enc.local file with the variable names "GOOGLE_API_KEY=" and "OPEN_AI_KEY=".
Google API Key

https://console.cloud.google.com/apis/credentials
Navigate to this button
Open AI Key
https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key
Implementation
NOTE: This implementation is incomplete
Implemented Features:
Not Implemented Features:
First, the Google OCR was implemented by converting the pdf pages into images and sending them to the software as an API call. This is done by the server which receives the images from the client and returns a set of text and bounding boxes describing their location. Then, the text is overlayed onto the original pdf based on the bounding box positions. In this section, the most challenging part was understanding how files are passed between the client, the server, and the Google API. Since each had access to and accepted varying data forms, it was hard to match everything perfectly.
Second, the embedding of text was similarly conducted by creating an API call to the Open AI API. This is stored on a per page basis in an SQLite database. Additionally, the entirety of the database is accessed via API calls. Unfortunately, the upload and search functionality are not implemented. Nevertheless, the upload would function by creating an embedding of the text on each page and storing them in the database. Then, whenever a search is performed, the query is similarly embedded, and the pages with the closest Euclidean distance would be considered first in the results.
Overall, I tried to keep all processes in a similar format to the existing repository. This includes formatting, adding where they make sense, and using APIs in similar cases.