Rag_from_scratch

This repository contains a vanilla implementation of Retrieval-Augmented Generation (RAG) for document search and question answering.

Project Structure

chunked_text.txt: Contains the text split into chunks for processing.
embeddings.csv: Stores the embeddings generated from the text chunks.
preprocessed_pdf.txt: Contains the preprocessed text extracted from the PDF.
get_embeddings.py: Script to generate embeddings for the text chunks.
search_and_answer.py: Script to perform search and question answering using the embeddings.
splitting.py: Script to split the preprocessed text into chunks.
requirements.txt: List of required packages for the project.

To set up and run this project, follow these steps:

Clone the repository:

git clone https://github.com/harrrshall/rag_from_scratch/
cd rag_from_scratch

replace pdf_url in splitting.py file line(35) to your pdf and put your openai api in search_and_answer.py file

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Readme.md		Readme.md
chunked_text.txt		chunked_text.txt
embeddings.csv		embeddings.csv
get_embeddings.py		get_embeddings.py
preprocessed_pdf.txt		preprocessed_pdf.txt
requirements.txt		requirements.txt
search_and_answer.py		search_and_answer.py
splitting.py		splitting.py