At AgentNovaX, our vision is to create a world where innovation, collaboration, technology, and sustainability work hand-in-hand to empower communities. We aim to provide tools that simplify tasks, increase productivity, and contribute to a better planet. Through creativity, inclusivity, and environmental consciousness, we strive to inspire a global movement toward shared success and continuous growth. π±π‘
Retrieval-Augmented Generation (RAG) is a cutting-edge natural language processing (NLP) technique that combines retrieval-based and generation-based methods to significantly enhance the performance of language models for tasks like question answering, text generation, and semantic search. RAG integrates two key components: retrieving relevant information from a knowledge base or corpus and generating meaningful content based on that information.
- In this step, relevant information is retrieved from a large corpus of text or database based on the input query. This is typically done by converting the input into an embedding and searching for the most relevant documents or pieces of text using vector databases.
- Example: For a query like "What is the capital of France?", the system retrieves relevant passages like "The capital of France is Paris."
- After retrieving the relevant documents or text, a language model (like GPT-3, T5, or BERT) is used to generate a response based on the retrieved content. This is the "augmented generation" step where the model synthesizes the retrieved data into a coherent, informative response.
- The input query (e.g., "What is the capital of France?") is converted into an embedding using a pre-trained model like BERT or MiniLM.
- The embedding is then used to search a vector database (e.g., PGVector, FAISS, Elasticsearch) for the most relevant documents.
- Once relevant documents are retrieved, a generation model (e.g., GPT-3, T5, or BART) takes the input query along with the retrieved documents to generate a complete, contextually relevant response.
- By combining retrieval and generation, RAG enhances the accuracy and relevance of responses. The generation model is augmented with dynamic, up-to-date information, allowing the model to provide more informed and accurate answers.
- β‘ Enhanced Accuracy: The retrieval mechanism allows the model to access more accurate and context-specific information, improving response quality.
- π Dynamic Knowledge: Instead of relying solely on fixed pre-trained knowledge, RAG can use real-time external information retrieved from documents or databases.
- π Open-Domain Performance: RAG excels in open-domain tasks like question answering and semantic search, where the model needs to answer queries based on external information.
RAG can be used across a variety of applications, including:
-
β Question Answering (QA):
- Provides more precise and relevant answers by retrieving information from a knowledge base or external documents.
-
π¬ Dialogue Systems:
- Enhances conversational AI systems by retrieving related documents and generating more informed, context-aware responses.
-
π Summarization:
- Retrieves relevant content from long documents and generates concise summaries.
-
Input:
- "What is the capital of France?"
-
π Retrieval:
- The system retrieves relevant documents, such as:
- "The capital of France is Paris."
- The system retrieves relevant documents, such as:
-
π Generation:
- Using the retrieved data, the model generates a response:
- "The capital of France is Paris."
- Using the retrieved data, the model generates a response:
Retrieval-Augmented Generation (RAG) is a powerful technique that enhances the generation process by combining the strengths of both retrieval and generation. By leveraging real-time information retrieval, RAG can provide better performance for tasks like question answering, semantic search, and text generation, making it an invaluable tool for NLP applications. Although we are using a smaller model to in this set up locally however ollama allows you to use any model for instance llama 2 or llama 3 with much larger parameters but is not recommended to run larger model locally without going thorough minimum system requirement. We can even use the existing large models like GPT to make use their wide parameters in case model can't be host locally.
AgentNovaX-SpringRAGAPI demonstrates the implementation of Retrieval-Augmented Generation (RAG) using a Spring Boot backend, locally hosted all-MiniLM model on Ollama, and PGVector as the vector database. This architecture allows the integration of retrieval and generation techniques to enhance the information retrieval process and provide more relevant results when interacting with language models.
Perfect for developers looking to locally host NLP Models into their Spring Boot applications with PostgreSQL as the backend vector data store, providing a complete RAG solution.
- Ollama hosts the all-MiniLM model, which generates dense vector embeddings from text queries.
- PGVector is used to store and retrieve vector embeddings efficiently from a PostgreSQL database.
- Spring Boot API interacts with Ollama to generate embeddings for queries, stores the vectors in PGVector, and performs semantic search with generated embeddings for a Retrieval-Augmented Generation (RAG) system.
- **π§βπ» Spring Boot Backend:
- Fast and Scalable REST API: Build robust APIs to handle text queries and responses.
- Integrated with PGVector: Seamlessly interacts with the PGVector PostgreSQL extension to store and retrieve vector embeddings for text queries.
- Secure Authentication: Leverage JWT for secure, token-based authentication.
- Seamless Communication: Interfaces with Ollama to call the all-MiniLM model to generate embeddings.
- Containerized with Docker: Full Docker support for easy deployment and portability.
- **π§ Ollama with all-MiniLM Model:
- Local Hosting of all-MiniLM: Hosts the all-MiniLM model locally using Ollama for generating high-quality embeddings.
- Efficient and Lightweight: MiniLM (384-dimensional embeddings) offers great performance with low latency compared to larger models like BERT.
- Flexible and Extensible: Easily switch between different models or deploy new ones using Ollama.
- Real-Time Embeddings: Generate embeddings dynamically from text inputs for use in the RAG pipeline.
- **π PGVector (Vector Database):
- PostgreSQL Vector Extension: Leverages PGVector to store embeddings in PostgreSQL for efficient search and retrieval.
- Efficient Vector Search: Use vector similarity search to quickly find the most relevant documents from the database.
- Dimensionality Support: Supports 384-dimensional embeddings generated by all-MiniLM for fast and accurate retrieval.
- Scalable Storage: Easily manage and scale vector embeddings, enabling open-domain search and retrieval tasks.
- SQL Interface: Seamlessly integrates with PostgreSQL, making it easy to query, store, and manage embeddings directly using SQL commands.
- **π Retrieval-Augmented Generation (RAG) Workflow:
- Text Embedding Generation: Convert text queries into embeddings using the all-MiniLM model hosted locally via Ollama.
- Semantic Search: Perform semantic search on the vector database (PGVector) to find the most relevant documents or passages related to the query.
- Generation of Contextual Responses: Augment the text query with retrieved information to generate more accurate, informed, and contextually relevant responses using generation-based models.
- Real-Time Updates: The model can continuously learn from new data added to the vector database for more accurate and up-to-date results.
- **π οΈ Dockerized Deployment:
- Easy Setup: Dockerized containers for each service (Spring Boot, PostgreSQL, Ollama) ensure a quick and hassle-free setup.
- Portability: Run the entire application stack on any platform that supports Docker, making deployment and scaling easy.
- Multi-Container Support: Uses Docker Compose to orchestrate multiple containers for a fully functional local environment.
- **π Scalability and Performance:
- Optimized Vector Storage: PGVector provides fast retrieval of embeddings with minimal latency, supporting large-scale applications.
- Efficient Caching: Use caching mechanisms to speed up retrieval times for frequently queried embeddings.
- High Throughput: The system is designed to handle high loads of concurrent queries, ensuring responsiveness even with large datasets.
- Java (23)
- Spring Boot (3.4.1): Backend framework for building REST APIs.
- PostgreSQL (17)
- PGVector: A PostgreSQL extension for storing vector embeddings and performing vector searches.
- Ollama: Tool to serve models like all-MiniLM for generating embeddings.
- Docker: Containerization to simplify deployment and environment setup.
- Postman: For testing APIs.
-
- Windows:
- Download the installer from the official site.
- Follow the installation instructions.
- Set up a password for the
postgres
user during installation.
- Linux:
sudo apt update sudo apt install postgresql postgresql-contrib
- macOS:
brew install postgresql
- Windows: Use pgAdmin or start the service from the services tab.
- Linux/macOS:
sudo service postgresql start
-
Access the PostgreSQL shell:
psql -U postgres
-
Create a database and tables using the script in
schema.sql
:CREATE DATABASE "postgres" WITH OWNER = postgres ENCODING = 'UTF8';
-
Exit the shell:
\q
Ensure your Spring Boot application is properly configured and can run in a Docker container. Update configuration inside Dockerfile and docker-compose setup for Spring Boot.
Update postgres configuration inside Dockerfile and custom-postgresql setup for Postgres SQL with Vector.
Update ollama configuration inside Dockerfile and start-ollama setup for ollama with all-minilm model locally we can skip this step and can directly use available larger model API likes GPT for specific use cases.
git clone <repository_url>
cd <repository_name>
Update the application.properties
or application.yml
file:
spring.datasource.url=jdbc:postgresql://database:5432/postgres
spring.datasource.username=postgres
spring.datasource.password=postgres
spring.jpa.hibernate.ddl-auto=update
ollama.model=all-minilm
ollama.embeddingURL="http://ollama:11434/api/embed"
Once all the configurations are set, follow these steps to run the project:
- Build the Spring Boot Application:
./mvnw clean package
or (Gradle)
./gradlew bootJar
- Start the Docker Containers:
docker-compose up --build
This will start:
- PostgreSQL with PGVector.
- Ollama to serve the all-MiniLM model.
- Spring Boot Application that connects to the other services.
- Test the API: You can now test your API using tools like Postman or curl. You can send text data to the Spring Boot API, which will generate an embedding using the all-MiniLM model hosted via Ollama, store the embedding in PostgreSQL (via PGVector), and retrieve the embeddings for RAG.
This project comes with a Postman collection inside assets/postman to easily test the API endpoints. You can import the agentnovax-api-rag-springboot-ollama-pgvector.postman_collection
into your Postman application to start testing.
-
Download the Postman Collection:
Download theagentnovax-api-rag-springboot-ollama-pgvector.postman_collection
from the repository. -
Import into Postman:
Open Postman and click on the "Import" button on the top left. Select the downloaded collection file and import it into Postman. -
Test API Endpoints:
Once imported, you can start testing the RAG Index and RAG Query endpoint provided by the API.
This collection simplifies testing the API without needing to manually craft requests and responses.
We welcome contributions! Feel free to open a pull request or raise an issue for enhancements or bug fixes.
- Testing: Use tools like Postman or cURL to test API endpoints.
- Scaling: Consider containerization with Docker for consistent deployment across environments.
- Models: Refer to the documentation for detailed setup for other models that can be hosted locally.
- For more information, check out the Spring Boot Documentation, Ollama, PGVector and Docker.
Licensed under the Apache License, Version 2.0 - see the LICENSE file for details..
For any queries, feel free to reach out via [email protected].
Stay connected with AgentNovaX through our social media channels:
- X (Twitter) π¦
- LinkedIn π
- Instagram πΈ
- Facebook π
- YouTube π₯
NovaLeaf is an initiative focused on environmental sustainability, aiming to contribute to a greener planet. Through this initiative, AgentNovaX is committed to planting trees, fostering green projects, and encouraging eco-friendly practices among individuals and communities.
- Plant a Tree, Empower a Community: For every milestone achieved in our platform, a tree will be planted in a designated area.
- Green Nova Trees: These trees represent our growth and commitment to sustainability, and each one is named for the cause it supports.
- Join the Movement: Become part of the NovaLeaf family and help us plant the future, one tree at a time. π³
π Please consider starring this repository to support the NovaLeaf initiative π
For more information, visit NovaLeaf.
DataFlux provides free tools for data conversion, JSON/YAML beautification, and validation to help developers and data enthusiasts streamline their workflow.
- Tools available: JSON/YAML Beautifiers and Validators, JSON/YAML conversion, Text Compare, JavaScript Validators, and more.
- Visit DataFlux to explore our tools and enhance your productivity.