This RAG application uses an agentic approach to combine web search, hallucination control and accuracy checks with RAG. It's easy to modify because its a simple Gradio app.
Note This app runs in NVIDIA AI Workbench. It's a free, lightweight developer platform that you can run on your own systems to get up and running with complex AI applications and workloads in a short amount of time.
You may want to fork this repository into your own account before proceeding. Otherwise you won't be able to save your local changes to GitHub because this NVIDIA owned repository is read-only.
Navigating the README: Application Overview | Get Started | Deep Dive | License
Other Resources: ⬇️ Download AI Workbench | 📖 User Guide |📂 Other Projects | 🚨 User Forum
- Need: internet access because the chat app uses Tavily for web-searches, as well as endpoints on build.nvidia.com
- Don't Need: Local GPU
- Nice to Have: Remote GPU system where you self-host an endpoint
-
You embed your documents (pdfs or webpages) to the vector database.
-
You configure each of the separate components for the pipeline. For each component you can:
- Select from a drop down of endpoints or use a self-hosted endpoint.
- Modify the prompt.
-
You submit your query.
-
An LLM evaluates its relevance to the index and then routes it to the DB or to search by Tavily.
-
Answers are checked for hallucination and relevance. "Failing"" answers are run through the process again.
The diagram below shows this agentic flow.
- Directly within the app you can:
- Change the prompts for the different components, e.g. the hallucination grader.
- Change the webpages and pdfs you want to use for the context in the RAG.
- Select different endpoints from build.nvidia.com for the inference components.
- Configure it to use self-hosted endpoints with NVIDIA Inference Microservices (NIMs) or Ollama.
- You can also modify the application code to:
- Add new endpoints and endpoint providers
- Change the Gradio interface or the application structure and logic.
Note Setting up self-hosted endpoints is relatively advanced because you will need to do it manually.
The quickest path is with the pre-configured build.nvidia.com endpoints.
-
Install AI Workbench.
-
Get an NVIDIA Developer Account and an API key.
- Go to build.nvidia.com and click
Login. - Create account, verify email.
- Make a Cloud Account.
- Click your initial >
API Keys. - Create and save your key.
- Go to build.nvidia.com and click
-
Get a Tavily account and an API key.
- Go to Tavily and create an account.
- Create an API key on the overview page.
-
Have some pdfs or web pages to put in the RAG.
-
NVIDIA Employees: Configure
INTERNAL_APIAPI key to use internal endpoints instead of public ones.
-
Open NVIDIA AI Workbench. Select a location to work in.
-
Use the repository URL to clone this project with AI Workbench and wait for it to build.
-
Add your NVIDIA API key and the Tavily API key when prompted.
-
Open the Chat from Workbench. It should automatically open in a new browser tab.
-
Upload your documents and change the Router prompt to focus on your uploaded documents.
-
Start chatting.
Note This assumes you've done the Get Started steps.
You can configure any or all pipeline components (Router, Generator, Retrieval, Hallucination Check, Answer Check) to use self-hosted endpoints independently. This means you can mix and match between hosted and self-hosted components based on your needs. The application includes built-in GPU compatibility checking to help you select appropriate models for your hardware configuration.
Prerequisites:
- NVIDIA GPU(s) with appropriate VRAM
- Ubuntu 22.04 or later with latest NVIDIA drivers
- Docker and NVIDIA Container Toolkit
To set up NIM endpoints for your components:
- Check the NIM documentation for detailed setup instructions
- For each component you want to self-host:
- Select "NIM Endpoints" in the component's configuration
- Choose your GPU type and count - the UI will automatically show only compatible models
- Enter your endpoint details (host, port)
- Components not set to self-hosted will continue using their configured cloud endpoints
The application will validate your GPU configuration for each component and prevent incompatible model selections. You can use different GPU configurations for different components based on their computational needs.
This NVIDIA AI Workbench example project is under the Apache 2.0 License
This project may utilize additional third-party open source software projects. Review the license terms of these open source projects before use. Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components. You are responsible for confirming compliance with third-party component license terms and requirements.
| ❓ Have Questions? |
|---|
| Please direct any issues, fixes, suggestions, and discussion on this project to the DevZone Members Only Forum thread here |
