A production-ready AI helpdesk assistant with RAG (Retrieval-Augmented Generation), streaming responses, and source citations.
Built with Next.js 14, TypeScript, and Tailwind CSS. Features a clean chat interface, intelligent document retrieval, and support for multiple LLM providers.
- π― RAG Pipeline: BM25-inspired retrieval with relevance scoring
- π¬ Streaming Chat: Real-time token streaming for responsive UX
- π Source Citations: Every answer includes clickable source references
- π‘οΈ Guardrails: Refuses to answer out-of-scope questions
- π¨ Modern UI: Beautiful, responsive chat interface with Tailwind CSS
- π€ Admin Upload: Web interface to upload new documentation files
- π§ͺ Evaluation API: Built-in testing for retrieval quality
- π Multi-Provider: Support for OpenAI, Anthropic, Google Gemini, or mock LLM
- β‘ Auto-Indexing: Automatic index rebuild on file upload
- π Security: Input sanitization, XSS prevention, no secret logging
# Install dependencies
npm install
# Set up environment variables
cp .env.local.example .env.local
# Start development server
npm run devThe app will be available at http://localhost:3000
Create a .env.local file with the following:
# LLM Provider (mock, openai, anthropic, or gemini)
LLM_PROVIDER=mock
# OpenAI Configuration (if using openai provider)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
# Anthropic Configuration (if using anthropic provider)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-3-haiku-20240307
# Google Gemini Configuration (if using gemini provider)
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-flashNote: The mock provider works without any API keys and is perfect for testing!
- Navigate to http://localhost:3000
- Type your question in the input box
- Press Enter or click Send
- Watch the response stream in real-time
- Click on source citations to see which documents were used
Example Questions:
- "What are the pricing tiers?"
- "How do I get an API key?"
- "Can I get a refund after 20 days?"
- "What's included in the Pro plan?"
Upload new documentation files:
- Navigate to http://localhost:3000/admin
- Select one or more markdown (.md) files
- Click "Upload Files"
- The index will automatically rebuild
Test retrieval quality:
curl http://localhost:3000/api/evalThis runs a suite of test questions and reports:
- Which sources were retrieved for each question
- Relevance scores
- Pass/fail status
- Overall pass rate
helpdesk-ai/
βββ app/
β βββ api/
β β βββ chat/route.ts # Streaming chat endpoint
β β βββ admin/upload/route.ts # File upload endpoint
β β βββ eval/route.ts # Evaluation endpoint
β βββ admin/page.tsx # Admin upload interface
β βββ layout.tsx # Root layout
β βββ page.tsx # Main chat page
β βββ globals.css # Global styles
βββ components/
β βββ Chat.tsx # Chat UI component
βββ lib/
β βββ retriever.ts # RAG retrieval logic
β βββ llm.ts # LLM provider abstraction
βββ data/
β βββ pricing.md # Knowledge base files
β βββ refunds.md
β βββ getting-started.md
βββ README.md
- Indexing: Markdown files are split into paragraphs and indexed on startup
- Retrieval: User query is scored against all snippets using BM25-inspired algorithm
- Context Building: Top-k relevant snippets are assembled into context
- Generation: LLM generates answer using only the provided context
- Citation: Sources are returned alongside the response
The retriever uses an enhanced BM25-inspired scoring function with precision filters:
- Term Frequency: How often query terms appear in the snippet
- Document Length Normalization: Adjusts for snippet length
- Phrase Matching Boost: Extra weight for exact phrase matches
- Relevance Threshold (2.0): Filters out weak matches below minimum score
- Score Gap Filter (90%): Only includes results within 90% of top score to prevent irrelevant citations
The system uses dual-mode prompting with strict guardrails:
When Context Exists:
- Answer ONLY from provided documentation
- Cite sources explicitly
- Never infer or make up information
- Explain if context doesn't fully answer the question
When No Context Found:
- Explicitly refuse to answer from general knowledge
- List available documentation topics (Pricing, Getting Started, Refunds)
- Ask what the user would like to know
- Never hallucinate or guess
This ensures both mock and real LLMs (OpenAI, Anthropic, etc.) refuse out-of-scope questions.
For this small knowledge base (3 docs, ~500 words each):
- BM25 Advantages: No API calls, instant indexing, deterministic, explainable scores
- Embeddings Trade-off: Would add API dependency and latency for minimal quality gain
- Scalability: For 100+ documents, embeddings would be recommended
- Better UX: Users see responses immediately, not after 5-10 seconds
- Perceived Performance: Feels faster even if total time is similar
- Engagement: Users stay engaged while waiting
- Security: API keys never exposed to client
- Control: Full control over retrieval and prompt construction
- Flexibility: Easy to swap providers or add caching
Test the sample prompts from the requirements:
# In-scope questions (should cite sources)
"What are the pricing tiers and what's included?"
"How do I get an API key to start?"
"Can I get a refund after 20 days?"
# Out-of-scope question (should refuse)
"Do you ship hardware devices?"# Run evaluation suite
curl http://localhost:3000/api/eval | jq
# Expected output: 5-6 out of 6 tests passingTo add unit tests for the retriever:
npm install --save-dev jest @types/jest ts-jestCreate lib/retriever.test.ts:
import { retrieve, buildIndexFromDataDir } from './retriever';
describe('Retriever', () => {
const index = buildIndexFromDataDir();
test('retrieves pricing for pricing query', () => {
const results = retrieve('pricing tiers', index);
expect(results[0].source).toBe('pricing.md');
});
test('returns empty for irrelevant query', () => {
const results = retrieve('xyz123abc', index);
expect(results.length).toBe(0);
});
});- β Input sanitization (XSS prevention)
- β Content-Type headers to prevent MIME sniffing
- β API keys kept server-side only
- β No logging of sensitive data
- β File upload validation (markdown only)
- β Request size limits (500 char input)
- β Prompt injection prevention via context isolation
For production deployment, add:
- Rate limiting (e.g., 10 requests/minute per IP)
- Authentication for admin panel
- CORS configuration
- Request logging and monitoring
- Error tracking (Sentry, etc.)
- Index Build: ~5ms for 3 documents
- Retrieval: ~2ms per query
- First Token: 200ms (mock), 800ms (OpenAI)
- Full Response: 2-5 seconds depending on length
- Cache frequently asked questions
- Pre-compute embeddings for hybrid search
- Add Redis for distributed caching
- Implement request deduplication
# Install Vercel CLI
npm i -g vercel
# Deploy
vercel
# Set environment variables in Vercel dashboardFROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]Remember to set environment variables in your deployment platform:
LLM_PROVIDEROPENAI_API_KEY(if using OpenAI)ANTHROPIC_API_KEY(if using Anthropic)
Edit lib/llm.ts and add a new provider case:
if (provider === 'your-provider') {
// Implement streaming logic
}Simply drop .md files into the /data directory and restart the server, or use the admin upload interface.
Edit lib/retriever.ts to adjust:
- Scoring algorithm parameters (k1, b)
- Number of results (topK)
- Relevance threshold
Stream chat responses with RAG.
Request:
{
"messages": [
{ "role": "user", "content": "What are the pricing tiers?" }
]
}Response: Text stream with citations appended
Upload markdown files to knowledge base.
Request: multipart/form-data with files field
Response:
{
"success": true,
"uploaded": 2,
"message": "Successfully uploaded 2 file(s)"
}Run evaluation suite.
Response:
{
"summary": {
"total": 6,
"passed": 5,
"passRate": "83.3%"
},
"results": [...]
}Rebuild search index.
Response:
{
"success": true,
"snippets": 15
}