Skip to content

samuellachisa/helpdesk-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HelpDesk AI

A production-ready AI helpdesk assistant with RAG (Retrieval-Augmented Generation), streaming responses, and source citations.

Built with Next.js 14, TypeScript, and Tailwind CSS. Features a clean chat interface, intelligent document retrieval, and support for multiple LLM providers.

Next.js TypeScript License

✨ Features

Core Features

  • 🎯 RAG Pipeline: BM25-inspired retrieval with relevance scoring
  • πŸ’¬ Streaming Chat: Real-time token streaming for responsive UX
  • πŸ“š Source Citations: Every answer includes clickable source references
  • πŸ›‘οΈ Guardrails: Refuses to answer out-of-scope questions
  • 🎨 Modern UI: Beautiful, responsive chat interface with Tailwind CSS

Advanced Features

  • πŸ“€ Admin Upload: Web interface to upload new documentation files
  • πŸ§ͺ Evaluation API: Built-in testing for retrieval quality
  • πŸ”Œ Multi-Provider: Support for OpenAI, Anthropic, Google Gemini, or mock LLM
  • ⚑ Auto-Indexing: Automatic index rebuild on file upload
  • πŸ”’ Security: Input sanitization, XSS prevention, no secret logging

πŸš€ Quick Start

Installation

# Install dependencies
npm install

# Set up environment variables
cp .env.local.example .env.local

# Start development server
npm run dev

The app will be available at http://localhost:3000

Environment Variables

Create a .env.local file with the following:

# LLM Provider (mock, openai, anthropic, or gemini)
LLM_PROVIDER=mock

# OpenAI Configuration (if using openai provider)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini

# Anthropic Configuration (if using anthropic provider)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-3-haiku-20240307

# Google Gemini Configuration (if using gemini provider)
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-flash

Note: The mock provider works without any API keys and is perfect for testing!

πŸ“– Usage

Chat Interface

  1. Navigate to http://localhost:3000
  2. Type your question in the input box
  3. Press Enter or click Send
  4. Watch the response stream in real-time
  5. Click on source citations to see which documents were used

Example Questions:

  • "What are the pricing tiers?"
  • "How do I get an API key?"
  • "Can I get a refund after 20 days?"
  • "What's included in the Pro plan?"

Admin Panel

Upload new documentation files:

  1. Navigate to http://localhost:3000/admin
  2. Select one or more markdown (.md) files
  3. Click "Upload Files"
  4. The index will automatically rebuild

Evaluation

Test retrieval quality:

curl http://localhost:3000/api/eval

This runs a suite of test questions and reports:

  • Which sources were retrieved for each question
  • Relevance scores
  • Pass/fail status
  • Overall pass rate

πŸ—οΈ Architecture

Project Structure

helpdesk-ai/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ chat/route.ts          # Streaming chat endpoint
β”‚   β”‚   β”œβ”€β”€ admin/upload/route.ts  # File upload endpoint
β”‚   β”‚   └── eval/route.ts          # Evaluation endpoint
β”‚   β”œβ”€β”€ admin/page.tsx             # Admin upload interface
β”‚   β”œβ”€β”€ layout.tsx                 # Root layout
β”‚   β”œβ”€β”€ page.tsx                   # Main chat page
β”‚   └── globals.css                # Global styles
β”œβ”€β”€ components/
β”‚   └── Chat.tsx                   # Chat UI component
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ retriever.ts               # RAG retrieval logic
β”‚   └── llm.ts                     # LLM provider abstraction
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ pricing.md                 # Knowledge base files
β”‚   β”œβ”€β”€ refunds.md
β”‚   └── getting-started.md
└── README.md

RAG Pipeline

  1. Indexing: Markdown files are split into paragraphs and indexed on startup
  2. Retrieval: User query is scored against all snippets using BM25-inspired algorithm
  3. Context Building: Top-k relevant snippets are assembled into context
  4. Generation: LLM generates answer using only the provided context
  5. Citation: Sources are returned alongside the response

Retrieval Algorithm

The retriever uses an enhanced BM25-inspired scoring function with precision filters:

  • Term Frequency: How often query terms appear in the snippet
  • Document Length Normalization: Adjusts for snippet length
  • Phrase Matching Boost: Extra weight for exact phrase matches
  • Relevance Threshold (2.0): Filters out weak matches below minimum score
  • Score Gap Filter (90%): Only includes results within 90% of top score to prevent irrelevant citations

Prompt Engineering

The system uses dual-mode prompting with strict guardrails:

When Context Exists:

  • Answer ONLY from provided documentation
  • Cite sources explicitly
  • Never infer or make up information
  • Explain if context doesn't fully answer the question

When No Context Found:

  • Explicitly refuse to answer from general knowledge
  • List available documentation topics (Pricing, Getting Started, Refunds)
  • Ask what the user would like to know
  • Never hallucinate or guess

This ensures both mock and real LLMs (OpenAI, Anthropic, etc.) refuse out-of-scope questions.

πŸ”§ Design Decisions

Why BM25 over Embeddings?

For this small knowledge base (3 docs, ~500 words each):

  • BM25 Advantages: No API calls, instant indexing, deterministic, explainable scores
  • Embeddings Trade-off: Would add API dependency and latency for minimal quality gain
  • Scalability: For 100+ documents, embeddings would be recommended

Why Streaming?

  • Better UX: Users see responses immediately, not after 5-10 seconds
  • Perceived Performance: Feels faster even if total time is similar
  • Engagement: Users stay engaged while waiting

Why Server-Side RAG?

  • Security: API keys never exposed to client
  • Control: Full control over retrieval and prompt construction
  • Flexibility: Easy to swap providers or add caching

πŸ§ͺ Testing

Manual Testing

Test the sample prompts from the requirements:

# In-scope questions (should cite sources)
"What are the pricing tiers and what's included?"
"How do I get an API key to start?"
"Can I get a refund after 20 days?"

# Out-of-scope question (should refuse)
"Do you ship hardware devices?"

Automated Evaluation

# Run evaluation suite
curl http://localhost:3000/api/eval | jq

# Expected output: 5-6 out of 6 tests passing

Unit Testing (Optional)

To add unit tests for the retriever:

npm install --save-dev jest @types/jest ts-jest

Create lib/retriever.test.ts:

import { retrieve, buildIndexFromDataDir } from './retriever';

describe('Retriever', () => {
  const index = buildIndexFromDataDir();

  test('retrieves pricing for pricing query', () => {
    const results = retrieve('pricing tiers', index);
    expect(results[0].source).toBe('pricing.md');
  });

  test('returns empty for irrelevant query', () => {
    const results = retrieve('xyz123abc', index);
    expect(results.length).toBe(0);
  });
});

πŸ”’ Security

Implemented Protections

  • βœ… Input sanitization (XSS prevention)
  • βœ… Content-Type headers to prevent MIME sniffing
  • βœ… API keys kept server-side only
  • βœ… No logging of sensitive data
  • βœ… File upload validation (markdown only)
  • βœ… Request size limits (500 char input)
  • βœ… Prompt injection prevention via context isolation

Production Recommendations

For production deployment, add:

  • Rate limiting (e.g., 10 requests/minute per IP)
  • Authentication for admin panel
  • CORS configuration
  • Request logging and monitoring
  • Error tracking (Sentry, etc.)

πŸ“Š Performance

Benchmarks (Local Testing)

  • Index Build: ~5ms for 3 documents
  • Retrieval: ~2ms per query
  • First Token: 200ms (mock), 800ms (OpenAI)
  • Full Response: 2-5 seconds depending on length

Optimization Opportunities

  • Cache frequently asked questions
  • Pre-compute embeddings for hybrid search
  • Add Redis for distributed caching
  • Implement request deduplication

🚒 Deployment

Vercel (Recommended)

# Install Vercel CLI
npm i -g vercel

# Deploy
vercel

# Set environment variables in Vercel dashboard

Docker

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]

Environment Setup

Remember to set environment variables in your deployment platform:

  • LLM_PROVIDER
  • OPENAI_API_KEY (if using OpenAI)
  • ANTHROPIC_API_KEY (if using Anthropic)

πŸ› οΈ Extending

Adding New LLM Providers

Edit lib/llm.ts and add a new provider case:

if (provider === 'your-provider') {
  // Implement streaming logic
}

Adding New Data Sources

Simply drop .md files into the /data directory and restart the server, or use the admin upload interface.

Customizing Retrieval

Edit lib/retriever.ts to adjust:

  • Scoring algorithm parameters (k1, b)
  • Number of results (topK)
  • Relevance threshold

πŸ“ API Reference

POST /api/chat

Stream chat responses with RAG.

Request:

{
  "messages": [
    { "role": "user", "content": "What are the pricing tiers?" }
  ]
}

Response: Text stream with citations appended

POST /api/admin/upload

Upload markdown files to knowledge base.

Request: multipart/form-data with files field

Response:

{
  "success": true,
  "uploaded": 2,
  "message": "Successfully uploaded 2 file(s)"
}

GET /api/eval

Run evaluation suite.

Response:

{
  "summary": {
    "total": 6,
    "passed": 5,
    "passRate": "83.3%"
  },
  "results": [...]
}

PUT /api/chat

Rebuild search index.

Response:

{
  "success": true,
  "snippets": 15
}

About

HelpDesk AI A production-ready AI helpdesk assistant with RAG (Retrieval-Augmented Generation), streaming responses, and source citations.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors