Benchmark tool that tests how learn.microsoft.com is cited by ChatGPT, Gemini, and Perplexity.
View live dashboard: GitHub Pages Dashboard
Run manually:
# Dry run (no API keys needed)
python run_benchmark.py --dry-run
# Real run (requires API keys in .env file)
python run_benchmark.pygit clone https://github.com/ps0394/ai-citation-tool.git
cd ai-citation-toolpip install -r requirements.txtCreate a .env file:
OPENAI_API_KEY=sk-proj-your-openai-key
GEMINI_API_KEY=your-gemini-key
PERPLEXITY_API_KEY=your-perplexity-key- Go to Settings → Secrets and variables → Actions
- Add repository secrets:
- Name:
OPENAI_API_KEYValue: Your OpenAI API key - Name:
GEMINI_API_KEYValue: Your Gemini API key (optional) - Name:
PERPLEXITY_API_KEYValue: Your Perplexity API key (optional)
- Name:
- Go to Settings → Pages
- Source: Deploy from a branch
- Branch: main
- Folder: /docs
- Runs automatically at 9:00 AM UTC daily
- Results stored in repository as
results_YYYY-MM-DD.csv - Updates dashboard automatically
- Go to Actions → AI Citation Benchmark → Run workflow
- Choose normal run or dry-run mode
- Displays citation analytics and trends
- Interactive charts powered by Chart.js
- Automatically loads latest results
- Fallback to sample data if no results exist
Generates results_YYYY-MM-DD.csv with columns:
date- Run dateprovider- ChatGPT/Gemini/Perplexitymodel- Model name usedprompt_id- Question ID (001-010)prompt- Question textresponse- AI response textcitation_url- Cited URL (if found)citation_domain- Domain (learn.microsoft.com normalized)
Domain Normalization: docs.microsoft.com → learn.microsoft.com
Symptoms: Benchmark completes but no OpenAI API calls appear in dashboard
Cause: GitHub Secrets not configured
Fix: Add OPENAI_API_KEY to repository secrets (see Setup step 3)
Symptoms: CSV shows empty citation columns
Cause: Using stub functions (API keys missing)
Fix: Verify API keys are configured in GitHub Secrets
Symptoms: Workflow fails with Python indentation errors
Cause: Git merge conflicts in code
Fix: Reset local branch: git reset --hard origin/main
Run with debug logging to see API call details:
python run_benchmark.py # Debug output included automaticallyFile Structure:
run_benchmark.py- Main benchmark scriptprompts.csv- 10 test questions (Azure/Microsoft focused)docs/index.html- Interactive dashboard.github/workflows/benchmark.yml- GitHub Actions automationresults_*.csv- Historical benchmark results
Adding New Questions:
Edit prompts.csv with format: prompt_id,category,prompt
Modifying Providers:
Update provider functions in run_benchmark.py (ChatGPT, Gemini, Perplexity)