An agentic AI chatbot that answers restaurant menu questions(and answers specific for allergens) using both structured (SQL) and unstructured (vector) data.
This project demonstrates:
- Agentic reasoning and tool orchestration
- SQL + Vector DB integration
Run the following commands in the root folder after cloing the repository:
$ cd src
$ export GOOGLE_API_KEY=<shared over email>
$ docker compose up —-build -dIt may take sometime to run docker compose. This starts the various components of the system as explained in Architecture section as docker containers.
$ docker ps --format "{{.Names}}""
src-restaurant-bot-1
src-mcp-weaviate-tool-1
src-mcp-mysql-tool-1
src-weaviate-1
src-db-1
src-inference-1
The application runs on port 9090 and can be accessed via google adk ui at: http://localhost:9090/dev-ui/?app=restaurant_agent
Select restaurant_agent from dropdown if not selected.
To stop the containers:
$ docker compose down -v- Which dishes contain gluten?
- What vegetarian dishes are priced below ₹200?
- How is Paneer Tikka prepared?
- Which dishes may have cross-contamination risks?
- Is Paneer Tikka safe for someone with a nut allergy?
- Which dishes are unsafe for someone with a dairy allergy?
Here is a high level architecture diagarm with execution flow:

The system consists of following components:
- Understands user intent
- Chooses correct tool(s)
- Synthesizes final answer
See src/restaurant_agent/agent.py for agent code.
Database: MySQL
Purpose: Canonical truth for menu data and allergens
Tables
Menu_Items(item_id, name, price, is_veg, spice_level, ingredients)
Allergens(allergen_id, name)
Menu_Allergens(item_id, allergen_id, notes)Key Features
- get_schema exposed to agent for fetching schema
- query_mysql for fetching results for a query
- checks and allows only ready only queries
See src/mcp_tools/mysql_tool/mcp-service-sql.py for code.
Database: Weaviate
Purpose: Contains prepration notes and some metadata about dishes
Schema
class: MenuItemNotes
properties:
- item_id (int)-link to SQL primary key
- name (text)
- notes (text)-chef notes, cross-contamination warnings
- spice_level (text)
- is_veg (boolean)
- price (number)Key Features
- Chef preparation notes could be in txt or pdf files from which they can be extracted and put in weaviate for searching
- Uses hybrid search with alpha=0.5.
- Embedding model: text2vec-transformers : "sentence-transformers/all-MiniLM-L6-v2"
- Used for checking cross contamination
See src/mcp_tools/weaviate_tool/mcp-service-weaviate.py for code.
Here are some evaluations I tried. I am also preparing a presentation and can explain more in the demo.
- Ragas Library for aggregate statistics
I used ragas library to generate aggregate statistics for my agentic RAG system.
The script eval/helper_scripts/generate_ragas_dataset.py was used to generate dataset for ragas in eval/eval_scripts/ragas_dataset.json.
Two sample metrics using two different models can be found in eval/metrics/ragas-metrics.txt
- Custom Script for Tool Usage Evaluation
- For tool evaluation, I have written a custom script:
eval/helper_scripts/gen_tool_dataset.pyfor generating tools used in a sample run. This calcualtes tool precision and recall as I didnt find adk's tool trajectory metrics to be useful for my use case. Sample results ineval/metrics/tool-metrics.txt
- Using ADK eval
- Go to root directory and execute following commands to run adk eval on golden data set: (docker containers should be running as it uses mysql and weaviate docker containers)
$ cd src
$ pip install -r requirements.txt
$ adk eval restaurant_agent restaurant_agent/golden_data_set.evalset.json --config_file_path=restaurant_agent/test_config.json --print_detailed_resultsSample results are in eval/adk/adk_results.txt
Please note that I faced issue running adk on a larger data set of 12 queries(src/restaurant_agent/golden_data_set.evalset.json). The results in the directory contain results for dataset of 2 queries for now.
- Finer-grained Tool Evaluation The system can be extended to evaluate tool arguments in addition to tool selection (e.g., validating the structure and semantics of generated SQL or vector-search parameters). While Google ADK provides built-in tool evaluation, tool_trajectory_avg_score uses strict exact-match scoring, which assigns a score of 0 even for minor, semantically equivalent mismatches. This made it less effective for evaluating practical agent behavior, motivating the need for more flexible, custom evaluation metrics along with rubric_based_tool_use_quality_v1 from adk.
- Multi-Agent Architecture The current design uses a single LLM agent to handle planning, decision-making, tool execution, and response synthesis. A potential improvement is to adopt a multi-agent architecture, where:
-
a planning agent decomposes user intent,
-
an execution agent handles tool selection and calls, and
-
a synthesis agent aggregates results and generates the final response. This would also allow the use of different LLMs (lightweight vs. heavyweight) for different responsibilities, improving efficiency, cost control, and scalability.
- Prompt Optimization The system prompt was developed through iterative experimentation to enforce correct tool usage, reduce hallucinations, and prevent internal leakage. Once agent behavior is sufficiently stable, the prompt can be simplified or shortened through further experimentation, improving runtime efficiency while preserving reliability.
- Conversational Context Awareness The current system processes each user query independently. As a future enhancement, maintaining and passing relevant chat history (conversation context) to the LLM could improve accuracy and user experience.