-
Notifications
You must be signed in to change notification settings - Fork 376
[Feature] SQL Retriever tool using Vanna #894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Implement SQL Retriever using Vanna AI and NVIDIA NIM for natural language to SQL conversion. Features: - Natural language to SQL query generation using Vanna AI - Integration with NVIDIA NIM for LLM and embeddings - Support for multiple database types: * SQLite (file-based) * PostgreSQL * Generic SQL via SQLAlchemy (MySQL, SQL Server, Oracle) - VannaManager singleton for efficient instance management - Thread-safe operations - Configurable training data via YAML files - ChromaDB vector store for training data persistence Components: - sql_retriever.py: Main retriever implementing NAT Retriever interface - vanna_manager.py: Manages Vanna instances with singleton pattern - vanna_util.py: NVIDIA NIM integration for Vanna - register.py: NAT registration for SQL retriever - README.md: Comprehensive documentation Dependencies: - Added vanna~=0.7.9 for SQL generation - Added chromadb~=1.1.0 for vector storage - Added pandas~=2.0 for DataFrame operations - Added optional postgres dependency (psycopg2-binary) Training Data Support: - DDL statements for schema definition - Documentation for context - Question-SQL pairs for examples This implementation is based on the starterpack SQL tool but adapted to follow NAT Retriever patterns and extended with additional database connector support. Signed-off-by: Vineeth Kalluru <[email protected]>
- Fixed type hints to use 'str | None' instead of 'str = None' - Added None checks for optional parameters (vector_store_path, etc.) - Added type guards in _create_instance to ensure required config is set - Added type: ignore comments for linter cache issues - Fixed metadata dictionary type annotations to allow Any values All red squiggly lines in sql_retriever.py should now be resolved. Signed-off-by: Vineeth Kalluru <[email protected]>
Added explicit assertions after type guard checks in _create_instance to help the type checker understand that config values are guaranteed to be non-None at that point. Signed-off-by: Vineeth Kalluru <[email protected]>
Implement ElasticVectorStore class and ElasticNIMVanna to provide an alternative to ChromaDB for vector storage in SQL retriever. Features: - ElasticVectorStore: Full vector store implementation using Elasticsearch * Uses dense_vector field type for embeddings * kNN search for similarity matching * Support for DDL, documentation, and question-SQL pairs * Automatic index creation with proper mappings * Authentication support (API key or basic auth) - ElasticNIMVanna: Combined class using Elasticsearch + NVIDIA NIM * Drop-in replacement for NIMVanna (ChromaDB version) * Compatible with existing Vanna training workflows Vector Operations: - add_ddl(): Store DDL statements with embeddings - add_documentation(): Store documentation with embeddings - add_question_sql(): Store question-SQL pairs with embeddings - get_similar_question_sql(): Retrieve similar Q&A pairs via kNN - get_related_ddl(): Find relevant DDL statements - get_related_documentation(): Find relevant docs - remove_training_data(): Delete entries by ID Configuration: - url: Elasticsearch connection URL - index_name: ES index name (default: vanna_vectors) - api_key: Optional API key authentication - username/password: Optional basic authentication - embedding_function: NVIDIA or custom embedding function This provides users flexibility to choose between ChromaDB (local file-based) or Elasticsearch (scalable, distributed) for their vector storage needs. Signed-off-by: Vineeth Kalluru <[email protected]>
Add comprehensive test script and documentation for testing SQL Retriever with MySQL database and Elasticsearch vector store. Files Added: - test_mysql_elastic.py: Standalone test script for MySQL + Elasticsearch * Tests ElasticNIMVanna with MySQL connection via SQLAlchemy * Comprehensive CLI arguments for configuration * Connection validation for both MySQL and Elasticsearch * Training data support with automatic initialization * Detailed output with query results and statistics - test_training_data.yaml: Sample training data for e-commerce schema * DDL for customers, products, orders, order_items tables * Documentation about schema and query patterns * 7 example question-SQL pairs for common queries - TEST_README.md: Complete testing guide * Setup instructions for MySQL and Elasticsearch * Usage examples with different configurations * Command-line options documentation * Troubleshooting guide * Expected output examples Features: - Connection validation before running tests - Automatic training data initialization - Summary statistics for numeric results - Colored/formatted output for readability - Comprehensive error handling and logging - Docker setup instructions included Usage: python test_mysql_elastic.py -q "Show top 10 products" This provides a complete testing harness for validating the SQL Retriever with production-like configurations (MySQL + Elasticsearch). Signed-off-by: Vineeth Kalluru <[email protected]>
- Implement ElasticVectorStore class for Vanna integration - Add ElasticNIMVanna class combining Elasticsearch + NVIDIA NIM - Support for storing and retrieving training data embeddings - Vector similarity search for DDL, documentation, and Q&A pairs - Thread-safe initialization and connection management - Compatible with Elasticsearch 8.x This extends the existing SQL Retriever to support Elasticsearch as an alternative vector storage backend alongside ChromaDB. Signed-off-by: Vineeth Kalluru <[email protected]>
WalkthroughAdds a new SQL Retriever feature with Vanna-based NL→SQL generation, vector-store setup (ChromaDB and optional Elasticsearch), database connectivity (SQLite/Postgres/SQLAlchemy), initialization/training utilities, and registration into the retriever registry. Updates dependencies to include chromadb, pandas, vanna, and optional psycopg2-binary. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor App
participant Registry as Retriever Registry
participant Factory as create_sql_retriever
participant SQLR as SQLRetriever
participant VM as VannaManager
participant V as Vanna Instance
participant DB as Database
Note over Registry: Import-time registration of SQL retriever
App->>Registry: load retriever("sql_retriever")
Registry->>Factory: create_sql_retriever(config, builder)
Factory->>VM: create_with_config(llm, embedder, vec_path, conn, db_type, training)
VM-->>Factory: VannaManager (per-config)
Factory->>SQLR: new SQLRetriever(..., VannaManager)
Factory-->>App: SQLRetriever
App->>SQLR: search(query, top_k, return_sql)
SQLR->>VM: get_instance()
VM->>V: ensure created (vector store, DB, training)
SQLR->>V: generate_sql_safe(query)
V-->>SQLR: SQL string
SQLR->>DB: execute SQL
DB-->>SQLR: rows (DataFrame)
SQLR-->>App: RetrieverOutput (documents + metadata)
opt Error
SQLR-->>App: RetrieverError
end
sequenceDiagram
autonumber
participant VM as VannaManager
participant V as Vanna
participant VS as Vector Store
participant DB as Database
VM->>VM: _needs_initialization()
alt Needs init
VM->>V: create with LLM+Embedding
V->>VS: initialize vector store
VM->>V: init_vanna(training data, DDL/docs)
else Already initialized
VM->>V: reuse instance
end
VM->>DB: connect (sqlite/postgres/sqlalchemy)
DB-->>VM: connected/ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (23)
pyproject.toml (1)
41-41
: Consider tightening pandas version constraint.The constraint
pandas~=2.0
is quite broad and permits any 2.x release. Given that pandas 2.3.2 is the current stable version (per retrieved learnings), and pandas has significant API evolution toward 3.0, consider whetherpandas~=2.2
orpandas~=2.3
would be more appropriate to avoid unexpected breaking changes in future 2.x releases.src/nat/retriever/sql_retriever/register.py (2)
45-47
: Consider renamingembedding_name
toembedder_name
for consistency.The field is named
embedding_name
, but the parallel field isllm_name
(notllm_model_name
). For consistency, considerembedder_name
to match the pattern of naming after the component type rather than the output artifact.
106-107
: Document ValueError for missing LLM and embedder names
builder.get_llm_config throws ValueError("LLM{llm_name}
not found") and get_embedder_config throws ValueError("Tool{embedder_name}
not found") when the specified names aren’t registered. Update the factory or builder docstring to document these exceptions.src/nat/retriever/sql_retriever/sql_retriever.py (1)
63-63
: Unused**kwargs
parameter.The
**kwargs
parameter in__init__
is unused. If it's reserved for future extensibility or compatibility with base class signatures, consider adding a docstring note. Otherwise, remove it.Based on learnings
src/nat/retriever/sql_retriever/vanna_util.py (14)
27-61
: Add type hints and a typed config for NIMCustomLLM.Annotate parameters/returns and consider a TypedDict/Pydantic model for config to avoid stringly-typed lookups and ARG/TRY lints.
Example diff (minimal typing):
-class NIMCustomLLM(VannaBase): +class NIMCustomLLM(VannaBase): @@ - def __init__(self, config=None): + def __init__(self, config: dict | None = None) -> None: @@ - if "temperature" in config: - self.temperature = config["temperature"] + if "temperature" in config: + self.temperature = float(config["temperature"])
77-113
: Clean exception logging; remove manual traceback and unused kwargs.Follow project guidance: use logger.error() when re-raising; no duplicate stack traces; drop unused kwargs.
- def submit_prompt(self, prompt, **kwargs) -> str: + def submit_prompt(self, prompt: list[dict[str, str]], **kwargs) -> str: @@ - try: + try: response = self.client.invoke(prompt) @@ - except Exception as e: - logger.error(f"Error in submit_prompt: {e}") - logger.error(f"Error type: {type(e)}") - import traceback - - logger.error(f"Full traceback: {traceback.format_exc()}") - raise + except Exception as e: + logger.error(f"submit_prompt failed: {e}") + raise
166-195
: ES client: support auth/SSL/timeouts and exception chaining.Chain ImportError and surface optional SSL verify/timeout knobs; avoid broad warnings.
- except ImportError: - raise ImportError( + except ImportError as e: + raise ImportError( "elasticsearch package is required for ElasticVectorStore. " "Install it with: pip install elasticsearch" - ) + ) from e @@ - self.es_client = Elasticsearch(self.url, **client_kwargs) + # Consider: verify_certs/ca_certs when using https, and retries + self.es_client = Elasticsearch(self.url, **client_kwargs)If you want, I can add per-call request timeouts (e.g., request_timeout=30) on search/index operations. Confirm the ES client version in this repo to finalize parameter names.
201-226
: Index creation: ensure dims and immediate availability.After creating the index, first writes might not be searchable. Prefer refresh when indexing docs later.
No code change here; see indexing diffs below to add refresh="wait_for".
248-283
: Use a non-cryptographic collision‑resistant hash for IDs and add refresh.md5 is flagged and truncates entropy if shortened later. Use sha256 and ensure indexed docs are searchable immediately.
- import hashlib + import hashlib @@ - doc_id = hashlib.md5(ddl.encode()).hexdigest() + doc_id = hashlib.sha256(ddl.encode()).hexdigest() @@ - self.es_client.index(index=self.index_name, id=doc_id, document=doc) + self.es_client.index(index=self.index_name, id=doc_id, document=doc, refresh="wait_for")
284-314
: Same hashing/refresh update for documentation.- import hashlib + import hashlib @@ - doc_id = hashlib.md5(documentation.encode()).hexdigest() + doc_id = hashlib.sha256(documentation.encode()).hexdigest() @@ - self.es_client.index(index=self.index_name, id=doc_id, document=doc) + self.es_client.index(index=self.index_name, id=doc_id, document=doc, refresh="wait_for")
315-352
: Embed the combined question+SQL for better retrieval; update hashing/refresh.Embedding only the question may lose SQL context; index on combined text for kNN.
- import hashlib + import hashlib @@ - doc_id = hashlib.md5(combined_text.encode()).hexdigest() - embedding = self._generate_embedding(question) + doc_id = hashlib.sha256(combined_text.encode()).hexdigest() + embedding = self._generate_embedding(combined_text) @@ - self.es_client.index(index=self.index_name, id=doc_id, document=doc) + self.es_client.index(index=self.index_name, id=doc_id, document=doc, refresh="wait_for")
459-477
: Use logger.exception when swallowing errors.You return False on failure; capture stack via logger.exception().
- except Exception as e: - logger.error(f"Error removing training data {id}: {e}") + except Exception as e: + logger.exception(f"Error removing training data {id}: {e}") return False
491-524
: Paginate training data retrieval.size=10000 risks truncation. Use scroll or search_after for large corpora.
I can add a generator using search_after and sort by _id if desired.
597-631
: Avoid shadowing built‑ininput
and batch with embed_documents.Improve perf and readability; keep return shape consistent.
- def __call__(self, input): + def __call__(self, texts: str | list[str]) -> list[list[float]]: @@ - if isinstance(input, str): - input_data = [input] + if isinstance(texts, str): + input_data = [texts] else: - input_data = input + input_data = texts @@ - embeddings = [] - for i, text in enumerate(input_data): - logger.debug(f"Embedding text {i+1}/{len(input_data)}: {text[:50]}...") - embedding = self.embeddings.embed_query(text) - embeddings.append(embedding) + embeddings = self.embeddings.embed_documents(input_data)
641-688
: Trim duplicate stack traces; keep error+re-raise.Remove manual traceback; message already logged.
- except Exception as e: - logger.error(f"Error generating embeddings for query: {e}") - logger.error(f"Error type: {type(e)}") - logger.error(f"Query text: {query_text}") - import traceback - - logger.error(f"Full traceback: {traceback.format_exc()}") - raise + except Exception as e: + logger.error(f"embed_query failed: {e}") + raise
689-711
: Same logging clean‑up for embed_documents.- except Exception as e: - logger.error(f"Error generating document embeddings: {e}") - logger.error(f"Error type: {type(e)}") - logger.error(f"Input documents count: {len(input)}") - import traceback - - logger.error(f"Full traceback: {traceback.format_exc()}") - raise + except Exception as e: + logger.error(f"embed_documents failed for {len(input)} docs: {e}") + raise
771-947
: Annotate return type and use logger.exception where errors are swallowed.Add -> None; change error logging when not re‑raising (e.g., doc chunk loop).
-def init_vanna(vn, training_data_path: str | None = None): +def init_vanna(vn, training_data_path: str | None = None) -> None: @@ - except Exception as e: - logger.error(f"Error training documentation chunk {i}: {e}") + except Exception as e: + logger.exception(f"Error training documentation chunk {i}: {e}") # Continue with other chunks
353-396
: Ensure ES kNN filter support (8.2.0+) and handle missing hits
- ES 8.2.0+ introduced
filter
inside the top-levelknn
clause (Integrate filtering support for ANN #84734; see ES 8.2.0 release notes cite1).- Guard against missing hits to avoid KeyError:
- for hit in response["hits"]["hits"]: + for hit in response.get("hits", {}).get("hits", []):src/nat/retriever/sql_retriever/vanna_manager.py (5)
42-44
: Annotate class vars to satisfy RUF012.-from typing import Dict +from typing import ClassVar, Dict @@ - _instances: Dict[str, "VannaManager"] = {} - _lock = threading.Lock() + _instances: ClassVar[Dict[str, "VannaManager"]] = {} + _lock: ClassVar[threading.Lock] = threading.Lock()
285-317
: Remove unused imports; keep error semantics.psycopg2 imports are unused; drop them to avoid unnecessary dependency pressure here.
- try: - import psycopg2 - from psycopg2.pool import SimpleConnectionPool - - logger.info("Connecting to PostgreSQL database...") + try: + logger.info("Connecting to PostgreSQL database...")
331-340
: Add SQLAlchemy engine robustness.Enable pool_pre_ping to avoid stale connections.
- engine = create_engine(connection_string) + engine = create_engine(connection_string, pool_pre_ping=True)
443-493
: Add return type for factory method.- ): + ) -> "VannaManager":
495-517
: Stronger hash and avoid collisions for config_key.md5 truncated to 12 chars risks collisions. Use sha256 and keep more bits.
- config_str = f"{vanna_llm_config.model_name}_{vanna_embedder_config.model_name}_{vector_store_path}_{db_connection_string}_{db_type}" - return hashlib.md5(config_str.encode()).hexdigest()[:12] + config_str = f"{vanna_llm_config.model_name}|{vanna_embedder_config.model_name}|{vector_store_path}|{db_connection_string}|{db_type}" + return hashlib.sha256(config_str.encode()).hexdigest()[:24]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
pyproject.toml
(4 hunks)src/nat/retriever/register.py
(1 hunks)src/nat/retriever/sql_retriever/__init__.py
(1 hunks)src/nat/retriever/sql_retriever/register.py
(1 hunks)src/nat/retriever/sql_retriever/sql_retriever.py
(1 hunks)src/nat/retriever/sql_retriever/vanna_manager.py
(1 hunks)src/nat/retriever/sql_retriever/vanna_util.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}
: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/retriever/register.py
src/nat/retriever/sql_retriever/vanna_manager.py
src/nat/retriever/sql_retriever/sql_retriever.py
src/nat/retriever/sql_retriever/vanna_util.py
src/nat/retriever/sql_retriever/__init__.py
src/nat/retriever/sql_retriever/register.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py
: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py
: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
src/nat/retriever/register.py
src/nat/retriever/sql_retriever/vanna_manager.py
src/nat/retriever/sql_retriever/sql_retriever.py
src/nat/retriever/sql_retriever/vanna_util.py
src/nat/retriever/sql_retriever/__init__.py
src/nat/retriever/sql_retriever/register.py
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All importable Python code must live under src/ (or packages//src/)
Files:
src/nat/retriever/register.py
src/nat/retriever/sql_retriever/vanna_manager.py
src/nat/retriever/sql_retriever/sql_retriever.py
src/nat/retriever/sql_retriever/vanna_util.py
src/nat/retriever/sql_retriever/__init__.py
src/nat/retriever/sql_retriever/register.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Changes in src/nat should prioritize backward compatibility
Files:
src/nat/retriever/register.py
src/nat/retriever/sql_retriever/vanna_manager.py
src/nat/retriever/sql_retriever/sql_retriever.py
src/nat/retriever/sql_retriever/vanna_util.py
src/nat/retriever/sql_retriever/__init__.py
src/nat/retriever/sql_retriever/register.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/retriever/register.py
src/nat/retriever/sql_retriever/vanna_manager.py
src/nat/retriever/sql_retriever/sql_retriever.py
src/nat/retriever/sql_retriever/vanna_util.py
src/nat/retriever/sql_retriever/__init__.py
src/nat/retriever/sql_retriever/register.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Files:
src/nat/retriever/register.py
src/nat/retriever/sql_retriever/vanna_manager.py
src/nat/retriever/sql_retriever/sql_retriever.py
src/nat/retriever/sql_retriever/vanna_util.py
src/nat/retriever/sql_retriever/__init__.py
src/nat/retriever/sql_retriever/register.py
**/*
⚙️ CodeRabbit configuration file
**/*
: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raise
statements to maintain the original stack trace,
and uselogger.error()
(notlogger.exception()
) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txt
file, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txt
file are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/retriever/register.py
pyproject.toml
src/nat/retriever/sql_retriever/vanna_manager.py
src/nat/retriever/sql_retriever/sql_retriever.py
src/nat/retriever/sql_retriever/vanna_util.py
src/nat/retriever/sql_retriever/__init__.py
src/nat/retriever/sql_retriever/register.py
🧠 Learnings (1)
📚 Learning: 2025-09-23T18:39:15.023Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-09-23T18:39:15.023Z
Learning: Applies to packages/*/pyproject.toml : In packages, declare a dependency on nvidia-nat or packages starting with nvidia-nat-
Applied to files:
pyproject.toml
🧬 Code graph analysis (5)
src/nat/retriever/sql_retriever/vanna_manager.py (2)
src/nat/retriever/sql_retriever/vanna_util.py (3)
NIMVanna
(115-120)NVIDIAEmbeddingFunction
(557-710)init_vanna
(771-947)src/nat/retriever/sql_retriever/sql_retriever.py (1)
get_stats
(231-244)
src/nat/retriever/sql_retriever/sql_retriever.py (1)
src/nat/retriever/sql_retriever/vanna_manager.py (5)
VannaManager
(31-492)create_with_config
(443-492)get_instance
(131-199)generate_sql_safe
(396-420)get_stats
(431-440)
src/nat/retriever/sql_retriever/vanna_util.py (1)
src/nat/retriever/sql_retriever/sql_retriever.py (1)
search
(106-187)
src/nat/retriever/sql_retriever/__init__.py (1)
src/nat/retriever/sql_retriever/sql_retriever.py (1)
SQLRetriever
(33-244)
src/nat/retriever/sql_retriever/register.py (2)
src/nat/builder/builder.py (1)
Builder
(68-290)src/nat/retriever/sql_retriever/sql_retriever.py (1)
SQLRetriever
(33-244)
🪛 Ruff (0.13.2)
src/nat/retriever/sql_retriever/vanna_manager.py
42-42: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
171-173: Avoid specifying long messages outside the exception class
(TRY003)
196-196: Do not catch blind exception: Exception
(BLE001)
214-216: Avoid specifying long messages outside the exception class
(TRY003)
255-258: Avoid specifying long messages outside the exception class
(TRY003)
274-274: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
309-312: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
315-315: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
343-346: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
349-349: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
391-391: Do not catch blind exception: Exception
(BLE001)
402-402: Avoid specifying long messages outside the exception class
(TRY003)
414-414: Abstract raise
to an inner function
(TRY301)
414-414: Avoid specifying long messages outside the exception class
(TRY003)
416-416: Consider moving this statement to an else
block
(TRY300)
419-419: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
516-516: Probable use of insecure hash functions in hashlib
: md5
(S324)
src/nat/retriever/sql_retriever/sql_retriever.py
63-63: Unused method argument: kwargs
(ARG002)
141-143: Abstract raise
to an inner function
(TRY301)
141-143: Avoid specifying long messages outside the exception class
(TRY003)
150-150: Abstract raise
to an inner function
(TRY301)
150-150: Avoid specifying long messages outside the exception class
(TRY003)
183-183: Consider moving this statement to an else
block
(TRY300)
187-187: Avoid specifying long messages outside the exception class
(TRY003)
src/nat/retriever/sql_retriever/vanna_util.py
34-34: Avoid specifying long messages outside the exception class
(TRY003)
44-44: Avoid specifying long messages outside the exception class
(TRY003)
47-47: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Unused method argument: kwargs
(ARG002)
80-80: Create your own exception
(TRY002)
80-80: Avoid specifying long messages outside the exception class
(TRY003)
83-83: Create your own exception
(TRY002)
83-83: Avoid specifying long messages outside the exception class
(TRY003)
105-105: Consider moving this statement to an else
block
(TRY300)
107-107: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
108-108: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
111-111: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
144-144: Avoid specifying long messages outside the exception class
(TRY003)
155-155: Avoid specifying long messages outside the exception class
(TRY003)
171-174: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
171-174: Avoid specifying long messages outside the exception class
(TRY003)
192-192: Do not catch blind exception: Exception
(BLE001)
246-246: Avoid specifying long messages outside the exception class
(TRY003)
263-263: Probable use of insecure hash functions in hashlib
: md5
(S324)
298-298: Probable use of insecure hash functions in hashlib
: md5
(S324)
332-332: Probable use of insecure hash functions in hashlib
: md5
(S324)
459-459: Unused method argument: kwargs
(ARG002)
473-473: Consider moving this statement to an else
block
(TRY300)
474-474: Do not catch blind exception: Exception
(BLE001)
475-475: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
478-478: Unused method argument: kwargs
(ARG002)
491-491: Unused method argument: kwargs
(ARG002)
520-520: Consider moving this statement to an else
block
(TRY300)
521-521: Do not catch blind exception: Exception
(BLE001)
522-522: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
664-666: Avoid specifying long messages outside the exception class
(TRY003)
679-679: Consider moving this statement to an else
block
(TRY300)
681-681: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
682-682: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
683-683: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
686-686: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
702-702: Consider moving this statement to an else
block
(TRY300)
704-704: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
705-705: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
706-706: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
709-709: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
805-805: Do not catch blind exception: Exception
(BLE001)
815-815: Do not catch blind exception: Exception
(BLE001)
832-832: Do not catch blind exception: Exception
(BLE001)
833-833: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
895-895: Do not catch blind exception: Exception
(BLE001)
896-896: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
937-937: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
939-939: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (9)
src/nat/retriever/register.py (1)
22-22
: LGTM! Registration import follows established pattern.The import correctly registers the SQL retriever provider using the toolkit's side-effect registration mechanism, consistent with existing retriever registrations.
src/nat/retriever/sql_retriever/__init__.py (1)
1-21
: LGTM! Clean package initialization.The module correctly exports
SQLRetriever
as the public API, includes proper licensing headers, and follows Python packaging best practices.src/nat/retriever/sql_retriever/register.py (1)
31-74
: LGTM! Well-documented configuration class.The
SQLRetrieverConfig
class is comprehensive with clear field descriptions, supported database types, and connection string format examples. The Pydantic model provides good validation foundation.src/nat/retriever/sql_retriever/sql_retriever.py (4)
185-187
: LGTM! Proper exception handling and chaining.The exception handling correctly logs with
exc_info=True
, wraps the original exception inRetrieverError
, and chains it usingfrom e
. This preserves the full stack trace and provides clear error context.
106-183
: LGTM! Well-structured search implementation.The
search
method properly:
- Validates parameters and handles defaults
- Generates SQL via the VannaManager
- Checks database connectivity before execution
- Handles empty results gracefully
- Applies result limits correctly
- Includes comprehensive logging at appropriate levels
33-51
: LGTM! Comprehensive class documentation.The class docstring provides clear description, supported database types, and a practical usage example.
189-229
: LGTM! Clean DataFrame conversion logic.The
_dataframe_to_documents
method correctly converts pandas DataFrame to the expectedRetrieverOutput
format, handles None fromto_json()
, and includes comprehensive metadata.pyproject.toml (1)
86-86
: All good: psycopg2-binary pinned to latest stable
Verified that~=2.9
resolves to version 2.9.10 (latest release) and no security advisories were found. No changes needed.src/nat/retriever/sql_retriever/vanna_manager.py (1)
396-421
: Intentional logger.error + re‑raise aligns with house style.Keep logger.error (not exception) when re‑raising to avoid duplicate stack traces.
If CI flags TRY400, we can suppress at file scope or justify in pyproject per the coding_guidelines.
# Keep sorted!!! | ||
"aioboto3>=11.0.0", | ||
"authlib~=1.5", | ||
"chromadb~=1.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify chromadb version and precision.
The constraint chromadb~=1.1.0
uses 3-digit precision, while project guidelines recommend 2 digits (e.g., ~=1.1
). Verify whether the patch-level pin is necessary or if chromadb~=1.1
would suffice.
Run the following to check the latest chromadb version and any known security advisories:
🏁 Script executed:
#!/bin/bash
# Check PyPI for latest chromadb version
echo "=== Latest chromadb version ==="
curl -s https://pypi.org/pypi/chromadb/json | jq -r '.info.version'
# Check for security advisories
echo -e "\n=== Security advisories ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "chromadb") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
Length of output: 116
Use 2-digit precision for chromadb pin
Change chromadb~=1.1.0
to chromadb~=1.1
in pyproject.toml to comply with the guideline of using two-digit precision for ~=
and allow future patch-level updates.
🤖 Prompt for AI Agents
In pyproject.toml around line 23, the dependency is pinned as "chromadb~=1.1.0"
which uses three-digit precision; change it to "chromadb~=1.1" so the ~=
operator uses two-digit precision and permits future patch-level updates (update
the single dependency entry at line 23 accordingly).
"rich~=13.9", | ||
"tabulate~=0.9", | ||
"uvicorn[standard]~=0.34", | ||
"vanna~=0.7.9", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify vanna version and consider 2-digit precision.
The constraint vanna~=0.7.9
uses 3-digit precision. Per project guidelines, prefer 2-digit precision (e.g., vanna~=0.7
) unless a specific patch is required. Additionally, verify this is the latest stable version.
🏁 Script executed:
#!/bin/bash
# Check PyPI for latest vanna version
echo "=== Latest vanna version ==="
curl -s https://pypi.org/pypi/vanna/json | jq -r '.info.version'
# Check for security advisories
echo -e "\n=== Security advisories ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "vanna") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
Length of output: 696
Use 2-digit precision for vanna constraint
0.7.9 is the latest stable release and contains no unpatched vulnerabilities, so per project guidelines switch from vanna~=0.7.9
to:
vanna~=0.7
🤖 Prompt for AI Agents
In pyproject.toml around line 53, the dependency pin uses three-segment
precision "vanna~=0.7.9"; update it to two-segment precision per project
guidelines by changing the constraint to "vanna~=0.7" so it allows any
compatible 0.7.x release while maintaining the intended compatibility range.
|
||
|
||
@register_retriever(config_type=SQLRetrieverConfig) | ||
async def create_sql_retriever(config: SQLRetrieverConfig, builder: Builder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type hint.
The function signature is missing a return type hint. Per coding guidelines, all public APIs must have type hints on parameters and return values.
Apply this diff:
@register_retriever(config_type=SQLRetrieverConfig)
-async def create_sql_retriever(config: SQLRetrieverConfig, builder: Builder):
+async def create_sql_retriever(config: SQLRetrieverConfig, builder: Builder) -> SQLRetriever:
As per coding guidelines
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def create_sql_retriever(config: SQLRetrieverConfig, builder: Builder): | |
@register_retriever(config_type=SQLRetrieverConfig) | |
async def create_sql_retriever(config: SQLRetrieverConfig, builder: Builder) -> SQLRetriever: |
🤖 Prompt for AI Agents
In src/nat/retriever/sql_retriever/register.py around line 78, the async
function create_sql_retriever(config: SQLRetrieverConfig, builder: Builder) is
missing a return type annotation; update the signature to include the concrete
return type (e.g., -> SQLRetriever), import that type if necessary, and ensure
the annotation matches the actual object returned by the function.
training_data_path=training_data_path, # type: ignore[arg-type] | ||
nvidia_api_key=nvidia_api_key, # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify type: ignore suppressions are necessary.
The type checker is being suppressed for training_data_path
and nvidia_api_key
. Verify that VannaManager.create_with_config
correctly accepts Optional[str]
for these parameters, or update the signature if needed.
🏁 Script executed:
#!/bin/bash
# Check VannaManager.create_with_config signature
ast-grep --pattern $'class VannaManager {
$$$
@classmethod
def create_with_config($$$, training_data_path: $TRAINING_TYPE, $$$, nvidia_api_key: $KEY_TYPE, $$$) {
$$$
}
$$$
}'
# Also search for the method definition directly
rg -nP -A 15 'def create_with_config\(' src/nat/retriever/sql_retriever/vanna_manager.py
Length of output: 595
Remove unnecessary type ignores for Optional[str] parameters
VannaManager.create_with_config declares both training_data_path: str | None
and nvidia_api_key: str | None
, so you can drop the # type: ignore[arg-type]
on lines 98–99 in src/nat/retriever/sql_retriever/sql_retriever.py
.
🤖 Prompt for AI Agents
In src/nat/retriever/sql_retriever/sql_retriever.py around lines 98 to 99, the
arguments training_data_path and nvidia_api_key are declared Optional[str] in
VannaManager.create_with_config so the trailing "# type: ignore[arg-type]"
comments are unnecessary; remove those two type-ignore comments from the
function call so the parameters pass normal type checking.
logger.info(f"VannaManager: Creating instance for {self.config_key}") | ||
logger.debug(f"VannaManager: Vector store path: {self.vector_store_path}") | ||
logger.debug(f"VannaManager: Database connection: {self.db_connection_string}") | ||
logger.debug(f"VannaManager: Database type: {self.db_type}") | ||
logger.debug(f"VannaManager: Training data path: {self.training_data_path}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not log raw DB connection strings; redact credentials.
Current debug logs can leak usernames/passwords. Redact before logging.
- logger.debug(f"VannaManager: Vector store path: {self.vector_store_path}")
- logger.debug(f"VannaManager: Database connection: {self.db_connection_string}")
+ logger.debug(f"VannaManager: Vector store path: {self.vector_store_path}")
+ logger.debug(f"VannaManager: Database connection: { _redact_dsn(self.db_connection_string) }")
Add helper near the bottom of the module:
+from urllib.parse import urlparse, urlunparse
+
+def _redact_dsn(dsn: str) -> str:
+ try:
+ parsed = urlparse(dsn)
+ if parsed.username or parsed.password:
+ netloc = f"{parsed.hostname or ''}"
+ if parsed.port:
+ netloc += f":{parsed.port}"
+ parsed = parsed._replace(netloc=netloc)
+ return urlunparse(parsed)
+ except Exception:
+ return "<redacted>"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
logger.info(f"VannaManager: Creating instance for {self.config_key}") | |
logger.debug(f"VannaManager: Vector store path: {self.vector_store_path}") | |
logger.debug(f"VannaManager: Database connection: {self.db_connection_string}") | |
logger.debug(f"VannaManager: Database type: {self.db_type}") | |
logger.debug(f"VannaManager: Training data path: {self.training_data_path}") | |
logger.info(f"VannaManager: Creating instance for {self.config_key}") | |
logger.debug(f"VannaManager: Vector store path: {self.vector_store_path}") | |
logger.debug(f"VannaManager: Database connection: { _redact_dsn(self.db_connection_string) }") | |
logger.debug(f"VannaManager: Database type: {self.db_type}") | |
logger.debug(f"VannaManager: Training data path: {self.training_data_path}") |
🤖 Prompt for AI Agents
In src/nat/retriever/sql_retriever/vanna_manager.py around lines 224 to 229, the
debug logs print raw DB connection strings which can leak credentials; implement
a helper (placed near the bottom of the module) that accepts a connection string
and returns a redacted version by masking username and password (or any
"user:pass@" segment and common credential query params) and update the logger
calls to use this helper instead of logging self.db_connection_string directly;
ensure the helper safely handles empty/None values and different URL formats and
use it for any other places that log the DB connection.
# Create instance | ||
vn_instance = NIMVanna( | ||
VectorConfig={ | ||
"client": "persistent", | ||
"path": self.vector_store_path, | ||
"embedding_function": NVIDIAEmbeddingFunction( | ||
api_key=self.nvidia_api_key, | ||
model=self.vanna_embedder_config.model_name, | ||
), | ||
}, | ||
LLMConfig={ | ||
"api_key": self.nvidia_api_key, | ||
"model": self.vanna_llm_config.model_name, | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure NVIDIA API key is present before creating Vanna.
Fail fast with a clear message to avoid None passing to NVIDIAEmbeddingFunction.
- vn_instance = NIMVanna(
+ if not self.nvidia_api_key:
+ raise RuntimeError("VannaManager: NVIDIA API key is required (env NVIDIA_API_KEY or parameter).")
+ vn_instance = NIMVanna(
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Create instance | |
vn_instance = NIMVanna( | |
VectorConfig={ | |
"client": "persistent", | |
"path": self.vector_store_path, | |
"embedding_function": NVIDIAEmbeddingFunction( | |
api_key=self.nvidia_api_key, | |
model=self.vanna_embedder_config.model_name, | |
), | |
}, | |
LLMConfig={ | |
"api_key": self.nvidia_api_key, | |
"model": self.vanna_llm_config.model_name, | |
}, | |
) | |
# Create instance | |
if not self.nvidia_api_key: | |
raise RuntimeError( | |
"VannaManager: NVIDIA API key is required (env NVIDIA_API_KEY or parameter)." | |
) | |
vn_instance = NIMVanna( | |
VectorConfig={ | |
"client": "persistent", | |
"path": self.vector_store_path, | |
"embedding_function": NVIDIAEmbeddingFunction( | |
api_key=self.nvidia_api_key, | |
model=self.vanna_embedder_config.model_name, | |
), | |
}, | |
LLMConfig={ | |
"api_key": self.nvidia_api_key, | |
"model": self.vanna_llm_config.model_name, | |
}, | |
) |
🤖 Prompt for AI Agents
In src/nat/retriever/sql_retriever/vanna_manager.py around lines 230 to 244, the
code constructs NIMVanna passing self.nvidia_api_key into
NVIDIAEmbeddingFunction and LLMConfig without verifying the key; add an explicit
pre-check that self.nvidia_api_key is present (non-empty/None) and if not, raise
a ValueError or log an error and exit with a clear message like "NVIDIA API key
missing: cannot create Vanna instance", before attempting to instantiate
NIMVanna so None is never passed to NVIDIAEmbeddingFunction.
def init_vanna(vn, training_data_path: str | None = None): | ||
""" | ||
Initialize and train a Vanna instance for SQL generation using configurable training data. | ||
This function configures a Vanna SQL generation agent with training data loaded from a YAML file, | ||
making it scalable for different SQL data sources with different contexts. | ||
Args: | ||
vn: Vanna instance to be trained and configured | ||
training_data_path: Path to YAML file containing training data. If None, no training is applied. | ||
Returns: | ||
None: Modifies the Vanna instance in-place | ||
Example: | ||
>>> from vanna.chromadb import ChromaDB_VectorStore | ||
>>> vn = NIMCustomLLM(config) & ChromaDB_VectorStore() | ||
>>> vn.connect_to_sqlite("path/to/database.db") | ||
>>> init_vanna(vn, "path/to/training_data.yaml") | ||
>>> # Vanna is now ready to generate SQL queries | ||
""" | ||
import os | ||
|
||
logger.info("=== Starting Vanna initialization ===") | ||
|
||
# Get and train DDL from sqlite_master (if connected to SQLite) | ||
if hasattr(vn, "run_sql") and vn.run_sql_is_set: | ||
logger.info("Loading DDL from database...") | ||
try: | ||
# Try SQLite-specific query first | ||
try: | ||
df_ddl = vn.run_sql( | ||
"SELECT type, sql FROM sqlite_master WHERE sql is not null" | ||
) | ||
except Exception: | ||
# For non-SQLite databases, try standard information_schema | ||
try: | ||
df_ddl = vn.run_sql( | ||
""" | ||
SELECT table_name, column_name, data_type | ||
FROM information_schema.columns | ||
WHERE table_schema = 'public' | ||
""" | ||
) | ||
except Exception as e: | ||
logger.warning(f"Could not auto-load DDL from database: {e}") | ||
df_ddl = None | ||
|
||
if df_ddl is not None and not df_ddl.empty: | ||
ddl_count = len(df_ddl) | ||
logger.info(f"Found {ddl_count} DDL statements in database") | ||
|
||
if "sql" in df_ddl.columns: | ||
for i, ddl in enumerate(df_ddl["sql"].to_list(), 1): | ||
if ddl: | ||
logger.debug(f"Training DDL {i}/{ddl_count}: {ddl[:100]}...") | ||
vn.train(ddl=ddl) | ||
|
||
logger.info( | ||
f"Successfully trained {ddl_count} DDL statements from database" | ||
) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix DDL training count and non‑SQLite path.
Only statements with a valid SQL column are trained; the success log miscounts.
- if df_ddl is not None and not df_ddl.empty:
- ddl_count = len(df_ddl)
+ if df_ddl is not None and not df_ddl.empty:
+ ddl_trained_from_db = 0
logger.info(f"Found {ddl_count} DDL statements in database")
@@
- for i, ddl in enumerate(df_ddl["sql"].to_list(), 1):
+ for i, ddl in enumerate(df_ddl["sql"].to_list(), 1):
if ddl:
logger.debug(f"Training DDL {i}/{ddl_count}: {ddl[:100]}...")
vn.train(ddl=ddl)
+ ddl_trained_from_db += 1
@@
- logger.info(
- f"Successfully trained {ddl_count} DDL statements from database"
- )
+ logger.info(f"Successfully trained {ddl_trained_from_db} DDL statements from database")
Optionally, convert information_schema rows into CREATE TABLE statements before training.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.13.2)
805-805: Do not catch blind exception: Exception
(BLE001)
815-815: Do not catch blind exception: Exception
(BLE001)
832-832: Do not catch blind exception: Exception
(BLE001)
Some high-level intermediate feedback (I understand this is still in draft):
|
Description
This PR introduces a new SQL Retriever component to NeMo Agent Toolkit, enabling natural language to SQL query generation using Vanna AI framework with NVIDIA NIM integration.
The SQL Retriever tool converts natural language questions into executable SQL queries by leveraging:
Key Features
1. Multi-Database Support
2. Dual Vector Store Support
ChromaDB: Embedded, file-based vector storage (default)
Elasticsearch: Distributed, scalable vector storage
Implementation Details
SQLRetriever Class
Implements the
Retriever
interface from NAT framework:search()
: Converts natural language to SQL and retrieves dataRetrieverOutput
with documentsVannaManager (Singleton Pattern)
Vector Store Implementations
ChromaVectorStore (via Vanna):
ElasticVectorStore (new):
Example Usage
Training Data Format
Configuration
Added to
pyproject.toml
:Dependencies
Required:
vanna~=0.7.9
- Vanna AI frameworkchromadb~=1.1.0
- Default vector storepandas~=2.0
- Data processingelasticsearch~=8.14.0
- Optional Elasticsearch supportOptional:
psycopg2-binary~=2.9
- PostgreSQL supportpymysql
- MySQL support (via SQLAlchemy)Breaking Changes
None - this is a new feature addition.
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Chores