Skip to content

Conversation

@amotl
Copy link
Member

@amotl amotl commented May 10, 2025

About

The cratedb-about package now includes a knowledge outline file in YAML format, cratedb-outline.yaml, and also provides a compact Python API cratedb_about.CrateDbKnowledgeOutline to read and query it. 1 /cc @WalBeh

Details

As far as we can foresee it, it can populate the existing DOCUMENTATION_INDEX with a few more curated resources, which could possibly level up knowledge in question answering.

# Load knowledge outline from YAML file and read all items.
outline = CrateDbKnowledgeOutline.load()
DOCUMENTATION_INDEX = outline.find_items().to_dict()

Notes

Note

That's a stacked PR. GH-9 will need to go in first.

Note

This change has not been tested. Can you try it before or after merging, to be confident it works reasonably in practice?

Footnotes

  1. If you like the idea, cratedb-outline.yaml is the right spot to propose any changes going forward, serving as a single source of truth for a curated knowledge outline. Because we picked up that the description attribute is very important as it will guide LLMs to the right resources, any contributions to improve the status quo are very much welcome.

@amotl amotl changed the base branch from main to boilerplate May 10, 2025 08:32
@amotl amotl force-pushed the use-about-package branch from c6c07a4 to 952476f Compare May 10, 2025 08:33
@coderabbitai
Copy link

coderabbitai bot commented May 10, 2025

"""

Walkthrough

The changes transition the documentation index from a hardcoded list to dynamic loading from an external YAML source, update documentation tool descriptions and URL validation logic, add a new dependency for the knowledge outline, introduce an HTTP timeout setting, and adjust related tests to accommodate the new dynamic structure and broader URL support.

Changes

File(s) Change Summary
cratedb_mcp/knowledge.py Replaced the hardcoded documentation index with dynamic loading from an external YAML via CrateDbKnowledgeOutline, updating DOCUMENTATION_INDEX accordingly; added documentation_url_permitted function to restrict allowed documentation URLs.
cratedb_mcp/__main__.py Updated import statements; added HTTP timeout to HTTP requests; revised tool descriptions to use "title" instead of "name"; broadened allowed documentation fetch URLs using documentation_url_permitted; generalized docstrings.
cratedb_mcp/settings.py Added new constant HTTP_TIMEOUT to define default HTTP request timeout duration.
pyproject.toml Added cachetools<6 and cratedb-about==0.0.4 as new dependencies.
tests/test_knowledge.py Updated tests to check for a larger, dynamically loaded documentation index with new keys and expanded content.
tests/test_mcp.py Split and renamed tests for permitted documentation fetch URLs; updated forbidden URL test to expect new error message and URL validation logic.
.github/workflows/release-pypi.yml Added --verbose flag to the twine upload command for more detailed output during package upload.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MainTool
    participant Knowledge
    participant ExternalYAML

    User->>MainTool: Request documentation index
    MainTool->>Knowledge: Load DOCUMENTATION_INDEX
    Knowledge->>ExternalYAML: Load knowledge outline YAML
    ExternalYAML-->>Knowledge: Return documentation metadata
    Knowledge-->>MainTool: Return DOCUMENTATION_INDEX
    MainTool-->>User: Deliver documentation index
Loading
sequenceDiagram
    participant User
    participant MainTool

    User->>MainTool: fetch_cratedb_docs(url)
    MainTool->>MainTool: Check if url is permitted by domain
    alt Allowed
        MainTool->>User: Return fetched documentation
    else Not allowed
        MainTool->>User: Raise ValueError ("Link is not permitted: ...")
    end
Loading

Suggested reviewers

  • surister

Poem

In burrows deep, the docs now grow,
No longer hardcoded, but YAML in tow!
URLs are broader, the tests now gleam,
CrateDB’s knowledge flows like a stream.
With every hop, our wisdom expands—
A rabbit’s delight in these digital lands!
🐇📚✨
"""


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35727ca and 06980f5.

📒 Files selected for processing (6)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • cratedb_mcp/settings.py
  • tests/test_knowledge.py
  • pyproject.toml
  • tests/test_mcp.py
  • cratedb_mcp/main.py
  • cratedb_mcp/knowledge.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
pyproject.toml (1)

22-24: Consider using a version tag instead of a commit hash

Using a specific commit hash provides reproducibility but lacks flexibility for future updates. Consider using a tag or a branch reference if you want to include future bug fixes and improvements without manual updates.

-  "cratedb-about @ git+https://github.com/crate/about.git@b259d33e68bfd",
+  "cratedb-about @ git+https://github.com/crate/[email protected]",

When possible, consider publishing cratedb-about to PyPI for easier installation and dependency management.

tests/test_mcp.py (1)

22-30: Consider mocking HTTP requests in tests

These tests make actual HTTP requests to external URLs, which could lead to flaky tests if those resources become unavailable.

Consider using mocks to test the URL validation logic without making actual HTTP requests:

@pytest.mark.parametrize(
    "url", [
        "https://raw.githubusercontent.com/crate/crate/refs/heads/5.10/docs/general/builtins/scalar-functions.rst",
        "https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/scalar-functions.rst.txt",
    ]
)
def test_fetch_docs_permitted(url, mocker):
    mock_response = mocker.Mock()
    mock_response.text = "some text with initcap"
    mocker.patch("httpx.get", return_value=mock_response)
    
    response = fetch_cratedb_docs(url)
    assert "initcap" in response
    httpx.get.assert_called_once_with(url)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5727a27 and 952476f.

📒 Files selected for processing (5)
  • cratedb_mcp/__main__.py (1 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_mcp.py (1)
cratedb_mcp/__main__.py (1)
  • fetch_cratedb_docs (28-36)
🔇 Additional comments (5)
tests/test_mcp.py (1)

16-19: The error message assertion is correctly testing the new format

The test now properly verifies the updated error message format that includes the exact URL.

cratedb_mcp/__main__.py (1)

22-23: Description updated to reflect new field name

The tool description correctly uses "title" instead of "name" to align with the changes in the documentation index structure.

cratedb_mcp/knowledge.py (3)

89-107: Good documentation of the knowledge outline structure!

The docstring effectively explains the purpose and structure of the documentation sections, includes the canonical source, and provides clear examples. This will be valuable for future maintenance.


109-111:

Details

✅ Verification successful

Verify compatibility with dependent modules.

The AI summary indicates that this change impacts multiple files, particularly changing the key from "name" to "title". Ensure that all dependent code has been updated to use the new structure.


🏁 Script executed:

#!/bin/bash
# Search for code that might still be using the old "name" field
# instead of the new "title" field from the documentation index

# Look for references to DOCUMENTATION_INDEX and "name" field
rg --no-heading "DOCUMENTATION_INDEX.*name" --glob "*.py" || echo "No references to DOCUMENTATION_INDEX with name found"

# Look for functions that might use the documentation index
rg --no-heading "get_cratedb_documentation_index" --glob "*.py"

Length of output: 429


This will confirm whether any downstream code or tests still expect a "name" field.


🏁 Script executed:

#!/bin/bash
# Display the definition of get_cratedb_documentation_index in __main__.py
rg -n -C3 "def get_cratedb_documentation_index" --glob "cratedb_mcp/__main__.py"

# Show all references and context in __main__.py
rg -C5 "get_cratedb_documentation_index" --glob "cratedb_mcp/__main__.py"

# Show test usages in tests/test_mcp.py
rg -C3 "get_cratedb_documentation_index" --glob "tests/test_mcp.py"

# Check for any literal 'name' references in tests
rg -n "'name'" --glob "tests/*.py" || echo "No literal 'name' in tests"

Length of output: 2246


No residual “name” key usage found—migration to “title” is safe.
I verified references in both cratedb_mcp/__main__.py and tests/test_mcp.py and found no remaining uses of the old "name" field. All dependent modules and tests interact with DOCUMENTATION_INDEX via get_cratedb_documentation_index(), with no direct key lookups against "name".


109-111:

Details

❓ Verification inconclusive

Add error handling for outline loading process.

The current implementation doesn't handle potential failures when loading the knowledge outline. Since the PR description mentions this change hasn't been tested, robust error handling is critical.

-# Load knowledge outline from YAML file and read all items.
-outline = CrateDbKnowledgeOutline.load()
-DOCUMENTATION_INDEX = outline.find_items(as_dict=True)
+# Load knowledge outline from YAML file and read all items.
+try:
+    outline = CrateDbKnowledgeOutline.load()
+    DOCUMENTATION_INDEX = outline.find_items(as_dict=True)
+    # Validate that we have loaded at least some items
+    if not DOCUMENTATION_INDEX:
+        raise ValueError("No documentation items found in the knowledge outline")
+except Exception as e:
+    import logging
+    logging.error(f"Failed to load knowledge outline: {e}")
+    # Fallback to an empty list or minimal essential documentation
+    DOCUMENTATION_INDEX = []

Can you verify the functionality works by running the following script?


🏁 Script executed:

#!/bin/bash
# Verify the CrateDbKnowledgeOutline loading works correctly

# Check if the cratedb-about package is installed
pip show cratedb-about

# Create a simple test script to load and print the outline
cat > test_outline_loading.py << 'EOF'
from cratedb_about import CrateDbKnowledgeOutline
try:
    outline = CrateDbKnowledgeOutline.load()
    items = outline.find_items(as_dict=True)
    print(f"Successfully loaded {len(items)} documentation items")
    if items:
        print("\nFirst item:")
        for key, value in items[0].items():
            print(f"  {key}: {value}")
except Exception as e:
    print(f"Error loading outline: {e}")
EOF

# Run the test script
python test_outline_loading.py

Length of output: 323


Add error handling for knowledge outline loading

Our sandbox couldn’t import cratedb_about (ModuleNotFoundError), so we couldn’t verify that CrateDbKnowledgeOutline.load() succeeds. To make this more robust in your environment—where cratedb_about is installed—wrap the load/find process in a try/except and fall back safely:

File: cratedb_mcp/knowledge.py (around lines 109–111)

- # Load knowledge outline from YAML file and read all items.
- outline = CrateDbKnowledgeOutline.load()
- DOCUMENTATION_INDEX = outline.find_items(as_dict=True)
+ # Load knowledge outline from YAML file and read all items.
+ try:
+     outline = CrateDbKnowledgeOutline.load()
+     DOCUMENTATION_INDEX = outline.find_items(as_dict=True)
+     if not DOCUMENTATION_INDEX:
+         raise ValueError("No documentation items found in the knowledge outline")
+ except Exception as e:
+     import logging
+     logging.error(f"Failed to load knowledge outline: {e}")
+     # Fallback to an empty list or minimal essential documentation
+     DOCUMENTATION_INDEX = []

Please verify in your local setup that:

  • cratedb_about is installed and importable
  • CrateDbKnowledgeOutline.load() returns the expected items

@amotl amotl force-pushed the use-about-package branch from 87958f1 to d9b33ff Compare May 10, 2025 08:53
@amotl amotl requested review from kneth and surister May 10, 2025 09:25
@amotl amotl marked this pull request as ready for review May 10, 2025 09:26
@amotl amotl force-pushed the use-about-package branch from afb9e4d to b93c3a8 Compare May 10, 2025 12:51
@amotl amotl changed the title MCP documentation: Use resource index from cratedb-about MCP docs: Use resource index from cratedb-about May 10, 2025
@surister
Copy link
Member

Will test later

Base automatically changed from boilerplate to main May 10, 2025 17:39
@amotl amotl force-pushed the use-about-package branch from b93c3a8 to 910f59f Compare May 10, 2025 17:57
@amotl amotl force-pushed the use-about-package branch from 910f59f to b936ba2 Compare May 10, 2025 17:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
cratedb_mcp/knowledge.py (1)

3-3: 🛠️ Refactor suggestion

Add error handling for the imported dependency (unaddressed).

The code imports CrateDbKnowledgeOutline from cratedb-about but still doesn't include error handling if the package is missing. This was previously suggested but hasn't been implemented.

-from cratedb_about import CrateDbKnowledgeOutline
+try:
+    from cratedb_about import CrateDbKnowledgeOutline
+except ImportError:
+    raise ImportError(
+        "The 'cratedb-about' package is required. "
+        "Please install it using 'pip install cratedb-about'."
+    )
🧹 Nitpick comments (2)
cratedb_mcp/knowledge.py (1)

114-118: Enhance URL validation function with docstring and more specific patterns.

The function could benefit from a docstring explaining its purpose, and the URL validation could be more specific to prevent potential security issues.

 def documentation_url_permitted(url: str) -> bool:
+    """
+    Validate if a documentation URL is from a permitted domain.
+    
+    Only URLs from CrateDB domains and specific GitHub repositories are allowed.
+    
+    Args:
+        url: The URL to validate
+        
+    Returns:
+        bool: True if the URL is from a permitted domain, False otherwise
+    """
     return (
-            url.startswith("https://cratedb.com/") or
-            url.startswith("https://github.com/crate") or
-            url.startswith("https://raw.githubusercontent.com/crate"))
+            url.startswith("https://cratedb.com/") or
+            url.startswith("https://github.com/crate/") or
+            url.startswith("https://raw.githubusercontent.com/crate/"))
tests/test_knowledge.py (1)

5-10: Test is now more robust against data changes.

The test has been improved to check for specific titles rather than relying on specific indices, making it more resilient to changes in the source data. This addresses the previous review comment.

However, the test might be even more robust with additional assertions:

 def test_documentation_index():
     titles = [item["title"] for item in DOCUMENTATION_INDEX]
     assert len(titles) >= 50
+    # Verify that all entries have required fields
+    for item in DOCUMENTATION_INDEX:
+        assert "title" in item
+        assert "link" in item
+        assert "description" in item
     assert "CrateDB database" in titles
     assert "CrateDB features" in titles
     assert "CrateDB SQL reference: Scalar functions" in titles
     assert "Guide: CrateDB query optimization" in titles
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b93c3a8 and 18791b8.

📒 Files selected for processing (7)
  • .github/workflows/release-pypi.yml (1 hunks)
  • cratedb_mcp/__main__.py (3 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • .github/workflows/release-pypi.yml
  • cratedb_mcp/settings.py
  • pyproject.toml
  • cratedb_mcp/main.py
  • tests/test_mcp.py
🔇 Additional comments (1)
cratedb_mcp/knowledge.py (1)

89-107: Great documentation on knowledge structure and importance.

The added docstring provides clear explanations about the documentation sections and highlights the importance of the description field for guiding the LLM. The examples also clearly illustrate the expected format.

Comment on lines 109 to 111
# Load knowledge outline from YAML file and read all items.
outline = CrateDbKnowledgeOutline.load()
DOCUMENTATION_INDEX = outline.find_items().to_dict()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for outline loading.

The code loads the knowledge outline without any error handling. If the outline fails to load (due to network issues, file corruption, etc.), the application would crash. Consider adding try-except blocks to handle potential errors gracefully.

-# Load knowledge outline from YAML file and read all items.
-outline = CrateDbKnowledgeOutline.load()
-DOCUMENTATION_INDEX = outline.find_items().to_dict()
+# Load knowledge outline from YAML file and read all items.
+try:
+    outline = CrateDbKnowledgeOutline.load()
+    DOCUMENTATION_INDEX = outline.find_items().to_dict()
+except Exception as e:
+    # Log the error or provide fallback
+    import logging
+    logging.error(f"Failed to load knowledge outline: {e}")
+    # Fallback to empty index or minimal set
+    DOCUMENTATION_INDEX = []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Load knowledge outline from YAML file and read all items.
outline = CrateDbKnowledgeOutline.load()
DOCUMENTATION_INDEX = outline.find_items().to_dict()
# Load knowledge outline from YAML file and read all items.
try:
outline = CrateDbKnowledgeOutline.load()
DOCUMENTATION_INDEX = outline.find_items().to_dict()
except Exception as e:
# Log the error or provide fallback
import logging
logging.error(f"Failed to load knowledge outline: {e}")
# Fallback to empty index or minimal set
DOCUMENTATION_INDEX = []
🤖 Prompt for AI Agents
In cratedb_mcp/knowledge.py around lines 109 to 111, the code loads the
knowledge outline without error handling, which can cause crashes if loading
fails. Wrap the outline loading and item finding code in a try-except block to
catch exceptions, log an appropriate error message, and handle the failure
gracefully, such as by returning an empty dictionary or a default value for
DOCUMENTATION_INDEX.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
cratedb_mcp/knowledge.py (2)

3-3: 🛠️ Refactor suggestion

Add error handling for the imported dependency.

The import for CrateDbKnowledgeOutline from cratedb_about doesn't include error handling. If the package is missing, the application will crash at startup.

-from cratedb_about import CrateDbKnowledgeOutline
+try:
+    from cratedb_about import CrateDbKnowledgeOutline
+except ImportError:
+    raise ImportError(
+        "The 'cratedb-about' package is required. "
+        "Please install it using 'pip install cratedb-about'."
+    )

111-112: 🛠️ Refactor suggestion

Add error handling for outline loading.

The knowledge outline is loaded without any error handling. If loading fails (due to network issues, file corruption, etc.), the application would crash.

def __init__(self):
-    self.outline = CrateDbKnowledgeOutline.load()
+    try:
+        self.outline = CrateDbKnowledgeOutline.load()
+    except Exception as e:
+        import logging
+        logging.error(f"Failed to load knowledge outline: {e}")
+        # Provide a minimal fallback
+        self.outline = None

Also update the items method to handle the fallback:

@cachetools.cached(cache={})
def items(self):
+    if self.outline is None:
+        return []
    return self.outline.find_items().to_dict()
🧹 Nitpick comments (4)
cratedb_mcp/knowledge.py (2)

119-123: URL validation is improved but could be more robust.

The current implementation checks if URLs start with specific prefixes, which is better than the previous implementation but could still allow malicious URLs with carefully crafted subdomains.

Consider using urllib.parse.urlparse for more robust URL validation:

def documentation_url_permitted(url: str) -> bool:
-    return (
-            url.startswith("https://cratedb.com/") or
-            url.startswith("https://github.com/crate") or
-            url.startswith("https://raw.githubusercontent.com/crate"))
+    from urllib.parse import urlparse
+    parsed_url = urlparse(url)
+    
+    # Ensure HTTPS scheme
+    if parsed_url.scheme != "https":
+        return False
+        
+    # Check domain and path combinations
+    if parsed_url.netloc == "cratedb.com":
+        return True
+    elif parsed_url.netloc == "github.com" and parsed_url.path.startswith("/crate"):
+        return True
+    elif parsed_url.netloc == "raw.githubusercontent.com" and parsed_url.path.startswith("/crate"):
+        return True
+        
+    return False

114-116: Consider configuring the cache size or expiration.

The @cachetools.cached decorator is used with an empty dictionary as the cache, which has no size limit. This could potentially lead to memory issues if the outline contains a very large number of items or if it changes frequently.

-@cachetools.cached(cache={})
+@cachetools.cached(cache=cachetools.TTLCache(maxsize=1, ttl=3600))  # Cache for 1 hour

This ensures that the cache will be refreshed periodically, preventing stale data and limiting memory usage.

cratedb_mcp/__main__.py (1)

14-15: Add docstring to explain DocumentationIndex initialization.

The comment doesn't clearly explain what happens during documentation index loading or potential failure scenarios. Consider adding more details.

-# Load CrateDB documentation outline.
-documentation_index = DocumentationIndex()
+# Load CrateDB documentation outline from cratedb-about package.
+# This will fetch and parse the YAML file containing documentation metadata.
+# Note: If loading fails, the application will use a fallback empty index.
+documentation_index = DocumentationIndex()
tests/test_knowledge.py (1)

7-7: Consider future-proofing the minimum count assertion.

While increasing the minimum expected count from 35 to 50 reflects the current state, consider whether this might still be brittle if the external source changes significantly. If the primary concern is ensuring sufficient documentation, you might consider a more modest minimum threshold or supplement with additional checks for critical documentation sections.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18791b8 and 35727ca.

📒 Files selected for processing (6)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • cratedb_mcp/settings.py
  • tests/test_mcp.py
  • pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (1)
cratedb_mcp/__main__.py (1)
cratedb_mcp/knowledge.py (4)
  • DocumentationIndex (89-116)
  • Queries (6-86)
  • documentation_url_permitted (119-123)
  • items (115-116)
🔇 Additional comments (2)
tests/test_knowledge.py (2)

1-1: Import statement properly updated.

The import now includes DocumentationIndex from cratedb_mcp.knowledge, which is necessary for the updated test function and aligns with the PR objective of transitioning to a dynamic documentation index.


5-11: Good implementation of resilient testing approach.

The test has been appropriately updated to use the new DocumentationIndex class rather than a static constant, aligning with the PR's objective of dynamically loading documentation from an external YAML source. The approach of checking for specific titles rather than indices makes the test resilient to changes in the underlying data structure.

This implementation successfully addresses the previous feedback about making tests more robust against changes in the source data.

"""Fetches a CrateDB documentation link."""
if not documentation_url_permitted(link):
raise ValueError(f'Link is not permitted: {link}')
return client.get(link, timeout=HTTP_TIMEOUT).text
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for HTTP requests.

The HTTP GET request doesn't include error handling for timeouts or connection failures, which could cause the application to crash if the documentation source is unavailable.

-    return client.get(link, timeout=HTTP_TIMEOUT).text
+    try:
+        response = client.get(link, timeout=HTTP_TIMEOUT)
+        response.raise_for_status()  # Raise an exception for 4XX/5XX responses
+        return response.text
+    except httpx.TimeoutException:
+        raise ValueError(f"Request timed out after {HTTP_TIMEOUT} seconds: {link}")
+    except httpx.HTTPStatusError as e:
+        raise ValueError(f"HTTP error {e.response.status_code} while fetching: {link}")
+    except httpx.RequestError as e:
+        raise ValueError(f"Request error while fetching: {link} - {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return client.get(link, timeout=HTTP_TIMEOUT).text
try:
response = client.get(link, timeout=HTTP_TIMEOUT)
response.raise_for_status() # Raise an exception for 4XX/5XX responses
return response.text
except httpx.TimeoutException:
raise ValueError(f"Request timed out after {HTTP_TIMEOUT} seconds: {link}")
except httpx.HTTPStatusError as e:
raise ValueError(f"HTTP error {e.response.status_code} while fetching: {link}")
except httpx.RequestError as e:
raise ValueError(f"Request error while fetching: {link} - {str(e)}")
🤖 Prompt for AI Agents
In cratedb_mcp/__main__.py at line 43, the HTTP GET request lacks error handling
for timeouts and connection failures. Wrap the client.get call in a try-except
block to catch exceptions like requests.Timeout and requests.ConnectionError,
then handle these exceptions gracefully by logging an error or returning a
fallback value to prevent the application from crashing.

' before answering questions. Has documentation title, description, and link.')
def get_cratedb_documentation_index():
return DOCUMENTATION_INDEX
return documentation_index.items()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling when accessing documentation items.

If the outline couldn't be loaded or if there's an error during access, the application would crash. Consider adding error handling to gracefully handle potential issues.

def get_cratedb_documentation_index():
-    return documentation_index.items()
+    try:
+        return documentation_index.items()
+    except Exception as e:
+        import logging
+        logging.error(f"Failed to access documentation index: {e}")
+        # Return a minimal fallback
+        return []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return documentation_index.items()
def get_cratedb_documentation_index():
try:
return documentation_index.items()
except Exception as e:
import logging
logging.error(f"Failed to access documentation index: {e}")
# Return a minimal fallback
return []
🤖 Prompt for AI Agents
In cratedb_mcp/__main__.py at line 35, the return statement directly accesses
documentation_index.items() without error handling, which can cause the
application to crash if the outline fails to load or an error occurs. Wrap the
access to documentation_index.items() in a try-except block to catch potential
exceptions, and handle them gracefully by logging the error and returning an
empty list or a suitable fallback value.


def query_cratedb(query: str) -> list[dict]:
return httpx.post(f'{HTTP_URL}/_sql', json={'stmt': query}).json()
return httpx.post(f'{HTTP_URL}/_sql', json={'stmt': query}, timeout=HTTP_TIMEOUT).json()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for the database query.

Similar to the documentation fetching, the database query doesn't include error handling for timeouts or connection failures.

def query_cratedb(query: str) -> list[dict]:
-    return httpx.post(f'{HTTP_URL}/_sql', json={'stmt': query}, timeout=HTTP_TIMEOUT).json()
+    try:
+        response = httpx.post(f'{HTTP_URL}/_sql', json={'stmt': query}, timeout=HTTP_TIMEOUT)
+        response.raise_for_status()  # Raise an exception for 4XX/5XX responses
+        return response.json()
+    except httpx.TimeoutException:
+        raise ValueError(f"Database query timed out after {HTTP_TIMEOUT} seconds")
+    except httpx.HTTPStatusError as e:
+        raise ValueError(f"HTTP error {e.response.status_code} while querying database")
+    except httpx.RequestError as e:
+        raise ValueError(f"Request error while querying database: {str(e)}")
+    except ValueError as e:
+        raise ValueError(f"Invalid JSON response from database: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return httpx.post(f'{HTTP_URL}/_sql', json={'stmt': query}, timeout=HTTP_TIMEOUT).json()
def query_cratedb(query: str) -> list[dict]:
try:
response = httpx.post(f'{HTTP_URL}/_sql', json={'stmt': query}, timeout=HTTP_TIMEOUT)
response.raise_for_status() # Raise an exception for 4XX/5XX responses
return response.json()
except httpx.TimeoutException:
raise ValueError(f"Database query timed out after {HTTP_TIMEOUT} seconds")
except httpx.HTTPStatusError as e:
raise ValueError(f"HTTP error {e.response.status_code} while querying database")
except httpx.RequestError as e:
raise ValueError(f"Request error while querying database: {str(e)}")
except ValueError as e:
raise ValueError(f"Invalid JSON response from database: {str(e)}")
🤖 Prompt for AI Agents
In cratedb_mcp/__main__.py at line 22, the database query using httpx.post lacks
error handling for timeouts and connection failures. Wrap the httpx.post call in
a try-except block to catch exceptions like httpx.TimeoutException and
httpx.RequestError, and handle them appropriately, such as logging the error and
returning a fallback response or raising a custom exception.

cratedb-about [1,2] includes a knowledge outline file in YAML format,
`cratedb-outline.yaml`, and also provides a Python API to read and
query it.

[1] https://pypi.org/project/cratedb-about
[2] https://github.com/crate/about
@amotl amotl force-pushed the use-about-package branch from 35727ca to 06980f5 Compare May 16, 2025 11:13
@amotl amotl merged commit 418af7a into main May 16, 2025
6 checks passed
@amotl amotl deleted the use-about-package branch May 16, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants