-
Notifications
You must be signed in to change notification settings - Fork 0
Prompt: Add instructions about working with CrateDB #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new instructional system for CrateDB has been introduced, including a Markdown guide, corresponding Python module for loading and accessing instructions, updates to documentation files, and a test to verify content delivery. The changes provide structured guidance and principles for working with CrateDB, both for human users and as prompt material for language models. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant InstructionsModule as cratedb_about.instruction
participant Resource as cratedb-instructions.md
User->>InstructionsModule: Import GeneralInstructions
User->>InstructionsModule: Instantiate GeneralInstructions
User->>InstructionsModule: Call render()
InstructionsModule->>Resource: Load cratedb-instructions.md content
InstructionsModule-->>User: Return full instructions text
Estimated code review effort2 (~20 minutes) Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/cratedb_about/instruction/__init__.py (1)
9-19: Consider adding error handling for resource loading.While the current implementation is clean, consider adding error handling around the file reading operation to provide more informative error messages if the markdown file is missing or corrupted.
+import importlib.resources +from pathlib import Path + +try: + instructions_file = ( + importlib.resources.files("cratedb_about.instruction") / "cratedb-instructions.md" + ) + instructions_text = instructions_file.read_text() +except (FileNotFoundError, ImportError) as e: + raise RuntimeError(f"Failed to load CrateDB instructions: {e}") from e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGES.md(1 hunks)README.md(2 hunks)src/cratedb_about/instruction/__init__.py(1 hunks)src/cratedb_about/instruction/cratedb-instructions.md(1 hunks)tests/test_instructions.py(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: amotl
PR: crate/about#32
File: src/cratedb_about/outline/cratedb-outline.yaml:321-329
Timestamp: 2025-05-15T21:25:54.870Z
Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:16:33.171Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
CHANGES.md (4)
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:16:33.171Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#32
File: src/cratedb_about/outline/cratedb-outline.yaml:321-329
Timestamp: 2025-05-15T21:25:54.870Z
Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
README.md (6)
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:16:33.171Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#32
File: src/cratedb_about/outline/cratedb-outline.yaml:321-329
Timestamp: 2025-05-15T21:25:54.870Z
Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:76-76
Timestamp: 2025-05-15T11:27:22.793Z
Learning: In the cratedb_about package, the `Settings.llms_txt_payload` classproperty in `src/cratedb_about/query/model.py` handles loading context files from either a local filesystem path or an HTTP URL based on the `ABOUT_CONTEXT_URL` environment variable.
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:75-76
Timestamp: 2025-05-15T11:27:34.535Z
Learning: For the `cratedb_about` repository, the maintainer prefers generic exception handling rather than catching specific exception types.
src/cratedb_about/instruction/__init__.py (2)
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:76-76
Timestamp: 2025-05-15T11:27:22.793Z
Learning: In the cratedb_about package, the `Settings.llms_txt_payload` classproperty in `src/cratedb_about/query/model.py` handles loading context files from either a local filesystem path or an HTTP URL based on the `ABOUT_CONTEXT_URL` environment variable.
Learnt from: amotl
PR: crate/about#32
File: src/cratedb_about/outline/cratedb-outline.yaml:321-329
Timestamp: 2025-05-15T21:25:54.870Z
Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
src/cratedb_about/instruction/cratedb-instructions.md (1)
Learnt from: amotl
PR: crate/about#32
File: src/cratedb_about/outline/cratedb-outline.yaml:321-329
Timestamp: 2025-05-15T21:25:54.870Z
Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
🧬 Code Graph Analysis (1)
tests/test_instructions.py (1)
src/cratedb_about/instruction/__init__.py (2)
Instructions(9-19)full(18-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Python 3.9
🔇 Additional comments (9)
src/cratedb_about/instruction/__init__.py (1)
1-6: Well-structured resource loading implementation.The use of
importlib.resourcesis the correct modern approach for accessing package data files. Loading the content at module level is efficient since it's read once when imported.CHANGES.md (1)
4-5: Excellent changelog entry.The entry clearly describes the new feature and properly credits contributors. The formatting and style are consistent with existing entries.
README.md (2)
63-66: Well-integrated documentation update.The description clearly explains the dual purpose of the instructions file for both human users and LLM contexts. The content fits naturally into the existing documentation structure.
233-233: Proper reference link addition.The hyperlink reference is correctly added to the reference section, maintaining consistency with the existing documentation style.
tests/test_instructions.py (1)
4-8: Effective test implementation.The test properly verifies that the
Instructions.full()method returns content with the expected key sections. The assertions are meaningful and check for the core instructional components.src/cratedb_about/instruction/cratedb-instructions.md (4)
1-19: Excellent foundational content.The introduction and key concepts section provides a comprehensive overview of CrateDB's architecture and deployment options. The content is well-structured and technically accurate, effectively serving both human users and LLM contexts.
20-31: Clear operational guidelines.The impersonation and SQL rules sections provide specific, actionable guidance. The SQL rules particularly address common pitfalls and CrateDB-specific functions, which will be valuable for generating accurate queries.
32-44: Strong technical guidelines.The key guidelines section effectively differentiates CrateDB from similar systems (Elasticsearch, PostgreSQL) while emphasizing the importance of consulting CrateDB-specific documentation. This will help prevent common misconceptions.
45-70: Comprehensive writing standards.The core writing principles section establishes clear standards for language, organization, and user-centered approach. This aligns well with the retrieved learnings about consistent language for LLM contexts while maintaining readability for human users.
... to be used for LLM system prompts.
The general instructions provided here can be easily used in a generic way, for example by using Simon Willison's handsome Prerequisitesexport OPENAI_API_KEY=sk-XJZ7ofog5GpIT--INVALID--bkFJ0CJ5lyAKSefZKdV1Y3S0
uv pip install llmAsk something specific, but without any contextllm --model gpt-4.1 "How do I work with date intervals?"Ask the same, encouraging CrateDBinstructions="https://raw.githubusercontent.com/crate/about/c3421748df4aa65703a292fdfac7d81dc8f9df24/src/cratedb_about/instruction/cratedb-instructions.md"
llm --model gpt-4.1 --fragment $instructions "How do I work with date intervals?"You'd need to evaluate the outcome yourself, including its correctness. 🙏 Details
To work with date intervals in CrateDB, use the standard SQL How to Use Date Intervals in CrateDB1. Add or Subtract Intervals to TimestampsTo advance or rewind a timestamp value, use the SELECT
CURRENT_TIMESTAMP AS now,
CURRENT_TIMESTAMP - INTERVAL '7 days' AS seven_days_ago,
CURRENT_TIMESTAMP + INTERVAL '2 hours' AS in_two_hours
;Expected Outcome: 2. Filter Data Using Date IntervalsTo filter rows within a specific time range, compare a timestamp against an interval expression: SELECT *
FROM your_table
WHERE timestamp_column >= CURRENT_TIMESTAMP - INTERVAL '1 day'
;Expected Outcome: 3. Truncate Timestamps Using
|
|
In this way, you can enrich the LLM prompt using both the llms.txt file to enrich the context, and the new instructions file to enjoy the best of both worlds. # URL to llms.txt file (> 1 MB).
context=https://cdn.crate.io/about/v1/llms-full.txt
# URL to instructions file (~ 4 kB).
instructions="https://raw.githubusercontent.com/crate/about/instructions/src/cratedb_about/instruction/cratedb-instructions.md"export OPENAI_API_KEY=sk-XJZ7ofog5GpIT--INVALID--bkFJ0CJ5lyAKSefZKdV1Y3S0
uvx llm --model gpt-4.1 --fragment=$context --system-fragment=$instructions "How do I use pandas with CrateDB?"Note You will need to use gpt-4.1 or a similar model that accepts large context windows. The Note For improved LLM-based documentation inquiry, burning less tokens because of higher selectiveness, please use the CrateDB MCP Server. |
| - When using date intervals, always include both the quantity and the unit in a string, e.g. INTERVAL '7 days'. | ||
| - Don't use DATE_SUB, it does not exist in CrateDB. Use DATE_TRUNC instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two were an attempt to steer AWS Nova in the right direction, as it regularly came up with wrong syntax. But it didn't have much effect. It shouldn't be needed if we can make the LLM consume the documentation. Maybe those very explicit instructions aren't even needed anymore with the overall improved system prompt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Maybe Nova was not able to understand the instructions right? I think it is good to have them, every bit counts. Coding with LLMs in the summer of 2025 emphasizes it:
Provide large context
When your goal is to reason with an LLM about implementing or fixing some code, you need to provide extensive information to the LLM [...]
About other details...
It shouldn't be needed if we can make the LLM consume the documentation.
The problem is that we can't make the LLM consume all the documentation at once without incurring higher costs, and selective MCP docs serving also provides ambiguities. In the spirit of Python and others, explicit is always better than implicit. 1
Maybe those very explicit instructions aren't even needed anymore with the overall improved system prompt?
Right, those are the very explicit instructions that will resemble the system prompt. Let's curate them together to include all relevant hands-on details, compiled into a single handy document in Markdown format in the proposed layout, style and jargon, derived from other's work. You can imagine it as THE CrateDB cheatsheet you've always dreamed of. It looks like contemporary LLMs will understand it fluently:
When dealing with specific technologies that are not so widespread / obvious, it is often a good idea to also add the documentation in the context window. For example when writing tests for vector sets, a Redis data type so new that LLMs don’t yet know about, I add the README file in the context: with such trivial trick, the LLM can use vector sets at expert level immediately.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DATE_SUB is a MySQL function. I was just picturing what prompt we end up with if we add an explicit rule whenever the LLM hallucinates and randomly picks functions from other databases.
- To retrieve the latest value for a column, use CrateDB's
MAX_BYfunction.
In contrast, this reads like a valid instruction. It's agnostic of any other database, and it provides constructive advice.
- Don't use DATE_SUB, it does not exist in CrateDB. Use DATE_TRUNC instead.
That, however, is very particular for the eventuality that the LLM draws wrong conclusions, and ends up wanting to query CrateDB with DATE_SUB. To phrase it in a positive way, maybe this should rather be something like:
- To filter for a particular date range, apply
DATE_TRUNCon the timestamp column and use it in the query's WHERE clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely way to go, giving advises in a concise, positive and proactive way. Thanks!
Because we also added the mintlify tech-writing instructions to the general instructions of the cratedb-about package, per "Core writing principles" section, it absolutely makes sense to add advises in the same writing style, always preferring active voice.
about/src/cratedb_about/instruction/cratedb-instructions.md
Lines 45 to 69 in c342174
| ## Core writing principles | |
| ### Language and style requirements | |
| - Use clear, direct language appropriate for technical audiences | |
| - Write in second person ("you") for instructions and procedures | |
| - Use active voice over passive voice | |
| - Employ present tense for current states, future tense for outcomes | |
| - Maintain consistent terminology throughout all documentation | |
| - Keep sentences concise while providing necessary context | |
| - Use parallel structure in lists, headings, and procedures | |
| ### Content organization standards | |
| - Lead with the most important information (inverted pyramid structure) | |
| - Use progressive disclosure: basic concepts before advanced ones | |
| - Break complex procedures into numbered steps | |
| - Include prerequisites and context before instructions | |
| - Provide expected outcomes for each major step | |
| - End sections with next steps or related information | |
| - Use descriptive, keyword-rich headings for excellent guidance | |
| ### User-centered approach | |
| - Focus on user goals and outcomes rather than system features | |
| - Anticipate common questions and address them proactively | |
| - Include troubleshooting for likely failure points | |
| - Provide multiple pathways when appropriate (beginner vs advanced), but offer an opinionated path for people to follow to avoid overwhelming with options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's on the other repository, this ticket collects our previous steps in this regard, and also includes pointers to excellent work of others in this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added 1eb7216 to improve this particular rule / instruction / convention. I've kept the advise to NOT use DATE_SUB, but relocated it after the positive advise what to do instead, like you proposed.
[...] very particular for the eventuality that the LLM draws wrong conclusions [...]
I think we should not beat around the bush and treat those "negative" cases equally well and proactively add advises to protect against the most likely corresponding pitfalls. If any hiccups have been observed already, it is clearly a signal to help out with a small rule -- every bit counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the gist of our discussion to the other ticket, so it will not get lost in a PR. Thanks again!
| - Get started with CrateDB Cloud at `https://console.cratedb.cloud`. | ||
| - CrateDB also provides an option to run it on your premises, optimally by using its Docker/OCI image `docker.io/crate`. Nightly images are available per `docker.io/crate/crate:nightly`. | ||
|
|
||
| ## Impersonation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds very much LLM-related. In the sense of separation of concerns, is this repository the right place for LLM instructions? I might not be fully aware of the exact scope of this repository, but it feels to me that this is rather something that should go into cratedb-mcp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is absolutely LLM related. In this spirit, because the cratedb-about package provides elements for relevant procedures, it is the informational backbone for cratedb-mcp, see also what's inside.
The documentation subsystem of the cratedb-mcp package uses the Python API of
cratedb-aboutto serve and consider relevant documentation resources within its data flow procedures. It selects relevant resources mostly based on the value of the description attribute of the outline data model.
The ingredients of cratedb-about can be used in a standalone way with LLMs easily, with no MCP in plain sight.
About
Instructions about working with CrateDB have been missing dearly. They can be used by humans as a cheat sheet, or to improve prompts for LLMs and similar technologies. Thanks for contributing a few, @hammerhead and @WalBeh.
Details
I've stacked different kinds of guidelines on top of each other. Future iterations might remix and bring them into shape further. Please add any of your thoughts for improvements to the review comments.
What's inside
References
Backlog