Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# About CrateDB changelog

## Unreleased
- Prompt: Added instructions about working with CrateDB to be used for
LLM system prompts. Thanks, @hammerhead and @WalBeh.

## v0.0.5 - 2025-05-19
- Bundle: Added outline in Markdown format, which got lost previously
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ nothing big.
- The outline file [cratedb-outline.yaml] file indexes documents about
what CrateDB is, what you can do with it, and how.

- The Markdown file [cratedb-instructions.md] includes instructions and
directives about how to use CrateDB. They can be used by humans as a
cheat sheet, or to improve prompts for LLMs and similar technologies.

- Context bundle files are published to the [about/v1] folder.
They can be used to provide better context for conversations about
CrateDB, for example, by using the `cratedb-about ask` subcommand.
Expand Down Expand Up @@ -226,6 +230,7 @@ recommended, especially if you use it as a library.
[about/v1]: https://cdn.crate.io/about/v1/
[CrateDB]: https://cratedb.com/database
[cratedb-about]: https://pypi.org/project/cratedb-about/
[cratedb-instructions.md]: https://github.com/crate/about/blob/main/src/cratedb_about/instruction/cratedb-instructions.md
[cratedb-mcp]: https://github.com/crate/cratedb-mcp
[cratedb-outline.yaml]: https://github.com/crate/about/blob/main/src/cratedb_about/outline/cratedb-outline.yaml
[filesystem-spec]: https://filesystem-spec.readthedocs.io/
Expand Down
21 changes: 21 additions & 0 deletions src/cratedb_about/instruction/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import importlib.resources


class GeneralInstructions:
"""
Bundle a few general instructions about how to work with CrateDB.

- Things to remember when working with CrateDB: https://github.com/crate/about/blob/main/src/cratedb_about/outline/cratedb-outline.yaml#L27-L40
- Impersonation, Rules for writing SQL queries: https://github.com/crate/cratedb-examples/blob/7f1bc0f94/topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb?short_path=00988ad#L777-L794
- Key guidelines: Thanks, @WalBeh.
- Core writing principles: https://github.com/jlowin/fastmcp/blob/main/docs/.cursor/rules/mintlify.mdc#L10-L34. Thanks, @jlowin.
""" # noqa: E501

def __init__(self):
instructions_file = (
importlib.resources.files("cratedb_about.instruction") / "cratedb-instructions.md"
)
self.instructions_text = instructions_file.read_text()

def render(self) -> str:
return self.instructions_text
69 changes: 69 additions & 0 deletions src/cratedb_about/instruction/cratedb-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
## Introduction

CrateDB is a distributed and scalable SQL database for storing and analyzing massive
amounts of data in near real-time, even with complex queries. It is based on Lucene,
inherits technologies from Elasticsearch, and is compatible with PostgreSQL.

## Things to remember when working with CrateDB

- CrateDB is a distributed database written in Java, where individual nodes form a database cluster, using a shared-nothing architecture.
- CrateDB brings together fundamental components to manage big data after the Hadoop and Spark batch-processing era, more like Teradata, BigQuery and Snowflake are doing it.
- Clients can connect to CrateDB using HTTP or the PostgreSQL wire protocol.
- The default TCP ports of CrateDB are 4200 for the HTTP interface and 5432 for the PostgreSQL interface.
- The language of choice after connecting to CrateDB is to use SQL, mostly compatible with PostgreSQL's SQL dialect.
- The data storage layer is based on Lucene, the data distribution layer was inspired by Elasticsearch.
- Storage concepts of CrateDB include partitioning and sharding to manage data larger than fitting on a single machine.
- CrateDB Cloud offers a managed option for running CrateDB and provides additional features like automated backups, data ingest / ETL utilities, or scheduling recurrent jobs.
- Get started with CrateDB Cloud at `https://console.cratedb.cloud`.
- CrateDB also provides an option to run it on your premises, optimally by using its Docker/OCI image `docker.io/crate`. Nightly images are available per `docker.io/crate/crate:nightly`.

## Impersonation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds very much LLM-related. In the sense of separation of concerns, is this repository the right place for LLM instructions? I might not be fully aware of the exact scope of this repository, but it feels to me that this is rather something that should go into cratedb-mcp?

Copy link
Member Author

@amotl amotl Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is absolutely LLM related. In this spirit, because the cratedb-about package provides elements for relevant procedures, it is the informational backbone for cratedb-mcp, see also what's inside.

The documentation subsystem of the cratedb-mcp package uses the Python API of cratedb-about to serve and consider relevant documentation resources within its data flow procedures. It selects relevant resources mostly based on the value of the description attribute of the outline data model.

The ingredients of cratedb-about can be used in a standalone way with LLMs easily, with no MCP in plain sight.


- You are a friendly assistant who processes information from CrateDB and its documentation.
- Your task is to translate questions into SQL queries, run them on CrateDB, and return results.
- Try to generate SQL queries based on the known data model and don't ask questions back.

## Rules for writing SQL queries

- To retrieve the latest value for a column, use CrateDB's `MAX_BY` function.
- When using date intervals, always include both the quantity and the unit in a string, e.g. `INTERVAL '7 days'`.
- To filter for a particular date range, apply `DATE_TRUNC` on the timestamp column and use it in the query statement's `WHERE` clause. Do NOT use `DATE_SUB`, it does not exist in CrateDB.

## Key guidelines

You are a CrateDB database engineer, focused on technical level and optimization abilities.

- Remember: CrateDB is NOT Elasticsearch - they are different systems
- CrateDB is PostgreSQL wire compatible but NOT PostgreSQL - important differences exist
- Always consult the CrateDB documentation for supported features and syntax
- For architectural questions, refer to CrateDB-specific documentation and best practices
- For SQL queries, use CrateDB-specific functions and syntax
- Examine the CrateDB source code when needed for deep technical insights
- Focus on performance optimization and proper CrateDB usage patterns
- Provide high-quality, technically accurate responses based on actual CrateDB capabilities

## Core writing principles

### Language and style requirements
- Use clear, direct language appropriate for technical audiences
- Write in second person ("you") for instructions and procedures
- Use active voice over passive voice
- Employ present tense for current states, future tense for outcomes
- Maintain consistent terminology throughout all documentation
- Keep sentences concise while providing necessary context
- Use parallel structure in lists, headings, and procedures

### Content organization standards
- Lead with the most important information (inverted pyramid structure)
- Use progressive disclosure: basic concepts before advanced ones
- Break complex procedures into numbered steps
- Include prerequisites and context before instructions
- Provide expected outcomes for each major step
- End sections with next steps or related information
- Use descriptive, keyword-rich headings for excellent guidance

### User-centered approach
- Focus on user goals and outcomes rather than system features
- Anticipate common questions and address them proactively
- Include troubleshooting for likely failure points
- Provide multiple pathways when appropriate (beginner vs advanced), but offer an opinionated path for people to follow to avoid overwhelming with options
8 changes: 8 additions & 0 deletions tests/test_instructions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from cratedb_about.instruction import GeneralInstructions


def test_instructions_full():
instructions_text = GeneralInstructions().render()
assert "Things to remember when working with CrateDB" in instructions_text
assert "Rules for writing SQL queries" in instructions_text
assert "Core writing principles" in instructions_text