This index provides a list of notebooks for ingesting, querying, and operating Apache Druid.
Visit the index of notebooks by release for quick access to new and updated notebooks following different releases of Apache Druid.
- Ingestion into Apache Druid.
- Query with both the interactive and MSQ API.
- Operations to manage and administer your cluster.
- Insider guides for specific use cases.
- More sample data sets.
There are also dedicated notebooks that dive more into the components used to create this learning environment.
Visit the 02-ingestion
folder for notebooks focused on using streaming and batch ingestion.
Title | Description | Docker Profile |
---|---|---|
Druid data types | Work through several examples of table schemas with different underlying data types, as well as methods for converting between them. | druid-jupyter |
Arrays | Ingesting, creating, and manipulating ARRAYs and the UNNEST operator. | druid-jupyter |
Spatial | Ingest spatial dimensions and use rectangular, circular, and polygon filters to query. | druid-jupyter |
Nested objects | Work through ingesting, querying, and transforming nested columns. | druid-jupyter |
NULL | Examples of how to treat incoming data to generate NULL values, and work with them using scalar functions, aggregations, and arrays. | druid-jupyter |
UPDATE, DELETE and UPSERT | Examples of how to apply changes to data in Druid, including updates, deletes and upsert logic. | druid-jupyter |
Title | Description | Docker Profile |
---|---|---|
Introduction to streaming ingestion | An introduction to streaming ingestion using Apache Kafka. | all-services |
Defining data schemas | Manual and automatic schema detection for incoming data streams. | all-services |
Transforming incoming rows | Examples of transforming data in real-time as it arrives. | all-services |
Filtering incoming rows | Work through examples of using filters on incoming data streams. | all-services |
Rollup | Applying a GROUP BY at ingestion time and emit SUM, MAX, MIN, and other aggregates, including Apache Datasketches. | all-services |
Streaming segment generation and care | Scale up a streaming ingestion, see the impact on segments, and try out a compaction job. | all-services |
Multi-topic Kafka ingestion | A walk through of automatic topic detection for streaming ingestion. | all-services |
Title | Description | Docker Profile |
---|---|---|
Introduction to batch ingestion | Work through of SQL based batch ingestion. | druid-jupyter |
Primary and secondary partitioning in batch ingestion | Use PARTITIONED BY and CLUSTERED BY to optimize query performance. | druid-jupyter |
Generating Apache Datasketches at ingestion time | Generate sketch objects to support approximate distinct count operations as part of ingestion. | druid-jupyter |
For tutorials focused on effective use of all manner of SELECT
statements in Apache Druid, see the notebooks in 03-query
.
Title | Description | Docker Profile |
---|---|---|
Learn the basics of Druid SQL | An introduction to the unique aspects of Druid SQL. | druid-jupyter |
GROUP BY | Use GROUP BY in various forms to aggregate your data. | druid-jupyter |
COUNT DISTINCT | Work through approximate and accurate ways of counting unique occurrences of data. | druid-jupyter |
SQL API | See examples of getting results from the Druid SQL API directly. | druid-jupyter |
TopN approximation | Understand the approximation used for GROUP BY queries with ORDER BY and LIMIT. | druid-jupyter |
Analyzing data distributions | Use approximation to estimate quantiles, ranks, and histograms. | druid-jupyter |
UNION ALL | Work through using the two types of UNION ALL operation available in Druid. | druid-jupyter |
TABLE(APPEND) | Work through using the TABLE(APPEND) operation available in Druid to combine multiple tables for queries. | druid-jupyter |
Lookup tables | See how LOOKUP tables can be used to enrich and update data | druid-jupyter |
Lookup tables - Kafka | Walk through how to set up a LOOKUP reading from an Apache Kafka topic. | druid-jupyter |
Time functions | Using scalar functions against time data to transform, filter, and aggregate at ingestion and query time. | druid-jupyter |
String functions | See how different string functions can be used at ingestion and query time. | druid-jupyter |
IPv4 functions | A short notebook on IPv4 functions in Druid SQL. | druid-jupyter |
CASE | Examples of using the two forms of CASE function available in Druid SQL. | druid-jupyter |
Window functions | Examples of RANK, LAG, LEAD, and other window functions. | druid-jupyter |
JOIN | A full review of all join strategies available in Druid with examples and performance comparisons. | druid-jupyter |
PIVOT and UNPIVOT | Use PIVOT to convert row values into columns. Use UNPIVOT to convert column values into rows. | druid-jupyter |
Asychronous historical queries | Use asynchronous queries to access data without prefetch to historicals. | all-services |
Asychronous real-time queries | Use asynchronous queries to combine real-time and historical data. | all-services |
Exporting data (experimental) | Walk through using INSERT INTO EXTERN to export query results. | druid-jupyter |
Retention load rules | Using load rules to prescribe up how much data is cached on historicals, including when used with multiple tiers. | tiered-druid-jupyter |
The 05-operations
folder contains notebooks related to on-going administration and operation of the Apache Druid database.
Title | Description | Docker Profile |
---|---|---|
Apache Druid logging | Walk through configuration options for log files. | jupyter |
Streaming and SQL-based ingestion logs | A notebook focused on task logs. | jupyter |
Apache Druid metrics | An overview of metrics available from Apache Druid. | jupyter |
Compaction - partitioning | A walkthrough of compaction tasks being used to change the PARTITIONED BY and CLUSTERED BY of an existing table, especially important for streaming use cases. | druid-jupyter |
Compaction - data and schema | Examples of compaction jobs being used to remove dimensions, filter out data, and apply a new level of aggregation. | druid-jupyter |
In 01-introduction
you'll find a library of insider guides created in partnership with Apache Druid community members around the world. Each one includes links to official documentation that you should read, and to notebooks that will give you knowledge of relevant functionality in Druid.
The 06-datasets
folder contains guidance for ingesting other datasets into Druid. These can be useful when wanting to try out some of Druid's features.
The 99-contributing
folder contains notebooks that explain a little more about the learning environment and its components. You'll also find templates for submitting your own content.
Title | Description | Docker Profile |
---|---|---|
Druid Python API | Learn more about the Python wrapper used by the notebooks. | None |
Data generator - files | Use the data generator to create batch-ingestable files. | all-services |
Data generator - streams | Send data to Kafka directly from the data generator. | all-services |
Data generator profiles | Learn how to use different data generator simulation profiles. | all-services |
Boilerplate data generator ingestions | Example SQL and native ingestion specifications for grabbing data generator sample data. | all-services |