Skip to content

Latest commit

 

History

History

notebooks

Notebook index

This index provides a list of notebooks for ingesting, querying, and operating Apache Druid.

Visit the index of notebooks by release for quick access to new and updated notebooks following different releases of Apache Druid.

There are also dedicated notebooks that dive more into the components used to create this learning environment.

Ingestion

Visit the 02-ingestion folder for notebooks focused on using streaming and batch ingestion.

General

Title Description Docker Profile
Druid data types Work through several examples of table schemas with different underlying data types, as well as methods for converting between them. druid-jupyter
Arrays Ingesting, creating, and manipulating ARRAYs and the UNNEST operator. druid-jupyter
Spatial Ingest spatial dimensions and use rectangular, circular, and polygon filters to query. druid-jupyter
Nested objects Work through ingesting, querying, and transforming nested columns. druid-jupyter
NULL Examples of how to treat incoming data to generate NULL values, and work with them using scalar functions, aggregations, and arrays. druid-jupyter
UPDATE, DELETE and UPSERT Examples of how to apply changes to data in Druid, including updates, deletes and upsert logic. druid-jupyter

Streaming

Title Description Docker Profile
Introduction to streaming ingestion An introduction to streaming ingestion using Apache Kafka. all-services
Defining data schemas Manual and automatic schema detection for incoming data streams. all-services
Transforming incoming rows Examples of transforming data in real-time as it arrives. all-services
Filtering incoming rows Work through examples of using filters on incoming data streams. all-services
Rollup Applying a GROUP BY at ingestion time and emit SUM, MAX, MIN, and other aggregates, including Apache Datasketches. all-services
Streaming segment generation and care Scale up a streaming ingestion, see the impact on segments, and try out a compaction job. all-services
Multi-topic Kafka ingestion A walk through of automatic topic detection for streaming ingestion. all-services

Batch

Title Description Docker Profile
Introduction to batch ingestion Work through of SQL based batch ingestion. druid-jupyter
Primary and secondary partitioning in batch ingestion Use PARTITIONED BY and CLUSTERED BY to optimize query performance. druid-jupyter
Generating Apache Datasketches at ingestion time Generate sketch objects to support approximate distinct count operations as part of ingestion. druid-jupyter

Query

For tutorials focused on effective use of all manner of SELECT statements in Apache Druid, see the notebooks in 03-query.

Title Description Docker Profile
Learn the basics of Druid SQL An introduction to the unique aspects of Druid SQL. druid-jupyter
GROUP BY Use GROUP BY in various forms to aggregate your data. druid-jupyter
COUNT DISTINCT Work through approximate and accurate ways of counting unique occurrences of data. druid-jupyter
SQL API See examples of getting results from the Druid SQL API directly. druid-jupyter
TopN approximation Understand the approximation used for GROUP BY queries with ORDER BY and LIMIT. druid-jupyter
Analyzing data distributions Use approximation to estimate quantiles, ranks, and histograms. druid-jupyter
UNION ALL Work through using the two types of UNION ALL operation available in Druid. druid-jupyter
TABLE(APPEND) Work through using the TABLE(APPEND) operation available in Druid to combine multiple tables for queries. druid-jupyter
Lookup tables See how LOOKUP tables can be used to enrich and update data druid-jupyter
Lookup tables - Kafka Walk through how to set up a LOOKUP reading from an Apache Kafka topic. druid-jupyter
Time functions Using scalar functions against time data to transform, filter, and aggregate at ingestion and query time. druid-jupyter
String functions See how different string functions can be used at ingestion and query time. druid-jupyter
IPv4 functions A short notebook on IPv4 functions in Druid SQL. druid-jupyter
CASE Examples of using the two forms of CASE function available in Druid SQL. druid-jupyter
Window functions Examples of RANK, LAG, LEAD, and other window functions. druid-jupyter
JOIN A full review of all join strategies available in Druid with examples and performance comparisons. druid-jupyter
PIVOT and UNPIVOT Use PIVOT to convert row values into columns. Use UNPIVOT to convert column values into rows. druid-jupyter
Asychronous historical queries Use asynchronous queries to access data without prefetch to historicals. all-services
Asychronous real-time queries Use asynchronous queries to combine real-time and historical data. all-services
Exporting data (experimental) Walk through using INSERT INTO EXTERN to export query results. druid-jupyter
Retention load rules Using load rules to prescribe up how much data is cached on historicals, including when used with multiple tiers. tiered-druid-jupyter

Operations

The 05-operations folder contains notebooks related to on-going administration and operation of the Apache Druid database.

Title Description Docker Profile
Apache Druid logging Walk through configuration options for log files. jupyter
Streaming and SQL-based ingestion logs A notebook focused on task logs. jupyter
Apache Druid metrics An overview of metrics available from Apache Druid. jupyter
Compaction - partitioning A walkthrough of compaction tasks being used to change the PARTITIONED BY and CLUSTERED BY of an existing table, especially important for streaming use cases. druid-jupyter
Compaction - data and schema Examples of compaction jobs being used to remove dimensions, filter out data, and apply a new level of aggregation. druid-jupyter

Insider guides

In 01-introduction you'll find a library of insider guides created in partnership with Apache Druid community members around the world. Each one includes links to official documentation that you should read, and to notebooks that will give you knowledge of relevant functionality in Druid.

Sample data

The 06-datasets folder contains guidance for ingesting other datasets into Druid. These can be useful when wanting to try out some of Druid's features.

Contributing

The 99-contributing folder contains notebooks that explain a little more about the learning environment and its components. You'll also find templates for submitting your own content.

Title Description Docker Profile
Druid Python API Learn more about the Python wrapper used by the notebooks. None
Data generator - files Use the data generator to create batch-ingestable files. all-services
Data generator - streams Send data to Kafka directly from the data generator. all-services
Data generator profiles Learn how to use different data generator simulation profiles. all-services
Boilerplate data generator ingestions Example SQL and native ingestion specifications for grabbing data generator sample data. all-services