From 8fdb8c32b4a64f1507133d74024fa1dd585e9f42 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Fri, 28 Mar 2025 13:16:56 +0100 Subject: [PATCH 01/51] Improve docs for other types --- documentation/reference/sql/datatypes.md | 229 +++++++++++++++++------ 1 file changed, 168 insertions(+), 61 deletions(-) diff --git a/documentation/reference/sql/datatypes.md b/documentation/reference/sql/datatypes.md index 3ed6fdf1..078cb384 100644 --- a/documentation/reference/sql/datatypes.md +++ b/documentation/reference/sql/datatypes.md @@ -6,78 +6,91 @@ description: Data types reference documentation. The type system is derived from Java types. -| Type Name | Storage bits | Nullable | Description | -|-------------------|-----------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `boolean` | `1` | No | Boolean `true` or `false`. | -| `ipv4` | `32` | Yes | `0.0.0.1` to `255.255.255. 255` | -| `byte` | `8` | No | Signed integer `-128` to `127`. | -| `short` | `16` | No | Signed integer `-32768` to `32767`. | -| `char` | `16` | Yes | `unicode` character. | -| `int` | `32` | Yes | Signed integer `0x80000000` to `0x7fffffff`. | -| `float` | `32` | Yes | Single precision IEEE 754 floating point value. | -| `symbol` | `32` | Yes | Symbols are stored as 32-bit signed indexes from symbol table. Each index will have a corresponding `string` value. Translation from index to string value is done automatically when data is being written or read. Symbol table is stored separately from column. | -| `varchar` | `128 + utf8Len` | Yes | Length-prefixed sequence of UTF-8 encoded characters is stored using a 128-bit header and UTF-8 encoded data. Sequences shorter than 9 bytes are fully inlined within the header and do not occupy any additional data space. | -| `string` | `96+n*16` | Yes | Length-prefixed sequence of UTF-16 encoded characters whose length is stored as signed 32-bit integer with maximum value of `0x7fffffff`. | -| `long` | `64` | Yes | Signed integer `0x8000000000000000L` to `0x7fffffffffffffffL`. | -| `date` | `64` | Yes | Signed offset in **milliseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). While the `date` data type is available, we highly recommend applying the `timestamp` data type in its place. The only material advantage of date is a wider time range; timestamp however is adequate in virtually all cases. Date supports fewer functions and uses milliseconds instead of microseconds. | -| `timestamp` | `64` | Yes | Signed offset in **microseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | -| `double` | `64` | Yes | Double precision IEEE 754 floating point value. | -| `uuid` | `128` | Yes | [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) values. See also [the UUID type](#the-uuid-type). | -| `binary` | `64+n*8` | Yes | Length-prefixed sequence of bytes whose length is stored as signed 64-bit integer with maximum value of `0x7fffffffffffffffL`. | -| `long256` | `256` | Yes | Unsigned 256-bit integer. Does not support arbitrary arithmetic operations, but only equality checks. Suitable for storing hash code, such as crypto public addresses. | -| `geohash()` | `8`-`64` | Yes | Geohash with precision specified as a number followed by `b` for bits, `c` for chars. See [the geohashes documentation](/docs/concept/geohashes/) for details on use and storage. | -| `interval`[^1] | `128` | Yes | Pair of timestamps representating a time interval | - - -[^1]: non-persisted type, cannot be used in `CREATE` or `ALTER` statements. +| Type Name | Storage bits | Nullable | Description | +|-------------------|-----------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `boolean` | `1` | No | Boolean `true` or `false`. | +| `ipv4` | `32` | Yes | `0.0.0.1` to `255.255.255. 255` | +| `byte` | `8` | No | Signed integer `-128` to `127`. | +| `short` | `16` | No | Signed integer `-32768` to `32767`. | +| `char` | `16` | Yes | `unicode` character. | +| `int` | `32` | Yes | Signed integer `0x80000000` to `0x7fffffff`. | +| `float` | `32` | Yes | Single precision IEEE 754 floating point value. | +| `symbol` | `32` | Yes | A symbol, stored as a 32-bit signed index into the symbol table. Each index corresponds to a `string` value. The index is transparently translated to the string value. Symbol table is stored separately from the column data. | +| `varchar` | `128 + utf8Len` | Yes | Length-prefixed sequence of UTF-8 encoded characters, stored using a 128-bit header and UTF-8 encoded data. Sequences shorter than 9 bytes are fully inlined within the header and do not occupy any additional data space. | +| `string` | `96+n*16` | Yes | Length-prefixed sequence of UTF-16 encoded characters whose length is stored as signed 32-bit integer with maximum value of `0x7fffffff`. | +| `long` | `64` | Yes | Signed integer `0x8000000000000000L` to `0x7fffffffffffffffL`. | +| `date`[^1] | `64` | Yes | Signed offset in **milliseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | +| `timestamp` | `64` | Yes | Signed offset in **microseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | +| `double` | `64` | Yes | Double precision IEEE 754 floating point value. | +| `uuid` | `128` | Yes | [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) values. See also [the UUID type](#the-uuid-type). | +| `binary` | `64+n*8` | Yes | Length-prefixed sequence of bytes whose length is stored as signed 64-bit integer with maximum value of `0x7fffffffffffffffL`. | +| `long256` | `256` | Yes | Unsigned 256-bit integer. Does not support arithmetic operations, only equality checks. Suitable for storing a hash code, such as crypto public addresses. | +| `geohash()` | `8`-`64` | Yes | Geohash with precision specified as a number followed by `b` for bits, `c` for chars. See [the geohashes documentation](/docs/concept/geohashes/) for details on use and storage. | +| `interval`[^2] | `128` | Yes | Pair of timestamps representing a time interval. | + +[^1]: While the `date` type is available, we highly recommend using the +`timestamp` instead. The only material advantage of `date` is a wider time +range, but `timestamp` is adequate in virtually all cases. It has microsecond +resolution (vs. milliseconds for `date`), and is fully supported by all +date/time functions, while support for `date` is limited. + +[^2]: `interval` is not a persisted type. You can use it in expressions, but +can't have a database column of this type. ## VARCHAR and STRING considerations QuestDB supports two types for storing strings: `VARCHAR` and `STRING`. -Most users should use `VARCHAR`. `VARCHAR` is encoded in UTF-8. In contrast, -STRING is stored in UTF-16, which is less efficient than UTF-8 but is retained -for compatibility with older versions of QuestDB. +Most users should use `VARCHAR`. It uses the UTF-8 encoding, whereas `STRING` +uses UTF-16, which is less space-efficient for strings containing mostly ASCII +characters. QuestDB keeps supporting it only to maintain backward compatibility. Additionally, `VARCHAR` includes several optimizations for fast access and storage. -## Variable-sized type limitations +## Limitations for variable-sized types -Maximum size of a single `VARCHAR` field size is 268 MB. A size of a `VARCHAR` -column in a single partition is limited by 218 TB or disk size, whichever is -smaller. +The maximum size of a single `VARCHAR` field is 268 MB, and the maximum total +size of a `VARCHAR` column in a single partition is 218 TB. -`BINARY` field size is limited either by 64-Bit signed int (8388608 peta bytes) -or disk size, whichever is smaller. +The maximum size of a `BINARY` field is defined by the limits of the 64-bit +signed integer (8,388,608 petabytes). -`STRING` field size is limited by either 32-bit signed int (1073741824 -characters) or disk size, whichever is smaller. +The maximum size of a `STRING` field is defined by the limits of the 32-bit +signed integer (1,073,741,824 characters). + +The maximum number of dimensions an array can have is 32. The hard limit on the +total number of elements in an array (lengths of all dimensions multiplied +together) is `2^31 - 1` divided by the byte size of array element. For a +`DOUBLE[]`, this is `2^28 - 1` or 268,435,455. The actual limit QuestDB will +enforce is configurable via `cairo.max.array.element.count`, with the default of +10,000,000. The length of each individual dimension has a limit of `2^28 - 1` or +268,435,455, regardless of element size. ## Type nullability -Nullable types use a specific value to mark `NULL` values: - -| Type Name | Null value | Description | -| ---------------- | -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | -| `float` | `NaN` | As defined by IEEE 754 (`java.lang.Float.NaN`). | -| `double` | `NaN` | As defined by IEEE 754 (`java.lang.Double.NaN`). | -| `long256` | `0x8000000000000000800000000000000080000000000000008000000000000000` | The value equals four consecutive `long` null literals. | -| `long` | `0x8000000000000000L` | Minimum possible value a `long` can take -2^63. | -| `date` | `0x8000000000000000L` | Minimum possible value a `long` can take -2^63. | -| `timestamp` | `0x8000000000000000L` | Minimum possible value a `long` can take -2^63. | -| `int` | `0x80000000` | Minimum possible value an `int` can take, -2^31. | -| `uuid` | `80000000-0000-0000-8000-000000000000` | Both 64 highest bits and 64 lowest bits set to -2^63. | -| `char` | `0x0000` | 0. | -| `geohash(byte)` | `0xff` | Geohashes `from 1 up to included 7 bits`. | -| `geohash(short)` | `0xffff` | Geohashes `from 8 up to included 15 bits`. | -| `geohash(int)` | `0xffffffff` | Geohashes `from 16 up to included 31 bits`. | -| `geohash(long)` | `0xffffffffffffffff` | Geohashes `from 32 up to included 60 bits`. | -| `symbol` | `0x80000000` | Symbols are stored as `int` offsets in a lookup file. | -| `varchar` | `N/A` | Varchar columns have an explicit `NULL` marker in a header. | -| `string` | `0xffffffff` | Strings are length prefixed, the length is an `int` and `-1` marks it `NULL` (no further storage is used). | -| `binary` | `0xffffffffffffffff` | Binary columns are also length prefixed, the length is a `long` and `-1` marks it `NULL` (no further storage is used). | -| `ipv4` | `null` | IPv4 addresses are stored as `int`. | +Many nullable types reserve a value that marks them `NULL`: + +| Type Name | Null value | Description | +| ---------------- | -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | +| `float` | `NaN` | As defined by IEEE 754 (`java.lang.Float.NaN`). | +| `double` | `NaN` | As defined by IEEE 754 (`java.lang.Double.NaN`). | +| `long256` | `0x8000000000000000800000000000000080000000000000008000000000000000` | The value equals four consecutive `long` null literals. | +| `long` | `0x8000000000000000L` | Minimum possible value a `long` can take, -2^63. | +| `date` | `0x8000000000000000L` | Minimum possible value a `long` can take, -2^63. | +| `timestamp` | `0x8000000000000000L` | Minimum possible value a `long` can take, -2^63. | +| `int` | `0x80000000` | Minimum possible value an `int` can take, -2^31. | +| `uuid` | `80000000-0000-0000-8000-000000000000` | Both 64 highest bits and 64 lowest bits set to -2^63. | +| `char` | `0x0000` | The zero char (`NUL` in ASCII). | +| `geohash(byte)` | `0xff` | Valid for geohashes of 1 to 7 bits (inclusive). | +| `geohash(short)` | `0xffff` | Valid for geohashes of 8 to 15 bits (inclusive). | +| `geohash(int)` | `0xffffffff` | Valid for geohashes of 16 to 31 bits (inclusive). | +| `geohash(long)` | `0xffffffffffffffff` | Valid for geohashes of 32 to 60 bits (inclusive). | +| `symbol` | `0x80000000` | Symbol is stored as an `int` offset into a lookup file. The value `-1` marks it `NULL`. | +| `ipv4` | `128.0.0.0` (`0x80000000`) | IPv4 address is stored as `int` and uses the same `NULL` marker value. | +| `varchar` | `N/A` | Varchar column has an explicit `NULL` marker in the header. | +| `string` | `N/A` | String column is length-prefixed, the length is an `int` and `-1` marks it `NULL`. | +| `binary` | `N/A` | Binary column is length prefixed, the length is a `long` and `-1` marks it `NULL`. | To filter columns that contain, or don't contain, `NULL` values use a filter like: @@ -156,7 +169,101 @@ their own operators. For a full list, see ### Limitations -IPv4 column types cannot be created via InfluxDB Line Protocol as the protocol -lacks support for IPv4. As a result, the server cannot distinguish between -string and IPv4 data. However, InfluxDB Line Protocol can still insert string -data into a pre-existing column of type IPv4. +You cannot auto-create an IPv4 column using the InfluxDB Line Protocol, since it +doesn't support this type explicitly. The QuestDB server cannot distinguish +between string and IPv4 data. However, you can insert IPv4 data into a +pre-existing IPv4 column by sending IPs as strings. + +## N-Dimensional Array + +QuestDB supports the N-dimensional array type. Its design matches that of the +`NDArray` type in NumPy, which has become the de-facto standard for handling +N-dimensional data. In order to effectively use arrays in QuestDB, you should +understand the basic design principle behind it. + +The physical layout of the N-dimensional array is a single memory block with +values arranged in the _row-major_ order, where the coordinates of the adjacent +elements differ in the rightmost coordinate first (much like the adjacent +numbers differ in the rightmost digit first: 41, 42, 43, etc.) + +Separately, there are two lists of integers that describe this block of values, +and give it its N-dimensional appearance: _shape_ and _strides_. Both have +length equal to the number of dimensions. + +- the numbers in _shape_ tell the length along each dimension -- the range of +values you can use as a coordinate for that dimension +- the numbers in _strides_ tell how far apart are adjacent elements along that +dimension + +Here's a visual example of a 3-dimensional array of type `DOUBLE[2][3][2]`: + +```text +dim 1: |. . . . . .|. . . . . .| -- stride = 6, len = 2 +dim 2: |. .|. .|. .|. .|. .|. .| -- stride = 2, len = 3 +dim 3: |.|.|.|.|.|.|.|.|.|.|.|.| -- stride = 1, len = 2 +``` + +The dots are the individual values (`DOUBLE` numbers in our case). Each row +shows the whole array, but with different subdivisions according to the +dimension. So, in `dim 1`, the row is divided into two slots, the sub-arrays at +coordinates 1 and 2 along that dimension, and the distance between the start of +slot 1 and slot 2 is equal to 6 (the stride for that dimension). In `dim 3`, +each slots contains an individual number, and the stride is 1. + +The legal values for the coordinate in `dim 3` are just 1 and 2, even though you +would be able to access any array element just by using a large-enough number. +This is how the flat array gets its 3-dimensional appearance: for each value, +there's a unique list of coordinates, `[i, j, k]`, that addresses it. + +The relevance of all this to you as the user is that QuestDB can perform all of +the following operations cheaply, by editing just the two small lists, `shape` +and `strides`, and doing nothing to the potentially huge block of memory holding +the array values: + +1. _Slice_: extract a 3-dimensional array that is just a part of the full one, + by constraining the range of legal coordinates at each dimension. Example: + `array[1:2, 2:4, 1:2]` will give us a view into the array with the shape + `DOUBLE[1, 2, 1]`, covering just the ranges of coordinates indicated in the + expression. + +2. _Take a sub-array_: constrain the coordinate at a given dimension to just one + choice, and then eliminate that dimension from the array. Example: + `array[2]` has the shape `DOUBLE[3, 2]` and consists of the second subarray + in the 1st dimension. + +3. _Flatten_: remove a dimension from the array, flattening it into the + next-finer dimension. Example: flattening `dim 2` gives us an array shape + `DOUBLE[2, 6]`. All elements are still available, but using just 2 + coordinates. + +4. _Transpose_: reverse the strides, changing the meaning of each coordinate. + Example: transposing our array changes the strides from `(6, 2, 1)` to + `(1, 2, 6)`. What we used to access with the 3rd coordinate, now we access + with the 1st coordinate. On a 2D array, this would have the effect of + swapping rows and columns (transposing a matrix). + +:::note + +QuestDB does not currently support the `flatten` operation. + +::: + +### The importance of the "vanilla" array shape + +QuestDB stores the _shape_ along with the array. However, it has no need to +store _strides_: they can be calculated from the shape. Strides become relevant +once you perform one of the mentioned array shape transformations. We say that +an array whose shape hasn't been transformed (that is, it matches the physical +arrangement of elements) is a _vanilla_ array, and this has consequences for +performance. A vanilla array can be processed by optimized bulk operations that +go over the entire block of memory, disregarding the shape and strides, whereas +for any other array we have to step through all the coordinates one by one and +calculate the position of each element. + +So, while performing a shape transformation is cheap on its own, whole-array +operations on transformed arrays, such as equality checks, adding/multiplying +two arrays, etc., are expected to be slower than on vanilla arrays. + +QuestDB always stores arrays in vanilla form. Even if you transform an array's +shape and then store the result to the database, it will be stored in vanilla +form. From 1f68487f9fd228cbe45652bd941d488a7ee0bda8 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Fri, 28 Mar 2025 13:17:06 +0100 Subject: [PATCH 02/51] Add docs for ARRAY --- documentation/reference/function/finance.md | 103 ++++++++++++++++--- documentation/reference/operators/numeric.md | 5 + documentation/reference/sql/datatypes.md | 2 + 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/documentation/reference/function/finance.md b/documentation/reference/function/finance.md index f9389148..6bc42e53 100644 --- a/documentation/reference/function/finance.md +++ b/documentation/reference/function/finance.md @@ -10,12 +10,14 @@ This page describes functions specific to the financial services domain. Trade price calculation. -`l2price(target_quantity, quantity_1, price_1, quantity_2, price_2, ..., quantity_n, price_n)` +`l2price(target_size, size_array, price_array)` -Consider `quantity_1`, `price_1`, `quantity_2`, `price_2`, ..., `quantity_n`, +`l2price(target_size, size_1, price_1, size_2, price_2, ..., size_n, price_n)` + +Consider `size_1`, `price_1`, `size_2`, `price_2`, ..., `size_n`, `price_n` to be either side of an order book with `n` price levels. Then, the return value of the function is the average trade price of a market order -executed with the size of `target_quantity` against the book. +executed with the size of `target_size` against the book. Let's take the below order book as an example. @@ -45,6 +47,12 @@ $$ This average trade price is the output of the function when executed with the parameters taken from the above example: +```questdb-sql +select l2price(50, ARRAY[14.0, 16.0, 23.0, 12.0], ARRAY[14.50, 14.60, 14.80, 15.10]); +``` + +or + ```questdb-sql select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10); ``` @@ -55,25 +63,89 @@ select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10); ### Parameters -The function takes a `target quantity`, and a variable number of -`quantity`/`price` pairs. Each represents a price level of the order book. +There are two variants of the function, one accepting arrays of numbers, +and the other accepting individual numbers: + +The variant with arrays takes a `target size`, and a pair of arrays of type +`DOUBLE[]`: `size` and `price`. The arrays must match in length. Each +element of the array represents a price level of the order book. -Each parameter is expected to be a double, or convertible to double (float, -long, int, short, byte). +The variant with individual numbers takes a `target size`, and a variable +number of `size`/`price` pairs of type `DOUBLE`, or convertible to `DOUBLE` +(`FLOAT`, `LONG`, `INT`, `SHORT`, `BYTE`). -- `target_quantity`: The size of a hypothetical market order to be filled. -- `quantity*`: The number of instruments available at the corresponding price - levels. +- `target_size`: The size of a hypothetical market order to be filled. +- `size*`: The sizes of offers available at the corresponding price levels (can + be fractional). - `price*`: Price levels of the order book. ### Return value -The function returns with a `double`, representing the average trade price. +The function returns a `double`, representing the average trade price. -Returns null if the price is not calculable. For example, if the target quantity -cannot be filled, or there is incomplete data in the set (nulls). +It returns `NULL` if the price is not calculable. For example, if the target +size cannot be filled, or there is incomplete data in the set (nulls). -### Examples +### Examples - ARRAY + +Test data: + +```questdb-sql +CREATE TABLE order_book ( + ts TIMESTAMP, + bidSize DOUBLE[], bid DOUBLE[], + askSize DOUBLE[], ask DOUBLE[] +) TIMESTAMP(ts) PARTITION BY DAY; + +INSERT INTO order_book VALUES + ('2024-05-22T09:40:15.006000Z', + ARRAY[40.0, 47.0, 39.0], ARRAY[14.10, 14.00, 13.90], + ARRAY[54.0, 36.0, 23.0], ARRAY[14.50, 14.60, 14.80]), + ('2024-05-22T09:40:15.175000Z', + ARRAY[42.0, 45.0, 35.0], ARRAY[14.00, 13.90, 13.80], + ARRAY[16.0, 57.0, 30.0], ARRAY[14.30, 14.50, 14.60]), + ('2024-05-22T09:40:15.522000Z', + ARRAY[36.0, 38.0, 31.0], ARRAY[14.10, 14.00, 13.90], + ARRAY[30.0, 47.0, 34.0], ARRAY[14.40, 14.50, 14.60]); +``` + +Trading price of instrument when buying 100: + +```questdb-sql +SELECT ts, L2PRICE(100, askSize, ask) AS buy FROM order_book; +``` + +| ts | buy | +| --------------------------- | --------------- | +| 2024-05-22T09:40:15.006000Z | 14.565999999999 | +| 2024-05-22T09:40:15.175000Z | 14.495 | +| 2024-05-22T09:40:15.522000Z | 14.493 | + +Trading price of instrument when selling 100: + +```questdb-sql +SELECT ts, L2PRICE(100, bidSize, bid) AS sell FROM order_book; +``` + +| ts | sell | +| --------------------------- | ------ | +| 2024-05-22T09:40:15.006000Z | 14.027 | +| 2024-05-22T09:40:15.175000Z | 13.929 | +| 2024-05-22T09:40:15.522000Z | 14.01 | + +The spread for target quantity 100: + +```questdb-sql +SELECT ts, L2PRICE(100, askSize, ask) - L2PRICE(100, bidSize, bid) AS spread FROM order_book; +``` + +| ts | spread | +| --------------------------- | -------------- | +| 2024-05-22T09:40:15.006000Z | 0.538999999999 | +| 2024-05-22T09:40:15.175000Z | 0.565999999999 | +| 2024-05-22T09:40:15.522000Z | 0.483 | + +### Examples - scalar columns Test data: @@ -114,7 +186,7 @@ SELECT ts, L2PRICE(100, bidSize1, bid1, bidSize2, bid2, bidSize3, bid3) AS sell | 2024-05-22T09:40:15.175000Z | 13.929 | | 2024-05-22T09:40:15.522000Z | 14.01 | -The spread for target quantity 100: +The spread for target size of 100: ```questdb-sql SELECT ts, L2PRICE(100, askSize1, ask1, askSize2, ask2, askSize3, ask3) @@ -172,6 +244,7 @@ b_0 = \bar{y} - b_1 \bar{x} $$ Where: + - $\bar{y}$ is the mean of y values - $\bar{x}$ is the mean of x values - $b_1$ is the slope calculated by `regr_slope(y, x)` diff --git a/documentation/reference/operators/numeric.md b/documentation/reference/operators/numeric.md index 5112cf27..3c008464 100644 --- a/documentation/reference/operators/numeric.md +++ b/documentation/reference/operators/numeric.md @@ -4,6 +4,11 @@ sidebar_label: Numeric description: Numeric operators --- +These operations work for any numeric types. Also, addition and multiplication +work for N-dimensional arrays. The result will be an array where each element is +the result of applying the operation to the elements at the same coordinates in +the operand arrays. + ## `*` Multiply `*` is a binary operation to multiply two numbers together. diff --git a/documentation/reference/sql/datatypes.md b/documentation/reference/sql/datatypes.md index 078cb384..39ac277c 100644 --- a/documentation/reference/sql/datatypes.md +++ b/documentation/reference/sql/datatypes.md @@ -26,6 +26,7 @@ The type system is derived from Java types. | `binary` | `64+n*8` | Yes | Length-prefixed sequence of bytes whose length is stored as signed 64-bit integer with maximum value of `0x7fffffffffffffffL`. | | `long256` | `256` | Yes | Unsigned 256-bit integer. Does not support arithmetic operations, only equality checks. Suitable for storing a hash code, such as crypto public addresses. | | `geohash()` | `8`-`64` | Yes | Geohash with precision specified as a number followed by `b` for bits, `c` for chars. See [the geohashes documentation](/docs/concept/geohashes/) for details on use and storage. | +| `array` | See description | Yes | Header: 20 + 4 \* `nDims` bytes. Payload: dense array of values. Example: `DOUBLE[3][4]`: header 28 bytes, payload 3\*4\*8 = 96 bytes. | | `interval`[^2] | `128` | Yes | Pair of timestamps representing a time interval. | [^1]: While the `date` type is available, we highly recommend using the @@ -91,6 +92,7 @@ Many nullable types reserve a value that marks them `NULL`: | `varchar` | `N/A` | Varchar column has an explicit `NULL` marker in the header. | | `string` | `N/A` | String column is length-prefixed, the length is an `int` and `-1` marks it `NULL`. | | `binary` | `N/A` | Binary column is length prefixed, the length is a `long` and `-1` marks it `NULL`. | +| `array` | `N/A` | Array column marks a `NULL` value with a zero in the `size` field of the header. | To filter columns that contain, or don't contain, `NULL` values use a filter like: From 7f03a57b6d1cfdf5475e86a9b33f7b166e5e43ab Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Fri, 28 Mar 2025 13:29:27 +0100 Subject: [PATCH 03/51] Don't use footnotes --- documentation/reference/sql/datatypes.md | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/documentation/reference/sql/datatypes.md b/documentation/reference/sql/datatypes.md index 39ac277c..45bbd6c7 100644 --- a/documentation/reference/sql/datatypes.md +++ b/documentation/reference/sql/datatypes.md @@ -19,24 +19,15 @@ The type system is derived from Java types. | `varchar` | `128 + utf8Len` | Yes | Length-prefixed sequence of UTF-8 encoded characters, stored using a 128-bit header and UTF-8 encoded data. Sequences shorter than 9 bytes are fully inlined within the header and do not occupy any additional data space. | | `string` | `96+n*16` | Yes | Length-prefixed sequence of UTF-16 encoded characters whose length is stored as signed 32-bit integer with maximum value of `0x7fffffff`. | | `long` | `64` | Yes | Signed integer `0x8000000000000000L` to `0x7fffffffffffffffL`. | -| `date`[^1] | `64` | Yes | Signed offset in **milliseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | +| `date` | `64` | Yes | Signed offset in **milliseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | | `timestamp` | `64` | Yes | Signed offset in **microseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | | `double` | `64` | Yes | Double precision IEEE 754 floating point value. | | `uuid` | `128` | Yes | [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) values. See also [the UUID type](#the-uuid-type). | | `binary` | `64+n*8` | Yes | Length-prefixed sequence of bytes whose length is stored as signed 64-bit integer with maximum value of `0x7fffffffffffffffL`. | | `long256` | `256` | Yes | Unsigned 256-bit integer. Does not support arithmetic operations, only equality checks. Suitable for storing a hash code, such as crypto public addresses. | | `geohash()` | `8`-`64` | Yes | Geohash with precision specified as a number followed by `b` for bits, `c` for chars. See [the geohashes documentation](/docs/concept/geohashes/) for details on use and storage. | -| `array` | See description | Yes | Header: 20 + 4 \* `nDims` bytes. Payload: dense array of values. Example: `DOUBLE[3][4]`: header 28 bytes, payload 3\*4\*8 = 96 bytes. | -| `interval`[^2] | `128` | Yes | Pair of timestamps representing a time interval. | - -[^1]: While the `date` type is available, we highly recommend using the -`timestamp` instead. The only material advantage of `date` is a wider time -range, but `timestamp` is adequate in virtually all cases. It has microsecond -resolution (vs. milliseconds for `date`), and is fully supported by all -date/time functions, while support for `date` is limited. - -[^2]: `interval` is not a persisted type. You can use it in expressions, but -can't have a database column of this type. +| `array` | See description | Yes | Header: 20 + 4 \* `nDims` bytes. Payload: dense array of values. Example: `DOUBLE[3][4]`: header 28 bytes, payload 3\*4\*8 = 96 bytes. | +| `interval` | `128` | Yes | Pair of timestamps representing a time interval. Not a persisted type: you can use it in expressions, but can't have a database column of this type. | ## VARCHAR and STRING considerations @@ -49,6 +40,14 @@ characters. QuestDB keeps supporting it only to maintain backward compatibility. Additionally, `VARCHAR` includes several optimizations for fast access and storage. +## TIMESTAMP and DATE considerations + +While the `date` type is available, we highly recommend using the `timestamp` +instead. The only material advantage of `date` is a wider time range, but +`timestamp` is adequate in virtually all cases. It has microsecond resolution +(vs. milliseconds for `date`), and is fully supported by all date/time +functions, while support for `date` is limited. + ## Limitations for variable-sized types The maximum size of a single `VARCHAR` field is 268 MB, and the maximum total From abf268ad9f114c3ae73ff8becf9aef880857750b Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 31 Mar 2025 15:45:33 +0200 Subject: [PATCH 04/51] Add IPv4 limitation to ILP Limitations --- documentation/guides/schema-design-essentials.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 734ffe2d..280c8dd4 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -87,6 +87,7 @@ CREATE TABLE marketing_campaign_metrics ( :::tip When using table prefixes for multi-tenancy: + - Use consistent naming conventions (e.g., always `_`) - Consider using uppercase for tenant identifiers to improve readability - Document your naming convention in your team's schema design guidelines @@ -112,9 +113,11 @@ The easiest way to create a schema is through the **[Web Console](/docs/web-cons When using the **[Influx Line Protocol](/docs/reference/api/ilp/overview/) (ILP)**, QuestDB automatically creates tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations: -- QuestDB applies **default settings** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types). -- Users **cannot modify [partitioning](/docs/concept/partitions/) or [symbol capacity](/docs/concept/symbol/#usage-of-symbols) later**, so they should create tables explicitly beforehand. -- Auto-creation can be [disabled via configuration](/docs/configuration/#influxdb-line-protocol-ilp). +- QuestDB applies the **default settings** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types). +- You **cannot modify [partitioning](/docs/concept/partitions/) or [symbol capacity](/docs/concept/symbol/#usage-of-symbols) later**. +- You cannot auto-create the `IPv4` data type. Sending an IP address as a string will create a `VARCHAR` column. + +You can disable column auto-creation [via configuration](/docs/configuration/#influxdb-line-protocol-ilp). ## The designated timestamp and partitioning strategy @@ -216,7 +219,6 @@ For changes, the typical workaround is: 3. Drop the old column and rename the new one. 4. **If changes affect table-wide properties** (e.g., partitioning, timestamp column, or WAL settings), create a new table with the required properties, [insert data from the old table](/reference/sql/insert/#inserting-query-results), drop the old table, and rename the new table. - ## Examples of schema translations from other databases ```questdb-sql title="Create sample table with deduplication/upsert for PostgreSQL From 0741ba6b5a4d168f6f5919a172f1c7bb950f112f Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 31 Mar 2025 15:45:43 +0200 Subject: [PATCH 05/51] Generally improve docs --- documentation/reference/operators/ipv4.md | 8 +++---- documentation/reference/sql/datatypes.md | 29 ++++++++++------------- 2 files changed, 16 insertions(+), 21 deletions(-) diff --git a/documentation/reference/operators/ipv4.md b/documentation/reference/operators/ipv4.md index 60f29fda..905ce5df 100644 --- a/documentation/reference/operators/ipv4.md +++ b/documentation/reference/operators/ipv4.md @@ -6,11 +6,9 @@ description: IPv4 operators This document outlines the IPv4 data type operators. -They are useful when dealing with IP addresses. - The IP addresses can be in the range of `0.0.0.1` - `255.255.255.255`. -The address: `0.0.0.0` is interpreted as null. +The address: `0.0.0.0` is interpreted as `NULL`. The following operators support `string` type arguments to permit the passing of netmasks: @@ -22,7 +20,7 @@ netmasks: - [rnd_ipv4(string, int)](/docs/reference/operators/ipv4/#random-address-range-generator---rnd_ipv4string-int) - [netmask()](/docs/reference/operators/ipv4/#return-netmask---netmask) -## `<` Lesser than +## `<` Less than Takes two IPv4 arguments. @@ -36,7 +34,7 @@ Use case: testing to see if one IP address is less than another. ipv4 '33.1.8.43' < ipv4 '200.6.38.9' -> T ``` -## `<=` Lesser than or equal +## `<=` Less than or equal Takes two IPv4 arguments. diff --git a/documentation/reference/sql/datatypes.md b/documentation/reference/sql/datatypes.md index 45bbd6c7..de2f7a89 100644 --- a/documentation/reference/sql/datatypes.md +++ b/documentation/reference/sql/datatypes.md @@ -9,16 +9,16 @@ The type system is derived from Java types. | Type Name | Storage bits | Nullable | Description | |-------------------|-----------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `boolean` | `1` | No | Boolean `true` or `false`. | -| `ipv4` | `32` | Yes | `0.0.0.1` to `255.255.255. 255` | -| `byte` | `8` | No | Signed integer `-128` to `127`. | -| `short` | `16` | No | Signed integer `-32768` to `32767`. | +| `ipv4` | `32` | Yes | `0.0.0.1` to `255.255.255.255` | +| `byte` | `8` | No | Signed integer, `-128` to `127`. | +| `short` | `16` | No | Signed integer, `-32,768` to `32,767`. | | `char` | `16` | Yes | `unicode` character. | -| `int` | `32` | Yes | Signed integer `0x80000000` to `0x7fffffff`. | +| `int` | `32` | Yes | Signed integer, `-2,147,483,648` to `2,147,483,647`. | | `float` | `32` | Yes | Single precision IEEE 754 floating point value. | | `symbol` | `32` | Yes | A symbol, stored as a 32-bit signed index into the symbol table. Each index corresponds to a `string` value. The index is transparently translated to the string value. Symbol table is stored separately from the column data. | | `varchar` | `128 + utf8Len` | Yes | Length-prefixed sequence of UTF-8 encoded characters, stored using a 128-bit header and UTF-8 encoded data. Sequences shorter than 9 bytes are fully inlined within the header and do not occupy any additional data space. | | `string` | `96+n*16` | Yes | Length-prefixed sequence of UTF-16 encoded characters whose length is stored as signed 32-bit integer with maximum value of `0x7fffffff`. | -| `long` | `64` | Yes | Signed integer `0x8000000000000000L` to `0x7fffffffffffffffL`. | +| `long` | `64` | Yes | Signed integer, `-9,223,372,036,854,775,808` to `9,223,372,036,854,775,807`. | | `date` | `64` | Yes | Signed offset in **milliseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | | `timestamp` | `64` | Yes | Signed offset in **microseconds** from [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time). | | `double` | `64` | Yes | Double precision IEEE 754 floating point value. | @@ -87,7 +87,7 @@ Many nullable types reserve a value that marks them `NULL`: | `geohash(int)` | `0xffffffff` | Valid for geohashes of 16 to 31 bits (inclusive). | | `geohash(long)` | `0xffffffffffffffff` | Valid for geohashes of 32 to 60 bits (inclusive). | | `symbol` | `0x80000000` | Symbol is stored as an `int` offset into a lookup file. The value `-1` marks it `NULL`. | -| `ipv4` | `128.0.0.0` (`0x80000000`) | IPv4 address is stored as `int` and uses the same `NULL` marker value. | +| `ipv4` | `0.0.0.0` (`0x00000000`) | IPv4 address is stored as a 32-bit integer and the zero value represents `NULL`. | | `varchar` | `N/A` | Varchar column has an explicit `NULL` marker in the header. | | `string` | `N/A` | String column is length-prefixed, the length is an `int` and `-1` marks it `NULL`. | | `binary` | `N/A` | Binary column is length prefixed, the length is a `long` and `-1` marks it `NULL`. | @@ -147,25 +147,22 @@ send `UUIDs` as `strings` to be converted to UUIDs by the server. ## IPv4 -QuestDB supports the IPv4 data type. - -The data type adds validity checks and type-specific functions. - -They are - as one would imagine - very useful when dealing with IP addresses. +QuestDB supports the IPv4 data type. It has validity checks and some +IPv4-specific functions. IPv4 addresses exist within the range of `0.0.0.1` - `255.255.255.255`. -A full-zero address - `0.0.0.0` is interpreted as null. +An all-zero address - `0.0.0.0` - is interpreted as `NULL`. -Columns may be created with the IPv4 data type like so: +Create a column with the IPv4 data type like this: ```sql -- Creating a table named traffic with two ipv4 columns: src and dst. CREATE TABLE traffic (ts timestamp, src ipv4, dst ipv4) timestamp(ts) PARTITION BY DAY; ``` -IPv4 addresses also support a wide range of existing SQL functions and contain -their own operators. For a full list, see +IPv4 addresses support a wide range of existing SQL functions, and there are +some operators specifically for them. For a full list, see [IPv4 Operators](/docs/reference/operators/ipv4/). ### Limitations @@ -239,7 +236,7 @@ the array values: 4. _Transpose_: reverse the strides, changing the meaning of each coordinate. Example: transposing our array changes the strides from `(6, 2, 1)` to - `(1, 2, 6)`. What we used to access with the 3rd coordinate, now we access + `(1, 2, 6)`. What we used to access with the 3rd coordinate, we now access with the 1st coordinate. On a 2D array, this would have the effect of swapping rows and columns (transposing a matrix). From 3dcf9650e70166f6d6ed52112131cf88fff44d5d Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 31 Mar 2025 16:48:20 +0200 Subject: [PATCH 06/51] Move array docs to Concepts --- documentation/concept/array.md | 202 +++++++++++++++++++++++ documentation/reference/sql/datatypes.md | 94 ----------- 2 files changed, 202 insertions(+), 94 deletions(-) create mode 100644 documentation/concept/array.md diff --git a/documentation/concept/array.md b/documentation/concept/array.md new file mode 100644 index 00000000..0071fbe0 --- /dev/null +++ b/documentation/concept/array.md @@ -0,0 +1,202 @@ +--- +title: N-Dim array +sidebar_label: N-Dimensional array +description: Explains the technical design and syntax to use N-dimensional arrays. +--- + +QuestDB supports the N-dimensional array type. Its design matches that of the +`NDArray` type in NumPy, which has become the de-facto standard for handling +N-dimensional data. In order to effectively use arrays in QuestDB, you should +understand the basic design principle behind it. + +The physical layout of the N-dimensional array is a single memory block with +values arranged in the _row-major_ order, where the coordinates of the adjacent +elements differ in the rightmost coordinate first (much like the adjacent +numbers differ in the rightmost digit first: 41, 42, 43, etc.) + +Separately, there are two lists of integers that describe this block of values, +and give it its N-dimensional appearance: _shape_ and _strides_. Both have +length equal to the number of dimensions. + +- the numbers in _shape_ tell the length along each dimension -- the range of +values you can use as a coordinate for that dimension +- the numbers in _strides_ tell how far apart are adjacent elements along that +dimension + +Here's a visual example of a 3-dimensional array of type `DOUBLE[2][3][2]`: + +```text +dim 1: |. . . . . .|. . . . . .| -- stride = 6, len = 2 +dim 2: |. .|. .|. .|. .|. .|. .| -- stride = 2, len = 3 +dim 3: |.|.|.|.|.|.|.|.|.|.|.|.| -- stride = 1, len = 2 +``` + +The dots are the individual values (`DOUBLE` numbers in our case). Each row +shows the whole array, but with different subdivisions according to the +dimension. So, in `dim 1`, the row is divided into two slots, the sub-arrays at +coordinates 1 and 2 along that dimension, and the distance between the start of +slot 1 and slot 2 is equal to 6 (the stride for that dimension). In `dim 3`, +each slots contains an individual number, and the stride is 1. + +The legal values for the coordinate in `dim 3` are just 1 and 2, even though you +would be able to access any array element just by using a large-enough number. +This is how the flat array gets its 3-dimensional appearance: for each value, +there's a unique list of coordinates, `[i, j, k]`, that addresses it. + +The relevance of all this to you as the user is that QuestDB can perform all of +the following operations cheaply, by editing just the two small lists, `shape` +and `strides`, and doing nothing to the potentially huge block of memory holding +the array values: + +1. _Slice_: extract a 3-dimensional array that is just a part of the full one, + by constraining the range of legal coordinates at each dimension. Example: + `array[1:2, 2:4, 1:2]` will give us a view into the array with the shape + `DOUBLE[1, 2, 1]`, covering just the ranges of coordinates indicated in the + expression. + +2. _Take a sub-array_: constrain the coordinate at a given dimension to just one + choice, and then eliminate that dimension from the array. Example: + `array[2]` has the shape `DOUBLE[3, 2]` and consists of the second subarray + in the 1st dimension. + +3. _Flatten_: remove a dimension from the array, flattening it into the + next-finer dimension. Example: flattening `dim 2` gives us an array shape + `DOUBLE[2, 6]`. All elements are still available, but using just 2 + coordinates. + +4. _Transpose_: reverse the strides, changing the meaning of each coordinate. + Example: transposing our array changes the strides from `(6, 2, 1)` to + `(1, 2, 6)`. What we used to access with the 3rd coordinate, we now access + with the 1st coordinate. On a 2D array, this would have the effect of + swapping rows and columns (transposing a matrix). + +:::note + +QuestDB does not currently support the `flatten` operation. + +::: + +## Importance of the "vanilla" array shape + +QuestDB stores the _shape_ along with the array. However, it has no need to +store _strides_: they can be calculated from the shape. Strides become relevant +once you perform one of the mentioned array shape transformations. We say that +an array whose shape hasn't been transformed (that is, it matches the physical +arrangement of elements) is a _vanilla_ array, and this has consequences for +performance. A vanilla array can be processed by optimized bulk operations that +go over the entire block of memory, disregarding the shape and strides, whereas +for any other array we have to step through all the coordinates one by one and +calculate the position of each element. + +So, while performing a shape transformation is cheap on its own, whole-array +operations on transformed arrays, such as equality checks, adding/multiplying +two arrays, etc., are expected to be slower than on vanilla arrays. + +QuestDB always stores arrays in vanilla form. Even if you transform an array's +shape and then store the result to the database, it will be stored in vanilla +form. + +## Array access syntax + +We model our N-dimensional array access syntax on Python's `NDArray`, except that +we inherit 1-based indexing from SQL. This is the syntax: + +```questdb-sql +arr[, , ...] +``` + +Each `dimN-selector` can be one of two forms: + +- single integer +- range in the form `low:high` + +### Single-integer array selector + +Using single integers you select individual array elements. An element of a +2D array is a 1D sub-array, and an element of a 1D array is an individual +scalar value, like a `DOUBLE`. + +All the following examples use the 3D array named `arr`, of type +`DOUBLE[3][3][3]`. + +**Example:** select a number. + +```questdb-sql +arr[1, 3, 2] +``` + +This selects the `DOUBLE` number at the coordinates (1, 3, 2). Remember that the +coordinates are 1-based! + +**Example:** select a 2D sub-array. + +```questdb-sql +arr[1] +``` + +This selects the first 2D sub-array in `arr`. + +**Example:** select a 1D sub-array. + +```questdb-sql +arr[1, 3] +``` + +This selects the first 2D-subarray in `arr`, and then the 3rd 1D-subarray in +it. You can also write + +```questdb-sql +arr[1][3] +``` + +Semantically, this is two operations, like this: + +```questdb-sql +(arr[1]) [3] +``` + +However, the performance of all expressions is the same. + +### Range selector - slicing + +A range of integers selects a slice of the array. The dimensionality of the +result remains the same, even if the range contains just one number. + +**Example:** select a slice of `arr` by constraining the first dimension. + +```questdb-sql +arr[2:3] +``` + +This returns a `DOUBLE[1][3][3]`, containing just the second sub-array of `arr`. + +**Example:** select a slice of `arr` by constraining the first and second dimensions. + +```questdb-sql +arr[2:3, 3:4] +``` + +This returns a `DOUBLE[1][1][3]`. + +### Mixing selectors + +You can use both types of selectors within the same bracket expression. + +**Example:** select the first sub-array of `arr`, and slice it. + +```questdb-sql +arr[1, 2:4] +``` + +This returns a `DOUBLE[2][3]`. The top dimension is gone because the first +selector took out a sub-array and not a one-element slice. + +**Example:** select discontinuous elements from sub-arrays. + +```questdb-sql +arr[1:4, 3, 2] +``` + +This leaves the top dimension unconstrained, then takes the 3rd sub-array in +each of the top-level sub-arrays, and then selects just the 2nd element in each +of them. diff --git a/documentation/reference/sql/datatypes.md b/documentation/reference/sql/datatypes.md index de2f7a89..2953efbe 100644 --- a/documentation/reference/sql/datatypes.md +++ b/documentation/reference/sql/datatypes.md @@ -171,97 +171,3 @@ You cannot auto-create an IPv4 column using the InfluxDB Line Protocol, since it doesn't support this type explicitly. The QuestDB server cannot distinguish between string and IPv4 data. However, you can insert IPv4 data into a pre-existing IPv4 column by sending IPs as strings. - -## N-Dimensional Array - -QuestDB supports the N-dimensional array type. Its design matches that of the -`NDArray` type in NumPy, which has become the de-facto standard for handling -N-dimensional data. In order to effectively use arrays in QuestDB, you should -understand the basic design principle behind it. - -The physical layout of the N-dimensional array is a single memory block with -values arranged in the _row-major_ order, where the coordinates of the adjacent -elements differ in the rightmost coordinate first (much like the adjacent -numbers differ in the rightmost digit first: 41, 42, 43, etc.) - -Separately, there are two lists of integers that describe this block of values, -and give it its N-dimensional appearance: _shape_ and _strides_. Both have -length equal to the number of dimensions. - -- the numbers in _shape_ tell the length along each dimension -- the range of -values you can use as a coordinate for that dimension -- the numbers in _strides_ tell how far apart are adjacent elements along that -dimension - -Here's a visual example of a 3-dimensional array of type `DOUBLE[2][3][2]`: - -```text -dim 1: |. . . . . .|. . . . . .| -- stride = 6, len = 2 -dim 2: |. .|. .|. .|. .|. .|. .| -- stride = 2, len = 3 -dim 3: |.|.|.|.|.|.|.|.|.|.|.|.| -- stride = 1, len = 2 -``` - -The dots are the individual values (`DOUBLE` numbers in our case). Each row -shows the whole array, but with different subdivisions according to the -dimension. So, in `dim 1`, the row is divided into two slots, the sub-arrays at -coordinates 1 and 2 along that dimension, and the distance between the start of -slot 1 and slot 2 is equal to 6 (the stride for that dimension). In `dim 3`, -each slots contains an individual number, and the stride is 1. - -The legal values for the coordinate in `dim 3` are just 1 and 2, even though you -would be able to access any array element just by using a large-enough number. -This is how the flat array gets its 3-dimensional appearance: for each value, -there's a unique list of coordinates, `[i, j, k]`, that addresses it. - -The relevance of all this to you as the user is that QuestDB can perform all of -the following operations cheaply, by editing just the two small lists, `shape` -and `strides`, and doing nothing to the potentially huge block of memory holding -the array values: - -1. _Slice_: extract a 3-dimensional array that is just a part of the full one, - by constraining the range of legal coordinates at each dimension. Example: - `array[1:2, 2:4, 1:2]` will give us a view into the array with the shape - `DOUBLE[1, 2, 1]`, covering just the ranges of coordinates indicated in the - expression. - -2. _Take a sub-array_: constrain the coordinate at a given dimension to just one - choice, and then eliminate that dimension from the array. Example: - `array[2]` has the shape `DOUBLE[3, 2]` and consists of the second subarray - in the 1st dimension. - -3. _Flatten_: remove a dimension from the array, flattening it into the - next-finer dimension. Example: flattening `dim 2` gives us an array shape - `DOUBLE[2, 6]`. All elements are still available, but using just 2 - coordinates. - -4. _Transpose_: reverse the strides, changing the meaning of each coordinate. - Example: transposing our array changes the strides from `(6, 2, 1)` to - `(1, 2, 6)`. What we used to access with the 3rd coordinate, we now access - with the 1st coordinate. On a 2D array, this would have the effect of - swapping rows and columns (transposing a matrix). - -:::note - -QuestDB does not currently support the `flatten` operation. - -::: - -### The importance of the "vanilla" array shape - -QuestDB stores the _shape_ along with the array. However, it has no need to -store _strides_: they can be calculated from the shape. Strides become relevant -once you perform one of the mentioned array shape transformations. We say that -an array whose shape hasn't been transformed (that is, it matches the physical -arrangement of elements) is a _vanilla_ array, and this has consequences for -performance. A vanilla array can be processed by optimized bulk operations that -go over the entire block of memory, disregarding the shape and strides, whereas -for any other array we have to step through all the coordinates one by one and -calculate the position of each element. - -So, while performing a shape transformation is cheap on its own, whole-array -operations on transformed arrays, such as equality checks, adding/multiplying -two arrays, etc., are expected to be slower than on vanilla arrays. - -QuestDB always stores arrays in vanilla form. Even if you transform an array's -shape and then store the result to the database, it will be stored in vanilla -form. From 91f1a53ca8091d6c19c515f1d02eb71640bcd555 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 9 Apr 2025 10:47:45 +0200 Subject: [PATCH 07/51] Remove outdated note --- documentation/concept/array.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index 0071fbe0..de0b91f2 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -70,12 +70,6 @@ the array values: with the 1st coordinate. On a 2D array, this would have the effect of swapping rows and columns (transposing a matrix). -:::note - -QuestDB does not currently support the `flatten` operation. - -::: - ## Importance of the "vanilla" array shape QuestDB stores the _shape_ along with the array. However, it has no need to From 3b68314620f71218a0b34d9c5ac4b7b532e3de2a Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 13 May 2025 13:34:52 +0200 Subject: [PATCH 08/51] Document dim_length() and out-of-bounds access --- documentation/concept/array.md | 37 +++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index de0b91f2..d1466fff 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -35,8 +35,8 @@ The dots are the individual values (`DOUBLE` numbers in our case). Each row shows the whole array, but with different subdivisions according to the dimension. So, in `dim 1`, the row is divided into two slots, the sub-arrays at coordinates 1 and 2 along that dimension, and the distance between the start of -slot 1 and slot 2 is equal to 6 (the stride for that dimension). In `dim 3`, -each slots contains an individual number, and the stride is 1. +slot 1 and slot 2 is 6 (the stride for that dimension). In `dim 3`, each slot +contains an individual number, and the stride is 1. The legal values for the coordinate in `dim 3` are just 1 and 2, even though you would be able to access any array element just by using a large-enough number. @@ -86,9 +86,9 @@ So, while performing a shape transformation is cheap on its own, whole-array operations on transformed arrays, such as equality checks, adding/multiplying two arrays, etc., are expected to be slower than on vanilla arrays. -QuestDB always stores arrays in vanilla form. Even if you transform an array's -shape and then store the result to the database, it will be stored in vanilla -form. +QuestDB always stores arrays in vanilla form. When you transform an array's +shape and then store the resulting array to the database, it will be stored in +vanilla form. ## Array access syntax @@ -130,6 +130,12 @@ arr[1] This selects the first 2D sub-array in `arr`. +:::note + +If you use an index larger than the array length, the result will be `NULL`. + +::: + **Example:** select a 1D sub-array. ```questdb-sql @@ -172,6 +178,14 @@ arr[2:3, 3:4] This returns a `DOUBLE[1][1][3]`. +:::note + +The slice's upper bound can be larger than the length of the array. The +result will be the same as if the upper bound was left out — the slice +will include all the elements up to the end along that dimension. + +::: + ### Mixing selectors You can use both types of selectors within the same bracket expression. @@ -194,3 +208,16 @@ arr[1:4, 3, 2] This leaves the top dimension unconstrained, then takes the 3rd sub-array in each of the top-level sub-arrays, and then selects just the 2nd element in each of them. + +## Find the length along a dimension + +Use the function `dim_length()` to get the length of the array along a specific +dimension. + +**Example:** get the length of `arr` along the 1st dimension. + +```questdb-sql +dim_length(arr, 1) +``` + +For an array of shape `DOUBLE[3][5]`, this will return `3`. From 3f7f7f7859e529643cd27f7a1b74935a42dbf23a Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 20 May 2025 16:24:59 +0200 Subject: [PATCH 09/51] New page for array functions --- documentation/sidebars.js | 110 +++++++++++++++++++------------------- 1 file changed, 56 insertions(+), 54 deletions(-) diff --git a/documentation/sidebars.js b/documentation/sidebars.js index b5c7cd9e..0a84dfd8 100644 --- a/documentation/sidebars.js +++ b/documentation/sidebars.js @@ -253,46 +253,46 @@ module.exports = { type: "category", label: "CREATE", items: [ - "reference/sql/create-table", - { - id: "reference/sql/acl/create-group", - type: "doc", - customProps: { tag: "Enterprise" }, - }, - "reference/sql/create-mat-view", - { - id: "reference/sql/acl/create-service-account", - type: "doc", - customProps: { tag: "Enterprise" }, - }, - { - id: "reference/sql/acl/create-user", - type: "doc", - customProps: { tag: "Enterprise" }, - } + "reference/sql/create-table", + { + id: "reference/sql/acl/create-group", + type: "doc", + customProps: { tag: "Enterprise" }, + }, + "reference/sql/create-mat-view", + { + id: "reference/sql/acl/create-service-account", + type: "doc", + customProps: { tag: "Enterprise" }, + }, + { + id: "reference/sql/acl/create-user", + type: "doc", + customProps: { tag: "Enterprise" }, + } ] }, { type: "category", label: "DROP", items: [ - "reference/sql/drop", - { - id: "reference/sql/acl/drop-group", - type: "doc", - customProps: { tag: "Enterprise" }, - }, - "reference/sql/drop-mat-view", - { - id: "reference/sql/acl/drop-service-account", - type: "doc", - customProps: { tag: "Enterprise" }, - }, - { - id: "reference/sql/acl/drop-user", - type: "doc", - customProps: { tag: "Enterprise" }, - }, + "reference/sql/drop", + { + id: "reference/sql/acl/drop-group", + type: "doc", + customProps: { tag: "Enterprise" }, + }, + "reference/sql/drop-mat-view", + { + id: "reference/sql/acl/drop-service-account", + type: "doc", + customProps: { tag: "Enterprise" }, + }, + { + id: "reference/sql/acl/drop-user", + type: "doc", + customProps: { tag: "Enterprise" }, + }, ] }, "reference/sql/explain", @@ -305,16 +305,16 @@ module.exports = { type: "category", label: "GRANT", items: [ - { - id: "reference/sql/acl/grant", - type: "doc", - customProps: { tag: "Enterprise" }, - }, - { - id: "reference/sql/acl/grant-assume-service-account", - type: "doc", - customProps: { tag: "Enterprise" }, - }, + { + id: "reference/sql/acl/grant", + type: "doc", + customProps: { tag: "Enterprise" }, + }, + { + id: "reference/sql/acl/grant-assume-service-account", + type: "doc", + customProps: { tag: "Enterprise" }, + }, ] }, "reference/sql/insert", @@ -330,16 +330,16 @@ module.exports = { type: "category", label: "REVOKE", items: [ - { - id: "reference/sql/acl/revoke", - type: "doc", - customProps: { tag: "Enterprise" }, - }, - { - id: "reference/sql/acl/revoke-assume-service-account", - type: "doc", - customProps: { tag: "Enterprise" }, - }, + { + id: "reference/sql/acl/revoke", + type: "doc", + customProps: { tag: "Enterprise" }, + }, + { + id: "reference/sql/acl/revoke-assume-service-account", + type: "doc", + customProps: { tag: "Enterprise" }, + }, ] }, { @@ -377,6 +377,7 @@ module.exports = { label: "Functions", items: [ "reference/function/aggregation", + "reference/function/array", "reference/function/binary", "reference/function/boolean", "reference/function/conditional", @@ -533,6 +534,7 @@ module.exports = { "concept/deduplication", "concept/designated-timestamp", "concept/geohashes", + "concept/array", "concept/indexes", "concept/interval-scan", "concept/jit-compiler", From 77ad0e937b1034cddebce166266604a8937099e2 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 20 May 2025 16:25:05 +0200 Subject: [PATCH 10/51] Update sidebars --- documentation/reference/function/array.md | 60 +++++++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 documentation/reference/function/array.md diff --git a/documentation/reference/function/array.md b/documentation/reference/function/array.md new file mode 100644 index 00000000..8df9d986 --- /dev/null +++ b/documentation/reference/function/array.md @@ -0,0 +1,60 @@ +--- +title: Array functions +sidebar_label: Array +description: Array functions reference documentation. +--- + +This page documents functions for n-dimensional arrays. This isn't an exhaustive +list of all functions that may take an array parameter. For example, financial +functions are listed in [their own section](/docs/reference/finance), whether or +not they can take an array parameter. + +## dim_length + +`dim_length(array, dim)` returns the length of the n-dimensional array along +dimension `dim`. + +### Parameters + +- `array` — the array +- `dim` — the dimension (1-based) whose length to get + +## flatten + +`flatten(array, dim)` removes the dimension `dim` from the array, flattening it +into the next dimension. All elements are still available because the next +dimension's length gets multiplied by the removed dimension's length. + +Example: given an array of shape `DOUBLE[2][3][2]`, `flatten(array, 2)` gives us +an array of shape `DOUBLE[2, 6]`. The second dimension is gone, and the third +dimension's length increased from 2 to 3 \* 2 = 6. + +### Parameters + +- `array` — the array +- `dim` — the dimension (1-based) to flatten. Cannot be the last dimension. + +## matmul + +`matmul(left_matrix, right_matrix)` performs matrix multiplication. This is an +operation from linear algebra. + +A matrix is represented as a 2D array. We call the first matrix coordinate "row" +and the second one "column". + +`left_matrix`'s number of columns (its dimension 2) must be equal to +`right_matrix`'s number of rows (its dimension 1). + +The resulting matrix has the same number of rows as `left_matrix` and the same +number of columns as `right_matrix`. The value at every (row, column) position +in the result is equal to the sum of products of matching elements in the +corresponding row of `left_matrix` and column of `right_matrix`: + +`result[row, col] := sum_over_i(left_matrix[row, i] * right_matrix[i, col])` + +## transpose + +`transpose(array)` transposes an array, reversing the order of its coordinates. +Example: given `array: DOUBLE[2, 5]`, `result = transpose(array)` returns an +array of shape `DOUBLE[5, 2]`, such that `array[i, j] = result[j, i]` for any +valid `i` and `j` coordinates. From 1e4ab1e3745b077aaa8fbd403c5a08c175a8cee7 Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Wed, 21 May 2025 11:58:13 +0200 Subject: [PATCH 11/51] broken link fixed --- documentation/reference/function/array.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/reference/function/array.md b/documentation/reference/function/array.md index 8df9d986..30cc9bcb 100644 --- a/documentation/reference/function/array.md +++ b/documentation/reference/function/array.md @@ -6,7 +6,7 @@ description: Array functions reference documentation. This page documents functions for n-dimensional arrays. This isn't an exhaustive list of all functions that may take an array parameter. For example, financial -functions are listed in [their own section](/docs/reference/finance), whether or +functions are listed in [their own section](/docs/reference/function/finance/), whether or not they can take an array parameter. ## dim_length From cf1d39df825d010470cd991d10b2670dd6318c9a Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 21 May 2025 11:59:35 +0200 Subject: [PATCH 12/51] Proper SQL examples with results in Array Concept --- documentation/concept/array.md | 170 +++++++++++++++++++++++---------- 1 file changed, 117 insertions(+), 53 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index d1466fff..0ca5f57a 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -1,6 +1,6 @@ --- -title: N-Dim array -sidebar_label: N-Dimensional array +title: N-Dimensional array +sidebar_label: N-Dim array description: Explains the technical design and syntax to use N-dimensional arrays. --- @@ -106,118 +106,182 @@ Each `dimN-selector` can be one of two forms: ### Single-integer array selector -Using single integers you select individual array elements. An element of a -2D array is a 1D sub-array, and an element of a 1D array is an individual -scalar value, like a `DOUBLE`. +Using single integers you select individual array elements. An element of a 2D +array is a 1D sub-array, and an element of a 1D array is an individual scalar +value, like a `DOUBLE`. If you use a coordinate larger than the array's given +dimension length, the result will be `NULL` for scalars, and an empty array for +sub-arrays. All the following examples use the 3D array named `arr`, of type -`DOUBLE[3][3][3]`. +`DOUBLE[3][3][3]`: -**Example:** select a number. +```questdb-sql +CREATE TABLE tango AS (SELECT ARRAY[ + [ [ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ], + [ [10, 11, 12], [13, 14, 15], [16, 17, 18] ], + [ [19, 20, 21], [22, 23, 24], [25, 26, 27] ] +] arr from long_sequence(1)); +``` + +**Example:** select a number from the array. ```questdb-sql -arr[1, 3, 2] +SELECT arr[1, 3, 2] elem FROM tango; ``` -This selects the `DOUBLE` number at the coordinates (1, 3, 2). Remember that the +| elem | +| ---- | +| 8.0 | + +This selected the `DOUBLE` number at the coordinates (1, 3, 2). Remember that the coordinates are 1-based! -**Example:** select a 2D sub-array. +**Example:** select an out-of-range element from the array. ```questdb-sql -arr[1] +SELECT arr[1, 3, 4] elem FROM tango; ``` -This selects the first 2D sub-array in `arr`. +| elem | +| ---- | +| NULL | -:::note +**Example:** select a 2D sub-array. -If you use an index larger than the array length, the result will be `NULL`. +```questdb-sql +SELECT arr[1] subarr FROM tango; +``` -::: +| subarr | +| ------------------------------------------- | +| {{1.0,2.0,3.0},{4.0,5.0,6.0},{7.0,8.0,9.0}} | -**Example:** select a 1D sub-array. +This selected the first 2D sub-array in `arr`. + +**Example:** select a sub-array that is out-of-range. ```questdb-sql -arr[1, 3] +SELECT arr[4] subarr FROM tango; ``` -This selects the first 2D-subarray in `arr`, and then the 3rd 1D-subarray in -it. You can also write +| subarr | +| ------ | +| {} | + +**Example:** select a 1D sub-array. ```questdb-sql -arr[1][3] +SELECT arr[1, 3] subarr FROM tango; ``` -Semantically, this is two operations, like this: +| subarr | +| ------------- | +| {7.0,8.0,9.0} | -```questdb-sql -(arr[1]) [3] -``` +This selected the first 2D-subarray in `arr`, and then the 3rd 1D-subarray in +it. -However, the performance of all expressions is the same. +You can also write `arr[1][3]`. Semantically, this is two operations, like this: +`(arr[1]) [3]`. However, the performance of all three expressions is the same. ### Range selector - slicing -A range of integers selects a slice of the array. The dimensionality of the -result remains the same, even if the range contains just one number. +A range of integers selects a slice of the array. You can think of slicing as +leaving the array intact, but constraining the range of numbers you can use for +a coordinate. The lowest valid coordinate remains `1`, but it gets remapped to +the coordinate indicated by the lower bound of the slicing range. + +The dimensionality of the result remains the same, even if the range contains +just one number. The slice includes the lower bound, but excludes the upper +bound. If the upper bound of the range exceeds the array's length, the result +is the same as if the upper bound was left out — the result extends to the +end of the array along that dimension. **Example:** select a slice of `arr` by constraining the first dimension. ```questdb-sql -arr[2:3] +SELECT arr[2:3] slice FROM tango; ``` -This returns a `DOUBLE[1][3][3]`, containing just the second sub-array of `arr`. +| slice | +| ------------------------------------------------------ | +| {{{10.0,11.0,12.0},{13.0,14.0,15.0},{16.0,17.0,18.0}}} | -**Example:** select a slice of `arr` by constraining the first and second dimensions. +This returned a `DOUBLE[1][3][3]`, containing just the second sub-array of +`arr`. + +You can omit the upper bound, letting the slice extend to the end of the array. + +**Example:** select a slice of `arr` with a right-open range. ```questdb-sql -arr[2:3, 3:4] +SELECT arr[2:] slice FROM tango; ``` -This returns a `DOUBLE[1][1][3]`. +| slice | +| --------------------------------------------------------- | +| {{{10.0,11.0,12.0},{13.0,14.0,15.0},{16.0,17.0,18.0}}, | +| {{19.0,20.0,21.0},{22.0,23.0,24.0},{25.0,26.0,27.0}}} | + +This returns a `DOUBLE[2][3][3]` and contains everything except the first +sub-array along the first dimension. :::note -The slice's upper bound can be larger than the length of the array. The -result will be the same as if the upper bound was left out — the slice -will include all the elements up to the end along that dimension. +You cannot omit the lower bound. The expression `:3` would conflict with the +syntax for a variable placeholder in SQL. ::: -### Mixing selectors +**Example:** select a slice of `arr` by constraining the first and second dimensions. -You can use both types of selectors within the same bracket expression. +```questdb-sql +SELECT arr[2:3, 3:4] slice FROM tango; +``` -**Example:** select the first sub-array of `arr`, and slice it. +| slice | +| ---------------------- | +| {{{16.0,17.0,18.0}}} | + +**Example:** select a slice of `arr` with large upper bounds. ```questdb-sql -arr[1, 2:4] +SELECT arr[2:100, 3:100] slice FROM tango; ``` -This returns a `DOUBLE[2][3]`. The top dimension is gone because the first -selector took out a sub-array and not a one-element slice. +| slice | +| ----------------------------------------- | +| {{{16.0,17.0,18.0}},{{25.0,26.0,27.0}}} | -**Example:** select discontinuous elements from sub-arrays. +The result is the same same as if using `arr[2:, 3:]`. + +### Mixing selectors + +You can use both types of selectors within the same bracket expression. + +**Example:** select the first sub-array of `arr`, and slice it. ```questdb-sql -arr[1:4, 3, 2] +SELECT arr[1, 2:4] subarr FROM tango; ``` -This leaves the top dimension unconstrained, then takes the 3rd sub-array in -each of the top-level sub-arrays, and then selects just the 2nd element in each -of them. - -## Find the length along a dimension +| subarr | +| ------------------------------- | +| {{4.0,5.0,6.0},{7.0,8.0,9.0}} | -Use the function `dim_length()` to get the length of the array along a specific -dimension. +This returned a `DOUBLE[2][3]`. The top dimension is gone because the first +selector took out a sub-array and not a one-element slice. -**Example:** get the length of `arr` along the 1st dimension. +**Example:** select discontinuous elements from sub-arrays. ```questdb-sql -dim_length(arr, 1) +SELECT arr[1:, 3, 2] subarr FROM tango; ``` -For an array of shape `DOUBLE[3][5]`, this will return `3`. +| subarr | +| ----------------- | +| {8.0,17.0,26.0} | + +This left the top dimension unconstrained, then took the 3rd sub-array in +each of the top-level sub-arrays, and then selected just the 2nd element in each +of them. From 3d5600c30f7f8f50a97fde6266271c9ee9508cb2 Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Wed, 21 May 2025 13:41:51 +0200 Subject: [PATCH 13/51] execute many --- documentation/pgwire/python.md | 78 ++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index 719574f5..47e3690e 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -78,9 +78,18 @@ async def connect_to_questdb(): await conn.close() +// Set the timezone to UTC +os.environ['TZ'] = 'UTC' +time.tzset() asyncio.run(connect_to_questdb()) ``` +:::note +**Note**: The `asyncpg` client uses the system timezone by default. QuestDB always sends timestamp in UTC. +To set the timezone to UTC, you can set the `TZ` environment variable before running your script. +This is important for time-series data to ensure consistent timestamps. +::: + ### Querying Data asyncpg provides several methods for fetching data: @@ -131,6 +140,8 @@ async def query_with_asyncpg(): await conn.close() +os.environ['TZ'] = 'UTC' +time.tzset() asyncio.run(query_with_asyncpg()) ``` @@ -174,6 +185,8 @@ async def stream_with_cursor(): await conn.close() print(f"Finished processing {total_processed} total rows") +os.environ['TZ'] = 'UTC' +time.tzset() asyncio.run(stream_with_cursor()) ``` @@ -202,6 +215,8 @@ async def connection_pool_example(): await pool.close() +os.environ['TZ'] = 'UTC' +time.tzset() asyncio.run(connection_pool_example()) ``` @@ -243,9 +258,72 @@ async def parameterized_query(): await conn.close() +os.environ['TZ'] = 'UTC' +time.tzset() asyncio.run(parameterized_query()) ``` +### Batch Inserts with `executemany()` +While we recommend using the [InfluxDB Line Protocol (ILP)](/docs/ingestion-overview/) for ingestion, you can also use +the `executemany()` method to insert multiple rows in a single query. It is highly efficient for executing the same +parameterized statements multiple times with different sets of data. This method is significantly faster than executing +individual statements in a loop because it reduces network round-trips and allows for potential batching optimizations +by the database. In our testing, we found that `executemany()` is about 10x-100x faster than using a loop with `execute()`. + +It's particularly useful for bulk data insertion, such as recording multiple trades as they occur. INSERT performance +grows with the batch size, so you should experiment with the batch size to find the optimal value for your use case. We +recommend starting with a batch size of 1000 and adjusting it based on your performance requirements. + +```python +import asyncio +import os +import time + +import asyncpg +from datetime import datetime, timedelta, timezone + +async def execute_many_market_data_example(): + + conn = await asyncpg.connect( + host='127.0.0.1', + port=8812, + user='admin', + password='quest', + database='qdb' + ) + + await conn.execute(""" + CREATE TABLE IF NOT EXISTS trades ( + ts TIMESTAMP, + symbol SYMBOL, + price DOUBLE, + volume LONG, + exchange SYMBOL + ) timestamp(ts) PARTITION BY DAY; + """) + + base_timestamp = datetime.now() + + trades_data = [ + ( (base_timestamp + timedelta(microseconds=10)), 'BTC-USD', 68500.50, 0.5, 'Coinbase'), + ( (base_timestamp + timedelta(microseconds=20)), 'ETH-USD', 3800.20, 2.1, 'Kraken'), + ( (base_timestamp + timedelta(microseconds=30)), 'BTC-USD', 68501.75, 0.25, 'Binance'), + ( (base_timestamp + timedelta(microseconds=40)), 'SOL-USD', 170.80, 10.5, 'Coinbase'), + ( (base_timestamp + timedelta(microseconds=50)), 'ETH-USD', 3799.90, 1.5, 'Binance'), + ] + + await conn.executemany(""" + INSERT INTO trades (ts, symbol, price, volume, exchange) + VALUES ($1, $2, $3, $4, $5) + """, trades_data) + print(f"Successfully inserted {len(trades_data)} trade records using executemany.") + +os.environ['TZ'] = 'UTC' +time.tzset() +asyncio.run(execute_many_market_data_example()) +``` + + ### Binary Protocol asyncpg uses the binary protocol by default, which improves performance by avoiding text encoding/decoding for data From 589c2ac25efb2d7135311e883808bbedf32aa68a Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Wed, 21 May 2025 14:10:55 +0200 Subject: [PATCH 14/51] inserting arrays with asyncpg --- documentation/pgwire/python.md | 103 +++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index 47e3690e..5c4b55df 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -323,6 +323,109 @@ time.tzset() asyncio.run(execute_many_market_data_example()) ``` +### Inserting Arrays + +QuestDB, via the PostgreSQL wire protocol, supports [array data types](/docs/concept/array/), including multidimensional +arrays. asyncpg makes working with these arrays straightforward by automatically converting Python lists (or lists of lists +for multidimensional arrays) into the appropriate PostgreSQL array format and vice-versa when fetching data. + +:::caution asyncpg & QuestDB Arrays: Manual Setup Needed +Using array types with asyncpg against QuestDB requires manual type registration. + +asyncpg's standard type introspection query is not yet supported by QuestDB. Therefore, you must manually register a "codec" +for your array types with the asyncpg connection, as shown in the preceding code example. This ensures correct array +handling and avoids errors. + +This is a temporary workaround until a permanent solution is available in asyncpg or QuestDB. +::: + +When you need to insert multiple rows containing array data, such as a series of order book snapshots, `executemany()` +offers a more performant way to do so compared to inserting row by row with execute(). + +```python title="Batch Inserting L3 Order Book Snapshots" +import asyncio +import os +import time + +import asyncpg +from datetime import datetime, timedelta + + +async def batch_insert_l3_order_book_arrays(): + conn = await asyncpg.connect( + host='127.0.0.1', + port=8812, + user='admin', + password='quest', + database='qdb' + ) + + # Workaround for asyncpg using introspection to determine array types. + # The introspection query uses a construct that is not supported by QuestDB. + # This is a temporary workaround before this PR or its equivalent is merged to asyncpg: + # https://github.com/MagicStack/asyncpg/pull/1260 + arrays = [{ + 'oid': 1022, + 'elemtype': 701, + 'kind': 'b', + 'name': '_float8', + 'elemtype_name': 'float8', + 'ns': 'pg_catalog', + 'elemdelim': ',', + 'depth': 0, + 'range_subtype': None, + 'attrtypoids': None, + 'basetype': None + }] + conn._protocol.get_settings().register_data_types(arrays) + + await conn.execute(""" + CREATE TABLE IF NOT EXISTS l3_order_book + ( + bid DOUBLE [][], + ask DOUBLE [][], + ts TIMESTAMP + ) TIMESTAMP(ts) PARTITION BY DAY WAL; + """) + + # Prepare a list of L3 order book snapshots for batch insertion + snapshots_to_insert = [] + base_timestamp = datetime.now() + + # First row + bids1 = [[68500.50, 0.5], [68500.00, 1.2], [68499.50, 0.3]] + asks1 = [[68501.00, 0.8], [68501.50, 0.4], [68502.00, 1.1]] + ts1 = (base_timestamp + timedelta(seconds=1)) + snapshots_to_insert.append((bids1, asks1, ts1)) + + # Second row + bids2 = [[68502.10, 0.3], [68501.80, 0.9], [68501.20, 1.5]] + asks2 = [[68502.50, 1.1], [68503.00, 0.6], [68503.50, 0.2]] + ts2 = (base_timestamp + timedelta(seconds=2)) + snapshots_to_insert.append((bids2, asks2, ts2)) + + # Third row + bids3 = [[68490.60, 2.5], [68489.00, 3.2]] + asks3 = [[68491.20, 1.8], [68492.80, 0.7]] + ts3 = (base_timestamp + timedelta(seconds=3)) + snapshots_to_insert.append((bids3, asks3, ts3)) + + print(f"Prepared {len(snapshots_to_insert)} snapshots for batch insertion.") + + # Insert the snapshots into the database in a single batch + await conn.executemany( + """ + INSERT INTO l3_order_book (bid, ask, ts) + VALUES ($1, $2, $3) + """, + snapshots_to_insert # List of tuples, each tuple is a row + ) + print(f"Successfully inserted {len(snapshots_to_insert)} L3 order book snapshots using executemany().") + +os.environ['TZ'] = 'UTC' +time.tzset() +asyncio.run(batch_insert_l3_order_book_arrays()) +``` ### Binary Protocol From edeef250f2d714547502d692f3cdf30319709c55 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 21 May 2025 14:51:06 +0200 Subject: [PATCH 15/51] Fix parsing errors, improve --- documentation/concept/array.md | 81 +++++++++++++++++++++------------- 1 file changed, 51 insertions(+), 30 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index 0ca5f57a..66859a03 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -123,7 +123,9 @@ CREATE TABLE tango AS (SELECT ARRAY[ ] arr from long_sequence(1)); ``` -**Example:** select a number from the array. +### Example + +select a number from the array. ```questdb-sql SELECT arr[1, 3, 2] elem FROM tango; @@ -136,7 +138,9 @@ SELECT arr[1, 3, 2] elem FROM tango; This selected the `DOUBLE` number at the coordinates (1, 3, 2). Remember that the coordinates are 1-based! -**Example:** select an out-of-range element from the array. +### Example + +select an out-of-range element from the array. ```questdb-sql SELECT arr[1, 3, 4] elem FROM tango; @@ -146,7 +150,9 @@ SELECT arr[1, 3, 4] elem FROM tango; | ---- | | NULL | -**Example:** select a 2D sub-array. +### Example + +select a 2D sub-array. ```questdb-sql SELECT arr[1] subarr FROM tango; @@ -154,11 +160,13 @@ SELECT arr[1] subarr FROM tango; | subarr | | ------------------------------------------- | -| {{1.0,2.0,3.0},{4.0,5.0,6.0},{7.0,8.0,9.0}} | +| [[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]] | This selected the first 2D sub-array in `arr`. -**Example:** select a sub-array that is out-of-range. +### Example + +select a sub-array that is out-of-range. ```questdb-sql SELECT arr[4] subarr FROM tango; @@ -166,9 +174,11 @@ SELECT arr[4] subarr FROM tango; | subarr | | ------ | -| {} | +| [] | -**Example:** select a 1D sub-array. +### Example + +select a 1D sub-array. ```questdb-sql SELECT arr[1, 3] subarr FROM tango; @@ -176,7 +186,7 @@ SELECT arr[1, 3] subarr FROM tango; | subarr | | ------------- | -| {7.0,8.0,9.0} | +| [7.0,8.0,9.0] | This selected the first 2D-subarray in `arr`, and then the 3rd 1D-subarray in it. @@ -193,11 +203,20 @@ the coordinate indicated by the lower bound of the slicing range. The dimensionality of the result remains the same, even if the range contains just one number. The slice includes the lower bound, but excludes the upper -bound. If the upper bound of the range exceeds the array's length, the result +bound. + +You can omit the upper bound, like this: `arr[2:]`. The slice will then extend +to the end of the array in the corresponding dimension. The lower bound is +mandatory, due to syntax conflict with variable placeholders such as `:a` or +`:2`. + +If the upper bound of the range exceeds the array's length, the result is the same as if the upper bound was left out — the result extends to the end of the array along that dimension. -**Example:** select a slice of `arr` by constraining the first dimension. +### Example + +Select a slice of `arr` by constraining the first dimension. ```questdb-sql SELECT arr[2:3] slice FROM tango; @@ -205,14 +224,14 @@ SELECT arr[2:3] slice FROM tango; | slice | | ------------------------------------------------------ | -| {{{10.0,11.0,12.0},{13.0,14.0,15.0},{16.0,17.0,18.0}}} | +| [[[10.0,11.0,12.0],[13.0,14.0,15.0],[16.0,17.0,18.0]]] | This returned a `DOUBLE[1][3][3]`, containing just the second sub-array of `arr`. -You can omit the upper bound, letting the slice extend to the end of the array. +### Example -**Example:** select a slice of `arr` with a right-open range. +Select a slice of `arr` with a right-open range. ```questdb-sql SELECT arr[2:] slice FROM tango; @@ -220,20 +239,14 @@ SELECT arr[2:] slice FROM tango; | slice | | --------------------------------------------------------- | -| {{{10.0,11.0,12.0},{13.0,14.0,15.0},{16.0,17.0,18.0}}, | -| {{19.0,20.0,21.0},{22.0,23.0,24.0},{25.0,26.0,27.0}}} | +| [[[10.0,11.0,12.0],[13.0,14.0,15.0],[16.0,17.0,18.0]],
[[19.0,20.0,21.0],[22.0,23.0,24.0],[25.0,26.0,27.0]]] | -This returns a `DOUBLE[2][3][3]` and contains everything except the first +This returned a `DOUBLE[2][3][3]` and contains everything except the first sub-array along the first dimension. -:::note - -You cannot omit the lower bound. The expression `:3` would conflict with the -syntax for a variable placeholder in SQL. +### Example -::: - -**Example:** select a slice of `arr` by constraining the first and second dimensions. +Select a slice of `arr` by constraining the first and second dimensions. ```questdb-sql SELECT arr[2:3, 3:4] slice FROM tango; @@ -241,9 +254,13 @@ SELECT arr[2:3, 3:4] slice FROM tango; | slice | | ---------------------- | -| {{{16.0,17.0,18.0}}} | +| [[[16.0,17.0,18.0]]] | + +Note that the returned array is still 3D. -**Example:** select a slice of `arr` with large upper bounds. +### Example + +Select a slice of `arr` with large upper bounds. ```questdb-sql SELECT arr[2:100, 3:100] slice FROM tango; @@ -251,7 +268,7 @@ SELECT arr[2:100, 3:100] slice FROM tango; | slice | | ----------------------------------------- | -| {{{16.0,17.0,18.0}},{{25.0,26.0,27.0}}} | +| [[[16.0,17.0,18.0]],[[25.0,26.0,27.0]]] | The result is the same same as if using `arr[2:, 3:]`. @@ -259,7 +276,9 @@ The result is the same same as if using `arr[2:, 3:]`. You can use both types of selectors within the same bracket expression. -**Example:** select the first sub-array of `arr`, and slice it. +### Example + +Select the first sub-array of `arr`, and slice it. ```questdb-sql SELECT arr[1, 2:4] subarr FROM tango; @@ -267,12 +286,14 @@ SELECT arr[1, 2:4] subarr FROM tango; | subarr | | ------------------------------- | -| {{4.0,5.0,6.0},{7.0,8.0,9.0}} | +| [[4.0,5.0,6.0],[7.0,8.0,9.0]] | This returned a `DOUBLE[2][3]`. The top dimension is gone because the first selector took out a sub-array and not a one-element slice. -**Example:** select discontinuous elements from sub-arrays. +### Example + +Select discontinuous elements from sub-arrays. ```questdb-sql SELECT arr[1:, 3, 2] subarr FROM tango; @@ -280,7 +301,7 @@ SELECT arr[1:, 3, 2] subarr FROM tango; | subarr | | ----------------- | -| {8.0,17.0,26.0} | +| [8.0,17.0,26.0] | This left the top dimension unconstrained, then took the 3rd sub-array in each of the top-level sub-arrays, and then selected just the 2nd element in each From bea799d1c33d07c2722d18129502de0e49c941a5 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 21 May 2025 15:02:56 +0200 Subject: [PATCH 16/51] Improve examples in arry functions --- documentation/reference/function/array.md | 65 ++++++++++++++++++++--- 1 file changed, 58 insertions(+), 7 deletions(-) diff --git a/documentation/reference/function/array.md b/documentation/reference/function/array.md index 30cc9bcb..8ce0adba 100644 --- a/documentation/reference/function/array.md +++ b/documentation/reference/function/array.md @@ -19,21 +19,39 @@ dimension `dim`. - `array` — the array - `dim` — the dimension (1-based) whose length to get +### Example + +Get the length of `arr` along the 1st dimension. + +```questdb-sql +SELECT dim_length(ARRAY[42, 42], 1); +``` + +| dim_length | +| ------------ | +| 2 | + ## flatten `flatten(array, dim)` removes the dimension `dim` from the array, flattening it into the next dimension. All elements are still available because the next dimension's length gets multiplied by the removed dimension's length. -Example: given an array of shape `DOUBLE[2][3][2]`, `flatten(array, 2)` gives us -an array of shape `DOUBLE[2, 6]`. The second dimension is gone, and the third -dimension's length increased from 2 to 3 \* 2 = 6. - ### Parameters - `array` — the array - `dim` — the dimension (1-based) to flatten. Cannot be the last dimension. +### Example + +```questdb-sql +SELECT flatten(ARRAY[[1, 2], [3, 4]], 1); +``` + +| flatten | +| ------------------- | +| [1.0,2.0,3.0,4.0] | + ## matmul `matmul(left_matrix, right_matrix)` performs matrix multiplication. This is an @@ -52,9 +70,42 @@ corresponding row of `left_matrix` and column of `right_matrix`: `result[row, col] := sum_over_i(left_matrix[row, i] * right_matrix[i, col])` +### Example + +Multiply the matrices: + +```text +| 1 2 | | 2 3 | +| | x | | +| 3 4 | | 2 3 | +``` + +```questdb-sql +SELECT matmul(ARRAY[[1, 2], [3, 4]], ARRAY[[2, 3], [2, 3]]); +``` + +| matmul | +| ------------------------- | +| [[6.0,9.0],[14.0,21.0]] | + ## transpose `transpose(array)` transposes an array, reversing the order of its coordinates. -Example: given `array: DOUBLE[2, 5]`, `result = transpose(array)` returns an -array of shape `DOUBLE[5, 2]`, such that `array[i, j] = result[j, i]` for any -valid `i` and `j` coordinates. + +### Example + +Transpose the matrix: + +```text +| 1 2 | T | 1 3 | +| | = | | +| 3 4 | | 2 4 | +``` + +```questdb-sql +SELECT transpose(ARRAY[[1, 2], [3, 4]]); +``` + +| transpose | +| ----------------------- | +| [[1.0,3.0],[2.0,4.0]] | From 13acbbedd0248e2ed9bec81092046d9797bc8b8d Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 21 May 2025 15:22:14 +0200 Subject: [PATCH 17/51] Document protocol version config in ILP --- documentation/reference/api/ilp/overview.md | 27 +++++++++++++++------ 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/documentation/reference/api/ilp/overview.md b/documentation/reference/api/ilp/overview.md index 97054d12..534b9ab8 100644 --- a/documentation/reference/api/ilp/overview.md +++ b/documentation/reference/api/ilp/overview.md @@ -96,10 +96,8 @@ strings combine a set of key/value pairs. The standard configuration string pattern is: -``` - +```text schema::key1=value1;key2=value2;key3=value3; - ``` It is made up of the following parts: @@ -110,6 +108,12 @@ It is made up of the following parts: - **Key=Value**: Each key-value pair sets a specific parameter for the client - **Terminating semicolon**: A semicolon must follow the last key-value pair +Basic example: + +```text +http::addr=localhost:9000; +``` + ### Client parameters Below is a list of common parameters that ILP clients will accept. @@ -133,6 +137,15 @@ Exposing these values may expose your database to bad actors. - **schema**: Specifies the transport method, with support for: `http`, `https`, `tcp` & `tcps` - **addr**: The address and port of the QuestDB server, as in `localhost:9000`. +- **protocol_version**: QuestDB has evolved its protocol beyond the basic ILP. + HTTP client autodetects the highest protocol version it can use, but TCP + doesn't and defaults to 1. This is how you specify protocol version 2 (latest): + +```text +tcp::addr=localhost:9009;protocol_version=2 +``` + +You need protocol version 2 in order to ingest n-dimensional arrays. #### HTTP Parameters @@ -277,11 +290,11 @@ and demonstrate the configuration keys and core configuration options. Once a client has been selected and configured, resume from your language client documentation. -##### TCP token authentication setup +### TCP token authentication setup Create `d`, `x` & `y` tokens for client usage. -##### Prerequisites +#### Prerequisites - `jose`: C-language implementation of Javascript Object Signing and Encryption. Generates tokens. @@ -319,7 +332,7 @@ apt install jq -##### Server configuration +#### Server configuration Next, create an authentication file. @@ -348,7 +361,7 @@ Once created, reference it in the server [configuration](/docs/configuration/): line.tcp.auth.db.path=conf/auth.txt ``` -##### Client keys +#### Client keys For the server configuration above, the corresponding JSON Web Key must be stored on the clients' side. From bf01db49f07efba4f88c1e4c1e1f53d8040fa6f1 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 21 May 2025 16:33:27 +0200 Subject: [PATCH 18/51] Improve rendering of examples --- documentation/concept/array.md | 44 ++++----------- documentation/reference/function/array.md | 65 +++++++++++++++++------ 2 files changed, 59 insertions(+), 50 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index 66859a03..4d1f8362 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -123,9 +123,7 @@ CREATE TABLE tango AS (SELECT ARRAY[ ] arr from long_sequence(1)); ``` -### Example - -select a number from the array. +#### Example: select a number from the array ```questdb-sql SELECT arr[1, 3, 2] elem FROM tango; @@ -138,9 +136,7 @@ SELECT arr[1, 3, 2] elem FROM tango; This selected the `DOUBLE` number at the coordinates (1, 3, 2). Remember that the coordinates are 1-based! -### Example - -select an out-of-range element from the array. +#### Example: select an out-of-range element from the array ```questdb-sql SELECT arr[1, 3, 4] elem FROM tango; @@ -150,9 +146,7 @@ SELECT arr[1, 3, 4] elem FROM tango; | ---- | | NULL | -### Example - -select a 2D sub-array. +#### Example: select a 2D sub-array ```questdb-sql SELECT arr[1] subarr FROM tango; @@ -164,9 +158,7 @@ SELECT arr[1] subarr FROM tango; This selected the first 2D sub-array in `arr`. -### Example - -select a sub-array that is out-of-range. +#### Example: select a sub-array that is out-of-range ```questdb-sql SELECT arr[4] subarr FROM tango; @@ -176,9 +168,7 @@ SELECT arr[4] subarr FROM tango; | ------ | | [] | -### Example - -select a 1D sub-array. +#### Example: select a 1D sub-array ```questdb-sql SELECT arr[1, 3] subarr FROM tango; @@ -214,9 +204,7 @@ If the upper bound of the range exceeds the array's length, the result is the same as if the upper bound was left out — the result extends to the end of the array along that dimension. -### Example - -Select a slice of `arr` by constraining the first dimension. +#### Example: select a slice of `arr` by constraining the first dimension ```questdb-sql SELECT arr[2:3] slice FROM tango; @@ -229,9 +217,7 @@ SELECT arr[2:3] slice FROM tango; This returned a `DOUBLE[1][3][3]`, containing just the second sub-array of `arr`. -### Example - -Select a slice of `arr` with a right-open range. +#### Example: select a slice of `arr` with a right-open range ```questdb-sql SELECT arr[2:] slice FROM tango; @@ -244,9 +230,7 @@ SELECT arr[2:] slice FROM tango; This returned a `DOUBLE[2][3][3]` and contains everything except the first sub-array along the first dimension. -### Example - -Select a slice of `arr` by constraining the first and second dimensions. +#### Example: Select a slice of `arr` by constraining the first and second dimensions ```questdb-sql SELECT arr[2:3, 3:4] slice FROM tango; @@ -258,9 +242,7 @@ SELECT arr[2:3, 3:4] slice FROM tango; Note that the returned array is still 3D. -### Example - -Select a slice of `arr` with large upper bounds. +#### Example: select a slice of `arr` with large upper bounds ```questdb-sql SELECT arr[2:100, 3:100] slice FROM tango; @@ -276,9 +258,7 @@ The result is the same same as if using `arr[2:, 3:]`. You can use both types of selectors within the same bracket expression. -### Example - -Select the first sub-array of `arr`, and slice it. +#### Example: select the first sub-array of `arr`, and slice it ```questdb-sql SELECT arr[1, 2:4] subarr FROM tango; @@ -291,9 +271,7 @@ SELECT arr[1, 2:4] subarr FROM tango; This returned a `DOUBLE[2][3]`. The top dimension is gone because the first selector took out a sub-array and not a one-element slice. -### Example - -Select discontinuous elements from sub-arrays. +#### Example: select discontinuous elements from sub-arrays ```questdb-sql SELECT arr[1:, 3, 2] subarr FROM tango; diff --git a/documentation/reference/function/array.md b/documentation/reference/function/array.md index 8ce0adba..16662c43 100644 --- a/documentation/reference/function/array.md +++ b/documentation/reference/function/array.md @@ -14,14 +14,14 @@ not they can take an array parameter. `dim_length(array, dim)` returns the length of the n-dimensional array along dimension `dim`. -### Parameters +#### Parameters - `array` — the array - `dim` — the dimension (1-based) whose length to get -### Example +#### Example -Get the length of `arr` along the 1st dimension. +Get the length of the array along the 1st dimension. ```questdb-sql SELECT dim_length(ARRAY[42, 42], 1); @@ -37,12 +37,14 @@ SELECT dim_length(ARRAY[42, 42], 1); into the next dimension. All elements are still available because the next dimension's length gets multiplied by the removed dimension's length. -### Parameters +#### Parameters - `array` — the array - `dim` — the dimension (1-based) to flatten. Cannot be the last dimension. -### Example +#### Example + +Flatten a 2D array into a 1D array. ```questdb-sql SELECT flatten(ARRAY[[1, 2], [3, 4]], 1); @@ -70,15 +72,34 @@ corresponding row of `left_matrix` and column of `right_matrix`: `result[row, col] := sum_over_i(left_matrix[row, i] * right_matrix[i, col])` -### Example +#### Parameters + +- `left_matrix`: the left-hand matrix. Must be a 2D array +- `right_matrix`: the right-hand matrix. Must be a 2D array with as many rows as + there are columns in `left_matrix` + +#### Example Multiply the matrices: -```text -| 1 2 | | 2 3 | -| | x | | -| 3 4 | | 2 3 | -``` +$$ + +\begin{bmatrix} +1 & 2 \\ +3 & 4 +\end{bmatrix} +\times +\begin{bmatrix} +2 & 3 \\ +2 & 3 +\end{bmatrix} += +\begin{bmatrix} +6 & 9 \\ +14 & 21 +\end{bmatrix} + +$$ ```questdb-sql SELECT matmul(ARRAY[[1, 2], [3, 4]], ARRAY[[2, 3], [2, 3]]); @@ -91,16 +112,26 @@ SELECT matmul(ARRAY[[1, 2], [3, 4]], ARRAY[[2, 3], [2, 3]]); ## transpose `transpose(array)` transposes an array, reversing the order of its coordinates. +This is most often used on a matrix, swapping its rows and columns. -### Example +#### Example Transpose the matrix: -```text -| 1 2 | T | 1 3 | -| | = | | -| 3 4 | | 2 4 | -``` +$$ + + \begin{bmatrix} + 1 & 2 \\ + 3 & 4 + \end{bmatrix} +^T += +\begin{bmatrix} +1 & 3 \\ +2 & 4 +\end{bmatrix} + +$$ ```questdb-sql SELECT transpose(ARRAY[[1, 2], [3, 4]]); From 20ae7baa96e9782daeab8fd0a1c7a76c9a45ea20 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 21 May 2025 16:35:28 +0200 Subject: [PATCH 19/51] Demote subheadings in Aggregate Functions ... so they don't show up in contents --- .../reference/function/aggregation.md | 138 +++++++++--------- 1 file changed, 69 insertions(+), 69 deletions(-) diff --git a/documentation/reference/function/aggregation.md b/documentation/reference/function/aggregation.md index 18b9ae80..26b4bf3d 100644 --- a/documentation/reference/function/aggregation.md +++ b/documentation/reference/function/aggregation.md @@ -23,7 +23,7 @@ datasets where an exact count is not required. Thus consider it the higher cardinality alternative to [`count_dinstinct`](/docs/reference/function/aggregation/#count_distinct). -### Parameters +#### Parameters - `column_name`: The name of the column for which to estimate the count of distinct values. @@ -33,11 +33,11 @@ cardinality alternative to accurate estimate, but consumes more memory. Defaults to 1 (lower accuracy, high efficiency). -### Return value +#### Return value Return value type is `long`. -### Examples +#### Examples _Please note that exact example values will vary as they are approximations derived from the HyperLogLog algorithm._ @@ -103,7 +103,7 @@ SELECT store_id, approx_count_distinct(transaction_id) FROM transactions GROUP B value for the given non-negative column and percentile using the [HdrHistogram](http://hdrhistogram.org/) algorithm. -### Arguments +#### Parameters - `value` is any numeric non-negative value. - `percentile` is a `double` value between 0.0 and 1.0, inclusive. @@ -119,11 +119,11 @@ value for the given non-negative column and percentile using the 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better). -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql title="Approximate percentile" SELECT approx_percentile(latency, 0.99) FROM request_logs; @@ -139,16 +139,16 @@ SELECT approx_percentile(latency, 0.99) FROM request_logs; The function will throw an error if any negative values are encountered in the input. All input values must be non-negative. -### Arguments +#### Parameters - `value` is any non-negative numeric value. - `precision` (optional) is an `int` value between 0 and 5, inclusive. This is the number of significant decimal digits to which the histogram will maintain value resolution and separation. Higher precision leads to more accurate results with increased memory usage. Defaults to 1 (lower accuracy, high efficiency). -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql title="Calculate approximate median price by symbol" demo SELECT symbol, approx_median(price) FROM trades GROUP BY symbol; @@ -173,15 +173,15 @@ SELECT symbol, approx_median(price, 3) FROM trades GROUP BY symbol; `avg(value)` calculates simple average of values ignoring missing data (e.g `null` values). -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql title="Average transaction amount" SELECT avg(amount) FROM transactions; @@ -216,16 +216,16 @@ each set differs from the average of its set. This calculation is based on - If there's no clear pattern, the function will return a value close to 0. -### Arguments +#### Parameters - `arg0` is any numeric value representing the first variable - `arg1` is any numeric value representing the second variable -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql title="Correlation between price and quantity" SELECT corr(price, quantity) FROM transactions; @@ -251,7 +251,7 @@ SELECT payment_type, corr(price, quantity) FROM transactions GROUP BY payment_ty data. - `count(column_name)` - counts the number of non-null values in a given column. -### Arguments +#### Parameters - `count()` does not require arguments. - `count(column_name)` - supports the following data types: @@ -270,11 +270,11 @@ SELECT payment_type, corr(price, quantity) FROM transactions GROUP BY payment_ty - `string` - `symbol` -### Return value +#### Return value Return value type is `long`. -### Examples +#### Examples Count of rows in the `transactions` table: @@ -332,11 +332,11 @@ SELECT payment_type, count(amount) FROM transactions; `count_distinct(column_name)` - counts distinct non-`null` values in `varchar`, `symbol`, `long256`, `UUID`, `IPv4`, `long`, `int` or `string` columns. -### Return value +#### Return value Return value type is `long`. -### Examples +#### Examples - Count of distinct sides in the transactions table. Side column can either be `BUY` or `SELL` or `null`. @@ -379,16 +379,16 @@ measure of the overall trend. - The closer the result is to zero, the less relationship there is between the two sets of numbers. -### Arguments +#### Parameters - `arg0` is any numeric value representing the first variable - `arg1` is any numeric value representing the second variable. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql title="Population covariance between price and quantity" SELECT covar_pop(price, quantity) FROM transactions; @@ -423,16 +423,16 @@ average in each set. - The closer the result is to zero, the less relationship there is between the two sets of numbers. -### Arguments +#### Parameters - `arg0` is any numeric value representing the first variable. - `arg1` is any numeric value representing the second variable. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql title="Sample covariance between price and quantity" SELECT covar_samp(price, quantity) FROM transactions; @@ -467,11 +467,11 @@ last row is always the one with the highest (latest) timestamp. For a table without a designated timestamp column, `first` returns the first row and `last` returns the last inserted row, regardless of any timestamp column. -### Return value +#### Return value Return value type is the same as the type of the argument. -### Examples +#### Examples Given a table `sensors`, which has a designated timestamp column: @@ -544,11 +544,11 @@ row with the lowest timestamp (oldest). For a table without a designated timestamp column, `first_not_null` returns the first non-null row, regardless of any timestamp column. -### Return value +#### Return value Return value type is the same as the type of the argument. -### Examples +#### Examples Given a table `sensors`, which has a designated timestamp column: @@ -601,11 +601,11 @@ row with the highest timestamp (most recent). For a table without a designated timestamp column, `last_not_null` returns the last non-null row, regardless of any timestamp column. -### Return value +#### Return value Return value type is the same as the type of the argument. -### Examples +#### Examples Given a table `sensors`, which has a designated timestamp column: @@ -650,17 +650,17 @@ SELECT last_not_null(device_id) FROM sensors_unordered; `haversine_dist_deg(lat, lon, ts)` - calculates the traveled distance for a series of latitude and longitude points. -### Arguments +#### Parameters - `lat` is the latitude expressed as degrees in decimal format (`double`) - `lon` is the longitude expressed as degrees in decimal format (`double`) - `ts` is the `timestamp` for the data point -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql title="Calculate the aggregate traveled distance for each car_id" SELECT car_id, haversine_dist_deg(lat, lon, k) @@ -675,15 +675,15 @@ are added using the [Kahan compensated sum algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm). This is only beneficial for floating-point values such as `float` or `double`. -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is the same as the type of the argument. -### Examples +#### Examples ```questdb-sql SELECT ksum(a) @@ -699,15 +699,15 @@ FROM (SELECT rnd_double() a FROM long_sequence(100)); `max(value)` - returns the highest value ignoring missing data (e.g `null` values). -### Arguments +#### Parameters - `value` is any numeric or string value -### Return value +#### Return value Return value type is the same as the type of the argument. -### Examples +#### Examples ```questdb-sql title="Highest transaction amount" SELECT max(amount) FROM transactions; @@ -732,15 +732,15 @@ SELECT payment_type, max(amount) FROM transactions; `min(value)` - returns the lowest value ignoring missing data (e.g `null` values). -### Arguments +#### Parameters - `value` is any numeric or string value -### Return value +#### Return value Return value type is the same as the type of the argument. -### Examples +#### Examples ```questdb-sql title="Lowest transaction amount" SELECT min(amount) FROM transactions; @@ -767,15 +767,15 @@ are added using the [Neumaier sum algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements). This is only beneficial for floating-point values such as `float` or `double`. -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql SELECT nsum(a) @@ -797,15 +797,15 @@ values are spread out over a wider range. `stddev` is an alias for `stddev_samp`. -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql SELECT stddev_samp(x) @@ -824,15 +824,15 @@ variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql SELECT stddev_pop(x) @@ -848,16 +848,16 @@ FROM (SELECT x FROM long_sequence(100)); `string_agg(value, delimiter)` - Concatenates the given string values into a single string with the delimiter used as a value separator. -### Arguments +#### Parameters - `value` is a `varchar` value. - `delimiter` is a `char` value. -### Return value +#### Return value Return value type is `varchar`. -### Examples +#### Examples ```questdb-sql SELECT string_agg(x::varchar, ',') @@ -881,17 +881,17 @@ values. - Does not support `ORDER BY`. -### Arguments +#### Parameters - `value`: A varchar or string column containing the values to be aggregated. - `delimiter`: A char value used to separate the distinct values in the concatenated string. -### Return value +#### Return value Return value type is `string`. -### Examples +#### Examples Suppose we want to find all the distinct sky cover types observed in the weather tablein our public demo: @@ -934,15 +934,15 @@ GROUP BY windDir; `sum(value)` - adds values ignoring missing data (e.g `null` values). -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is the same as the type of the argument. -### Examples +#### Examples ```questdb-sql title="Sum all quantities in the transactions table" SELECT sum(quantity) FROM transactions; @@ -961,7 +961,7 @@ SELECT item, sum(quantity) FROM transactions; | apple | 53 | | orange | 47 | -### Overflow +#### Overflow `sum` does not perform overflow check. To avoid overflow, you can cast the argument to wider type. @@ -980,15 +980,15 @@ the values are spread out over a wider range. `variance()` is an alias for `var_samp`. -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql SELECT var_samp(x) @@ -1007,15 +1007,15 @@ set of values. A low variance indicates that the values tend to be very close to the mean, while a high variance indicates that the values are spread out over a wider range. -### Arguments +#### Parameters - `value` is any numeric value. -### Return value +#### Return value Return value type is `double`. -### Examples +#### Examples ```questdb-sql SELECT var_pop(x) From 98950c2cbb886609275c237d2f30888f962838be Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 21 May 2025 16:35:40 +0200 Subject: [PATCH 20/51] Touch up Finance page --- documentation/reference/function/finance.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/reference/function/finance.md b/documentation/reference/function/finance.md index 6bc42e53..3fc163a9 100644 --- a/documentation/reference/function/finance.md +++ b/documentation/reference/function/finance.md @@ -8,7 +8,7 @@ This page describes functions specific to the financial services domain. ## l2price -Trade price calculation. +Level-2 trade price calculation. `l2price(target_size, size_array, price_array)` From 5eb4f669f4abe5ef58f56c2781fdcc09f92b05ab Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Fri, 23 May 2025 09:47:45 +0200 Subject: [PATCH 21/51] better wording --- documentation/pgwire/python.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index 5c4b55df..be729ab9 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -268,11 +268,11 @@ While we recommend using the [InfluxDB Line Protocol (ILP)](/docs/ingestion-over the `executemany()` method to insert multiple rows in a single query. It is highly efficient for executing the same parameterized statements multiple times with different sets of data. This method is significantly faster than executing individual statements in a loop because it reduces network round-trips and allows for potential batching optimizations -by the database. In our testing, we found that `executemany()` is about 10x-100x faster than using a loop with `execute()`. +by the database. In our testing, we found that `executemany()` can be 10x-100x faster than using a loop with `execute()`. -It's particularly useful for bulk data insertion, such as recording multiple trades as they occur. INSERT performance +It's particularly useful for bulk data insertion, such as recording multiple trades as they occur. `INSERT` performance grows with the batch size, so you should experiment with the batch size to find the optimal value for your use case. We -recommend starting with a batch size of 1000 and adjusting it based on your performance requirements. +recommend starting with a batch size of 1,000 and adjusting it based on further testing. ```python import asyncio From 263542b4ac09d7c43f0c3cf0de47f9fa8254818b Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Fri, 23 May 2025 09:48:35 +0200 Subject: [PATCH 22/51] better wording --- documentation/pgwire/python.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index be729ab9..386e0633 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -333,7 +333,7 @@ for multidimensional arrays) into the appropriate PostgreSQL array format and vi Using array types with asyncpg against QuestDB requires manual type registration. asyncpg's standard type introspection query is not yet supported by QuestDB. Therefore, you must manually register a "codec" -for your array types with the asyncpg connection, as shown in the preceding code example. This ensures correct array +for your array types with the asyncpg connection, as shown in the code example below. This ensures correct array handling and avoids errors. This is a temporary workaround until a permanent solution is available in asyncpg or QuestDB. From 6f4abb2cf79552068f39b1009079492c28d3642b Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Fri, 23 May 2025 09:51:34 +0200 Subject: [PATCH 23/51] links to anchors --- documentation/pgwire/python.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index 386e0633..1e60a2e4 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -7,9 +7,9 @@ description: QuestDB is tested with the following Python clients: -- [asyncpg](https://pypi.org/project/asyncpg/) -- [psycopg3](https://www.psycopg.org/psycopg3/docs/) -- [psycopg2](https://www.psycopg.org/docs/) +- [asyncpg](#asyncpg) +- [psycopg3](#psycopg3) +- [psycopg2](#psycopg2) Other Python clients that are compatible with the PostgreSQL wire protocol should also work with QuestDB, but we do not test them. If you find a client that From 00d8d6f49d51493044c9499b885cc3acefc05a9f Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Fri, 23 May 2025 12:41:55 +0200 Subject: [PATCH 24/51] Add ARRAY literal section in Concepts --- documentation/concept/array.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index 4d1f8362..3aa970c7 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -90,6 +90,33 @@ QuestDB always stores arrays in vanilla form. When you transform an array's shape and then store the resulting array to the database, it will be stored in vanilla form. +## The ARRAY literal + +You can create an array from scalar values using the `ARRAY[...]` syntax, as +in this example: + +```questdb-sql +CREATE TABLE tango AS (SELECT ARRAY[ + [ [ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ], + [ [10, 11, 12], [13, 14, 15], [16, 17, 18] ], + [ [19, 20, 21], [22, 23, 24], [25, 26, 27] ] +] arr from long_sequence(1)); +``` + +Values can be any expressions that yield scalars, so you can construct the array +from existing column data. + +Values can also be arrays, creating a higher-dimensional array: + +```questdb-sql +CREATE TABLE tango AS (SELECT ARRAY[1, 2] arr, ARRAY[3, 4] brr FROM long_sequence(1)); +SELECT ARRAY[arr, brr] FROM tango; +``` + +| array_2d | +| --------------------- | +| [[1.0,2.0],[3.0,4.0]] | + ## Array access syntax We model our N-dimensional array access syntax on Python's `NDArray`, except that From 2041417fe1f57df9eb42d313c6fc8316c6ce82b4 Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Fri, 23 May 2025 15:22:18 +0200 Subject: [PATCH 25/51] inserting arrays with psycopg3 --- documentation/pgwire/python.md | 148 ++++++++++++++++++++++++++------- 1 file changed, 116 insertions(+), 32 deletions(-) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index 1e60a2e4..31167219 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -657,6 +657,122 @@ async def async_psycopg3(): asyncio.run(async_psycopg3()) ``` +### Inserting Arrays + +QuestDB, via the PostgreSQL wire protocol, supports [array data types](/docs/concept/array/), including multidimensional +arrays. + +When you need to insert multiple rows containing array data, such as a series of order book snapshots, `executemany()` +offers a more performant way to do so compared to inserting row by row with execute(). + +:::tip +For data ingestion, we recommend using QuestDB's first-party clients with the [InfluxDB Line Protocol (ILP)](/docs/ingestion-overview/) +instead of PGWire. PGWire should primarily be used for querying data in QuestDB. If you cannot use ILP for some reason, +you should prefer [asyncpg](#inserting-arrays) over psycopg3 for performance reasons. We found that asyncpg is significantly faster than psycopg3 +when inserting batches of data including arrays. +::: + +```python title="Batch Inserting L3 Order Book Snapshots" +import asyncio +import os +import time +from datetime import datetime, timedelta + +import psycopg + +async def batch_insert_l3_order_book_arrays(): + conn_str = "host=127.0.0.1 port=8812 dbname=qdb user=admin password=quest" + + async with await psycopg.AsyncConnection.connect(conn_str) as conn: + async with conn.cursor() as cur: + await cur.execute(""" + CREATE TABLE IF NOT EXISTS l3_order_book + ( + bid DOUBLE [][], + ask DOUBLE [][], + ts TIMESTAMP + ) TIMESTAMP(ts) PARTITION BY DAY WAL; + """) + print("Table 'l3_order_book' is ready.") + + # Prepare a list of L3 order book snapshots for batch insertion + snapshots_to_insert = [] + base_timestamp = datetime.now() + + # First row + bids1 = [[68500.50, 0.5], [68500.00, 1.2], [68499.50, 0.3]] + asks1 = [[68501.00, 0.8], [68501.50, 0.4], [68502.00, 1.1]] + ts1 = (base_timestamp + timedelta(seconds=1)) + snapshots_to_insert.append((bids1, asks1, ts1)) + + # Second row + bids2 = [[68502.10, 0.3], [68501.80, 0.9], [68501.20, 1.5]] + asks2 = [[68502.50, 1.1], [68503.00, 0.6], [68503.50, 0.2]] + ts2 = (base_timestamp + timedelta(seconds=2)) + snapshots_to_insert.append((bids2, asks2, ts2)) + + # Third row + bids3 = [[68490.60, 2.5], [68489.00, 3.2]] + asks3 = [[68491.20, 1.8], [68492.80, 0.7]] + ts3 = (base_timestamp + timedelta(seconds=3)) + snapshots_to_insert.append((bids3, asks3, ts3)) + + print(f"Prepared {len(snapshots_to_insert)} snapshots for batch insertion.") + + # Insert the snapshots into the database in a single batch + await cur.executemany( + """ + INSERT INTO l3_order_book (bid, ask, ts) + VALUES (%b, %b, %b) + """, + snapshots_to_insert + ) + print(f"Successfully inserted {cur.rowcount} L3 order book snapshots using executemany().") + + +if __name__ == "__main__": + # Set timezone to UTC + os.environ['TZ'] = 'UTC' + if hasattr(time, 'tzset'): + time.tzset() + + asyncio.run(batch_insert_l3_order_book_arrays()) +``` + + +### Connection Pooling with psycopg2-pool + +For connection pooling with psycopg2, you can use external libraries like psycopg2-pool: + +```python +from psycopg2.pool import ThreadedConnectionPool + +pool = ThreadedConnectionPool( + minconn=5, + maxconn=20, + host='127.0.0.1', + port=8812, + user='admin', + password='quest', + dbname='qdb' +) + +conn = pool.getconn() + +try: + conn.autocommit = True + + with conn.cursor() as cur: + cur.execute("SELECT * FROM trades LIMIT 10") + rows = cur.fetchall() + print(f"Fetched {len(rows)} rows") +finally: + pool.putconn(conn) + +pool.closeall() +``` + + ### Connection Pooling psycopg3 provides connection pooling capabilities. This reduces the overhead of establishing new connections @@ -906,38 +1022,6 @@ finally: conn.close() ``` -### Connection Pooling with psycopg2-pool - -For connection pooling with psycopg2, you can use external libraries like psycopg2-pool: - -```python -from psycopg2.pool import ThreadedConnectionPool - -pool = ThreadedConnectionPool( - minconn=5, - maxconn=20, - host='127.0.0.1', - port=8812, - user='admin', - password='quest', - dbname='qdb' -) - -conn = pool.getconn() - -try: - conn.autocommit = True - - with conn.cursor() as cur: - cur.execute("SELECT * FROM trades LIMIT 10") - rows = cur.fetchall() - print(f"Fetched {len(rows)} rows") -finally: - pool.putconn(conn) - -pool.closeall() -``` - ### Integration with pandas psycopg2 integrates with pandas over SQLAlchemy, allowing you to read data directly into a DataFrame. This feature From 16431e2de07deb046990a2ec9fcef95b94fbb1e6 Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Fri, 23 May 2025 15:34:05 +0200 Subject: [PATCH 26/51] explain binary array transfers with psycopg3 --- documentation/pgwire/python.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index 31167219..33580a8f 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -667,9 +667,10 @@ offers a more performant way to do so compared to inserting row by row with exec :::tip For data ingestion, we recommend using QuestDB's first-party clients with the [InfluxDB Line Protocol (ILP)](/docs/ingestion-overview/) -instead of PGWire. PGWire should primarily be used for querying data in QuestDB. If you cannot use ILP for some reason, -you should prefer [asyncpg](#inserting-arrays) over psycopg3 for performance reasons. We found that asyncpg is significantly faster than psycopg3 -when inserting batches of data including arrays. +instead of PGWire. PGWire should primarily be used for querying data in QuestDB. + +If you cannot use ILP for some reason, you should prefer [asyncpg](#inserting-arrays) over psycopg3 for performance +reasons. We found that asyncpg is significantly faster than psycopg3 when inserting batches of data including arrays. ::: ```python title="Batch Inserting L3 Order Book Snapshots" @@ -739,6 +740,11 @@ if __name__ == "__main__": asyncio.run(batch_insert_l3_order_book_arrays()) ``` +The example above uses `%b` as placeholder parameters instead of the usual `%s` placeholder. This is because +`%b` format forces the use of the binary protocol. By default, psycopg3 uses the text encoding for array +parameters. The binary protocol is more efficient for transferring large amounts of data, especially for +arrays. The binary protocol is also the default for asyncpg, which is why we recommend using asyncpg for +data ingestion over PGWire. ### Connection Pooling with psycopg2-pool From ee55757a35053925d5e18ed8061d28591f17d85a Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Mon, 26 May 2025 11:34:34 +0200 Subject: [PATCH 27/51] binary for all psycopg3 transfers --- documentation/pgwire/python.md | 155 +++++++++++++++++---------------- 1 file changed, 80 insertions(+), 75 deletions(-) diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index 33580a8f..edc728c7 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -465,6 +465,10 @@ found that the binary version of psycopg3 is significantly faster than the pure- ::: +We recommend always passing `binary=True` to the `cursor()` method to use the binary protocol for better performance. +This is especially important for large datasets and array types. + + ### Basic Connection ```python @@ -479,7 +483,7 @@ conn = psycopg.connect( autocommit=True # Important for QuestDB ) -with conn.cursor() as cur: +with conn.cursor(binary=True) as cur: cur.execute("SELECT version()") version = cur.fetchone() print(f"Connected to QuestDB version: {version[0]}") @@ -500,27 +504,28 @@ import psycopg from datetime import datetime, timedelta with psycopg.connect( - host='127.0.0.1', - port=8812, - user='admin', - password='quest', - dbname='qdb', - autocommit=True + host='127.0.0.1', + port=8812, + user='admin', + password='quest', + dbname='qdb', + autocommit=True ) as conn: - with conn.cursor() as cur: + with conn.cursor(binary=True) as cur: end_time = datetime.now() start_time = end_time - timedelta(days=1) - + cur.execute(""" - SELECT * FROM trades - WHERE ts >= %s AND ts <= %s - ORDER BY ts DESC - LIMIT 10 - """, (start_time, end_time)) - + SELECT * + FROM trades + WHERE ts >= %s + AND ts <= %s + ORDER BY ts DESC LIMIT 10 + """, (start_time, end_time)) + rows = cur.fetchall() print(f"Fetched {len(rows)} rows") - + for row in rows: print(f"Timestamp: {row[0]}, Symbol: {row[1]}, Price: {row[2]}") ``` @@ -533,24 +538,24 @@ psycopg3 allows you to specify how rows are returned using row factories: import psycopg with psycopg.connect( - host='127.0.0.1', - port=8812, - user='admin', - password='quest', - dbname='qdb', - autocommit=True + host='127.0.0.1', + port=8812, + user='admin', + password='quest', + dbname='qdb', + autocommit=True ) as conn: - with conn.cursor(row_factory=psycopg.rows.dict_row) as cur: + with conn.cursor(row_factory=psycopg.rows.dict_row, binary=True) as cur: cur.execute("SELECT * FROM trades LIMIT 5") rows = cur.fetchall() - + for row in rows: print(f"Symbol: {row['symbol']}, Price: {row['price']}") - - with conn.cursor(row_factory=psycopg.rows.namedtuple_row) as cur: + + with conn.cursor(row_factory=psycopg.rows.namedtuple_row, binary=True) as cur: cur.execute("SELECT * FROM trades LIMIT 5") rows = cur.fetchall() - + for row in rows: print(f"Symbol: {row.symbol}, Price: {row.price}") ``` @@ -563,30 +568,30 @@ For large result sets, you can use server-side cursors: import psycopg with psycopg.connect( - host='127.0.0.1', - port=8812, - user='admin', - password='quest', - dbname='qdb', - autocommit=True + host='127.0.0.1', + port=8812, + user='admin', + password='quest', + dbname='qdb', + autocommit=True ) as conn: - with conn.cursor() as cur: + with conn.cursor(binary=True) as cur: # Execute a query that might return many rows cur.execute("SELECT * FROM trades") - + batch_size = 1000 total_processed = 0 - + while True: batch = cur.fetchmany(batch_size) - + if not batch: break - + total_processed += len(batch) if total_processed % 10000 == 0: print(f"Processed {total_processed} rows so far...") - + print(f"Finished processing {total_processed} total rows") ``` @@ -606,7 +611,7 @@ with psycopg.connect( dbname='qdb', autocommit=True ) as conn: - with conn.cursor() as cur: + with conn.cursor(binary=True) as cur: # Define query parameters end_time = datetime.now() start_time = end_time - timedelta(days=7) @@ -645,7 +650,7 @@ async def async_psycopg3(): dbname='qdb', autocommit=True ) as aconn: - async with aconn.cursor() as acur: + async with aconn.cursor(binary=True) as acur: await acur.execute("SELECT * FROM trades LIMIT 10") rows = await acur.fetchall() @@ -685,7 +690,7 @@ async def batch_insert_l3_order_book_arrays(): conn_str = "host=127.0.0.1 port=8812 dbname=qdb user=admin password=quest" async with await psycopg.AsyncConnection.connect(conn_str) as conn: - async with conn.cursor() as cur: + async with conn.cursor(binary=True) as cur: await cur.execute(""" CREATE TABLE IF NOT EXISTS l3_order_book ( @@ -746,38 +751,6 @@ parameters. The binary protocol is more efficient for transferring large amounts arrays. The binary protocol is also the default for asyncpg, which is why we recommend using asyncpg for data ingestion over PGWire. -### Connection Pooling with psycopg2-pool - -For connection pooling with psycopg2, you can use external libraries like psycopg2-pool: - -```python -from psycopg2.pool import ThreadedConnectionPool - -pool = ThreadedConnectionPool( - minconn=5, - maxconn=20, - host='127.0.0.1', - port=8812, - user='admin', - password='quest', - dbname='qdb' -) - -conn = pool.getconn() - -try: - conn.autocommit = True - - with conn.cursor() as cur: - cur.execute("SELECT * FROM trades LIMIT 10") - rows = cur.fetchall() - print(f"Fetched {len(rows)} rows") -finally: - pool.putconn(conn) - -pool.closeall() -``` - ### Connection Pooling @@ -805,7 +778,7 @@ pool = ConnectionPool( ) with pool.connection() as conn: - with conn.cursor() as cur: + with conn.cursor(binary=True) as cur: cur.execute("SELECT * FROM trades LIMIT 10") rows = cur.fetchall() print(f"Fetched {len(rows)} rows") @@ -1028,6 +1001,38 @@ finally: conn.close() ``` +### Connection Pooling with psycopg2-pool + +For connection pooling with psycopg2, you can use external libraries like psycopg2-pool: + +```python +from psycopg2.pool import ThreadedConnectionPool + +pool = ThreadedConnectionPool( + minconn=5, + maxconn=20, + host='127.0.0.1', + port=8812, + user='admin', + password='quest', + dbname='qdb' +) + +conn = pool.getconn() + +try: + conn.autocommit = True + + with conn.cursor() as cur: + cur.execute("SELECT * FROM trades LIMIT 10") + rows = cur.fetchall() + print(f"Fetched {len(rows)} rows") +finally: + pool.putconn(conn) + +pool.closeall() +``` + ### Integration with pandas psycopg2 integrates with pandas over SQLAlchemy, allowing you to read data directly into a DataFrame. This feature From 24183bbcdb8f6e6b605056298bf9f0aa77ac514a Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Mon, 26 May 2025 11:46:53 +0200 Subject: [PATCH 28/51] asyncpg and timezones explained --- documentation/pgwire/pgwire-intro.md | 3 + documentation/pgwire/python.md | 99 ++++++++++++++++++---------- 2 files changed, 69 insertions(+), 33 deletions(-) diff --git a/documentation/pgwire/pgwire-intro.md b/documentation/pgwire/pgwire-intro.md index 6760d807..0152c5e0 100644 --- a/documentation/pgwire/pgwire-intro.md +++ b/documentation/pgwire/pgwire-intro.md @@ -31,6 +31,9 @@ libraries interpreting these timestamps in their local timezone by default, pote data representation. Our language-specific guides provide detailed examples on how to configure your client to correctly interpret these timestamps as UTC. +We realize the current behavior is not ideal and we are actively working on improving it. In the meantime, we +recommend that you set the timezone in your client library to UTC to ensure consistent handling of timestamps. + ### PGWire vs. SQL Semantics While QuestDB supports the PGWire protocol for communication, its SQL dialect and feature diff --git a/documentation/pgwire/python.md b/documentation/pgwire/python.md index edc728c7..ab05f720 100644 --- a/documentation/pgwire/python.md +++ b/documentation/pgwire/python.md @@ -62,8 +62,12 @@ pip install asyncpg ```python import asyncio +import os +import time + import asyncpg + async def connect_to_questdb(): conn = await asyncpg.connect( host='127.0.0.1', @@ -72,15 +76,17 @@ async def connect_to_questdb(): password='quest', database='qdb' ) - + version = await conn.fetchval("SELECT version()") print(f"Connected to QuestDB version: {version}") - + await conn.close() -// Set the timezone to UTC +# Set the timezone to UTC os.environ['TZ'] = 'UTC' -time.tzset() +if hasattr(time, 'tzset'): + # tzset is only available on Unix-like systems + time.tzset() asyncio.run(connect_to_questdb()) ``` @@ -88,6 +94,9 @@ asyncio.run(connect_to_questdb()) **Note**: The `asyncpg` client uses the system timezone by default. QuestDB always sends timestamp in UTC. To set the timezone to UTC, you can set the `TZ` environment variable before running your script. This is important for time-series data to ensure consistent timestamps. + +See the [Timestamp Handling](/docs/pgwire/pgwire-intro#timestamp-handling) chapter for additional context on how +on how QuestDB handles timezones. ::: ### Querying Data @@ -103,6 +112,9 @@ asyncpg provides several methods for fetching data: import asyncio import asyncpg from datetime import datetime, timedelta +import os +import time + async def query_with_asyncpg(): conn = await asyncpg.connect( @@ -112,36 +124,39 @@ async def query_with_asyncpg(): password='quest', database='qdb' ) - + # Fetch multiple rows rows = await conn.fetch(""" - SELECT * FROM trades - WHERE ts >= $1 - ORDER BY ts DESC - LIMIT 10 - """, datetime.now() - timedelta(days=1)) - + SELECT * + FROM trades + WHERE ts >= $1 + ORDER BY ts DESC LIMIT 10 + """, datetime.now() - timedelta(days=1)) + print(f"Fetched {len(rows)} rows") for row in rows: print(f"Timestamp: {row['ts']}, Symbol: {row['symbol']}, Price: {row['price']}") - + # Fetch a single row single_row = await conn.fetchrow(""" - SELECT * FROM trades - LIMIT -1 - """) - + SELECT * + FROM trades LIMIT -1 + """) + if single_row: print(f"Latest trade: {single_row['symbol']} at {single_row['price']}") - + # Fetch a single value count = await conn.fetchval("SELECT count(*) FROM trades") print(f"Total trades: {count}") - + await conn.close() + +# Set the timezone to UTC os.environ['TZ'] = 'UTC' -time.tzset() +if hasattr(time, 'tzset'): + time.tzset() asyncio.run(query_with_asyncpg()) ``` @@ -152,6 +167,8 @@ For large result sets, you can use a cursor to fetch results in batches: ```python import asyncio import asyncpg +import os +import time async def stream_with_cursor(): conn = await asyncpg.connect( @@ -161,32 +178,36 @@ async def stream_with_cursor(): password='quest', database='qdb' ) - + async with conn.transaction(): # Execute a query that might return a large number of rows cursor = await conn.cursor(""" - SELECT * FROM trades - ORDER BY ts - """) - + SELECT * + FROM trades + ORDER BY ts + """) + batch_size = 100 total_processed = 0 - + while True: batch = await cursor.fetch(batch_size) - + # If no more rows, break the loop if not batch: break - + total_processed += len(batch) print(f"Processed {total_processed} rows so far...") - + await conn.close() print(f"Finished processing {total_processed} total rows") + +# Set the timezone to UTC os.environ['TZ'] = 'UTC' -time.tzset() +if hasattr(time, 'tzset'): + time.tzset() asyncio.run(stream_with_cursor()) ``` @@ -197,6 +218,8 @@ For applications that need to execute many queries, you can use connection pooli ```python import asyncio import asyncpg +import os +import time async def connection_pool_example(): pool = await asyncpg.create_pool( @@ -215,8 +238,10 @@ async def connection_pool_example(): await pool.close() +# Set the timezone to UTC os.environ['TZ'] = 'UTC' -time.tzset() +if hasattr(time, 'tzset'): + time.tzset() asyncio.run(connection_pool_example()) ``` @@ -228,6 +253,8 @@ asyncpg uses numbered parameters (`$1`, `$2`, etc.) for prepared statements: import asyncio import asyncpg from datetime import datetime, timedelta +import os +import time async def parameterized_query(): conn = await asyncpg.connect( @@ -258,8 +285,10 @@ async def parameterized_query(): await conn.close() +# Set the timezone to UTC os.environ['TZ'] = 'UTC' -time.tzset() +if hasattr(time, 'tzset'): + time.tzset() asyncio.run(parameterized_query()) ``` @@ -318,8 +347,10 @@ async def execute_many_market_data_example(): """, trades_data) print(f"Successfully inserted {len(trades_data)} trade records using executemany.") +# Set the timezone to UTC os.environ['TZ'] = 'UTC' -time.tzset() +if hasattr(time, 'tzset'): + time.tzset() asyncio.run(execute_many_market_data_example()) ``` @@ -422,8 +453,10 @@ async def batch_insert_l3_order_book_arrays(): ) print(f"Successfully inserted {len(snapshots_to_insert)} L3 order book snapshots using executemany().") +# Set the timezone to UTC os.environ['TZ'] = 'UTC' -time.tzset() +if hasattr(time, 'tzset'): + time.tzset() asyncio.run(batch_insert_l3_order_book_arrays()) ``` From 774ff8e86ed7307f9f1960c51b9eb8cca5de81a7 Mon Sep 17 00:00:00 2001 From: Jaromir Hamala Date: Mon, 26 May 2025 13:46:53 +0200 Subject: [PATCH 29/51] inserting arrays via pgwire --- documentation/pgwire/java.md | 96 ++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) diff --git a/documentation/pgwire/java.md b/documentation/pgwire/java.md index 9f261f5b..06686780 100644 --- a/documentation/pgwire/java.md +++ b/documentation/pgwire/java.md @@ -204,6 +204,102 @@ public class QuestDBParameterizedQuery { } ``` +### Inserting Arrays +QuestDB, via the PostgreSQL wire protocol, supports array data types, including multidimensional arrays. + +:::tip +Inserting large amounts of data using the JDBC driver can be inefficient. For high-throughput ingestion, consider using +QuestDB's [Java ILP client](/docs/clients/java_ilp/) or the [InfluxDB Line Protocol (ILP)](/docs/ingestion-overview/). +::: + +When you need to insert multiple rows containing array data, such as a series of order book snapshots, +JDBC Batch API offers a more performant way to do so compared to inserting row by row with `execute()`. +The optimal batch size can vary based on your specific use case, but a common practice is to batch +inserts of 100 to 1000 rows at a time. This reduces the number of round trips to the database and can significantly +improve performance, especially when dealing with large datasets. + +```java +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.sql.Statement; +import java.sql.Timestamp; +import java.time.Instant; +import java.time.temporal.ChronoUnit; +import java.util.Arrays; +import java.util.Calendar; +import java.util.TimeZone; + +public class ArrayInsert { + + public static void main(String[] args) { + String url = "jdbc:postgresql://127.0.0.1:8812/qdb"; + String user = "admin"; + String password = "quest"; + + try (Connection conn = DriverManager.getConnection(url, user, password)) { + try (Statement stmt = conn.createStatement()) { + String createTableSQL = """ + CREATE TABLE IF NOT EXISTS l3_order_book + ( + bid DOUBLE PRECISION[][], + ask DOUBLE PRECISION[][], + ts TIMESTAMP + ) TIMESTAMP(ts) PARTITION BY DAY WAL; + """; + stmt.execute(createTableSQL); + System.out.println("Table 'l3_order_book' is ready."); + } + + java.util.Calendar utcCalendar = Calendar.getInstance(); + utcCalendar.setTimeZone(TimeZone.getTimeZone("UTC")); + + Instant baseTimestamp = Instant.now(); + String insertSQL = "INSERT INTO l3_order_book (bid, ask, ts) VALUES (?, ?, ?)"; + try (PreparedStatement pstmt = conn.prepareStatement(insertSQL)) { + // Add first row to batch + Double[][] bids1 = {{68500.50, 0.5}, {68500.00, 1.2}, {68499.50, 0.3}}; + Double[][] asks1 = {{68501.00, 0.8}, {68501.50, 0.4}, {68502.00, 1.1}}; + Timestamp ts1 = Timestamp.from(baseTimestamp.plus(1, ChronoUnit.SECONDS)); + pstmt.setObject(1, bids1); + pstmt.setObject(2, asks1); + pstmt.setTimestamp(3, ts1, utcCalendar); + pstmt.addBatch(); + + // Add second row to batch + Double[][] bids2 = {{68502.10, 0.3}, {68501.80, 0.9}, {68501.20, 1.5}}; + Double[][] asks2 = {{68502.50, 1.1}, {68503.00, 0.6}, {68503.50, 0.2}}; + Timestamp ts2 = Timestamp.from(baseTimestamp.plus(2, ChronoUnit.SECONDS)); + pstmt.setObject(1, bids2); + pstmt.setObject(2, asks2); + pstmt.setTimestamp(3, ts2, utcCalendar); + pstmt.addBatch(); + + // Add third row to batch + Double[][] bids3 = {{68490.60, 2.5}, {68489.00, 3.2}}; + Double[][] asks3 = {{68491.20, 1.8}, {68492.80, 0.7}}; + Timestamp ts3 = Timestamp.from(baseTimestamp.plus(3, ChronoUnit.SECONDS)); + pstmt.setObject(1, bids3); + pstmt.setObject(2, asks3); + pstmt.setTimestamp(3, ts3, utcCalendar); + pstmt.addBatch(); + + // Execute the batch + int[] updateCounts = pstmt.executeBatch(); + + int totalInserted = Arrays.stream(updateCounts).sum(); + System.out.printf("Successfully inserted %d L3 order book snapshots using batch insert.%n", totalInserted); + } + } catch (SQLException e) { + System.err.println("Database error occurred."); + e.printStackTrace(); + } + } +} +``` + + ### Connection Pooling with HikariCP Connection pooling is highly recommended for production applications to efficiently manage database connections: From a1d758419d40a360d65583136a30fbd5f4563ee3 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 26 May 2025 14:12:54 +0200 Subject: [PATCH 30/51] Add note on limited Beta support --- documentation/concept/array.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index 3aa970c7..e727bf36 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -4,6 +4,13 @@ sidebar_label: N-Dim array description: Explains the technical design and syntax to use N-dimensional arrays. --- +:::note + +N-dimensional arrays are still in Beta phase. The only supported element type is +`DOUBLE`. + +::: + QuestDB supports the N-dimensional array type. Its design matches that of the `NDArray` type in NumPy, which has become the de-facto standard for handling N-dimensional data. In order to effectively use arrays in QuestDB, you should From 5cc187d6b412a1c96945420d50969648476af832 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 26 May 2025 14:25:57 +0200 Subject: [PATCH 31/51] Auto-style on java_ilp.md --- documentation/clients/java_ilp.md | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index 09ca12f1..fb005d17 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -39,21 +39,22 @@ The client provides the following benefits: :::info -This page focuses on our high-performance ingestion client, which is optimized for **writing** data to QuestDB. -For retrieving data, we recommend using a [PostgreSQL-compatible Java library](/docs/pgwire/java/) or our +This page focuses on our high-performance ingestion client, which is optimized +for **writing** data to QuestDB. For retrieving data, we recommend using a +[PostgreSQL-compatible Java library](/docs/pgwire/java/) or our [HTTP query endpoint](/docs/reference/sql/overview/#rest-http-api). ::: - ## Compatible JDKs -The client relies on some JDK internal libraries, which certain specialised JDK offerings may not support. +The client relies on some JDK internal libraries, which certain specialised JDK +offerings may not support. Here is a list of known incompatible JDKs: - Azul Zing 17 - - A fix is in progress. You can use Azul Zulu 17 in the meantime. + - A fix is in progress. You can use Azul Zulu 17 in the meantime. ## Quick start @@ -68,7 +69,7 @@ Add a QuestDB as a dependency in your project's build configuration file. renderText={(release) => ( {` - org.questdb + org.questdb questdb ${release.name} `} @@ -133,24 +134,30 @@ There are three ways to create a client instance: string. See [Configuration options](#configuration-options) for all available options. It allows sharing the same configuration across clients in different languages. + ```java try (Sender sender = Sender.fromConfig("http::addr=localhost:9000;auto_flush_rows=5000;retry_timeout=10000;")) { // ... } ``` + 2. **From an environment variable.** The `QDB_CLIENT_CONF` environment variable is used to set the configuration string. Moving configuration parameters to an environment variable allows you to avoid hard-coding sensitive information such as tokens and password in your code. + ```bash export QDB_CLIENT_CONF="http::addr=localhost:9000;auto_flush_rows=5000;retry_timeout=10000;" ``` + ```java try (Sender sender = Sender.fromEnv()) { // ... } ``` + 3. **Using the Java builder API.** This provides type-safe configuration. + ```java try (Sender sender = Sender.builder(Sender.Transport.HTTP) .address("localhost:9000") @@ -176,7 +183,7 @@ There are three ways to create a client instance: - `timestampColumn(CharSequence, Instant)`, or `timestampColumn(CharSequence, long, ChronoUnit)` -5. Use `at(Instant)` or `at(long timestamp, ChronoUnit unit) ` or `atNow()` to +5. Use `at(Instant)` or `at(long timestamp, ChronoUnit unit)` or `atNow()` to set a designated timestamp. 6. Optionally: You can use `flush()` to send locally buffered data into a server. @@ -300,6 +307,7 @@ There are two ways to assign a designated timestamp to a row: 2. Server-assigned timestamp: The server automatically assigns a timestamp to the row based on the server's wall-clock time. Example: + ```java sender.table("trades") .symbol("symbol", "ETH-USD") From 97a3fa4ea20868542c051dd3b3f2397a359455ea Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 26 May 2025 14:43:20 +0200 Subject: [PATCH 32/51] Document array ingestion for Java ILP client --- documentation/clients/java_ilp.md | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index fb005d17..27c92725 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -95,7 +95,7 @@ wall-clock time. The configuration for the client is specified using a configuration string. This string follows the format: -``` +```text ::=;=;...; ``` @@ -180,6 +180,7 @@ There are three ways to create a client instance: - `longColumn(CharSequence, long)` - `doubleColumn(CharSequence, double)` - `boolColumn(CharSequence, boolean)` + - `arrayColumn()` -- several variants, see below - `timestampColumn(CharSequence, Instant)`, or `timestampColumn(CharSequence, long, ChronoUnit)` @@ -190,6 +191,31 @@ There are three ways to create a client instance: 7. Go to the step no. 2 to start a new row. 8. Use `close()` to dispose the Sender after you no longer need it. +## Ingesting arrays + +To ingest a 1D or 2D array, simply construct a Java array of the appropriate +type (`double[]`, `double[][]`) and supply it to the `arrayColumn()` method. In +order to avoid GC overheads, you are highly encouraged create the array instance +once, and then populate it with the data of each row. + +For arrays of higher dimensionality, use the `DoubleArray` class. Here's a basic +example for a 3D array: + +```java +try (Sender sender = Sender.fromConfig("http::addr=localhost:9000;"); + DoubleArray ary = new DoubleArray(3, 3, 3); +) { + for (int i = 0; i < ROW_COUNT; i++) { + for (int value = 0; value < 3 * 3 * 3; value++) { + ary.append(value); + } + sender.table("tango") + .doubleArray("array", ary) + .at(getTimestamp(), ChronoUnit.MICROS); + } +} +``` + ## Flushing Client accumulates data into an internal buffer. Flushing the buffer sends the From 5b6bcdc31442db39435e397e670122f61e1b43ef Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 27 May 2025 11:36:00 +0200 Subject: [PATCH 33/51] Touch up explanation of "transpose" --- documentation/concept/array.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index e727bf36..1d768be0 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -71,11 +71,11 @@ the array values: `DOUBLE[2, 6]`. All elements are still available, but using just 2 coordinates. -4. _Transpose_: reverse the strides, changing the meaning of each coordinate. - Example: transposing our array changes the strides from `(6, 2, 1)` to - `(1, 2, 6)`. What we used to access with the 3rd coordinate, we now access - with the 1st coordinate. On a 2D array, this would have the effect of - swapping rows and columns (transposing a matrix). +4. _Transpose_: reverse the shape and the strides, changing the meaning of each + coordinate. Example: transposing our array changes the strides from + `(6, 2, 1)` to `(1, 2, 6)`. What we used to access with the 3rd coordinate, + we now access with the 1st coordinate. On a 2D array, this would have the + effect of swapping rows and columns (transposing a matrix). ## Importance of the "vanilla" array shape From 0f0bd91c749a61894f6a60fc1dd83321d245d4d8 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 27 May 2025 11:36:57 +0200 Subject: [PATCH 34/51] Touch up performance explanation --- documentation/concept/array.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index 1d768be0..464c7a72 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -91,7 +91,7 @@ calculate the position of each element. So, while performing a shape transformation is cheap on its own, whole-array operations on transformed arrays, such as equality checks, adding/multiplying -two arrays, etc., are expected to be slower than on vanilla arrays. +two arrays, etc., are expected to be faster on vanilla arrays. QuestDB always stores arrays in vanilla form. When you transform an array's shape and then store the resulting array to the database, it will be stored in From f6f8cdcaed3a9084c3ac9e3f2a23e638ba1f5755 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 27 May 2025 11:40:46 +0200 Subject: [PATCH 35/51] Rephrase explanation of storing the array --- documentation/concept/array.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index 464c7a72..fad8bb61 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -93,9 +93,9 @@ So, while performing a shape transformation is cheap on its own, whole-array operations on transformed arrays, such as equality checks, adding/multiplying two arrays, etc., are expected to be faster on vanilla arrays. -QuestDB always stores arrays in vanilla form. When you transform an array's -shape and then store the resulting array to the database, it will be stored in -vanilla form. +QuestDB always stores arrays in vanilla form. If you transform an array's shape +and then store it to the database, QuestDB will physically rearrange the +elements, and store the new array in vanilla shape. ## The ARRAY literal From 454b3a441b3b708aa2cc5f852ba821e52d3ba776 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 27 May 2025 12:39:20 +0200 Subject: [PATCH 36/51] Move sample array to higher level --- documentation/concept/array.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/documentation/concept/array.md b/documentation/concept/array.md index fad8bb61..0bbc094d 100644 --- a/documentation/concept/array.md +++ b/documentation/concept/array.md @@ -138,14 +138,6 @@ Each `dimN-selector` can be one of two forms: - single integer - range in the form `low:high` -### Single-integer array selector - -Using single integers you select individual array elements. An element of a 2D -array is a 1D sub-array, and an element of a 1D array is an individual scalar -value, like a `DOUBLE`. If you use a coordinate larger than the array's given -dimension length, the result will be `NULL` for scalars, and an empty array for -sub-arrays. - All the following examples use the 3D array named `arr`, of type `DOUBLE[3][3][3]`: @@ -157,6 +149,14 @@ CREATE TABLE tango AS (SELECT ARRAY[ ] arr from long_sequence(1)); ``` +### Single-integer array selector + +Using single integers you select individual array elements. An element of a 2D +array is a 1D sub-array, and an element of a 1D array is an individual scalar +value, like a `DOUBLE`. If you use a coordinate larger than the array's given +dimension length, the result will be `NULL` for scalars, and an empty array for +sub-arrays. + #### Example: select a number from the array ```questdb-sql From df4a9a4c05e4b6b69867b408bafa8aaaa51019a7 Mon Sep 17 00:00:00 2001 From: victor Date: Wed, 28 May 2025 18:11:10 +0800 Subject: [PATCH 37/51] document `ndarray` and `protocol_version` in java and rust client. --- documentation/clients/ingest-rust.md | 92 +++++++++++++++++++++++++++ documentation/clients/java_ilp.md | 22 +++++++ documentation/configuration-string.md | 16 +++++ 3 files changed, 130 insertions(+) diff --git a/documentation/clients/ingest-rust.md b/documentation/clients/ingest-rust.md index 84143d06..7b4712c5 100644 --- a/documentation/clients/ingest-rust.md +++ b/documentation/clients/ingest-rust.md @@ -150,6 +150,75 @@ fn main() -> Result<()> { Using the current timestamp hinder the ability to deduplicate rows which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). +## Ingesting arrays + +The `Sender::column_arr` interface supports efficient ingestion of N-dimensionals array data with various array types: + +- Built-in Rust Arrays (up to 3-dimensionals) +- Vectors and Slices (up to 3-dimensionals) +- [Ndarray](https://docs.rs/ndarray) Structures + +1. Recording 1D Array (Built-in) +```rust +use questdb::{Result, ingress::{Buffer, SenderBuilder}}; +fn main() -> Result<()> { + let mut sender = SenderBuilder::new("tcp::addr=localhost:9000")?.build()?; + let mut buffer = sender.new_buffer(); + + buffer + .table("x")? + .symbol("device_id", "sensor")? + .column_arr("measurements", [1.0f64, 2.0, 3.0, 4.0])?; + buffer.send()?; + Ok(()) +} +``` + +2. Recording 2D Vector +```rust +use questdb::{Result, ingress::{Buffer, SenderBuilder}}; + +fn main() -> Result<()> { + let mut sender = SenderBuilder::new("tcp::addr=localhost:9000")?.build()?; + let mut buffer = sender.new_buffer(); + + // 2D vector + let matrix_data = vec![ + vec![1.1, 2.2, 3.3], + vec![4.4, 5.5, 6.6] + ]; + + buffer + .table("matrix_data")? + .column_arr("values", &matrix_data)?; + + buffer.send()?; + Ok(()) +} +``` + +3. Recording a [Ndarray](https://docs.rs/ndarray): +```rust +use questdb::{Result, ingress::{Buffer, SenderBuilder}}; +use ndarray::arr3; + +fn main() -> Result<()> { + let mut sender = SenderBuilder::new("tcp::addr=localhost:9000")?.build()?; + let mut buffer = sender.new_buffer(); + let tensor = arr3(&[ + [[1.0], [2.0]], + [[3.0], [4.0]], + [[5.0], [6.0]] + ]); + buffer + .table("x")? + .column_arr("tensor_data", &tensor.view())?; // Efficient view ingestion + buffer.send()?; + Ok(()) +} + +``` + ## Configuration options The easiest way to configure the line sender is the configuration string. The @@ -224,6 +293,28 @@ You can inspect the sender's error state by calling `sender.must_close()`. For more details about the HTTP and TCP transports, please refer to the [ILP overview](/docs/reference/api/ilp/overview#transport-selection). +## Protocol Version +To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's parameter `protocol_version`. + +For HTTP implementations: +- Protocol version auto-negotiation occurs during handshake +- No manual configuration required in most scenarios +- Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements + +For TCP connections: +- Lacks automatic protocol detection capability +- Defaults to text-based format (protocol_version=1) +- Mandatory configuration: + Set `protocol_version=2` when: + a) Connecting to servers built after `8.4.0` + b) Requiring array data writes + +Here is a configuration string with `protocol_version=2` for `TCP`: + +``` +tcp::addr=localhost:9000;protocol_version=2; +``` + ## Crate features The QuestDB client crate supports some optional features, mostly related to @@ -245,6 +336,7 @@ These features are opt-in: certificates store. - `insecure-skip-verify`: Allows skipping server certificate validation in TLS (this compromises security). +- `ndarray`: Enables ingestion of arrays through the `column_arr()` interface using [ndarray](https://docs.rs/ndarray) crate. ## Next steps diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index 27c92725..d8d043d0 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -354,6 +354,28 @@ rows with older timestamps are ingested before rows with newer timestamps. ::: +## Protocol Version +To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's parameter `protocol_version`. + +For HTTP implementations: +- Protocol version auto-negotiation occurs during handshake +- No manual configuration required in most scenarios +- Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements + +For TCP connections: +- Lacks automatic protocol detection capability +- Defaults to text-based format (protocol_version=1) +- Mandatory configuration: + Set `protocol_version=2` when: + a) Connecting to servers built after `8.4.0` + b) Requiring array-type data writes + +Here is a configuration string with `protocol_version=2` for `TCP`: + +``` +tcp::addr=localhost:9000;protocol_version=2; +``` + ## Configuration options Client can be configured either by using a configuration string as shown in the diff --git a/documentation/configuration-string.md b/documentation/configuration-string.md index 6df6820e..737efa3f 100644 --- a/documentation/configuration-string.md +++ b/documentation/configuration-string.md @@ -67,6 +67,22 @@ controls the auto-flushing behavior of the TCP transport. `request_timeout`. This is useful for large requests. You can set this value to `0` to disable this logic. +### Protocol Version +`protocol_version` : Specifies the version of Ingestion Line Protocol to use. +Valid options are: + +- `1` - Text-based format compatible with InfluxDB line protocol. +- `2` - Binary format with array and f64 support. +- `auto` (default) - Automatic version selection based on connection type. + +Behavior details: + +| Value | Behavior | +| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `1` | - Plain text serialization
- Compatible with InfluxDB servers
No array type support | +| `2` | - Binary encoding for f64
- Full support for array | +| `auto` | - **HTTP/HTTPS**: Auto-detects server capability during handshake (supports version negotiation)
- **TCP/TCPS**: Defaults to version 1 for compatibility | + ### TLS encryption To enable TLS, select the `https` or `tcps` protocol. From 4a02f244d03c8aa302f9ff20b0b9d756eb0081ae Mon Sep 17 00:00:00 2001 From: victor Date: Wed, 28 May 2025 21:53:17 +0800 Subject: [PATCH 38/51] add c/c++/rust array example. --- documentation/clients/ingest-c-and-cpp.md | 172 ++++++++++++++++++++++ documentation/clients/ingest-python.md | 37 +++++ documentation/clients/ingest-rust.md | 6 +- documentation/clients/java_ilp.md | 6 +- 4 files changed, 215 insertions(+), 6 deletions(-) diff --git a/documentation/clients/ingest-c-and-cpp.md b/documentation/clients/ingest-c-and-cpp.md index 0a08eed5..8b47b321 100644 --- a/documentation/clients/ingest-c-and-cpp.md +++ b/documentation/clients/ingest-c-and-cpp.md @@ -188,6 +188,62 @@ using the original event timestamps when ingesting data into QuestDB. Using the current timestamp will hinder the ability to deduplicate rows which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). +### Array Insertion + +Currently, the C++ interface supports `std::array` for data, while the C interface offers a lower-level and more flexible option: + +```cpp +#include +#include +#include + +using namespace std::literals::string_view_literals; +using namespace questdb::ingress::literals; + +int main() +{ + try + { + auto sender = questdb::ingress::line_sender::from_conf( + "tcp::addr=127.0.0.1:9000;protocol_version=2;"); + const auto table_name = "cpp_market_orders_byte_strides"_tn; + const auto symbol_col = "symbol"_cn; + const auto book_col = "order_book"_cn; + size_t rank = 3; + std::vector shape{2, 3, 2}; + std::vector strides{48, 16, 8}; + std::array arr_data = { + 48123.5, + 2.4, + 48124.0, + 1.8, + 48124.5, + 0.9, + 48122.5, + 3.1, + 48122.0, + 2.7, + 48121.5, + 4.3}; + + questdb::ingress::line_sender_buffer buffer = sender.new_buffer(); + buffer.table(table_name) + .symbol(symbol_col, "BTC-USD"_utf8) + .column(book_col, 3, shape, strides, arr_data) + .at(questdb::ingress::timestamp_nanos::now()); + sender.flush(buffer); + return true; + } + catch (const questdb::ingress::line_sender_error& err) + { + std::cerr << "[ERROR] " << err.what() << std::endl; + return false; + } +} +``` + +If your strides match the element size, call `column(book_col, 3, shape, strides, arr_data)` (note the false generic parameter). + ## C :::note @@ -424,6 +480,99 @@ original event timestamps when ingesting data into QuestDB. Using the current timestamp hinder the ability to deduplicate rows which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). +### Array Insertion + +```c +int main() +{ + line_sender_error* err = NULL; + line_sender* sender = NULL; + line_sender_buffer* buffer = NULL; + char* conf_str = concat("tcp::addr=", host, ":", port, ";protocol_version=2;"); + if (!conf_str) + { + fprintf(stderr, "Could not concatenate configuration string.\n"); + return false; + } + + line_sender_utf8 conf_str_utf8 = {0, NULL}; + if (!line_sender_utf8_init( + &conf_str_utf8, strlen(conf_str), conf_str, &err)) + goto on_error; + + sender = line_sender_from_conf(conf_str_utf8, &err); + if (!sender) + goto on_error; + + free(conf_str); + conf_str = NULL; + + buffer = line_sender_buffer_new_for_sender(sender); + line_sender_buffer_reserve(buffer, 64 * 1024); + + line_sender_table_name table_name = QDB_TABLE_NAME_LITERAL("market_orders_byte_strides"); + line_sender_column_name symbol_col = QDB_COLUMN_NAME_LITERAL("symbol"); + line_sender_column_name book_col = QDB_COLUMN_NAME_LITERAL("order_book"); + + if (!line_sender_buffer_table(buffer, table_name, &err)) + goto on_error; + + line_sender_utf8 symbol_val = QDB_UTF8_LITERAL("BTC-USD"); + if (!line_sender_buffer_symbol(buffer, symbol_col, symbol_val, &err)) + goto on_error; + + size_t array_rank = 3; + uintptr_t array_shape[] = {2, 3, 2}; + intptr_t array_strides[] = {48, 16, 8}; + + double array_data[] = { + 48123.5, + 2.4, + 48124.0, + 1.8, + 48124.5, + 0.9, + 48122.5, + 3.1, + 48122.0, + 2.7, + 48121.5, + 4.3}; + + if (!line_sender_buffer_column_f64_arr_byte_strides( + buffer, + book_col, + array_rank, + array_shape, + array_strides, + (const uint8_t*)array_data, + sizeof(array_data), + &err)) + goto on_error; + + if (!line_sender_buffer_at_nanos(buffer, line_sender_now_nanos(), &err)) + goto on_error; + + if (!line_sender_flush(sender, buffer, &err)) + goto on_error; + + line_sender_close(sender); + return true; + +on_error:; + size_t err_len = 0; + const char* err_msg = line_sender_error_msg(err, &err_len); + fprintf(stderr, "Error: %.*s\n", (int)err_len, err_msg); + free(conf_str); + line_sender_error_free(err); + line_sender_buffer_free(buffer); + line_sender_close(sender); + return false; +} +``` + +If your strides match the element size, call `line_sender_buffer_column_f64_arr_elem_strides`. + ## Other Considerations for both C and C++ ### Configuration options @@ -473,6 +622,29 @@ demonstrated on the examples in this document. This call will return false if the flush wouldn't be data-transactional. +### Protocol version + +To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through configuration `protocol_version`. + +For HTTP protocol: +- Protocol version auto-negotiation occurs during handshake +- No manual configuration required in most scenarios +- Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements + +For TCP protocol: +- Lacks automatic protocol detection capability +- Defaults to text-based format (protocol_version=1) +- Mandatory configuration: + Set `protocol_version=2` when: + a) Connecting to servers built after `8.4.0` + b) Requiring array data writes + +Here is a configuration string with `protocol_version=2` for `TCP`: + +``` +tcp::addr=localhost:9000;protocol_version=2; +``` + ## Next Steps Please refer to the [ILP overview](/docs/reference/api/ilp/overview) for details diff --git a/documentation/clients/ingest-python.md b/documentation/clients/ingest-python.md index 242630d1..e5081777 100644 --- a/documentation/clients/ingest-python.md +++ b/documentation/clients/ingest-python.md @@ -233,6 +233,43 @@ with Sender.from_conf(conf) as sender: sender.dataframe(df, table_name='trades', at='timestamp') ``` +## Insert numpy.ndarray + +- Direct Array Insertion: + +```python +from questdb.ingress import Sender, TimestampNanos +import numpy as np + +arr1 = np.array([1.2345678901234567, 2.3456789012345678], dtype=np.float64) +arr2 = np.arange(6, dtype=np.float64).reshape(2, 3) +arr3 = base[:, ::2] + +conf = f'http::addr=localhost:9000;' +with Sender.from_conf(conf) as sender: + sender.row( + 'tango', + columns={'arr1': arr1, 'arr2': arr2, 'arr3': arr3}, + at=TimestampNanos.now()) + sender.flush() +``` + +- DataFrame Insertion + +```python +import pandas as pd +from questdb.ingress import Sender +import numpy as np + +df = pd.DataFrame({ + 'array': [np.array([1.0], np.float64), np.array([2.0], np.float64)] + 'timestamp': pd.to_datetime(['2022-03-08T18:03:57.609765Z', '2022-03-08T18:03:57.710419Z'])}) + +conf = f'http::addr=localhost:9000;' +with Sender.from_conf(conf) as sender: + sender.dataframe(df, table_name='tango', at='timestamp') +``` + ## Configuration options The minimal configuration string needs to have the protocol, host, and port, as diff --git a/documentation/clients/ingest-rust.md b/documentation/clients/ingest-rust.md index 7b4712c5..a9d244d6 100644 --- a/documentation/clients/ingest-rust.md +++ b/documentation/clients/ingest-rust.md @@ -294,14 +294,14 @@ For more details about the HTTP and TCP transports, please refer to the [ILP overview](/docs/reference/api/ilp/overview#transport-selection). ## Protocol Version -To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's parameter `protocol_version`. +To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's configuration `protocol_version`. -For HTTP implementations: +For HTTP protocol: - Protocol version auto-negotiation occurs during handshake - No manual configuration required in most scenarios - Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements -For TCP connections: +For TCP protocol: - Lacks automatic protocol detection capability - Defaults to text-based format (protocol_version=1) - Mandatory configuration: diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index d8d043d0..d60bbd27 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -355,14 +355,14 @@ rows with older timestamps are ingested before rows with newer timestamps. ::: ## Protocol Version -To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's parameter `protocol_version`. +To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's configuration `protocol_version`. -For HTTP implementations: +For HTTP protocol: - Protocol version auto-negotiation occurs during handshake - No manual configuration required in most scenarios - Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements -For TCP connections: +For TCP protocol: - Lacks automatic protocol detection capability - Defaults to text-based format (protocol_version=1) - Mandatory configuration: From f3d04026efd3bfea66ff4159282913341296d27c Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 28 May 2025 18:33:21 +0200 Subject: [PATCH 39/51] Improve ILP config page --- documentation/configuration-string.md | 98 ++++++++++++--------------- 1 file changed, 44 insertions(+), 54 deletions(-) diff --git a/documentation/configuration-string.md b/documentation/configuration-string.md index 737efa3f..07e584b0 100644 --- a/documentation/configuration-string.md +++ b/documentation/configuration-string.md @@ -5,83 +5,73 @@ description: Demonstrates available options, caveats, and more. --- -The QuestDB clients leverage a configuration string to pass common values. +You configure a QuestDB ingestion client with a configuration string. The syntax +is the same in all clients, and there are a number of common options. There are +also language-specific settings. -The presiding method will vary from client-to-client, but the string composition -is consistent. +This document provides a general overview and documents the common options. -Naturally, languages will each have their own approach, and these will be -covered in the clients' documentation. +## Configuration string breakdown -This document provides a general overview. +These are the common configuration options. -## Configuration string breakdown +### Protocol Version + +`protocol_version` — sets the line protocol version + +Valid options are: -When using the configuration string, the following options are available: +| Value | Behavior | +| ------ | ------------------------------------------------------------------------------------------------------------------- | +| `1` | - plain-text serialization
- compatible with InfluxDB servers
- no array type support | +| `2` | - binary encoding for f64
- full support for array | +| `auto` | - **HTTP/HTTPS**: negotiates the best version with the server
- **TCP/TCPS**: no negotiation, uses version 1 | ### HTTP transport authentication -- `username` : Username for HTTP basic authentication. -- `password` : Password for HTTP basic authentication. -- `token` : Bearer token for HTTP authentication. +- `username` — username for HTTP basic authentication +- `password` — password for HTTP basic authentication +- `token` — bearer token for HTTP authentication ### TCP transport authentication -- `username`: Username for TCP authentication. -- `token`: Token for TCP authentication. +- `username` — username for TCP authentication +- `token` — token for TCP authentication ### Auto-flushing -- `auto_flush` : Global switch for the auto-flushing behavior. Options are `on` - or `off`. Defaults to `on`. -- `auto_flush_rows` : The number of rows that will trigger a flush. This option - is supported for HTTP transport only. Defaults to 75,000. -- `auto_flush_interval` : The time in milliseconds that will trigger a flush. - Defaults to 1000. This option is support for HTTP transport only. +- `auto_flush` — global switch for the auto-flushing behavior. Options are `on` + or `off`. Defaults to `on` +- `auto_flush_rows` — number of rows that will trigger a flush. This option is + supported for HTTP transport only. Defaults to 75,000 +- `auto_flush_interval` — time in milliseconds that will trigger a flush. + Defaults to 1000. Used only for HTTP transport -The TCP transport for a client automatically flushes when its buffer is full. -The TCP transport utilizes a fixed-size buffer, and its maximum size is the same -as the initial size. Thus, the option `init_buf_size` (see below) effectively -controls the auto-flushing behavior of the TCP transport. +When using the TCP transport, the client automatically flushes when its buffer +is full. It uses a fixed-size buffer, whose size you can set with +`init_buf_size` (see below). ### Buffer -- `init_buf_size` : The initial size of the buffer in bytes. Default: 65536 - (64KiB) -- `max_buf_size` : The maximum size of the buffer in bytes. Default: 104857600 - (100MiB) This option is support for HTTP transport only. TCP transport uses a - fixed-size buffer and its maximum size is the same as the initial size. +- `init_buf_size` — initial size of the buffer in bytes. Default: 65536 + (64KiB). Also sets the fixed buffer size for TCP transport +- `max_buf_size` — maximum size of the buffer in bytes. Default: 104857600 + (100MiB). Used only for HTTP transport ### HTTP Transport -- `retry_timeout` : The time in milliseconds to continue retrying after a failed +- `retry_timeout` — time in milliseconds to continue retrying after a failed HTTP request. The interval between retries is an exponential backoff starting at 10ms and doubling after each failed attempt up to a maximum of 1 second. Default: 10000 (10 seconds) -- `request_timeout` : The time in milliseconds to wait for a response from the +- `request_timeout` — time in milliseconds to wait for a response from the server. This is in addition to the calculation derived from the `request_min_throughput` parameter. Default: 10000 (10 seconds) -- `request_min_throughput` : Minimum expected throughput in bytes per second for +- `request_min_throughput` — minimum expected throughput in bytes per second for HTTP requests. If the throughput is lower than this value, the connection will time out. This is used to calculate an additional timeout on top of `request_timeout`. This is useful for large requests. You can set this value - to `0` to disable this logic. - -### Protocol Version -`protocol_version` : Specifies the version of Ingestion Line Protocol to use. -Valid options are: - -- `1` - Text-based format compatible with InfluxDB line protocol. -- `2` - Binary format with array and f64 support. -- `auto` (default) - Automatic version selection based on connection type. - -Behavior details: - -| Value | Behavior | -| ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `1` | - Plain text serialization
- Compatible with InfluxDB servers
No array type support | -| `2` | - Binary encoding for f64
- Full support for array | -| `auto` | - **HTTP/HTTPS**: Auto-detects server capability during handshake (supports version negotiation)
- **TCP/TCPS**: Defaults to version 1 for compatibility | + to `0` to disable this logic ### TLS encryption @@ -89,14 +79,14 @@ To enable TLS, select the `https` or `tcps` protocol. The following options are available: -- `tls_roots` : Path to a Java keystore file containing trusted root - certificates. Defaults to the system default trust store. -- `tls_roots_password` : Password for the keystore file. It's always required - when `tls_roots` is set. -- `tls_verify` : Whether to verify the server's certificate. This should only be +- `tls_roots` — path to a Java keystore file containing trusted root + certificates. Defaults to the system default trust store +- `tls_roots_password` — password for the keystore file. It's always required + when `tls_roots` is set +- `tls_verify` — whether to verify the server's certificate. This should only be used for testing as a last resort and never used in production as it makes the connection vulnerable to man-in-the-middle attacks. Options are `on` or - `unsafe_off`. Defaults to `on`. + `unsafe_off`. Defaults to `on` ## Other considerations From c2e6b7932ebda213a9fad9e82837decf63009668 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 28 May 2025 18:33:27 +0200 Subject: [PATCH 40/51] Improve Rust ILP page --- documentation/clients/ingest-rust.md | 91 +++++++++++++++------------- 1 file changed, 48 insertions(+), 43 deletions(-) diff --git a/documentation/clients/ingest-rust.md b/documentation/clients/ingest-rust.md index a9d244d6..d494c7cf 100644 --- a/documentation/clients/ingest-rust.md +++ b/documentation/clients/ingest-rust.md @@ -10,33 +10,31 @@ import { ILPClientsTable } from "@theme/ILPClientsTable" QuestDB offers a Rust client designed for high-performance data ingestion. These are some of the highlights: -- **Creates tables automatically**: no need to define your schema up-front. +- **Creates tables automatically**: no need to define your schema up-front - **Concurrent schema changes**: seamlessly handle multiple data streams that modify the table schema on the fly -- **Optimized batching**: buffer the data and send many rows in one go. +- **Optimized batching**: buffer the data and send many rows in one go - **Health checks and feedback**: built-in health monitoring ensures the health - of your system. + of your system :::info -This page focuses on our high-performance ingestion client, which is optimized for **writing** data to QuestDB. -For retrieving data, we recommend using a [PostgreSQL-compatible Rust library](/docs/pgwire/rust/) or our +This page focuses on our high-performance ingestion client, which is optimized +for **writing** data to QuestDB. For retrieving data, we recommend using a +[PostgreSQL-compatible Rust library](/docs/pgwire/rust/) or our [HTTP query endpoint](/docs/reference/sql/overview/#rest-http-api). ::: - -## Requirements - -- Requires Rust 1.40 or later. -- Assumes your QuestDB server is already running. If you don't have a QuestDB - server yet, refer to [the general quick start](/docs/quick-start/). +If you don't have a QuestDB server yet, follow the +[Quick Start](/docs/quick-start/) section to set it up. ## Add the client crate to your project -Add the QuestDB client to your project using the command line: +QuestDB clients requires Rust 1.40 or later. Add its crate to your project using +the command line: ```bash cargo add questdb-rs @@ -44,8 +42,7 @@ cargo add questdb-rs ## Authentication -This is how you'd set up the client to authenticate using the HTTP Basic -authentication: +This is how you authenticate using the HTTP Basic authentication: ```rust let mut sender = Sender::from_conf( @@ -66,9 +63,8 @@ Then you use it like this: let mut sender = Sender::from_env()?; ``` -When using QuestDB Enterprise, authentication can also be done via REST token. -Please check the [RBAC docs](/docs/operations/rbac/#authentication) for more -info. +When using QuestDB Enterprise, you can authenticate via a REST token. Please +check the [RBAC docs](/docs/operations/rbac/#authentication) for more info. ## Basic insert @@ -147,18 +143,25 @@ fn main() -> Result<()> { } ``` -Using the current timestamp hinder the ability to deduplicate rows which is +:::warning + +Avoid using `at_now()` instead of `at(some_timestamp)` because this removes the +ability to deduplicate rows, which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). +::: + ## Ingesting arrays -The `Sender::column_arr` interface supports efficient ingestion of N-dimensionals array data with various array types: +The `Sender::column_arr` method supports efficient ingestion of N-dimensional +arrays using several convenient types: + +- native Rust arrays and slices (up to 3-dimensional) +- native Rust vectors (up to 3-dimensional) +- arrays from the [ndarray](https://docs.rs/ndarray) crate -- Built-in Rust Arrays (up to 3-dimensionals) -- Vectors and Slices (up to 3-dimensionals) -- [Ndarray](https://docs.rs/ndarray) Structures +### 1. Record a 1D Array (Built-in) -1. Recording 1D Array (Built-in) ```rust use questdb::{Result, ingress::{Buffer, SenderBuilder}}; fn main() -> Result<()> { @@ -174,14 +177,15 @@ fn main() -> Result<()> { } ``` -2. Recording 2D Vector +### 2. Record a 2D Vector + ```rust use questdb::{Result, ingress::{Buffer, SenderBuilder}}; fn main() -> Result<()> { let mut sender = SenderBuilder::new("tcp::addr=localhost:9000")?.build()?; let mut buffer = sender.new_buffer(); - + // 2D vector let matrix_data = vec![ vec![1.1, 2.2, 3.3], @@ -197,7 +201,8 @@ fn main() -> Result<()> { } ``` -3. Recording a [Ndarray](https://docs.rs/ndarray): +### 3. Record an array from [ndarray](https://docs.rs/ndarray) + ```rust use questdb::{Result, ingress::{Buffer, SenderBuilder}}; use ndarray::arr3; @@ -206,8 +211,8 @@ fn main() -> Result<()> { let mut sender = SenderBuilder::new("tcp::addr=localhost:9000")?.build()?; let mut buffer = sender.new_buffer(); let tensor = arr3(&[ - [[1.0], [2.0]], - [[3.0], [4.0]], + [[1.0], [2.0]], + [[3.0], [4.0]], [[5.0], [6.0]] ]); buffer @@ -294,24 +299,23 @@ For more details about the HTTP and TCP transports, please refer to the [ILP overview](/docs/reference/api/ilp/overview#transport-selection). ## Protocol Version -To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's configuration `protocol_version`. -For HTTP protocol: -- Protocol version auto-negotiation occurs during handshake -- No manual configuration required in most scenarios -- Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements +To enhance data ingestion performance, QuestDB introduced an upgrade to the +text-based InfluxDB Line Protocol which encodes arrays and f64 values in binary +form. Arrays are supported only in this upgraded protocol version. -For TCP protocol: -- Lacks automatic protocol detection capability -- Defaults to text-based format (protocol_version=1) -- Mandatory configuration: - Set `protocol_version=2` when: - a) Connecting to servers built after `8.4.0` - b) Requiring array data writes +You can select the protocol version with the `protocol_version` setting in the +configuration string. -Here is a configuration string with `protocol_version=2` for `TCP`: +HTTP transport automatically negotiates the protocol version by default. In order +to avoid the slight latency cost at connection time, you can explicitly configure +the protocol version by setting `protocol_version=2|1;`. -``` +TCP transport does not negotiate the protocol version and uses version 1 by +default. You must explicitly set `protocol_version=2;` in order to ingest +arrays, as in this example: + +```text tcp::addr=localhost:9000;protocol_version=2; ``` @@ -336,7 +340,8 @@ These features are opt-in: certificates store. - `insecure-skip-verify`: Allows skipping server certificate validation in TLS (this compromises security). -- `ndarray`: Enables ingestion of arrays through the `column_arr()` interface using [ndarray](https://docs.rs/ndarray) crate. +- `ndarray`: Enables ingestion of arrays from the + [ndarray](https://docs.rs/ndarray) crate. ## Next steps From 7a4659fcd1d1cbcb555bb860f29418dabfc10c14 Mon Sep 17 00:00:00 2001 From: victor Date: Thu, 29 May 2025 10:39:49 +0800 Subject: [PATCH 41/51] fix c/java protocol_version part. --- documentation/clients/ingest-c-and-cpp.md | 28 +++++++++++------------ documentation/clients/java_ilp.md | 27 +++++++++++----------- 2 files changed, 26 insertions(+), 29 deletions(-) diff --git a/documentation/clients/ingest-c-and-cpp.md b/documentation/clients/ingest-c-and-cpp.md index 8b47b321..0f64616b 100644 --- a/documentation/clients/ingest-c-and-cpp.md +++ b/documentation/clients/ingest-c-and-cpp.md @@ -622,26 +622,24 @@ demonstrated on the examples in this document. This call will return false if the flush wouldn't be data-transactional. -### Protocol version +### Protocol Version -To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through configuration `protocol_version`. +To enhance data ingestion performance, QuestDB introduced an upgrade to the +text-based InfluxDB Line Protocol which encodes arrays and f64 values in binary +form. Arrays are supported only in this upgraded protocol version. -For HTTP protocol: -- Protocol version auto-negotiation occurs during handshake -- No manual configuration required in most scenarios -- Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements +You can select the protocol version with the `protocol_version` setting in the +configuration string. -For TCP protocol: -- Lacks automatic protocol detection capability -- Defaults to text-based format (protocol_version=1) -- Mandatory configuration: - Set `protocol_version=2` when: - a) Connecting to servers built after `8.4.0` - b) Requiring array data writes +HTTP transport automatically negotiates the protocol version by default. In order +to avoid the slight latency cost at connection time, you can explicitly configure +the protocol version by setting `protocol_version=2|1;`. -Here is a configuration string with `protocol_version=2` for `TCP`: +TCP transport does not negotiate the protocol version and uses version 1 by +default. You must explicitly set `protocol_version=2;` in order to ingest +arrays, as in this example: -``` +```text tcp::addr=localhost:9000;protocol_version=2; ``` diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index d60bbd27..64b471b3 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -355,24 +355,23 @@ rows with older timestamps are ingested before rows with newer timestamps. ::: ## Protocol Version -To enhance data ingestion performance, the client-server communication protocol is being upgraded from text-based to binary encoding. The transition can be managed through the sender's configuration `protocol_version`. -For HTTP protocol: -- Protocol version auto-negotiation occurs during handshake -- No manual configuration required in most scenarios -- Advanced use case: Set `protocol_version=2|1` to bypass initial protocol discovery for ultra-low latency requirements +To enhance data ingestion performance, QuestDB introduced an upgrade to the +text-based InfluxDB Line Protocol which encodes arrays and f64 values in binary +form. Arrays are supported only in this upgraded protocol version. -For TCP protocol: -- Lacks automatic protocol detection capability -- Defaults to text-based format (protocol_version=1) -- Mandatory configuration: - Set `protocol_version=2` when: - a) Connecting to servers built after `8.4.0` - b) Requiring array-type data writes +You can select the protocol version with the `protocol_version` setting in the +configuration string. -Here is a configuration string with `protocol_version=2` for `TCP`: +HTTP transport automatically negotiates the protocol version by default. In order +to avoid the slight latency cost at connection time, you can explicitly configure +the protocol version by setting `protocol_version=2|1;`. -``` +TCP transport does not negotiate the protocol version and uses version 1 by +default. You must explicitly set `protocol_version=2;` in order to ingest +arrays, as in this example: + +```text tcp::addr=localhost:9000;protocol_version=2; ``` From 0a9f407143f24114bfb30b79de1af51aa609c831 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 2 Jun 2025 11:23:56 +0200 Subject: [PATCH 42/51] Improve Rust ILP page --- documentation/clients/ingest-rust.md | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/documentation/clients/ingest-rust.md b/documentation/clients/ingest-rust.md index d494c7cf..8315b679 100644 --- a/documentation/clients/ingest-rust.md +++ b/documentation/clients/ingest-rust.md @@ -40,7 +40,7 @@ the command line: cargo add questdb-rs ``` -## Authentication +## Authenticate This is how you authenticate using the HTTP Basic authentication: @@ -66,9 +66,9 @@ let mut sender = Sender::from_env()?; When using QuestDB Enterprise, you can authenticate via a REST token. Please check the [RBAC docs](/docs/operations/rbac/#authentication) for more info. -## Basic insert +## Insert data -Basic insertion (no-auth): +This snippet connects to QuestDB and inserts one row of data: ```rust use questdb::{ @@ -145,13 +145,13 @@ fn main() -> Result<()> { :::warning -Avoid using `at_now()` instead of `at(some_timestamp)` because this removes the -ability to deduplicate rows, which is +Avoid using `at_now()` instead of `at(some_timestamp)`. This removes the ability +to deduplicate rows, which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). ::: -## Ingesting arrays +## Ingest arrays The `Sender::column_arr` method supports efficient ingestion of N-dimensional arrays using several convenient types: @@ -160,7 +160,17 @@ arrays using several convenient types: - native Rust vectors (up to 3-dimensional) - arrays from the [ndarray](https://docs.rs/ndarray) crate -### 1. Record a 1D Array (Built-in) +:::note + +You must use protocol version 2 to ingest arrays. HTTP transport will +automatically enable it as long as you're connecting to an up-to-date QuestDB +server (version 8.4.0 or later), but with TCP you must explicitly specify it in +the configuration string: `protocol_version=2;` See [below](#protocol-version) +for more details on protocol versions. + +::: + +### 1. Ingest a 1D Rust array ```rust use questdb::{Result, ingress::{Buffer, SenderBuilder}}; @@ -177,7 +187,7 @@ fn main() -> Result<()> { } ``` -### 2. Record a 2D Vector +### 2. Ingest a 2D Rust vector ```rust use questdb::{Result, ingress::{Buffer, SenderBuilder}}; @@ -201,7 +211,7 @@ fn main() -> Result<()> { } ``` -### 3. Record an array from [ndarray](https://docs.rs/ndarray) +### 3. Ingest an array from [ndarray](https://docs.rs/ndarray) ```rust use questdb::{Result, ingress::{Buffer, SenderBuilder}}; From 6841ff070d309886de59e5149af350d8ed54708e Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 2 Jun 2025 17:27:05 +0200 Subject: [PATCH 43/51] Improve Java ILP client page --- documentation/clients/java_ilp.md | 165 ++++++++++++++++-------------- 1 file changed, 87 insertions(+), 78 deletions(-) diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index 64b471b3..a7e19096 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -58,7 +58,7 @@ Here is a list of known incompatible JDKs: ## Quick start -Add a QuestDB as a dependency in your project's build configuration file. +Add QuestDB as a dependency in your project's build configuration file. )} /> -The code below creates an instance of a client configured to use HTTP transport -to connect to a QuestDB server running on localhost on port 9000. It then sends -two rows, each containing one symbol and two floating-point values. The client +The code below creates a client instance configured to use HTTP transport to +connect to a QuestDB server running on localhost, port 9000. It then sends two +rows, each containing one symbol and two floating-point values. The client requests the server to assign a timestamp to each row based on the server's wall-clock time. -The configuration for the client is specified using a configuration string. This -string follows the format: +Configure the client using a configuration string. It follows this general +format: ```text ::=;=;...; ``` -The valid transport protocols are: +[Transport protocol](/docs/reference/api/ilp/overview/#transport-selection) +can be one of these: -- `http`: ILP/HTTP -- `https`: ILP/HTTP with TLS encryption -- `tcp`: ILP/TCP -- `tcps`: ILP/TCP with TLS encryption +- `http` — ILP/HTTP +- `https` — ILP/HTTP with TLS encryption +- `tcp` — ILP/TCP +- `tcps` — ILP/TCP with TLS encryption -A [transport protocol](/docs/reference/api/ilp/overview/#transport-selection) -and the key `addr=host:port` are required. The key `addr` defines the hostname -and port of the QuestDB server. If the port is not specified, it defaults to -9000 for HTTP(s) transports and 9009 for TCP(s) transports. For a complete list -of options, refer to the [Configuration Options](#configuration-options) +The key `addr` sets the hostname and port of the QuestDB server. Port defaults +to 9000 for HTTP(s) transports and 9009 for TCP(s) transports. + +The minimum configuration includes the transport and the address. For a complete +list of options, refer to the [Configuration Options](#configuration-options) section. -## Example with TLS and Authentication enabled +## Authenticate and encrypt -This sample configures a client to use HTTP transport with TLS enabled for a +This sample configures the client to use HTTP transport with TLS enabled for a connection to a QuestDB server. It also instructs the client to authenticate using HTTP Basic Authentication. -When using QuestDB Enterprise, authentication can also be done via REST token. -Please check the [RBAC docs](/docs/operations/rbac/#authentication) for more -info. +When using QuestDB Enterprise, you can authenticate using a REST bearer token as +well. Please check the [RBAC docs](/docs/operations/rbac/#authentication) for +more info. -## Client instantiation +## Ways to create the client There are three ways to create a client instance: @@ -191,12 +192,12 @@ There are three ways to create a client instance: 7. Go to the step no. 2 to start a new row. 8. Use `close()` to dispose the Sender after you no longer need it. -## Ingesting arrays +## Ingest arrays To ingest a 1D or 2D array, simply construct a Java array of the appropriate type (`double[]`, `double[][]`) and supply it to the `arrayColumn()` method. In -order to avoid GC overheads, you are highly encouraged create the array instance -once, and then populate it with the data of each row. +order to avoid GC overheads, create the array instance once, and then populate +it with the data of each row. For arrays of higher dimensionality, use the `DoubleArray` class. Here's a basic example for a 3D array: @@ -216,19 +217,23 @@ try (Sender sender = Sender.fromConfig("http::addr=localhost:9000;"); } ``` -## Flushing +The `ary.append(value)` method allows you to populate the array in the row-major +order, without having to compute every coordinate individually. You can also use +`ary.set(value, coords...)` to set a value at specific coordinates. -Client accumulates data into an internal buffer. Flushing the buffer sends the -data to the server over the network and clears the buffer. +## Flush the buffer -Flushing can be done explicitly or automatically. +The client accumulates the data into an internal buffer and doesn't immediately +send it to the server. It can flush the buffer to the server either +automatically or on explicit request. -### Explicit flushing +### Flush explicitly -An explicit flush can be done by calling the `flush()` method. +You can configure the client to not use automatic flushing, and issue explicit +flush requests by calling `sender.flush()`: ```java - try (Sender sender = Sender.fromConfig("http::addr=localhost:9000;")) { + try (Sender sender = Sender.fromConfig("http::addr=localhost:9000;auto_flush=off")) { sender.table("trades") .symbol("symbol", "ETH-USD") .symbol("side", "sell") @@ -245,64 +250,67 @@ An explicit flush can be done by calling the `flush()` method. } ``` -### Automatic flushing +:::note -To avoid accumulating very large buffers, the client will - by default - flush -the buffer automatically. +Calling `sender.flush()` will flush the buffer even with auto-flushing enabled, +but this isn't a typical way to use the client. -HTTP auto-flushing is triggered when appending a row to the internal buffer and -the buffer either: +::: -- Reaches 75,000 rows -- Hasn't been flushed for 1 second. +### Flush automatically -Both parameters control batching and can be customized. Larger batches can -improve throughput, but can increase lag between data ingestion and visibility -in a target table. Smaller batches can reduce this lag, but can also reduce -throughput. +By default, the client automatically flushes the buffer according to a simple +policy. With HTTP, it will automatically flush at the time you append a new +row, if either of these has become true: -A configuration string example that auto-flushes every 10 rows or every 10 -seconds, whichever comes first: +- reached 75,000 rows +- hasn't been flushed for 1 second -`http::addr=localhost:9000;auto_flush_rows=10;auto_flush_interval=10000;` +Both parameters can be customized in order to achieve a good tradeoff between +throughput (large batches) and latency (small batches). -An example with auto-flushing disabled: +This configuration string will cause the client to auto-flush every 10 rows or +every 10 seconds, whichever comes first: -`http::addr=localhost:9000;auto_flush=off;` +`http::addr=localhost:9000;auto_flush_rows=10;auto_flush_interval=10000;` -TCP auto-flushing is triggered when appending a row to the internal sender -buffer and the buffer is full. +With TCP, the client flushes its internal buffer whenever it gets full. -Auto-flushing is also triggered when the client is being closed. Be aware that -retrying of failed requests is disabled when flushing on close. +The client will also flush automatically when it is being closed and there's +still some data in the buffer. However, **if the network operation fails at this +time, the client won't retry it.** Always explicitly flush the buffer before +closing the client. ## Error handling -The HTTP transport supports automatic retries for failed requests deemed -recoverable. Recoverable errors include network errors, some server errors, and -timeouts, while non-recoverable errors encompass invalid data, authentication -errors, and other client-side errors. +HTTP automatically retries failed, recoverable requests: network errors, some +server errors, and timeouts. Non-recoverable errors include invalid data, +authentication errors, and other client-side errors. -Retrying is particularly beneficial during network issues or when the server is -temporarily unavailable. The retrying behavior can be configured through the +Retrying is especially useful during transient network issues or when the server +goes offline for a short period. Configure the retrying behavior through the `retry_timeout` configuration option or via the builder API with -`retryTimeoutMillis(long timeoutMillis)`. The client continues to retry -recoverable errors until they either succeed or the specified timeout is -reached. Upon reaching the timeout, the client ceases retry attempts and throws -`LineSenderException`. +`retryTimeoutMillis(long timeoutMillis)`. The client continues to retry after +recoverable errors until it either succeeds or the specified timeout expires. If +it hits the timeout without success, the client throws a `LineSenderException`. + +The client won't retry requests while it's being closed and attempting to flush +the data left over in the buffer. + + The TCP transport has no mechanism to notify the client it encountered an + error; instead it just disconnects. When the client detects this, it throws a + `LineSenderException` and becomes unusable. -When utilizing the HTTP transport, the client can be reused after receiving an -error. Conversely, a client using TCP transport should be discarded after an -error, necessitating the creation of a new client. +## Recover after a client-side error -Retrying is disabled for failed requests when executing a flush upon closure. +With HTTP transport, the client always prepares a full row in RAM before trying +to send it. It also remains usable after an exception has occurred. This allows +you to cancel sending a row, for example due to a validation error, and go on +with the next row. -The TCP transport lacks support for error propagation from the server. In such -cases, the server merely closes the connection upon encountering an error, which -manifests as a `LineSenderException` on the client side. Consequently, the -client receives no additional error information from the server. This limitation -significantly contributes to the preference for HTTP transport over TCP -transport. +With TCP transport, you don't have this option. If you get an exception, you +can't continue with the same client instance, and don't have insight into which +rows were accepted by the server. ## Designated timestamp considerations @@ -311,7 +319,7 @@ important when ingesting data into QuestDB. There are two ways to assign a designated timestamp to a row: -1. User-assigned timestamp: The client assigns a specific timestamp to the row. +1. User-assigned timestamp: the client assigns a specific timestamp to the row. ```java java.time.Instant timestamp = Instant.now(); // or any other timestamp @@ -331,8 +339,9 @@ There are two ways to assign a designated timestamp to a row: high-throughput scenarios where instantiating an `Instant` object for each row is not feasible due to performance considerations. -2. Server-assigned timestamp: The server automatically assigns a timestamp to - the row based on the server's wall-clock time. Example: +2. Server-assigned timestamp: the server automatically assigns a timestamp to + the row based on the server's wall-clock time at the time of ingesting the + row. Example: ```java sender.table("trades") @@ -344,13 +353,13 @@ There are two ways to assign a designated timestamp to a row: ``` We recommended to use User-assigned timestamps when ingesting data into QuestDB. -Using Server-assigned hinder the ability to deduplicate rows which is +Server-assigned timestamps prevent the ability to deduplicate rows, which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). :::note -QuestDB works best when rows are ingested in chronological order. This means -rows with older timestamps are ingested before rows with newer timestamps. +QuestDB works best when you send data in chronological order (sorted by +timestamp). ::: From 85c7a33f9ddf8aa7a9d0fb44943adb3827a0311d Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Mon, 2 Jun 2025 17:29:09 +0200 Subject: [PATCH 44/51] Fix typo in C client page --- documentation/clients/ingest-c-and-cpp.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/documentation/clients/ingest-c-and-cpp.md b/documentation/clients/ingest-c-and-cpp.md index 0f64616b..e8e07664 100644 --- a/documentation/clients/ingest-c-and-cpp.md +++ b/documentation/clients/ingest-c-and-cpp.md @@ -11,7 +11,7 @@ import { ILPClientsTable } from "@theme/ILPClientsTable" QuestDB supports the C & C++ programming languages, providing a high-performance ingestion client tailored for insert-only operations. This integration ensures peak efficiency in time series data ingestion and analysis, perfectly suited for -systems for systems which require top performance and minimal latency. +systems which require top performance and minimal latency. Key features of the QuestDB C & C++ client include: @@ -253,7 +253,7 @@ This sectioni s for the QuestDB C client. Skip to the bottom of this page for information relating to both the C and C++ clients. -::: +::: From cc7118bc68cda94afbe6f39a7bf85215381e0f4e Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 3 Jun 2025 14:20:15 +0200 Subject: [PATCH 45/51] Improve C/C++ ILP docs --- documentation/clients/ingest-c-and-cpp.md | 81 ++++++++++++++--------- 1 file changed, 49 insertions(+), 32 deletions(-) diff --git a/documentation/clients/ingest-c-and-cpp.md b/documentation/clients/ingest-c-and-cpp.md index e8e07664..ed0d53fd 100644 --- a/documentation/clients/ingest-c-and-cpp.md +++ b/documentation/clients/ingest-c-and-cpp.md @@ -105,7 +105,7 @@ int main() .symbol("side","sell") .column("price", 2615.54) .column("amount", 0.00044) - .at(questdb::ingress::timestamp_nanos::now()); + .at_now()); // To insert more records, call `buffer.table(..)...` again. @@ -120,10 +120,11 @@ These are the main steps it takes: - Populate a `Buffer` with one or more rows of data - Send the buffer using `sender.flush()`(`Sender::flush`) -In this case, the designated timestamp will be the one at execution time. +In this case, we call `at_now()`, letting the server assign the timestamp to the +row. -Let's see now an example with timestamps, custom timeout, basic auth, and error -control. +Let's see now an example with explicit timestamps, custom timeout, basic auth, +and error control. ```cpp #include @@ -183,14 +184,20 @@ int main() } ``` -As you can see, both events now are using the same timestamp. We recommended -using the original event timestamps when ingesting data into QuestDB. Using the -current timestamp will hinder the ability to deduplicate rows which is +Now, both events use the same timestamp. We recommend using the event's +original timestamp when ingesting data into QuestDB. Using ingestion-time +timestamps precludes the ability to deduplicate rows, which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). ### Array Insertion -Currently, the C++ interface supports `std::array` for data, while the C interface offers a lower-level and more flexible option: +The sender uses an `std::array` to insert an array of any dimensionality. It +contains the elements laid out flat in row-major order, while the separate +vectors `shape` and `strides` describe its higher-dimensional structure. Please +refer to the [Concepts section on n-dimensional arrays](/docs/concept/array), +where this is explained in more detail. + +In this example, we insert a 3D array of `double` values: ```cpp #include @@ -211,7 +218,7 @@ int main() const auto book_col = "order_book"_cn; size_t rank = 3; std::vector shape{2, 3, 2}; - std::vector strides{48, 16, 8}; + std::vector strides{6, 2, 1}; std::array arr_data = { 48123.5, 2.4, @@ -229,7 +236,7 @@ int main() questdb::ingress::line_sender_buffer buffer = sender.new_buffer(); buffer.table(table_name) .symbol(symbol_col, "BTC-USD"_utf8) - .column(book_col, 3, shape, strides, arr_data) + .column(book_col, 3, shape, strides, arr_data) .at(questdb::ingress::timestamp_nanos::now()); sender.flush(buffer); return true; @@ -242,7 +249,9 @@ int main() } ``` -If your strides match the element size, call `column(book_col, 3, shape, strides, arr_data)` (note the false generic parameter). +In the example, we provide the strides in terms of the number of elements. You +can also provide them in terms of bytes, by using `` for the template +argument, like this: `column(book_col, 3, shape, strides, arr_data)`. ## C @@ -337,7 +346,7 @@ int main() { if (!line_sender_buffer_symbol(buffer, QDB_COLUMN_NAME_LITERAL("side"), QDB_UTF8_LITERAL("sell"), &error)) goto error; if (!line_sender_buffer_column_f64(buffer, QDB_COLUMN_NAME_LITERAL("price"), 2615.54, &error)) goto error; if (!line_sender_buffer_column_f64(buffer, QDB_COLUMN_NAME_LITERAL("amount"), 0.00044, &error)) goto error; - if (!line_sender_buffer_at_nanos(buffer, line_sender_now_nanos(), &error)) goto error; + if (!line_sender_buffer_at_now(buffer, &error)) goto error; // Flush the buffer to QuestDB @@ -374,7 +383,8 @@ error: ``` -In this case, the designated timestamp will be the one at execution time. +In this case, we call `line_sender_buffer_at_now()`, letting the server assign +the timestamp to the row. Let's see now an example with timestamps, custom timeout, basic auth, error control, and transactional awareness. @@ -475,13 +485,22 @@ error: ``` -As you can see, both events use the same timestamp. We recommended using the -original event timestamps when ingesting data into QuestDB. Using the current -timestamp hinder the ability to deduplicate rows which is +Now, both events use the same timestamp. We recommend using the event's +original timestamp when ingesting data into QuestDB. Using ingestion-time +timestamps precludes the ability to deduplicate rows, which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). ### Array Insertion +The sender uses a plain 1-dimensional C array to insert an array of any +dimensionality. It contains the elements laid out flat in row-major order, while +the separate arrays `shape` and `strides` describe its higher-dimensional +structure. Please refer to the +[Concepts section on n-dimensional arrays](/docs/concept/array), where this is +explained in more detail. + +In this example, we insert a 3D array of `double` values: + ```c int main() { @@ -523,7 +542,7 @@ int main() size_t array_rank = 3; uintptr_t array_shape[] = {2, 3, 2}; - intptr_t array_strides[] = {48, 16, 8}; + intptr_t array_strides[] = {6, 2, 1}; double array_data[] = { 48123.5, @@ -539,7 +558,7 @@ int main() 48121.5, 4.3}; - if (!line_sender_buffer_column_f64_arr_byte_strides( + if (!line_sender_buffer_column_f64_arr_elem_strides( buffer, book_col, array_rank, @@ -571,7 +590,8 @@ on_error:; } ``` -If your strides match the element size, call `line_sender_buffer_column_f64_arr_elem_strides`. +If you want to provide strides in terms of bytes, call +`line_sender_buffer_column_f64_arr_byte_strides` instead. ## Other Considerations for both C and C++ @@ -599,34 +619,31 @@ won't get access to the data in the buffer until you explicitly call `sender.flush` or `line_sender_flush`. This may lead to a pitfall where you drop a buffer that still has some data in it, resulting in permanent data loss. -Unlike other official QuestDB clients, the Rust client does not supports -auto-flushing via configuration. - A common technique is to flush periodically on a timer and/or once the buffer exceeds a certain size. You can check the buffer's size by calling -`buffer.size()` or `line_sender_buffer_size(..)`. +`buffer.size()` or `line_sender_buffer_size(...)`. The default `flush()` method clears the buffer after sending its data. If you want to preserve its contents (for example, to send the same data to multiple -QuestDB instances), call `sender.flush_and_keep(&mut buffer)` instead. +QuestDB instances), call `sender.flush_and_keep(&buffer)` or +`line_sender_flush_and_keep(...)` instead. ### Transactional flush -As described in the +As described in [ILP overview](/docs/reference/api/ilp/overview#http-transaction-semantics), the HTTP transport has some support for transactions. To ensure in advance that a flush will not affect more than one table, call -`buffer.transactional()` or `line_sender_buffer_transactional(buffer)` as we -demonstrated on the examples in this document. - -This call will return false if the flush wouldn't be data-transactional. +`buffer.transactional()` or `line_sender_buffer_transactional(buffer)`, as shown +in the examples above. This call will return false if the flush wouldn't be +data-transactional. ### Protocol Version To enhance data ingestion performance, QuestDB introduced an upgrade to the -text-based InfluxDB Line Protocol which encodes arrays and f64 values in binary -form. Arrays are supported only in this upgraded protocol version. +text-based InfluxDB Line Protocol which encodes arrays and `double` values in +binary form. Arrays are supported only in this upgraded protocol version. You can select the protocol version with the `protocol_version` setting in the configuration string. @@ -649,7 +666,7 @@ Please refer to the [ILP overview](/docs/reference/api/ilp/overview) for details about transactions, error control, delivery guarantees, health check, or table and column auto-creation. -With data flowing into QuestDB, now it's time to for analysis. +With data flowing into QuestDB, now it's time for analysis. To learn _The Way_ of QuestDB SQL, see the [Query & SQL Overview](/docs/reference/sql/overview/). From 1195476e01b280937578cb4ec059ae91bbac4ace Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 3 Jun 2025 14:20:25 +0200 Subject: [PATCH 46/51] Improve Rust ILP docs --- documentation/clients/ingest-rust.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/documentation/clients/ingest-rust.md b/documentation/clients/ingest-rust.md index 8315b679..4681f30c 100644 --- a/documentation/clients/ingest-rust.md +++ b/documentation/clients/ingest-rust.md @@ -257,9 +257,6 @@ won't get access to the data in the buffer until you explicitly call `sender.flush(&mut buffer)` or a variant. This may lead to a pitfall where you drop a buffer that still has some data in it, resulting in permanent data loss. -Unlike other official QuestDB clients, the rust client does not supports -auto-flushing via configuration. - A common technique is to flush periodically on a timer and/or once the buffer exceeds a certain size. You can check the buffer's size by calling `buffer.len()`. @@ -270,7 +267,7 @@ QuestDB instances), call `sender.flush_and_keep(&mut buffer)` instead. ## Transactional flush -As described at the +As described in [ILP overview](/docs/reference/api/ilp/overview#http-transaction-semantics), the HTTP transport has some support for transactions. @@ -311,8 +308,8 @@ For more details about the HTTP and TCP transports, please refer to the ## Protocol Version To enhance data ingestion performance, QuestDB introduced an upgrade to the -text-based InfluxDB Line Protocol which encodes arrays and f64 values in binary -form. Arrays are supported only in this upgraded protocol version. +text-based InfluxDB Line Protocol which encodes arrays and `f64` values in +binary form. Arrays are supported only in this upgraded protocol version. You can select the protocol version with the `protocol_version` setting in the configuration string. @@ -362,7 +359,7 @@ and column auto-creation. Explore the full capabilities of the Rust client via the [Crate API page](https://docs.rs/questdb-rs/latest/questdb/). -With data flowing into QuestDB, now it's time to for analysis. +With data flowing into QuestDB, now it's time for analysis. To learn _The Way_ of QuestDB SQL, see the [Query & SQL Overview](/docs/reference/sql/overview/). From d937811493d35c3797e91a5417d71f708fb18d27 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 3 Jun 2025 14:39:28 +0200 Subject: [PATCH 47/51] Use strides in bytes as primary example --- documentation/clients/ingest-c-and-cpp.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/documentation/clients/ingest-c-and-cpp.md b/documentation/clients/ingest-c-and-cpp.md index ed0d53fd..11da7051 100644 --- a/documentation/clients/ingest-c-and-cpp.md +++ b/documentation/clients/ingest-c-and-cpp.md @@ -218,7 +218,7 @@ int main() const auto book_col = "order_book"_cn; size_t rank = 3; std::vector shape{2, 3, 2}; - std::vector strides{6, 2, 1}; + std::vector strides{48, 16, 8}; std::array arr_data = { 48123.5, 2.4, @@ -236,7 +236,7 @@ int main() questdb::ingress::line_sender_buffer buffer = sender.new_buffer(); buffer.table(table_name) .symbol(symbol_col, "BTC-USD"_utf8) - .column(book_col, 3, shape, strides, arr_data) + .column(book_col, 3, shape, strides, arr_data) .at(questdb::ingress::timestamp_nanos::now()); sender.flush(buffer); return true; @@ -249,9 +249,9 @@ int main() } ``` -In the example, we provide the strides in terms of the number of elements. You -can also provide them in terms of bytes, by using `` for the template -argument, like this: `column(book_col, 3, shape, strides, arr_data)`. +In the example, we provide the strides in terms of bytes. You can also provide +them in terms of elements, by using `` for the template argument, like +this: `column(book_col, 3, shape, strides, arr_data)`. ## C @@ -542,7 +542,7 @@ int main() size_t array_rank = 3; uintptr_t array_shape[] = {2, 3, 2}; - intptr_t array_strides[] = {6, 2, 1}; + intptr_t array_strides[] = {48, 16, 8}; double array_data[] = { 48123.5, @@ -558,7 +558,7 @@ int main() 48121.5, 4.3}; - if (!line_sender_buffer_column_f64_arr_elem_strides( + if (!line_sender_buffer_column_f64_arr_byte_strides( buffer, book_col, array_rank, @@ -590,8 +590,8 @@ on_error:; } ``` -If you want to provide strides in terms of bytes, call -`line_sender_buffer_column_f64_arr_byte_strides` instead. +If you want to provide strides in terms of elements instead of bytes, call +`line_sender_buffer_column_f64_arr_elem_strides` instead. ## Other Considerations for both C and C++ From 47c7cd9c867429355b97179bb5daec5af400c0bd Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Tue, 3 Jun 2025 14:41:50 +0200 Subject: [PATCH 48/51] Small touchup --- documentation/clients/ingest-c-and-cpp.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/documentation/clients/ingest-c-and-cpp.md b/documentation/clients/ingest-c-and-cpp.md index 11da7051..14d7aa30 100644 --- a/documentation/clients/ingest-c-and-cpp.md +++ b/documentation/clients/ingest-c-and-cpp.md @@ -249,9 +249,9 @@ int main() } ``` -In the example, we provide the strides in terms of bytes. You can also provide -them in terms of elements, by using `` for the template argument, like -this: `column(book_col, 3, shape, strides, arr_data)`. +Here we provided the strides in terms of bytes. You can also provide them in +terms of elements, by using `` for the template argument, like this: +`column(book_col, 3, shape, strides, arr_data)`. ## C From cf294df2abd42934ec516b4e66ec30a91c2f7a28 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 4 Jun 2025 10:30:18 +0200 Subject: [PATCH 49/51] Touch up C/C++ docs --- documentation/clients/ingest-c-and-cpp.md | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/documentation/clients/ingest-c-and-cpp.md b/documentation/clients/ingest-c-and-cpp.md index 14d7aa30..bb157fae 100644 --- a/documentation/clients/ingest-c-and-cpp.md +++ b/documentation/clients/ingest-c-and-cpp.md @@ -65,7 +65,6 @@ Here is an example of how to configure and use the client for data ingestion: auto sender = questdb::ingress::line_sender::from_conf( "http::addr=localhost:9000;"); - ``` You can also pass the connection configuration via the `QDB_CLIENT_CONF` @@ -250,14 +249,14 @@ int main() ``` Here we provided the strides in terms of bytes. You can also provide them in -terms of elements, by using `` for the template argument, like this: -`column(book_col, 3, shape, strides, arr_data)`. +terms of whole elements, by using `` for the template argument, like +this: `column(book_col, 3, shape, strides, arr_data)`. ## C :::note -This sectioni s for the QuestDB C client. +This section is for the QuestDB C client. Skip to the bottom of this page for information relating to both the C and C++ clients. @@ -304,7 +303,6 @@ Then you use it like this: #include ... line_sender *sender = line_sender_from_env(&error); - ``` ### Basic data insertion @@ -380,7 +378,6 @@ error: return 1; } } - ``` In this case, we call `line_sender_buffer_at_now()`, letting the server assign @@ -482,7 +479,6 @@ error: return 1; } } - ``` Now, both events use the same timestamp. We recommend using the event's @@ -590,8 +586,8 @@ on_error:; } ``` -If you want to provide strides in terms of elements instead of bytes, call -`line_sender_buffer_column_f64_arr_elem_strides` instead. +You can also specify strides in terms of whole elements instead of bytes, by +calling `line_sender_buffer_column_f64_arr_elem_strides` instead. ## Other Considerations for both C and C++ From 1a455d8562daa4d806c0ad5130a7c5b0ce11af41 Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 4 Jun 2025 11:11:46 +0200 Subject: [PATCH 50/51] Improve the .Net page --- documentation/clients/ingest-dotnet.md | 194 +++++++++++++------------ 1 file changed, 98 insertions(+), 96 deletions(-) diff --git a/documentation/clients/ingest-dotnet.md b/documentation/clients/ingest-dotnet.md index 287078a9..68429920 100644 --- a/documentation/clients/ingest-dotnet.md +++ b/documentation/clients/ingest-dotnet.md @@ -29,13 +29,13 @@ perform basic insert operations. :::info -This page focuses on our high-performance ingestion client, which is optimized for **writing** data to QuestDB. -For retrieving data, we recommend using a [PostgreSQL-compatible .NET library](/docs/pgwire/c-sharp/) or our +This page focuses on our high-performance ingestion client, which is optimized +for **writing** data to QuestDB. For retrieving data, we recommend using a +[PostgreSQL-compatible .NET library](/docs/pgwire/c-sharp/) or our [HTTP query endpoint](/docs/reference/sql/overview/#rest-http-api). ::: - ## Requirements - .NET 6.0 or higher is required. @@ -59,7 +59,7 @@ dotnet add package net-questdb-client `Sender` is single-threaded, and uses a single connection to the database. If you want to send in parallel, you can use multiple senders and standard async -tasking. +tasks. ::: @@ -127,8 +127,8 @@ await sender.Table("trades") await sender.SendAsync(); ``` -In this case, the designated timestamp will be the one at execution time. Let's -see now an example with timestamps, custom auto-flushing, basic auth, and error +In this case, we asked the server to assign the timestamp to each row. Let's see +now an example with timestamps, custom auto-flushing, basic auth, and error reporting. ```csharp @@ -140,7 +140,9 @@ class Program { static async Task Main(string[] args) { - using var sender = Sender.New("http::addr=localhost:9000;username=admin;password=quest;auto_flush_rows=100;auto_flush_interval=1000;"); + using var sender = Sender.New( + "http::addr=localhost:9000;username=admin;password=quest;auto_flush_rows=100;auto_flush_interval=1000;" + ); var now = DateTime.UtcNow; try @@ -171,73 +173,72 @@ class Program } ``` -As you can see, both events use the same timestamp. We recommended using the -original event timestamps when ingesting data into QuestDB. Using the current -timestamp hinder the ability to deduplicate rows which is +Now, both events use the same timestamp. We recommend using the event's +original timestamp when ingesting data into QuestDB. Using ingestion-time +timestamps precludes the ability to deduplicate rows, which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). -## Configuration +## Ways to create the client -Construct new Senders via the `Sender` factory. +There are three ways to create a client instance: -It is mandatory to provide the `addr` config, as this defines the transport -protocol and the server location. +1. **From a configuration string.** This is the most common way to create a + client instance. It describes the entire client configuration in a single + string. See [Configuration options](#configuration-options) for all available + options. It allows sharing the same configuration across clients in different + languages. -By default, the HTTP protocol uses `9000`, the same as the other HTTP endpoints. -Optionally, TCP uses `9009`. + ```csharp + using var sender = Sender.New("http::addr=localhost:9000;"); + ``` -### With a configuration string +2. **From an environment variable.** The `QDB_CLIENT_CONF` environment variable + is used to set the configuration string. Moving configuration parameters to + an environment variable allows you to avoid hard-coding sensitive information + such as tokens and password in your code. -It is recommended, where possible, to initialise the sender using a -[configuration string](/docs/configuration-string/). + If you want to initialise some properties programmatically after the initial + config string, you can use `Configure` and `Build`. -Configuration strings provide a convenient shorthand for defining client -properties, and are validated during construction of the `Sender`. + ```bash + export QDB_CLIENT_CONF="http::addr=localhost:9000;auto_flush_rows=5000;retry_timeout=10000;" + ``` -```csharp -using var sender = Sender.New("http::addr=localhost:9000;"); -``` + ```csharp + (Sender.Configure("http::addr=localhost:9000;") with { auto_flush = AutoFlushType.off }).Build() + ``` -If you want to initialise some properties programmatically after the initial -config string, you can use `Configure` and `Build`. +3. **From SenderOptions.** -```csharp -(Sender.Configure("http::addr=localhost:9000;") with { auto_flush = AutoFlushType.off }).Build() -``` - -### From options - -The sender API also supports construction from `SenderOptions`. - -```csharp -await using var sender = Sender.New(new SenderOptions()); -``` + ```csharp + await using var sender = Sender.New(new SenderOptions()); + ``` -You might use this when binding options from configuration: + This way you can bind options from configuration: -```json -{ - "QuestDB": { - "addr": "localhost:9000", - "tls_verify": "unsafe_off;" - } -} -``` + ```json + { + "QuestDB": { + "addr": "localhost:9000", + "tls_verify": "unsafe_off;" + } + } + ``` -```csharp -var options = new ConfigurationBuilder() - .AddJsonFile("config.json") - .Build() - .GetSection("QuestDB") - .Get(); -``` + ```csharp + var options = new ConfigurationBuilder() + .AddJsonFile("config.json") + .Build() + .GetSection("QuestDB") + .Get(); + ``` ## Preparing Data -Senders use an internal buffer to convert input values into an ILP-compatible -UTF-8 byte-string. +The Sender uses an internal buffer to convert input values into an +ILP-compatible UTF-8 byte-string. -This buffer can be controlled using the `init_buf_size` and `max_buf_size` +You can control buffer sizing with the `init_buf_size` and `max_buf_size` parameters. Here is how to build a buffer of rows ready to be sent to QuestDB. @@ -245,7 +246,7 @@ Here is how to build a buffer of rows ready to be sent to QuestDB. :::warning The senders are **not** thread safe, since they manage an internal buffer. If -you wish to send data in parallel, you should construct multiple senders and use +you wish to send data in parallel, construct multiple senders and use non-blocking I/O to submit to QuestDB. ::: @@ -281,25 +282,25 @@ The table name must always be called before other builder functions. ### Add symbols A [symbol](/docs/concept/symbol/) is a dictionary-encoded string, used to -efficiently store commonly repeated data. This is frequently used for -identifiers, and symbol columns can have -[secondary indexes](/docs/concept/indexes/) defined upon them. +efficiently store commonly repeated data. We recommend using this type for +identifiers, because you can create a +[secondary index](/docs/concept/indexes/) for a symbol column. -Symbols can be added using calls to `Symbol`, which expects a symbol column -name, and string value. +Add symbols by calling `Symbol()`, which expects a symbol column +name, and a string value. ```csharp sender.Symbol("foo", "bah"); ``` -All symbol columns must be defined before any other column definition. +You must specify all symbol columns first, before any other columns. ### Add other columns -A number of data types can be submitted to QuestDB via ILP, including string / +There are several data types you can send to QuestDB via ILP, including string / long / double / DateTime / DateTimeOffset. -These can be written using the `Column` functions. +Provide these by calling `Column()`. ```csharp sender.Column("baz", 102); @@ -307,18 +308,20 @@ sender.Column("baz", 102); ### Finish the row -A row is completed by defining the designated timestamp value: +Completed a row by specifying the designated timestamp: ```csharp sender.At(DateTime.UtcNow); ``` -Generation of the timestamp can be offloaded to the server, using `AtNow`. +You can also let the server assign the timestamp, by calling `AtNow()` instead. :::caution -Using a server generated timestamp via AtNow/AtNowAsync is not compatible with -QuestDB's deduplication feature, and should be avoided where possible. +We recommend using the event's original timestamp when ingesting data into +QuestDB. Using ingestion-time timestamps precludes the ability to deduplicate +rows, which is +[important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). ::: @@ -329,16 +332,15 @@ database automatically, or manually. ### Auto-flushing -When the `At` functions are called, the auto-flushing parameters are checked to -see if it is appropriate to flush the buffer. If an auto-flush is triggered, -data will be sent to QuestDB. +When you call one of the `At` functions, the row is complete. The sender checks +the auto-flushing parameters to see if it should flush the buffer to the server. ```csharp sender.At(new DateTime(0,0,1)); ``` -To avoid blocking the calling thread, one can use the Async overloads of the -`At`. functions e.g `AtAsync`. +To avoid blocking the calling thread, use the Async overloads of the `At`, such +as `AtAsync`. ```csharp await sender.AtNowAsync(); @@ -352,54 +354,52 @@ using var sender = Sender.New("http::addr=localhost:9000;auto_flush=off;"); // o #### Flush by rows -Users can specify a threshold of rows to flush. This is effectively a submission -batch size by number of rows. +You can specify the number of rows that will trigger an auto-flush, creating a +batch insert operation of that size. ```csharp using var sender = Sender.New("http::addr=localhost:9000;auto_flush=on;auto_flush_rows=5000;"); ``` -By default, HTTP senders will send after `75,000` rows, and TCP after `600` +By default, the HTTP sender auto-flushes after 75,000 rows, and TCP after 600 rows. :::tip `auto_flush_rows` and `auto_flush_interval` are both enabled by default. If you -wish to only auto-flush based on one of these properties, you can disable the -other using `off` or `-1`. +wish to only auto-flush based on one of these properties, disable the other +using `off` or `-1`. ::: #### Flush by interval -Specify a time interval between batches. This is the elapsed time from the last -flush, and is checked when the `At` functions are called. +You can specify the time interval between auto-flushes. The sender checks it +every time you call an `At` function. ```csharp using var sender = Sender.New("http::addr=localhost:9000;auto_flush=on;auto_flush_interval=5000;"); ``` -By default, `auto_flush_interval` is set to `1000` ms. +By default, `auto_flush_interval` is 1000 ms. #### Flush by bytes -Specify a buffer length after which to flush, effectively a batch size in UTF-8 -bytes. This should be set according to `init_buf_size` $\lt$ `auto_flush_bytes` -$\leq$ `max_buf_size`. +As an additional option, disabled by default, you can specify the batch size in +terms of bytes instead of rows. You should ensure that `init_buf_size` +$\lt$ `auto_flush_bytes` $\leq$ `max_buf_size`. -This can be useful if a user has variety in their row sizes and wants to limit -the request sizes. +This can be useful if you have large variation in row sizes and want to limit +the request sizes. By default, this is disabled, but set to `100 KiB`. ```csharp using var sender = Sender.New("http::addr=localhost:9000;auto_flush=on;auto_flush_bytes=65536;"); ``` -By default, this is disabled, but set to `100 KiB`. - ### Explicit flushing -Manually flush the buffer using `Send`. and `SendAsync`. This will send any -outstanding data to the QuestDB server. +You can also manually flush the buffer at any time by calling `Send` or +`SendAsync`. This will send any outstanding data to the QuestDB server. ```csharp using var sender = Sender.New("http::addr=localhost:9000;auto_flush=off;"); @@ -409,8 +409,10 @@ await sender.SendAsync(); // send non-blocking sender.Send(); // send synchronously ``` -It is recommended to always end your submission code with a manual flush. This -will ensure that all data has been sent before disposing of the Sender. +You should always perform an explicit flush before closing the sender. The HTTP +sender normally retries the requests in case of errors, but won't do that while +auto-flushing before closing. Flushing explicitly ensures that the client +applies the same effort to send all the remaining data. ## Transactions @@ -478,7 +480,8 @@ Add data to a transaction in the usual way, but without calling `Table` between rows. ```csharp -sender.Symbol("bah", "baz").Column("num", 123).At(DateTime.UtcNow); // adds a symbol, integer column, and ends with current timestamp +// add a symbol, integer column, and end with current timestamp +sender.Symbol("bah", "baz").Column("num", 123).At(DateTime.UtcNow); ``` ### Closing a transaction @@ -498,7 +501,7 @@ server. sender.Rollback(); ``` -## Misc. +## Misc ### Cancelling rows @@ -581,7 +584,6 @@ using var sender = Sender.New( "tcps::addr=localhost:9009;tls_verify=unsafe_off;username=admin;token=NgdiOWDoQNUP18WOnb1xkkEG5TzPYMda5SiUOvT1K0U=;"); // See: /docs/reference/api/ilp/authenticate - ``` ## Next Steps From ba8eb4b1f62df4f2e84b55ad3043b7f30f94cb2a Mon Sep 17 00:00:00 2001 From: Marko Topolnik Date: Wed, 4 Jun 2025 11:11:56 +0200 Subject: [PATCH 51/51] Touch up the Java page --- documentation/clients/java_ilp.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/documentation/clients/java_ilp.md b/documentation/clients/java_ilp.md index a7e19096..8d2ab353 100644 --- a/documentation/clients/java_ilp.md +++ b/documentation/clients/java_ilp.md @@ -86,9 +86,9 @@ Add QuestDB as a dependency in your project's build configuration file. The code below creates a client instance configured to use HTTP transport to connect to a QuestDB server running on localhost, port 9000. It then sends two -rows, each containing one symbol and two floating-point values. The client -requests the server to assign a timestamp to each row based on the server's -wall-clock time. +rows, each containing one symbol and two floating-point values. The client asks +the server to assign a timestamp to each row based on the server's wall-clock +time. @@ -108,7 +108,7 @@ can be one of these: - `tcps` — ILP/TCP with TLS encryption The key `addr` sets the hostname and port of the QuestDB server. Port defaults -to 9000 for HTTP(s) transports and 9009 for TCP(s) transports. +to 9000 for HTTP(S) and 9009 for TCP(S). The minimum configuration includes the transport and the address. For a complete list of options, refer to the [Configuration Options](#configuration-options) @@ -352,8 +352,9 @@ There are two ways to assign a designated timestamp to a row: .atNow(); ``` -We recommended to use User-assigned timestamps when ingesting data into QuestDB. -Server-assigned timestamps prevent the ability to deduplicate rows, which is +We recommend using the event's original timestamp when ingesting data into +QuestDB. Using ingestion-time timestamps precludes the ability to deduplicate +rows, which is [important for exactly-once processing](/docs/reference/api/ilp/overview/#exactly-once-delivery-vs-at-least-once-delivery). :::note