diff --git a/docs/user/ppl/cmd/ad.md b/docs/user/ppl/cmd/ad.md index 6d18396506..4fb4e78bd3 100644 --- a/docs/user/ppl/cmd/ad.md +++ b/docs/user/ppl/cmd/ad.md @@ -1,34 +1,38 @@ -# ad (deprecated by ml command) +# ad (deprecated by ml command) -## Description -The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data, batch RCF for processing non-time-series data. -## Syntax +The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search results returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data, batch RCF for processing non-time-series data. -## Fixed In Time RCF For Time-series Data +## Syntax -ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] \ [date_format] [time_zone] [category_field] -* number_of_trees: optional. Number of trees in the forest. **Default:** 30. -* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8. -* sample_size: optional. The sample size used by stream samplers in this forest. **Default:** 256. -* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32. -* time_decay: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001. -* anomaly_rate: optional. The anomaly rate. **Default:** 0.005. -* time_field: mandatory. Specifies the time field for RCF to use as time-series data. -* date_format: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss". -* time_zone: optional. Used for setting time zone for time_field. **Default:** "UTC". -* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted. +The following sections describe the syntax for each RCF algorithm type. + +## Fixed in time RCF for time-series data + +`ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] [date_format] [time_zone] [category_field]` +* `number_of_trees`: optional. Number of trees in the forest. **Default:** 30. +* `shingle_size`: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8. +* `sample_size`: optional. The sample size used by stream samplers in this forest. **Default:** 256. +* `output_after`: optional. The number of points required by stream samplers before results are returned. **Default:** 32. +* `time_decay`: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001. +* `anomaly_rate`: optional. The anomaly rate. **Default:** 0.005. +* `time_field`: mandatory. Specifies the time field for RCF to use as time-series data. +* `date_format`: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss". +* `time_zone`: optional. Used for setting time zone for time_field. **Default:** "UTC". +* `category_field`: optional. Specifies the category field used to group inputs. Each category will be independently predicted. -## Batch RCF For Non-time-series Data -ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field] -* number_of_trees: optional. Number of trees in the forest. **Default:** 30. -* sample_size: optional. Number of random samples given to each tree from the training data set. **Default:** 256. -* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32. -* training_data_size: optional. **Default:** size of your training data set. -* anomaly_score_threshold: optional. The threshold of anomaly score. **Default:** 1.0. -* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted. +## Batch RCF for non-time-series data + +`ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]` +* `number_of_trees`: optional. Number of trees in the forest. **Default:** 30. +* `sample_size`: optional. Number of random samples given to each tree from the training dataset. **Default:** 256. +* `output_after`: optional. The number of points required by stream samplers before results are returned. **Default:** 32. +* `training_data_size`: optional. **Default:** size of your training dataset. +* `anomaly_score_threshold`: optional. The threshold of anomaly score. **Default:** 1.0. +* `category_field`: optional. Specifies the category field used to group inputs. Each category will be independently predicted. + ## Example 1: Detecting events in New York City from taxi ridership data with time-series data This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data. @@ -51,6 +55,7 @@ fetched rows / total rows = 1/1 +---------+---------------------+-------+---------------+ ``` + ## Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values. @@ -74,6 +79,7 @@ fetched rows / total rows = 2/2 +----------+---------+---------------------+-------+---------------+ ``` + ## Example 3: Detecting events in New York City from taxi ridership data with non-time-series data This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data. @@ -96,6 +102,7 @@ fetched rows / total rows = 1/1 +---------+-------+-----------+ ``` + ## Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values. @@ -119,6 +126,7 @@ fetched rows / total rows = 2/2 +----------+---------+-------+-----------+ ``` + ## Limitations The `ad` command can only work with `plugins.calcite.enabled=false`. \ No newline at end of file diff --git a/docs/user/ppl/cmd/addcoltotals.md b/docs/user/ppl/cmd/addcoltotals.md index bcc089859e..f807775eb5 100644 --- a/docs/user/ppl/cmd/addcoltotals.md +++ b/docs/user/ppl/cmd/addcoltotals.md @@ -1,11 +1,12 @@ -# AddColTotals +# addcoltotals -# Description -The `addcoltotals` command computes the sum of each column and add a summary event at the end to show the total of each column. This command works the same way `addtotals` command works with row=false and col=true option. This is useful for creating summary reports with subtotals or grand totals. The `addcoltotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if its specified in field-list or in the case of no field-list specified. +The `addcoltotals` command computes the sum of each column and adds a summary event at the end to show the total of each column. This command works the same way `addtotals` command works with row=false and col=true option. This is useful for creating summary reports with subtotals or grand totals. The `addcoltotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if its specified in field-list or in the case of no field-list specified. -# Syntax +## Syntax + +Use the following syntax: `addcoltotals [field-list] [label=] [labelfield=]` @@ -13,9 +14,9 @@ The `addcoltotals` command computes the sum of each column and add a summary eve - `labelfield=`: Optional. Field name to place the label. If it specifies a non-existing field, adds the field and shows label at the summary event row at this field. - `label=`: Optional. Custom text for the totals row labelfield\'s label. Default is \"Total\". -# Example 1: Basic Example +# Example 1: Basic example -The example shows placing the label in an existing field. +The following example PPL query shows how to use `addcoltotals` to place the label in an existing field. ```ppl source=accounts @@ -38,9 +39,9 @@ fetched rows / total rows = 4/4 +-----------+---------+ ``` -# Example 2: Adding column totals and adding a summary event with label specified. +# Example 2: Adding column totals and adding a summary event with label specified -The example shows adding totals after a stats command where final summary event label is \'Sum\' and row=true value was used by default when not specified. It also added new field specified by labelfield as it did not match existing field. +The following example PPL query shows how to use `addcoltotals` to add totals after a stats command where final summary event label is \'Sum\' and row=true value was used by default when not specified. It also added new field specified by labelfield as it did not match existing field. ```ppl source=accounts @@ -63,7 +64,7 @@ fetched rows / total rows = 3/3 # Example 3: With all options -The example shows using addcoltotals with all options set. +The following example PPL query shows how to use `addcoltotals` with all options set. ```ppl source=accounts diff --git a/docs/user/ppl/cmd/addtotals.md b/docs/user/ppl/cmd/addtotals.md index 745b1ae750..bcf0733843 100644 --- a/docs/user/ppl/cmd/addtotals.md +++ b/docs/user/ppl/cmd/addtotals.md @@ -1,12 +1,13 @@ -# AddTotals +# addtotals -## Description -The `addtotals` command computes the sum of numeric fields and appends a row with the totals to the result. The command can also add row totals and add a field to store row totals. This is useful for creating summary reports with subtotals or grand totals. The `addtotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if it\'s specified in field-list or in the case of no field-list specified. +The `addtotals` command computes the sum of numeric fields and appends a row with the totals to the result. The command can also add row totals and add a field to store row totals. This is useful for creating summary reports with subtotals or grand totals. The `addtotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if it's specified in field-list or in the case of no field-list specified. ## Syntax +Use the following syntax: + `addtotals [field-list] [label=] [labelfield=] [row=] [col=] [fieldname=]` - `field-list`: Optional. Comma-separated list of numeric fields to sum. If not specified, all numeric fields are summed. @@ -16,9 +17,9 @@ The `addtotals` command computes the sum of numeric fields and appends a row wit - `label=`: Optional. Custom text for the totals row labelfield\'s label. Default is \"Total\". This is applicable when col=true. This does not have any effect when labelfield and fieldname parameter both have same value. - `fieldname=`: Optional. Calculates total of each row and add a new field to store this total. This is applicable when row=true. -## Example 1: Basic Example +## Example 1: Basic example -The example shows placing the label in an existing field. +The following example PPL query shows how to use `addtotals` to place the label in an existing field. ```ppl source=accounts @@ -41,9 +42,9 @@ fetched rows / total rows = 4/4 +-----------+---------+-------+ ``` -## Example 2: Adding column totals and adding a summary event with label specified. +## Example 2: Adding column totals and adding a summary event with label specified -The example shows adding totals after a stats command where final summary event label is \'Sum\'. It also added new field specified by labelfield as it did not match existing field. +The following example PPL query shows how to use `addtotals` to add totals after a stats command where final summary event label is \'Sum\'. It also added new field specified by labelfield as it did not match existing field. ```ppl source=accounts @@ -66,7 +67,7 @@ fetched rows / total rows = 5/5 +----------------+-----------+---------+-----+-------+ ``` -if row=true in above example, there will be conflict between column added for column totals and column added for row totals being same field \'Total\', in that case the output will have final event row label null instead of \'Sum\' because the column is number type and it cannot output String in number type column. +if row=true in the preceding example, there will be conflict between column added for column totals and column added for row totals being same field \'Total\', in that case the output will have final event row label null instead of \'Sum\' because the column is number type and it cannot output String in number type column. ```ppl source=accounts @@ -91,7 +92,7 @@ fetched rows / total rows = 5/5 ## Example 3: With all options -The example shows using addtotals with all options set. +The following example PPL query shows how to use `addtotals` with all options set. ```ppl source=accounts diff --git a/docs/user/ppl/cmd/append.md b/docs/user/ppl/cmd/append.md index 6c765286c6..0a3a701738 100644 --- a/docs/user/ppl/cmd/append.md +++ b/docs/user/ppl/cmd/append.md @@ -1,21 +1,26 @@ -# append +# append -## Description -The `append` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (The main search). +The `append` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (the main search). + The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows. -## Syntax -append \ -* sub-search: mandatory. Executes PPL commands as a secondary search. +## Syntax + +Use the following syntax: + +`append ` +* `sub-search`: mandatory. Executes PPL commands as a secondary search. + ## Limitations * **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with `eval` or using `fields` to select non-conflicting columns). -## Example 1: Append rows from a count aggregation to existing search result -This example appends rows from "count by gender" to "sum by gender, state". +## Example 1: Append rows from a count aggregation to existing search results + +The following example appends rows from "count by gender" to "sum by gender, state". ```ppl source=accounts | stats sum(age) by gender, state | sort -`sum(age)` | head 5 | append [ source=accounts | stats count(age) by gender ] @@ -37,9 +42,10 @@ fetched rows / total rows = 6/6 +----------+--------+-------+------------+ ``` -## Example 2: Append rows with merged column names -This example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type. +## Example 2: Append rows with merged column names + +The following example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type. ```ppl source=accounts | stats sum(age) as sum by gender, state | sort -sum | head 5 | append [ source=accounts | stats sum(age) as sum by gender ] diff --git a/docs/user/ppl/cmd/appendcol.md b/docs/user/ppl/cmd/appendcol.md index fb879c1b6f..c736b3a242 100644 --- a/docs/user/ppl/cmd/appendcol.md +++ b/docs/user/ppl/cmd/appendcol.md @@ -1,15 +1,18 @@ -# appendcol +# appendcol -## Description -The `appendcol` command appends the result of a sub-search and attaches it alongside with the input search results (The main search). -## Syntax +The `appendcol` command appends the result of a sub-search and attaches it alongside the input search results (the main search). -appendcol [override=\] \ +## Syntax + +Use the following syntax: + +`appendcol [override=] ` * override=: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict. **Default:** false. -* sub-search: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input. +* `sub-search`: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input. -## Example 1: Append a count aggregation to existing search result + +## Example 1: Append a count aggregation to existing search results This example appends "count by gender" to "sum by gender, state". @@ -40,7 +43,8 @@ fetched rows / total rows = 10/10 +--------+-------+----------+------------+ ``` -## Example 2: Append a count aggregation to existing search result with override option + +## Example 2: Append a count aggregation to existing search results with override option This example appends "count by gender" to "sum by gender, state" with override option. @@ -71,9 +75,10 @@ fetched rows / total rows = 10/10 +--------+-------+----------+------------+ ``` + ## Example 3: Append multiple sub-search results -This example shows how to chain multiple appendcol commands to add columns from different sub-searches. +The following example PPL query shows how to use `appendcol` to chain multiple appendcol commands to add columns from different sub-searches. ```ppl source=employees @@ -101,9 +106,10 @@ fetched rows / total rows = 9/9 +------+-------------+-----+------------------+---------+ ``` + ## Example 4: Override case of column name conflict -This example demonstrates the override option when column names conflict between main search and sub-search. +The following example PPL query demonstrates how to use `appendcol` with the override option when column names conflict between main search and sub-search. ```ppl source=employees diff --git a/docs/user/ppl/cmd/appendpipe.md b/docs/user/ppl/cmd/appendpipe.md index f2dc71a2ab..659482edcf 100644 --- a/docs/user/ppl/cmd/appendpipe.md +++ b/docs/user/ppl/cmd/appendpipe.md @@ -1,15 +1,18 @@ -# appendpipe +# appendpipe -## Description -The `appendpipe` command appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first.The subpipeline is run when the search reaches the appendpipe command. +The `appendpipe` command appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first. The subpipeline is run when the search reaches the appendpipe command. The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows. -## Syntax -appendpipe [\] -* subpipeline: mandatory. A list of commands that are applied to the search results from the commands that occur in the search before the `appendpipe` command. +## Syntax + +Use the following syntax: + +`appendpipe []` +* `subpipeline`: mandatory. A list of commands that are applied to the search results from the commands that occur in the search before the `appendpipe` command. -## Example 1: Append rows from a total count to existing search result + +## Example 1: Append rows from a total count to existing search results This example appends rows from "total by gender" to "sum by gender, state" with merged column of same field name and type. @@ -37,6 +40,7 @@ fetched rows / total rows = 6/6 +------+--------+-------+-------+ ``` + ## Example 2: Append rows with merged column names This example appends rows from "count by gender" to "sum by gender, state". @@ -65,6 +69,7 @@ fetched rows / total rows = 6/6 +----------+--------+-------+ ``` + ## Limitations * **Schema Compatibility**: Same as command `append`, when fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with `eval` or using `fields` to select non-conflicting columns). \ No newline at end of file diff --git a/docs/user/ppl/cmd/bin.md b/docs/user/ppl/cmd/bin.md index 7f8ef389bd..8ad8d72f01 100644 --- a/docs/user/ppl/cmd/bin.md +++ b/docs/user/ppl/cmd/bin.md @@ -1,13 +1,15 @@ -# bin +# bin -## Description The `bin` command groups numeric values into buckets of equal intervals, making it useful for creating histograms and analyzing data distribution. It takes a numeric or time-based field and generates a new field with values that represent the lower bound of each bucket. -## Syntax -bin \ [span=\] [minspan=\] [bins=\] [aligntime=(earliest \| latest \| \)] [start=\] [end=\] -* field: mandatory. The field to bin. Accepts numeric or time-based fields. -* span: optional. The interval size for each bin. Cannot be used with bins or minspan parameters. +## Syntax + +Use the following syntax: + +`bin [span=] [minspan=] [bins=] [aligntime=(earliest | latest | )] [start=] [end=]` +* `field`: mandatory. The field to bin. Accepts numeric or time-based fields. +* `span`: optional. The interval size for each bin. Cannot be used with bins or minspan parameters. * Supports numeric (e.g., `1000`), logarithmic (e.g., `log10`, `2log10`), and time intervals * Available time units: * microsecond (us) @@ -19,19 +21,19 @@ bin \ [span=\] [minspan=\] [bins=\] [align * hour (h, hr, hrs, hour, hours) * day (d, day, days) * month (M, mon, month, months) -* minspan: optional. The minimum interval size for automatic span calculation. Cannot be used with span or bins parameters. -* bins: optional. The maximum number of equal-width bins to create. Cannot be used with span or minspan parameters. The bins parameter must be between 2 and 50000 (inclusive). +* `minspan`: optional. The minimum interval size for automatic span calculation. Cannot be used with span or bins parameters. +* `bins`: optional. The maximum number of equal-width bins to create. Cannot be used with span or minspan parameters. The bins parameter must be between 2 and 50000 (inclusive). **Limitation**: The bins parameter on timestamp fields has the following requirements: 1. **Pushdown must be enabled**: Controlled by ``plugins.calcite.pushdown.enabled`` (enabled by default). When pushdown is disabled, use the ``span`` parameter instead (e.g., ``bin @timestamp span=5m``). 2. **Timestamp field must be used as an aggregation bucket**: The binned timestamp field must be used in a ``stats`` aggregation (e.g., ``source=events | bin @timestamp bins=3 | stats count() by @timestamp``). Using bins on timestamp fields outside of aggregation buckets is not supported. -* aligntime: optional. Align the bin times for time-based fields. Valid only for time-based discretization. Options: +* `aligntime`: optional. Align the bin times for time-based fields. Valid only for time-based discretization. Options: * earliest: Align bins to the earliest timestamp in the data * latest: Align bins to the latest timestamp in the data * \: Align bins to a specific epoch time value or time modifier expression -* start: optional. The starting value for binning range. **Default:** minimum field value. -* end: optional. The ending value for binning range. **Default:** maximum field value. +* `start`: optional. The starting value for binning range. **Default:** minimum field value. +* `end`: optional. The ending value for binning range. **Default:** maximum field value. **Parameter Behavior** When multiple parameters are specified, priority order is: span > minspan > bins > start/end > default. @@ -41,6 +43,7 @@ When multiple parameters are specified, priority order is: span > minspan > bins * aligntime parameter only applies to time spans excluding days/months * start/end parameters expand the range (never shrink) and affect bin width calculation + ## Example 1: Basic numeric span ```ppl @@ -63,6 +66,7 @@ fetched rows / total rows = 3/3 +-------+----------------+ ``` + ## Example 2: Large numeric span ```ppl @@ -84,6 +88,7 @@ fetched rows / total rows = 2/2 +-------------+ ``` + ## Example 3: Logarithmic span (log10) ```ppl @@ -105,6 +110,7 @@ fetched rows / total rows = 2/2 +------------------+ ``` + ## Example 4: Logarithmic span with coefficient ```ppl @@ -127,6 +133,7 @@ fetched rows / total rows = 3/3 +------------------+ ``` + ## Example 5: Basic bins parameter ```ppl @@ -149,6 +156,7 @@ fetched rows / total rows = 3/3 +------------+ ``` + ## Example 6: Low bin count ```ppl @@ -169,6 +177,7 @@ fetched rows / total rows = 1/1 +-------+ ``` + ## Example 7: High bin count ```ppl @@ -191,6 +200,7 @@ fetched rows / total rows = 3/3 +-------+----------------+ ``` + ## Example 8: Basic minspan ```ppl @@ -213,6 +223,7 @@ fetched rows / total rows = 3/3 +-------+----------------+ ``` + ## Example 9: Large minspan ```ppl @@ -233,6 +244,7 @@ fetched rows / total rows = 1/1 +--------+ ``` + ## Example 10: Start and end range ```ppl @@ -253,6 +265,7 @@ fetched rows / total rows = 1/1 +-------+ ``` + ## Example 11: Large end range ```ppl @@ -273,6 +286,7 @@ fetched rows / total rows = 1/1 +----------+ ``` + ## Example 12: Span with start/end ```ppl @@ -296,6 +310,7 @@ fetched rows / total rows = 4/4 +-------+ ``` + ## Example 13: Hour span ```ppl @@ -318,6 +333,7 @@ fetched rows / total rows = 3/3 +---------------------+-------+ ``` + ## Example 14: Minute span ```ppl @@ -340,6 +356,7 @@ fetched rows / total rows = 3/3 +---------------------+-------+ ``` + ## Example 15: Second span ```ppl @@ -362,6 +379,7 @@ fetched rows / total rows = 3/3 +---------------------+-------+ ``` + ## Example 16: Daily span ```ppl @@ -384,6 +402,7 @@ fetched rows / total rows = 3/3 +---------------------+-------+ ``` + ## Example 17: Aligntime with time modifier ```ppl @@ -406,6 +425,7 @@ fetched rows / total rows = 3/3 +---------------------+-------+ ``` + ## Example 18: Aligntime with epoch timestamp ```ppl @@ -428,6 +448,7 @@ fetched rows / total rows = 3/3 +---------------------+-------+ ``` + ## Example 19: Default behavior (no parameters) ```ppl @@ -450,6 +471,7 @@ fetched rows / total rows = 3/3 +-----------+----------------+ ``` + ## Example 20: Binning with string fields ```ppl diff --git a/docs/user/ppl/cmd/chart.md b/docs/user/ppl/cmd/chart.md index 829afdedb7..788a52e45b 100644 --- a/docs/user/ppl/cmd/chart.md +++ b/docs/user/ppl/cmd/chart.md @@ -1,34 +1,36 @@ -# chart +# chart -## Description The `chart` command transforms search results by applying a statistical aggregation function and optionally grouping the data by one or two fields. The results are suitable for visualization as a two-dimension chart when grouping by two fields, where unique values in the second group key can be pivoted to column names. -## Syntax -chart [limit=(top\|bottom) \] [useother=\] [usenull=\] [nullstr=\] [otherstr=\] \ [ by \ \ ] \| [over \ ] [ by \] -* limit: optional. Specifies the number of categories to display when using column split. Each unique value in the column split field represents a category. **Default:** top10. +## Syntax + +Use the following syntax: + +`chart [limit=(top|bottom) ] [useother=] [usenull=] [nullstr=] [otherstr=] [ by ] | [over ] [ by ]` +* `limit`: optional. Specifies the number of categories to display when using column split. Each unique value in the column split field represents a category. **Default:** top10. * Syntax: `limit=(top|bottom)` or `limit=` (defaults to top) * When `limit=K` is set, the top or bottom K categories from the column split field are retained; the remaining categories are grouped into an "OTHER" category if `useother` is not set to false. * Set limit to 0 to show all categories without any limit. * Use `limit=topK` or `limit=bottomK` to specify whether to retain the top or bottom K column categories. The ranking is based on the sum of aggregated values for each column category. For example, `chart limit=top3 count() by region, product` keeps the 3 products with the highest total counts across all regions. If not specified, top is used by default. * Only applies when column split is present (by 2 fields or over...by... coexists). -* useother: optional. Controls whether to create an "OTHER" category for categories beyond the limit. **Default:** true +* `useother`: optional. Controls whether to create an "OTHER" category for categories beyond the limit. **Default:** true * When set to false, only the top/bottom N categories (based on limit) are shown without an "OTHER" category. * When set to true, categories beyond the limit are grouped into an "OTHER" category. * Only applies when using column split and when there are more categories than the limit. -* usenull: optional. Controls whether to group events without a column split (i.e. whose column split is null) into a separate "NULL" category. **Default:** true +* `usenull`: optional. Controls whether to group events without a column split (i.e. whose column split is null) into a separate "NULL" category. **Default:** true * `usenull` only applies to column split. * Row split should always be non-null value. Documents with null values in row split will be ignored. * When `usenull=false`, events with a null column split are excluded from results. * When `usenull=true`, events with a null column split are grouped into a separate "NULL" category. -* nullstr: optional. Specifies the category name for rows that do not contain the column split value. **Default:** "NULL" +* `nullstr`: optional. Specifies the category name for rows that do not contain the column split value. **Default:** "NULL" * Only applies when `usenull` is set to true. -* otherstr: optional. Specifies the category name for the "OTHER" category. **Default:** "OTHER" +* `otherstr`: optional. Specifies the category name for the "OTHER" category. **Default:** "OTHER" * Only applies when `useother` is set to true and there are values beyond the limit. -* aggregation_function: mandatory. The aggregation function to apply to the data. +* `aggregation_function`: mandatory. The aggregation function to apply to the data. * Currently, only a single aggregation function is supported. * Available functions: aggregation functions supported by the stats command. -* by: optional. Groups the results by either one field (row split) or two fields (row split and column split) +* `by`: optional. Groups the results by either one field (row split) or two fields (row split and column split) * `limit`, `useother`, and `usenull` apply to the column split * Results are returned as individual rows for each combination. * If not specified, the aggregation is performed across all documents. @@ -36,12 +38,14 @@ chart [limit=(top\|bottom) \] [useother=\] [usenull=\ by ` groups the results by both fields. * Using `over` alone on one field is equivalent to `by ` + ## Notes * The fields generated by column splitting are converted to strings so that they are compatible with `nullstr` and `otherstr` and can be used as column names once pivoted. * Documents with null values in fields used by the aggregation function are excluded from aggregation. For example, in `chart avg(balance) over deptno, group`, documents where `balance` is null are excluded from the average calculation. * The aggregation metric appears as the last column in the result. Result columns are ordered as: [row-split] [column-split] [aggregation-metrics]. + ## Example 1: Basic aggregation without grouping This example calculates the average balance across all accounts. @@ -62,6 +66,7 @@ fetched rows / total rows = 1/1 +--------------+ ``` + ## Example 2: Group by single field This example calculates the count of accounts grouped by gender. @@ -83,9 +88,10 @@ fetched rows / total rows = 2/2 +--------+---------+ ``` + ## Example 3: Using over and by for multiple field grouping -This example shows average balance grouped by both gender and age fields. Note that the age column in the result is converted to string type. +The following example PPL query shows how to use `chart` to calculate average balance grouped by both gender and age fields. Note that the age column in the result is converted to string type. ```ppl source=accounts @@ -106,6 +112,7 @@ fetched rows / total rows = 4/4 +--------+-----+--------------+ ``` + ## Example 4: Using basic limit functionality This example limits the results to show only the top 1 age group. Note that the age column in the result is converted to string type. @@ -128,9 +135,10 @@ fetched rows / total rows = 3/3 +--------+-------+---------+ ``` + ## Example 5: Using limit with other parameters -This example shows using limit with useother and custom otherstr parameters. +The following example PPL query shows how to use `chart` with limit, useother, and custom otherstr parameters. ```ppl source=accounts @@ -151,9 +159,10 @@ fetched rows / total rows = 4/4 +-------+--------------+---------+ ``` + ## Example 6: Using null parameters -This example shows using limit with usenull and custom nullstr parameters. +The following example PPL query shows how to use `chart` with limit, usenull, and custom nullstr parameters. ```ppl source=accounts @@ -174,9 +183,10 @@ fetched rows / total rows = 4/4 +-----------+------------------------+---------+ ``` + ## Example 7: Using chart command with span -This example demonstrates using span for grouping age ranges. +The following example PPL query demonstrates how to use `chart` with span for grouping age ranges. ```ppl source=accounts @@ -195,6 +205,7 @@ fetched rows / total rows = 2/2 +-----+--------+--------------+ ``` + ## Limitations * Only a single aggregation function is supported per chart command. \ No newline at end of file diff --git a/docs/user/ppl/cmd/dedup.md b/docs/user/ppl/cmd/dedup.md index 59dfcf63dd..099ed679d9 100644 --- a/docs/user/ppl/cmd/dedup.md +++ b/docs/user/ppl/cmd/dedup.md @@ -1,19 +1,22 @@ -# dedup +# dedup -## Description The `dedup` command removes duplicate documents defined by specified fields from the search result. -## Syntax -dedup [int] \ [keepempty=\] [consecutive=\] -* int: optional. The `dedup` command retains multiple events for each combination when you specify \. The number for \ must be greater than 0. All other duplicates are removed from the results. **Default:** 1 -* keepempty: optional. If set to true, keep the document if the any field in the field-list has NULL value or field is MISSING. **Default:** false. -* consecutive: optional. If set to true, removes only events with duplicate combinations of values that are consecutive. **Default:** false. -* field-list: mandatory. The comma-delimited field list. At least one field is required. +## Syntax + +Use the following syntax: + +`dedup [int] [keepempty=] [consecutive=]` +* `int`: optional. The `dedup` command retains multiple events for each combination when you specify ``. The number for `` must be greater than 0. All other duplicates are removed from the results. **Default:** 1 +* `keepempty`: optional. If set to true, keep the document if any field in the field-list has NULL value or field is MISSING. **Default:** false. +* `consecutive`: optional. If set to true, removes only events with duplicate combinations of values that are consecutive. **Default:** false. +* `field-list`: mandatory. The comma-delimited field list. At least one field is required. + ## Example 1: Dedup by one field -This example shows deduplicating documents by gender field. +The following example PPL query shows how to use `dedup` to remove duplicate documents based on the `gender` field: ```ppl source=accounts @@ -34,9 +37,10 @@ fetched rows / total rows = 2/2 +----------------+--------+ ``` -## Example 2: Keep 2 duplicates documents -This example shows deduplicating documents by gender field while keeping 2 duplicates. +## Example 2: Keep two duplicates documents + +The following example PPL query shows how to use `dedup` to remove duplicate documents based on the `gender` field while keeping two duplicates: ```ppl source=accounts @@ -58,9 +62,10 @@ fetched rows / total rows = 3/3 +----------------+--------+ ``` -## Example 3: Keep or Ignore the empty field by default -This example shows deduplicating documents while keeping null values. +## Example 3: Keep or ignore empty fields by default + +The following example PPL query shows how to use `dedup` to remove duplicate documents while keeping documents with null values in the specified field: ```ppl source=accounts @@ -83,7 +88,7 @@ fetched rows / total rows = 4/4 +----------------+-----------------------+ ``` -This example shows deduplicating documents while ignoring null values. +The following example PPL query shows how to use `dedup` to remove duplicate documents while ignoring documents with empty values in the specified field: ```ppl source=accounts @@ -105,9 +110,10 @@ fetched rows / total rows = 3/3 +----------------+-----------------------+ ``` + ## Example 4: Dedup in consecutive document -This example shows deduplicating consecutive documents. +The following example PPL query shows how to use `dedup` to remove duplicate consecutive documents: ```ppl source=accounts @@ -129,6 +135,7 @@ fetched rows / total rows = 3/3 +----------------+--------+ ``` + ## Limitations The `dedup` with `consecutive=true` command can only work with `plugins.calcite.enabled=false`. \ No newline at end of file diff --git a/docs/user/ppl/cmd/describe.md b/docs/user/ppl/cmd/describe.md index d6efffc9d5..129f14fc55 100644 --- a/docs/user/ppl/cmd/describe.md +++ b/docs/user/ppl/cmd/describe.md @@ -1,15 +1,18 @@ -# describe +# describe -## Description -Use the `describe` command to query metadata of the index. `describe` command can only be used as the first command in the PPL query. -## Syntax +The `describe` command queries metadata of the index. The `describe` command can only be used as the first command in the PPL query. -describe [dataSource.][schema.]\ -* dataSource: optional. If dataSource is not provided, it resolves to opensearch dataSource. -* schema: optional. If schema is not provided, it resolves to default schema. -* tablename: mandatory. describe command must specify which tablename to query from. +## Syntax + +Use the following syntax: + +`describe [dataSource.][schema.]` +* `dataSource`: optional. If dataSource is not provided, it resolves to OpenSearch dataSource. +* `schema`: optional. If schema is not provided, it resolves to default schema. +* `tablename`: mandatory. describe command must specify which tablename to query from. + ## Example 1: Fetch all the metadata This example describes the accounts index. @@ -39,6 +42,7 @@ fetched rows / total rows = 11/11 +----------------+-------------+------------+----------------+-----------+-----------+-------------+---------------+----------------+----------------+----------+---------+------------+---------------+------------------+-------------------+------------------+-------------+---------------+--------------+-------------+------------------+------------------+--------------------+ ``` + ## Example 2: Fetch metadata with condition and filter This example retrieves columns with type bigint in the accounts index. @@ -62,6 +66,7 @@ fetched rows / total rows = 3/3 +----------------+ ``` + ## Example 3: Fetch metadata for table in Prometheus datasource See [Fetch metadata for table in Prometheus datasource](../admin/datasources.md) for more context. \ No newline at end of file diff --git a/docs/user/ppl/cmd/eval.md b/docs/user/ppl/cmd/eval.md index d3300cd6b0..cb53299362 100644 --- a/docs/user/ppl/cmd/eval.md +++ b/docs/user/ppl/cmd/eval.md @@ -1,17 +1,20 @@ -# eval +# eval -## Description The `eval` command evaluates the expression and appends the result to the search result. -## Syntax -eval \=\ ["," \=\ ]... -* field: mandatory. If the field name does not exist, a new field is added. If the field name already exists, it will be overridden. +## Syntax + +Use the following syntax: + +`eval = ["," = ]...` +* `field`: mandatory. If the field name does not exist, a new field is added. If the field name already exists, it will be overridden. * expression: mandatory. Any expression supported by the system. + ## Example 1: Create a new field -This example shows creating a new field doubleAge for each document. The new doubleAge field is the result of multiplying age by 2. +The following example PPL query shows how to use `eval` to create a new field for each document. In this example, the new field is `doubleAge`. ```ppl source=accounts @@ -33,9 +36,10 @@ fetched rows / total rows = 4/4 +-----+-----------+ ``` + ## Example 2: Override an existing field -This example shows overriding the existing age field by adding 1 to it. +The following example PPL query shows how to use `eval` to override an existing field. In this example, the existing field `age` is overridden by the `age` field plus 1. ```ppl source=accounts @@ -57,9 +61,10 @@ fetched rows / total rows = 4/4 +-----+ ``` + ## Example 3: Create a new field with field defined in eval -This example shows creating a new field ddAge using a field defined in the same eval command. The new field ddAge is the result of multiplying doubleAge by 2, where doubleAge is defined in the same eval command. +The following example PPL query shows how to use `eval` to create a new field based on the fields defined in the `eval` expression. In this example, the new field `ddAge` is the evaluation result of the `doubleAge` field multiplied by 2. `doubleAge` is defined in the `eval` command. ```ppl source=accounts @@ -81,9 +86,10 @@ fetched rows / total rows = 4/4 +-----+-----------+-------+ ``` + ## Example 4: String concatenation -This example shows using the + operator for string concatenation. You can concatenate string literals and field values. +The following example PPL query shows using the `+` operator for string concatenation. You can concatenate string literals and field values. ```ppl source=accounts @@ -105,9 +111,10 @@ fetched rows / total rows = 4/4 +-----------+---------------+ ``` + ## Example 5: Multiple string concatenation with type casting -This example shows multiple concatenations with type casting from numeric to string. +The following example PPL query shows multiple concatenations with type casting from numeric to string. ```ppl source=accounts | eval full_info = 'Name: ' + firstname + ', Age: ' + CAST(age AS STRING) | fields firstname, age, full_info @@ -127,6 +134,7 @@ fetched rows / total rows = 4/4 +-----------+-----+------------------------+ ``` + ## Limitations -The `eval` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. \ No newline at end of file +The `eval` command is not rewritten to [query domain-specific language (DSL)](https://opensearch.org/docs/latest/query-dsl/index/). It is only run on the coordinating node. \ No newline at end of file diff --git a/docs/user/ppl/cmd/eventstats.md b/docs/user/ppl/cmd/eventstats.md index 1cb791d95e..72252fe244 100644 --- a/docs/user/ppl/cmd/eventstats.md +++ b/docs/user/ppl/cmd/eventstats.md @@ -1,10 +1,9 @@ -# eventstats +# eventstats -## Description The `eventstats` command enriches your event data with calculated summary statistics. It operates by analyzing specified fields within your events, computing various statistical measures, and then appending these results as new fields to each original event. Key aspects of `eventstats`: -1. It performs calculations across the entire result set or within defined groups. +1. It performs calculations across the entire search results or within defined groups. 2. The original events remain intact, with new fields added to contain the statistical results. 3. The command is particularly useful for comparative analysis, identifying outliers, or providing additional context to individual events. @@ -14,21 +13,24 @@ The `stats` and `eventstats` commands are both used for calculating statistics, * `stats`: Produces a summary table with only the calculated statistics. * `eventstats`: Adds the calculated statistics as new fields to the existing events, preserving the original data. * Event Retention - * `stats`: Reduces the result set to only the statistical summary, discarding individual events. + * `stats`: Reduces the search results to only the statistical summary, discarding individual events. * `eventstats`: Retains all original events and adds new fields with the calculated statistics. * Use Cases * `stats`: Best for creating summary reports or dashboards. Often used as a final command to summarize results. * `eventstats`: Useful when you need to enrich events with statistical context for further analysis or filtering. It can be used mid-search to add statistics that can be used in subsequent commands. -## Syntax -eventstats [bucket_nullable=bool] \... [by-clause] -* function: mandatory. An aggregation function or window function. -* bucket_nullable: optional. Controls whether the eventstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. +## Syntax + +Use the following syntax: + +`eventstats [bucket_nullable=bool] ... [by-clause]` +* `function`: mandatory. An aggregation function or window function. +* `bucket_nullable`: optional. Controls whether the eventstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true` * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false` -* by-clause: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... **Default:** aggregation over the entire result set. -* span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets. +* `by-clause`: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... **Default:** aggregation over the entire search results. +* `span-expression`: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets. * Available time units: * millisecond (ms) * second (s) @@ -40,23 +42,24 @@ eventstats [bucket_nullable=bool] \... [by-clause] * quarter (q) * year (y) -## Aggregation Functions +## Aggregation functions The eventstats command supports the following aggregation functions: -* COUNT: Count of values -* SUM: Sum of numeric values -* AVG: Average of numeric values -* MAX: Maximum value -* MIN: Minimum value -* VAR_SAMP: Sample variance -* VAR_POP: Population variance -* STDDEV_SAMP: Sample standard deviation -* STDDEV_POP: Population standard deviation +* `COUNT`: Count of values +* `SUM`: Sum of numeric values +* `AVG`: Average of numeric values +* `MAX`: Maximum value +* `MIN`: Minimum value +* `VAR_SAMP`: Sample variance +* `VAR_POP`: Population variance +* `STDDEV_SAMP`: Sample standard deviation +* `STDDEV_POP`: Population standard deviation * DISTINCT_COUNT/DC: Distinct count of values -* EARLIEST: Earliest value by timestamp -* LATEST: Latest value by timestamp +* `EARLIEST`: Earliest value by timestamp +* `LATEST`: Latest value by timestamp For detailed documentation of each function, see [Aggregation Functions](../functions/aggregations.md). + ## Usage Eventstats @@ -70,9 +73,10 @@ source = table | eventstats dc(field) as distinct_count source = table | eventstats distinct_count(category) by region ``` + ## Example 1: Calculate the average, sum and count of a field by group -This example shows calculating the average age, sum of age, and count of events for all accounts grouped by gender. +The following example PPL query shows how to use `eventstats` to calculate the average age, sum of age, and count of events for all accounts grouped by gender. ```ppl source=accounts @@ -95,9 +99,10 @@ fetched rows / total rows = 4/4 +----------------+--------+-----+--------------------+----------+---------+ ``` + ## Example 2: Calculate the count by a gender and span -This example shows counting events by age intervals of 5 years, grouped by gender. +The following example PPL query shows how to use `eventstats` to count events by age intervals of 5 years, grouped by gender. ```ppl source=accounts @@ -120,6 +125,7 @@ fetched rows / total rows = 4/4 +----------------+--------+-----+-----+ ``` + ## Example 3: Null buckets handling ```ppl diff --git a/docs/user/ppl/cmd/expand.md b/docs/user/ppl/cmd/expand.md index 8fddbea7ad..7ccd6f820e 100644 --- a/docs/user/ppl/cmd/expand.md +++ b/docs/user/ppl/cmd/expand.md @@ -1,6 +1,5 @@ -# expand +# expand -## Description The `expand` command transforms a single document with a nested array field into multiple documents—each containing one element from the array. All other fields in the original document are duplicated across the resulting documents. Key aspects of `expand`: @@ -9,12 +8,16 @@ Key aspects of `expand`: * If an alias is provided, the expanded values appear under the alias instead of the original field name. * If the specified field is an empty array, the row is retained with the expanded field set to null. -## Syntax -expand \ [as alias] -* field: mandatory. The field to be expanded (exploded). Currently only nested arrays are supported. -* alias: optional. The name to use instead of the original field name. +## Syntax + +Use the following syntax: + +`expand [as alias]` +* `field`: mandatory. The field to be expanded (exploded). Currently only nested arrays are supported. +* `alias`: optional. The name to use instead of the original field name. + ## Example 1: Expand address field with an alias Given a dataset `migration` with the following data: @@ -45,6 +48,7 @@ fetched rows / total rows = 3/3 +-------+-----+-------------------------------------------------------------------------------------------+ ``` + ## Limitations * The `expand` command currently only supports nested arrays. Primitive fields storing arrays are not supported. E.g. a string field storing an array of strings cannot be expanded with the current implementation. \ No newline at end of file diff --git a/docs/user/ppl/cmd/explain.md b/docs/user/ppl/cmd/explain.md index fb60a3b120..f175cc0155 100644 --- a/docs/user/ppl/cmd/explain.md +++ b/docs/user/ppl/cmd/explain.md @@ -1,18 +1,21 @@ -# explain +# explain -## Description -The `explain` command explains the plan of query which is often used for query translation and troubleshooting. The `explain` command can only be used as the first command in the PPL query. -## Syntax +The `explain` command displays the execution plan of a query, which is often used for query translation and troubleshooting. The `explain` command can only be used as the first command in the PPL query. -explain queryStatement -* mode: optional. There are 4 explain modes: "simple", "standard", "cost", "extended". **Default:** standard. +## Syntax + +Use the following syntax: + +`explain queryStatement` +* `mode`: optional. There are 4 explain modes: "simple", "standard", "cost", "extended". **Default:** standard. * standard: The default mode. Display logical and physical plan with pushdown information (DSL). * simple: Display the logical plan tree without attributes. * cost: Display the standard information plus plan cost attributes. * extended: Display the standard information plus generated code. -* queryStatement: mandatory. A PPL query to explain. +* `queryStatement`: mandatory. A PPL query to explain. + ## Example 1: Explain a PPL query in v2 engine When Calcite is disabled (plugins.calcite.enabled=false), explaining a PPL query will get its physical plan of v2 engine and pushdown information. @@ -45,9 +48,10 @@ Explain: } ``` + ## Example 2: Explain a PPL query in v3 engine -When Calcite is enabled (plugins.calcite.enabled=true), explaining a PPL query will get its logical and physical plan of v3 engine and pushdown information. +When Calcite is enabled (`plugins.calcite.enabled=true`), explaining a PPL query will get its logical and physical plan of v3 engine and pushdown information. ```ppl explain source=state_country @@ -72,9 +76,10 @@ Explain } ``` + ## Example 3: Explain a PPL query with simple mode -When Calcite is enabled (plugins.calcite.enabled=true), you can explain a PPL query with the "simple" mode. +When Calcite is enabled (`plugins.calcite.enabled=true`), you can explain a PPL query with the "simple" mode. ```ppl explain simple source=state_country @@ -84,7 +89,7 @@ explain simple source=state_country Explain -``` +```json { "calcite": { "logical": """LogicalProject @@ -96,9 +101,10 @@ Explain } ``` + ## Example 4: Explain a PPL query with cost mode -When Calcite is enabled (plugins.calcite.enabled=true), you can explain a PPL query with the "cost" mode. +When Calcite is enabled (`plugins.calcite.enabled=true`), you can explain a PPL query with the "cost" mode. ```ppl explain cost source=state_country @@ -123,6 +129,7 @@ Explain } ``` + ## Example 5: Explain a PPL query with extended mode ```ppl diff --git a/docs/user/ppl/cmd/fields.md b/docs/user/ppl/cmd/fields.md index 507a8e6903..485d0fc4fb 100644 --- a/docs/user/ppl/cmd/fields.md +++ b/docs/user/ppl/cmd/fields.md @@ -1,17 +1,20 @@ -# fields +# fields -## Description -The `fields` command keeps or removes fields from the search result. -## Syntax +The `fields` command specifies the fields that should be included in or excluded from the search results. -fields [+\|-] \ -* +\|-: optional. If the plus (+) is used, only the fields specified in the field list will be kept. If the minus (-) is used, all the fields specified in the field list will be removed. **Default:** +. -* field-list: mandatory. Comma-delimited or space-delimited list of fields to keep or remove. Supports wildcard patterns. +## Syntax + +Use the following syntax: + +`fields [+|-] ` +* `+|-`: optional. If the plus (+) is used, only the fields specified in the field list will be included. If the minus (-) is used, all the fields specified in the field list will be excluded. **Default:** `+`. +* `field-list`: mandatory. Comma-delimited or space-delimited list of fields to keep or remove. Supports wildcard patterns. -## Example 1: Select specified fields from result -This example shows selecting account_number, firstname and lastname fields from search results. +## Example 1: Select specified fields from the search result + +The following example PPL query shows how to retrieve the `account_number`, `firstname`, and `lastname` fields from the search results: ```ppl source=accounts @@ -32,9 +35,10 @@ fetched rows / total rows = 4/4 +----------------+-----------+----------+ ``` -## Example 2: Remove specified fields from result -This example shows removing the account_number field from search results. +## Example 2: Remove specified fields from the search results + +The following example PPL query shows how to remove the `account_number` field from the search results: ```ppl source=accounts @@ -56,6 +60,7 @@ fetched rows / total rows = 4/4 +-----------+----------+ ``` + ## Example 3: Space-delimited field selection Fields can be specified using spaces instead of commas, providing a more concise syntax. @@ -80,6 +85,7 @@ fetched rows / total rows = 4/4 +-----------+----------+-----+ ``` + ## Example 4: Prefix wildcard pattern Select fields starting with a pattern using prefix wildcards. @@ -103,6 +109,7 @@ fetched rows / total rows = 4/4 +----------------+ ``` + ## Example 5: Suffix wildcard pattern Select fields ending with a pattern using suffix wildcards. @@ -126,6 +133,7 @@ fetched rows / total rows = 4/4 +-----------+----------+ ``` + ## Example 6: Contains wildcard pattern Select fields containing a pattern using contains wildcards. @@ -147,6 +155,7 @@ fetched rows / total rows = 1/1 +----------------+-----------+-----------------+---------+-------+-----+----------------------+----------+ ``` + ## Example 7: Mixed delimiter syntax Combine spaces and commas for flexible field specification. @@ -170,6 +179,7 @@ fetched rows / total rows = 4/4 +-----------+----------------+----------+ ``` + ## Example 8: Field deduplication Automatically prevents duplicate columns when wildcards expand to already specified fields. @@ -194,6 +204,7 @@ fetched rows / total rows = 4/4 ``` Note: Even though `firstname` is explicitly specified and would also match `*name`, it appears only once due to automatic deduplication. + ## Example 9: Full wildcard selection Select all available fields using `*` or `` `*` ``. This selects all fields defined in the index schema, including fields that may contain null values. @@ -216,6 +227,7 @@ fetched rows / total rows = 1/1 ``` Note: The `*` wildcard selects fields based on the index schema, not on data content. Fields with null values are included in the result set. Use backticks `` `*` ` if the plain `*`` doesn't return all expected fields. + ## Example 10: Wildcard exclusion Remove fields using wildcard patterns with the minus (-) operator. @@ -239,6 +251,7 @@ fetched rows / total rows = 4/4 +----------------+----------------------+---------+--------+--------+----------+-------+-----+-----------------------+ ``` -## See Also + +## See also - [table](table.md) - Alias command with identical functionality \ No newline at end of file diff --git a/docs/user/ppl/cmd/fillnull.md b/docs/user/ppl/cmd/fillnull.md index 40ed91e865..a710590137 100644 --- a/docs/user/ppl/cmd/fillnull.md +++ b/docs/user/ppl/cmd/fillnull.md @@ -1,24 +1,27 @@ -# fillnull +# fillnull -## Description -The `fillnull` command fills null values with the provided value in one or more fields in the search result. -## Syntax +The `fillnull` command fills null values with the provided value in one or more fields in the search results. -fillnull with \ [in \] -fillnull using \ = \ [, \ = \] -fillnull value=\ [\] -* replacement: mandatory. The value used to replace null values. -* field-list: optional. List of fields to apply the replacement to. It can be comma-delimited (with `with` or `using` syntax) or space-delimited (with `value=` syntax). **Default:** all fields. -* field: mandatory when using `using` syntax. Individual field name to assign a specific replacement value. +## Syntax + +Use one of the following syntax options: + +`fillnull with [in ]` +`fillnull using = [, = ]` +`fillnull value= []` +* `replacement`: mandatory. The value used to replace null values. +* `field-list`: optional. List of fields to apply the replacement to. It can be comma-delimited (with `with` or `using` syntax) or space-delimited (with `value=` syntax). **Default:** all fields. +* `field`: mandatory when using `using` syntax. Individual field name to assign a specific replacement value. * **Syntax variations** * `with in ` - Apply same value to specified fields * `using =, ...` - Apply different values to different fields * `value= []` - Alternative syntax with optional space-delimited field list + ## Example 1: Replace null values with a specified value on one field -This example shows replacing null values in the email field with '\'. +The following example PPL query shows how to use `fillnull` to replace null values in the email field with '\'. ```ppl source=accounts @@ -40,9 +43,10 @@ fetched rows / total rows = 4/4 +-----------------------+----------+ ``` + ## Example 2: Replace null values with a specified value on multiple fields -This example shows replacing null values in both email and employer fields with the same replacement value '\'. +The following example PPL query shows how to use `fillnull` to replace null values in both email and employer fields with the same replacement value '\'. ```ppl source=accounts @@ -64,9 +68,10 @@ fetched rows / total rows = 4/4 +-----------------------+-------------+ ``` + ## Example 3: Replace null values with a specified value on all fields -This example shows replacing null values in all fields when no field list is specified. +The following example PPL query shows how to use `fillnull` to replace null values in all fields when no field list is specified. ```ppl source=accounts @@ -88,9 +93,10 @@ fetched rows / total rows = 4/4 +-----------------------+-------------+ ``` + ## Example 4: Replace null values with multiple specified values on multiple fields -This example shows using different replacement values for different fields using the 'using' syntax. +The following example PPL query shows how to use `fillnull` with different replacement values for different fields using the 'using' syntax. ```ppl source=accounts @@ -112,9 +118,10 @@ fetched rows / total rows = 4/4 +-----------------------+---------------+ ``` + ## Example 5: Replace null with specified value on specific fields (value= syntax) -This example shows using the alternative 'value=' syntax to replace null values in specific fields. +The following example PPL query shows how to use `fillnull` with the alternative 'value=' syntax to replace null values in specific fields. ```ppl source=accounts @@ -136,6 +143,7 @@ fetched rows / total rows = 4/4 +-----------------------+-------------+ ``` + ## Example 6: Replace null with specified value on all fields (value= syntax) When no field list is specified, the replacement applies to all fields in the result. @@ -160,9 +168,10 @@ fetched rows / total rows = 4/4 +-----------------------+-------------+ ``` + ## Limitations -* The `fillnull` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. +* The `fillnull` command is not rewritten to [query domain-specific language (DSL)](https://opensearch.org/docs/latest/query-dsl/index/). It is only run on the coordinating node. * When applying the same value to all fields without specifying field names, all fields must be the same type. For mixed types, use separate fillnull commands or explicitly specify fields. * The replacement value type must match ALL field types in the field list. When applying the same value to multiple fields, all fields must be the same type (all strings or all numeric). diff --git a/docs/user/ppl/cmd/flatten.md b/docs/user/ppl/cmd/flatten.md index ba4f9077dc..7dae6f0017 100644 --- a/docs/user/ppl/cmd/flatten.md +++ b/docs/user/ppl/cmd/flatten.md @@ -1,19 +1,23 @@ -# flatten +# flatten -## Description The `flatten` command flattens a struct or an object field into separate fields in a document. + The flattened fields will be ordered **lexicographically** by their original key names in the struct. For example, if the struct has keys `b`, `c` and `Z`, the flattened fields will be ordered as `Z`, `b`, `c`. Note that `flatten` should not be applied to arrays. Use the `expand` command to expand an array field into multiple rows instead. However, since an array can be stored in a non-array field in OpenSearch, when flattening a field storing a nested array, only the first element of the array will be flattened. -## Syntax -flatten \ [as (\)] -* field: mandatory. The field to be flattened. Only object and nested fields are supported. -* alias-list: optional. The names to use instead of the original key names. Names are separated by commas. It is advised to put the alias-list in parentheses if there is more than one alias. The length must match the number of keys in the struct field. The provided alias names **must** follow the lexicographical order of the corresponding original keys in the struct. +## Syntax + +Use the following syntax: + +`flatten [as ()]` +* `field`: mandatory. The field to be flattened. Only object and nested fields are supported. +* `alias-list`: optional. The names to use instead of the original key names. Names are separated by commas. It is advised to put the alias-list in parentheses if there is more than one alias. The length must match the number of keys in the struct field. The provided alias names **must** follow the lexicographical order of the corresponding original keys in the struct. -## Example: flatten an object field with aliases -This example shows flattening a message object field and using aliases to rename the flattened fields. +## Example: Flatten an object field with aliases + +The following example PPL query shows how to use `flatten` to flatten a message object field and use aliases to rename the flattened fields. Given the following index `my-index` ```text @@ -80,6 +84,7 @@ fetched rows / total rows = 2/2 +-----------------------------------------+--------+---------+-----+------+ ``` + ## Limitations * `flatten` command may not work as expected when its flattened fields are diff --git a/docs/user/ppl/cmd/grok.md b/docs/user/ppl/cmd/grok.md index c2636b5358..ed509135af 100644 --- a/docs/user/ppl/cmd/grok.md +++ b/docs/user/ppl/cmd/grok.md @@ -1,17 +1,20 @@ -# grok +# grok -## Description -The `grok` command parses a text field with a grok pattern and appends the results to the search result. -## Syntax +The `grok` command parses a text field with a grok pattern and appends the results to the search results. -grok \ \ -* field: mandatory. The field must be a text field. -* pattern: mandatory. The grok pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. +## Syntax + +Use the following syntax: + +`grok ` +* `field`: mandatory. The field must be a text field. +* `pattern`: mandatory. The grok pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. + ## Example 1: Create the new field -This example shows how to create new field `host` for each document. `host` will be the host name after `@` in `email` field. Parsing a null field will return an empty string. +The following example PPL query shows how to use `grok` to create new field `host` for each document. `host` will be the hostname after `@` in `email` field. Parsing a null field will return an empty string. ```ppl source=accounts @@ -33,9 +36,10 @@ fetched rows / total rows = 4/4 +-----------------------+------------+ ``` + ## Example 2: Override the existing field -This example shows how to override the existing `address` field with street number removed. +The following example PPL query shows how to use `grok` to override the existing `address` field with street number removed. ```ppl source=accounts @@ -57,9 +61,10 @@ fetched rows / total rows = 4/4 +------------------+ ``` + ## Example 3: Using grok to parse logs -This example shows how to use grok to parse raw logs. +The following example PPL query shows how to use `grok` to parse raw logs. ```ppl source=apache @@ -81,6 +86,7 @@ fetched rows / total rows = 4/4 +-----------------------------------------------------------------------------------------------------------------------------+----------------------------+----------+-------+ ``` + ## Limitations The grok command has the same limitations as the parse command, see [parse limitations](./parse.md#Limitations) for details. \ No newline at end of file diff --git a/docs/user/ppl/cmd/head.md b/docs/user/ppl/cmd/head.md index 5565c90d78..76ff16f7f4 100644 --- a/docs/user/ppl/cmd/head.md +++ b/docs/user/ppl/cmd/head.md @@ -1,17 +1,20 @@ -# head +# head -## Description -The `head` command returns the first N number of specified results after an optional offset in search order. -## Syntax +The `head` command returns the first N number of lines from a search result. -head [\] [from \] -* size: optional integer. Number of results to return. **Default:** 10 -* offset: optional integer after `from`. Number of results to skip. **Default:** 0 +## Syntax + +Use the following syntax: + +`head [] [from ]` +* `size`: optional integer. The number of results you want to return. **Default:** 10 +* `offset`: optional integer after `from`. Number of results to skip. **Default:** 0 -## Example 1: Get first 10 results -This example shows getting a maximum of 10 results from accounts index. +## Example 1: Get the first 10 results + +The following example PPL query shows how to use `head` to return the first 10 search results: ```ppl source=accounts @@ -33,9 +36,10 @@ fetched rows / total rows = 4/4 +-----------+-----+ ``` + ## Example 2: Get first N results -This example shows getting the first 3 results from accounts index. +The following example PPL query shows how to use `head` to get a specified number of search results. In this example, N is equal to 3: ```ppl source=accounts @@ -56,9 +60,10 @@ fetched rows / total rows = 3/3 +-----------+-----+ ``` -## Example 3: Get first N results after offset M -This example shows getting the first 3 results after offset 1 from accounts index. +## Example 3: Get the first N results after offset M + +The following example PPL query example shows getting the first 3 results after offset 1 from the `accounts` index. ```ppl source=accounts @@ -79,6 +84,7 @@ fetched rows / total rows = 3/3 +-----------+-----+ ``` + ## Limitations -The `head` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. \ No newline at end of file +The `head` command is not rewritten to [query domain-specific language (DSL)](https://opensearch.org/docs/latest/query-dsl/index/). It is only run on the coordinating node. \ No newline at end of file diff --git a/docs/user/ppl/cmd/index.md b/docs/user/ppl/cmd/index.md new file mode 100644 index 0000000000..54cf1aac70 --- /dev/null +++ b/docs/user/ppl/cmd/index.md @@ -0,0 +1,2 @@ +# Commands +PPL supports most common [SQL functions](https://docs.opensearch.org/latest/search-plugins/sql/functions/), including [relevance search](https://docs.opensearch.org/latest/search-plugins/sql/full-text/), but also introduces several more functions (called _commands_), which are available in PPL only. diff --git a/docs/user/ppl/cmd/join.md b/docs/user/ppl/cmd/join.md index 39d3f5a24d..95fb3bbaad 100644 --- a/docs/user/ppl/cmd/join.md +++ b/docs/user/ppl/cmd/join.md @@ -1,33 +1,37 @@ -# join +# join -## Description The `join` command combines two datasets together. The left side could be an index or results from a piped commands, the right side could be either an index or a subsearch. -## Syntax +## Syntax -### Basic syntax: +The `join` command supports basic and extended syntax options. -[joinType] join [leftAlias] [rightAlias] (on \| where) \ \ -* joinType: optional. The type of join to perform. Options: `left`, `semi`, `anti`, and performance-sensitive types `right`, `full`, `cross`. **Default:** `inner`. -* leftAlias: optional. The subsearch alias to use with the left join side, to avoid ambiguous naming. Pattern: `left = ` -* rightAlias: optional. The subsearch alias to use with the right join side, to avoid ambiguous naming. Pattern: `right = ` -* joinCriteria: mandatory. Any comparison expression. Must follow `on` or `where` keyword. -* right-dataset: mandatory. Right dataset could be either an `index` or a `subsearch` with/without alias. +### Basic syntax + +`[joinType] join [leftAlias] [rightAlias] (on | where) ` +* `joinType`: optional. The type of join to perform. Options: `left`, `semi`, `anti`, and performance-sensitive types `right`, `full`, `cross`. **Default:** `inner`. +* `leftAlias`: optional. The subsearch alias to use with the left join side, to avoid ambiguous naming. Pattern: `left = ` +* `rightAlias`: optional. The subsearch alias to use with the right join side, to avoid ambiguous naming. Pattern: `right = ` +* `joinCriteria`: mandatory. Any comparison expression. Must follow `on` or `where` keyword. +* `right-dataset`: mandatory. Right dataset could be either an `index` or a `subsearch` with/without alias. ### Extended syntax: -join [type=] [overwrite=] [max=n] (\ \| [leftAlias] [rightAlias] (on \| where) \) \ -* type: optional. Join type using extended syntax. Options: `left`, `outer` (alias of `left`), `semi`, `anti`, and performance-sensitive types `right`, `full`, `cross`. **Default:** `inner`. -* overwrite: optional boolean. Only works with `join-field-list`. Specifies whether duplicate-named fields from right-dataset should replace corresponding fields in the main search results. **Default:** `true`. -* max: optional integer. Controls how many subsearch results could be joined against each row in main search. **Default:** 0 (unlimited). -* join-field-list: optional. The fields used to build the join criteria. The join field list must exist on both sides. If not specified, all fields common to both sides will be used as join keys. -* leftAlias: optional. Same as basic syntax when used with extended syntax. -* rightAlias: optional. Same as basic syntax when used with extended syntax. -* joinCriteria: mandatory. Same as basic syntax when used with extended syntax. -* right-dataset: mandatory. Same as basic syntax. +`join [type=] [overwrite=] [max=n] ( | [leftAlias] [rightAlias] (on | where) ) ` +* `type`: optional. Join type using extended syntax. Options: `left`, `outer` (alias of `left`), `semi`, `anti`, and performance-sensitive types `right`, `full`, `cross`. **Default:** `inner`. +* `overwrite`: optional boolean. Only works with `join-field-list`. Specifies whether duplicate-named fields from right-dataset should replace corresponding fields in the main search results. **Default:** `true`. +* `max`: optional integer. Controls how many subsearch results could be joined against each row in main search. **Default:** 0 (unlimited). +* `join-field-list`: optional. The fields used to build the join criteria. The join field list must exist on both sides. If not specified, all fields common to both sides will be used as join keys. +* `leftAlias`: optional. Same as basic syntax when used with extended syntax. +* `rightAlias`: optional. Same as basic syntax when used with extended syntax. +* `joinCriteria`: mandatory. Same as basic syntax when used with extended syntax. +* `right-dataset`: mandatory. Same as basic syntax. -## Configuration + +## Configuration + +The following settings configure the `join` command behavior. ### plugins.ppl.join.subsearch_maxout @@ -56,6 +60,7 @@ curl -sS -H 'Content-Type: application/json' \ } ``` + ## Usage Basic join syntax: @@ -90,9 +95,10 @@ source = table1 | join type=inner max=1 a, b table2 | fields a, b, c source = table1 | join type=left overwrite=false max=0 a, b [source=table2 | rename d as b] | fields a, b, c ``` -## Example 1: Two indices join -This example shows joining two indices using the basic join syntax. +## Example 1: Two indexes join + +The following example PPL query shows how to use `join` to join two indexes using the basic join syntax. ```ppl source = state_country @@ -115,9 +121,10 @@ fetched rows / total rows = 5/5 +-------------+----------+-----------+ ``` + ## Example 2: Join with subsearch -This example shows joining with a subsearch using the basic join syntax. +The following example PPL query shows how to use `join` to join with a subsearch using the basic join syntax. ```ppl source = state_country as a @@ -143,9 +150,10 @@ fetched rows / total rows = 3/3 +-------------+----------+-----------+ ``` + ## Example 3: Join with field list -This example shows joining using the extended syntax with field list. +The following example PPL query shows how to use `join` with the extended syntax and field list. ```ppl source = state_country @@ -171,9 +179,10 @@ fetched rows / total rows = 3/3 +-------------+----------+---------+ ``` + ## Example 4: Join with options -This example shows joining using the extended syntax with additional options. +The following example PPL query shows how to use `join` with the extended syntax and additional options. ```ppl source = state_country @@ -195,6 +204,7 @@ fetched rows / total rows = 4/4 +-------------+----------+---------+ ``` + ## Limitations For basic syntax, if fields in the left outputs and right outputs have the same name. Typically, in the join criteria diff --git a/docs/user/ppl/cmd/kmeans.md b/docs/user/ppl/cmd/kmeans.md index 247902804d..6ccd4d5616 100644 --- a/docs/user/ppl/cmd/kmeans.md +++ b/docs/user/ppl/cmd/kmeans.md @@ -1,18 +1,21 @@ -# kmeans (deprecated by ml command) +# kmeans (deprecated by ml command) -## Description -The `kmeans` command applies the kmeans algorithm in the ml-commons plugin on the search result returned by a PPL command. -## Syntax +The `kmeans` command applies the kmeans algorithm in the ml-commons plugin on the search results returned by a PPL command. -kmeans \ \ \ -* centroids: optional. The number of clusters you want to group your data points into. **Default:** 2. -* iterations: optional. Number of iterations. **Default:** 10. -* distance_type: optional. The distance type can be COSINE, L1, or EUCLIDEAN. **Default:** EUCLIDEAN. +## Syntax + +Use the following syntax: + +`kmeans ` +* `centroids`: optional. The number of clusters you want to group your data points into. **Default:** 2. +* `iterations`: optional. Number of iterations. **Default:** 10. +* `distance_type`: optional. The distance type can be COSINE, L1, or EUCLIDEAN. **Default:** EUCLIDEAN. -## Example: Clustering of Iris Dataset -This example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. +## Example: Clustering of iris dataset + +The following example PPL query shows how to use `kmeans` to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. ```ppl source=iris_data @@ -32,6 +35,7 @@ Expected output: +--------------------+-------------------+--------------------+-------------------+-----------+ ``` + ## Limitations The `kmeans` command can only work with `plugins.calcite.enabled=false`. \ No newline at end of file diff --git a/docs/user/ppl/cmd/lookup.md b/docs/user/ppl/cmd/lookup.md index 03683cdc47..7e470dbc3c 100644 --- a/docs/user/ppl/cmd/lookup.md +++ b/docs/user/ppl/cmd/lookup.md @@ -1,18 +1,21 @@ -# lookup +# lookup -## Description The `lookup` command enriches your search data by adding or replacing data from a lookup index (dimension table). You can extend fields of an index with values from a dimension table, append or replace values when lookup condition is matched. As an alternative of join command, lookup command is more suitable for enriching the source data with a static dataset. -## Syntax -lookup \ (\ [as \])... [(replace \| append) (\ [as \])...] -* lookupIndex: mandatory. The name of lookup index (dimension table). -* lookupMappingField: mandatory. A mapping key in `lookupIndex`, analogy to a join key from right table. You can specify multiple `lookupMappingField` with comma-delimited. -* sourceMappingField: optional. A mapping key from source (left side), analogy to a join key from left side. If not specified, defaults to `lookupMappingField`. -* inputField: optional. A field in `lookupIndex` where matched values are applied to result output. You can specify multiple `inputField` with comma-delimited. If not specified, all fields except `lookupMappingField` from `lookupIndex` are applied to result output. -* outputField: optional. A field of output. You can specify zero or multiple `outputField`. If `outputField` has an existing field name in source query, its values will be replaced or appended by matched values from `inputField`. If the field specified in `outputField` is a new field, in replace strategy, an extended new field will be applied to the results, but fail in append strategy. +## Syntax + +Use the following syntax: + +`lookup ( [as ])... [(replace | append) ( [as ])...]` +* `lookupIndex`: mandatory. The name of lookup index (dimension table). +* `lookupMappingField`: mandatory. A mapping key in `lookupIndex`, analogy to a join key from right table. You can specify multiple `lookupMappingField` with comma-delimited. +* `sourceMappingField`: optional. A mapping key from source (left side), analogy to a join key from left side. If not specified, defaults to `lookupMappingField`. +* `inputField`: optional. A field in `lookupIndex` where matched values are applied to result output. You can specify multiple `inputField` with comma-delimited. If not specified, all fields except `lookupMappingField` from `lookupIndex` are applied to result output. +* `outputField`: optional. A field of output. You can specify zero or multiple `outputField`. If `outputField` has an existing field name in source query, its values will be replaced or appended by matched values from `inputField`. If the field specified in `outputField` is a new field, in replace strategy, an extended new field will be applied to the results, but fail in append strategy. * replace \| append: optional. The output strategies. If replace, matched values in `lookupIndex` field overwrite the values in result. If append, matched values in `lookupIndex` field only append to the missing values in result. **Default:** replace. + ## Usage Lookup @@ -27,9 +30,10 @@ source = table1 | lookup table2 id as cid, name append dept as department source = table1 | lookup table2 id as cid, name append dept as department, city as location ``` + ## Example 1: Replace strategy -This example shows using the lookup command with the REPLACE strategy to overwrite existing values. +The following example PPL query shows how to use `lookup` with the REPLACE strategy to overwrite existing values. ```bash ignore curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ @@ -126,9 +130,10 @@ Result set } ``` + ## Example 2: Append strategy -This example shows using the lookup command with the APPEND strategy to fill missing values only. +The following example PPL query shows how to use `lookup` with the APPEND strategy to fill missing values only. ```bash ignore curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ @@ -140,9 +145,10 @@ curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d }' ``` + ## Example 3: No inputField specified -This example shows using the lookup command without specifying inputField, which applies all fields from the lookup index. +The following example PPL query shows how to use `lookup` without specifying inputField, which applies all fields from the lookup index. ```bash ignore curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ @@ -239,9 +245,10 @@ Result set } ``` + ## Example 4: OutputField as a new field -This example shows using the lookup command with outputField as a new field name. +The following example PPL query shows how to use `lookup` with outputField as a new field name. ```bash ignore curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ diff --git a/docs/user/ppl/cmd/ml.md b/docs/user/ppl/cmd/ml.md index 38098954bf..c26fd00012 100644 --- a/docs/user/ppl/cmd/ml.md +++ b/docs/user/ppl/cmd/ml.md @@ -1,41 +1,46 @@ -# ml +# ml -## Description -Use the `ml` command to train/predict/train and predict on any algorithm in the ml-commons plugin on the search result returned by a PPL command. -## Syntax +The `ml` command trains, predicts, or trains and predicts on any algorithm in the ml-commons plugin on the search results returned by a PPL command. -## AD - Fixed In Time RCF For Time-series Data: +## Syntax -ml action='train' algorithm='rcf' \ \ \ \ \ \ \ \ \ -* number_of_trees: optional integer. Number of trees in the forest. **Default:** 30. -* shingle_size: optional integer. A shingle is a consecutive sequence of the most recent records. **Default:** 8. -* sample_size: optional integer. The sample size used by stream samplers in this forest. **Default:** 256. -* output_after: optional integer. The number of points required by stream samplers before results are returned. **Default:** 32. -* time_decay: optional double. The decay factor used by stream samplers in this forest. **Default:** 0.0001. -* anomaly_rate: optional double. The anomaly rate. **Default:** 0.005. -* time_field: mandatory string. It specifies the time field for RCF to use as time-series data. -* date_format: optional string. It's used for formatting time_field field. **Default:** "yyyy-MM-dd HH:mm:ss". -* time_zone: optional string. It's used for setting time zone for time_field field. **Default:** UTC. -* category_field: optional string. It specifies the category field used to group inputs. Each category will be independently predicted. +The `ml` command supports different syntax options depending on the algorithm. + +## AD - Fixed in time RCF for time-series data + +`ml action='train' algorithm='rcf' ` +* `number_of_trees`: optional integer. Number of trees in the forest. **Default:** 30. +* `shingle_size`: optional integer. A shingle is a consecutive sequence of the most recent records. **Default:** 8. +* `sample_size`: optional integer. The sample size used by stream samplers in this forest. **Default:** 256. +* `output_after`: optional integer. The number of points required by stream samplers before results are returned. **Default:** 32. +* `time_decay`: optional double. The decay factor used by stream samplers in this forest. **Default:** 0.0001. +* `anomaly_rate`: optional double. The anomaly rate. **Default:** 0.005. +* `time_field`: mandatory string. It specifies the time field for RCF to use as time-series data. +* `date_format`: optional string. It's used for formatting time_field field. **Default:** "yyyy-MM-dd HH:mm:ss". +* `time_zone`: optional string. It's used for setting time zone for time_field field. **Default:** UTC. +* `category_field`: optional string. It specifies the category field used to group inputs. Each category will be independently predicted. -## AD - Batch RCF for Non-time-series Data: -ml action='train' algorithm='rcf' \ \ \ \ \ -* number_of_trees: optional integer. Number of trees in the forest. **Default:** 30. -* sample_size: optional integer. Number of random samples given to each tree from the training data set. **Default:** 256. -* output_after: optional integer. The number of points required by stream samplers before results are returned. **Default:** 32. -* training_data_size: optional integer. **Default:** size of your training data set. -* anomaly_score_threshold: optional double. The threshold of anomaly score. **Default:** 1.0. -* category_field: optional string. It specifies the category field used to group inputs. Each category will be independently predicted. +## AD - Batch RCF for non-time-series data: + +`ml action='train' algorithm='rcf' ` +* `number_of_trees`: optional integer. Number of trees in the forest. **Default:** 30. +* `sample_size`: optional integer. Number of random samples given to each tree from the training dataset. **Default:** 256. +* `output_after`: optional integer. The number of points required by stream samplers before results are returned. **Default:** 32. +* `training_data_size`: optional integer. **Default:** size of your training dataset. +* `anomaly_score_threshold`: optional double. The threshold of anomaly score. **Default:** 1.0. +* `category_field`: optional string. It specifies the category field used to group inputs. Each category will be independently predicted. + ## KMEANS: -ml action='train' algorithm='kmeans' \ \ \ -* centroids: optional integer. The number of clusters you want to group your data points into. **Default:** 2. -* iterations: optional integer. Number of iterations. **Default:** 10. -* distance_type: optional string. The distance type can be COSINE, L1, or EUCLIDEAN. **Default:** EUCLIDEAN. +`ml action='train' algorithm='kmeans' ` +* `centroids`: optional integer. The number of clusters you want to group your data points into. **Default:** 2. +* `iterations`: optional integer. Number of iterations. **Default:** 10. +* `distance_type`: optional string. The distance type can be COSINE, L1, or EUCLIDEAN. **Default:** EUCLIDEAN. + ## Example 1: Detecting events in New York City from taxi ridership data with time-series data This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data. @@ -58,6 +63,7 @@ fetched rows / total rows = 1/1 +---------+---------------------+-------+---------------+ ``` + ## Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values. @@ -81,6 +87,7 @@ fetched rows / total rows = 2/2 +----------+---------+---------------------+-------+---------------+ ``` + ## Example 3: Detecting events in New York City from taxi ridership data with non-time-series data This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data. @@ -103,6 +110,7 @@ fetched rows / total rows = 1/1 +---------+-------+-----------+ ``` + ## Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values. @@ -126,9 +134,10 @@ fetched rows / total rows = 2/2 +----------+---------+-------+-----------+ ``` -## Example 5: KMEANS - Clustering of Iris Dataset -This example shows how to use KMEANS to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. +## Example 5: KMEANS - Clustering of iris dataset + +The following example PPL query shows how to use `ml` with KMEANS to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. ```ppl source=iris_data @@ -148,6 +157,7 @@ Expected output: +--------------------+-------------------+--------------------+-------------------+-----------+ ``` + ## Limitations The `ml` command can only work with `plugins.calcite.enabled=false`. \ No newline at end of file diff --git a/docs/user/ppl/cmd/multisearch.md b/docs/user/ppl/cmd/multisearch.md index 0b6e8ae208..ae18687df3 100644 --- a/docs/user/ppl/cmd/multisearch.md +++ b/docs/user/ppl/cmd/multisearch.md @@ -1,8 +1,7 @@ -# multisearch +# multisearch -## Description -Use the `multisearch` command to run multiple search subsearches and merge their results together. The command allows you to combine data from different queries on the same or different sources, and optionally apply subsequent processing to the combined result set. +The `multisearch` command runs multiple search subsearches and merges their results together. The command allows you to combine data from different queries on the same or different sources, and optionally apply subsequent processing to the combined search results. Key aspects of `multisearch`: 1. Combines results from multiple search operations into a single result set. 2. Each subsearch can have different filtering criteria, data transformations, and field selections. @@ -12,17 +11,21 @@ Key aspects of `multisearch`: Use Cases: * **Comparative Analysis**: Compare metrics across different segments, regions, or time periods -* **Success Rate Monitoring**: Calculate success rates by comparing successful vs. total operations -* **Multi-source Data Combination**: Merge data from different indices or apply different filters to the same source +* **Success Rate Monitoring**: Calculate success rates by comparing successful compared to total operations +* **Multi-source Data Combination**: Merge data from different indexes or apply different filters to the same source * **A/B Testing Analysis**: Combine results from different test groups for comparison * **Time-series Data Merging**: Interleave events from multiple sources based on timestamps -## Syntax -multisearch \ \ \ ... +## Syntax + +Use the following syntax: + +`multisearch ...` * subsearch1, subsearch2, ...: mandatory. At least two subsearches required. Each subsearch must be enclosed in square brackets and start with the `search` keyword. Format: `[search source=index | commands...]`. All PPL commands are supported within subsearches. -* result-processing: optional. Commands applied to the merged results after the multisearch operation, such as `stats`, `sort`, `head`, etc. +* `result-processing`: optional. Commands applied to the merged results after the multisearch operation, such as `stats`, `sort`, `head`, etc. + ## Usage Basic multisearch @@ -33,7 +36,8 @@ Basic multisearch | multisearch [search source=table | where status="success"] [search source=table | where status="error"] ``` -## Example 1: Basic Age Group Analysis + +## Example 1: Basic age group analysis This example combines young and adult customers into a single result set for further analysis. @@ -62,7 +66,8 @@ fetched rows / total rows = 4/4 +-----------+-----+-----------+ ``` -## Example 2: Success Rate Pattern + +## Example 2: Success rate Pattern This example combines high-balance and all valid accounts for comparison analysis. @@ -91,7 +96,8 @@ fetched rows / total rows = 4/4 +-----------+---------+--------------+ ``` -## Example 3: Timestamp Interleaving + +## Example 3: Timestamp interleaving This example combines time-series data from multiple sources with automatic timestamp-based ordering. @@ -118,9 +124,10 @@ fetched rows / total rows = 5/5 +---------------------+----------+-------+---------------------+ ``` -## Example 4: Type Compatibility - Missing Fields -This example demonstrates how missing fields are handled with NULL insertion. +## Example 4: Type compatibility - missing fields + +The following example PPL query demonstrates how missing fields are handled with NULL insertion. ```ppl | multisearch [search source=accounts @@ -146,6 +153,7 @@ fetched rows / total rows = 4/4 +-----------+-----+------------+ ``` + ## Limitations * **Minimum Subsearches**: At least two subsearches must be specified diff --git a/docs/user/ppl/cmd/parse.md b/docs/user/ppl/cmd/parse.md index 8e151ad888..168f4cfb49 100644 --- a/docs/user/ppl/cmd/parse.md +++ b/docs/user/ppl/cmd/parse.md @@ -1,20 +1,23 @@ -# parse +# parse -## Description -The `parse` command parses a text field with a regular expression and appends the result to the search result. -## Syntax +The `parse` command extracts information from a text field using a regular expression and adds it to the search result. -parse \ \ -* field: mandatory. The field must be a text field. -* pattern: mandatory. The regular expression pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. +## Syntax + +Use the following syntax: + +`parse ` +* `field`: mandatory. The field must be a text field. +* `pattern`: mandatory. The regular expression pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. -## Regular Expression -The regular expression pattern is used to match the whole text field of each document with Java regex engine. Each named capture group in the expression will become a new `STRING` field. +## Regular expression +The regular expression pattern is used to match the whole text field of each document with Java regex engine. Each named capture group in the expression will become a new `STRING` field. + ## Example 1: Create a new field -This example shows how to create a new field `host` for each document. `host` will be the host name after `@` in `email` field. Parsing a null field will return an empty string. +The following example PPL query shows how to create new field `host` for each document. `host` becomes the hostname after the @ symbol in the `email` field. Parsing a null field returns an empty string. ```ppl source=accounts @@ -36,9 +39,10 @@ fetched rows / total rows = 4/4 +-----------------------+------------+ ``` + ## Example 2: Override an existing field -This example shows how to override the existing `address` field with street number removed. +The following example PPL query shows how to override the existing `address` field while excluding the street number: ```ppl source=accounts @@ -60,9 +64,10 @@ fetched rows / total rows = 4/4 +------------------+ ``` + ## Example 3: Filter and sort by casted parsed field -This example shows how to sort street numbers that are higher than 500 in `address` field. +The following example PPL query shows how to sort street numbers that are higher than 500 in the `address` field. ```ppl source=accounts @@ -85,6 +90,7 @@ fetched rows / total rows = 3/3 +--------------+----------------+ ``` + ## Limitations There are a few limitations with parse command: diff --git a/docs/user/ppl/cmd/patterns.md b/docs/user/ppl/cmd/patterns.md index 7b9cb71889..de4db62ea4 100644 --- a/docs/user/ppl/cmd/patterns.md +++ b/docs/user/ppl/cmd/patterns.md @@ -1,30 +1,32 @@ -# patterns +# patterns -## Description -The `patterns` command extracts log patterns from a text field and appends the results to the search result. Grouping logs by their patterns makes it easier to aggregate stats from large volumes of log data for analysis and troubleshooting. +The `patterns` command extracts log patterns from a text field and appends the results to the search results. Grouping logs by their patterns makes it easier to aggregate stats from large volumes of log data for analysis and troubleshooting. `patterns` command allows users to select different log parsing algorithms to get high log pattern grouping accuracy. Two pattern methods are supported: `simple_pattern` and `brain`. -`simple_pattern` algorithm is basically a regex parsing method vs `brain` algorithm is an automatic log grouping algorithm with high grouping accuracy and keeps semantic meaning. +`simple_pattern` algorithm is basically a regex parsing method compared to `brain` algorithm is an automatic log grouping algorithm with high grouping accuracy and keeps semantic meaning. `patterns` command supports two modes: `label` and `aggregation`. `label` mode returns individual pattern labels. `aggregation` mode returns aggregated results on target field. Calcite engine by default labels the variables with '\<*\>' placeholder. If `show_numbered_token` option is turned on, Calcite engine's `label` mode not only labels pattern of text but also labels variable tokens in map. In `aggregation` mode, it will also output labeled pattern as well as variable tokens per pattern. The variable placeholder is in the format of '' instead of '<\*>'. -## Syntax +## Syntax -patterns \ [by byClause...] [method=simple_pattern \| brain] [mode=label \| aggregation] [max_sample_count=integer] [buffer_limit=integer] [show_numbered_token=boolean] [new_field=\] (algorithm parameters...) -* field: mandatory. The text field to analyze for patterns. -* byClause: optional. Fields or scalar functions used to group logs for labeling/aggregation. -* method: optional. Algorithm choice: `simple_pattern` or `brain`. **Default:** `simple_pattern`. -* mode: optional. Output mode: `label` or `aggregation`. **Default:** `label`. -* max_sample_count: optional. Max sample logs returned per pattern in aggregation mode. **Default:** 10. -* buffer_limit: optional. Safeguard parameter for `brain` algorithm to limit internal temporary buffer size (min: 50,000). **Default:** 100,000. -* show_numbered_token: optional. The flag to turn on numbered token output format. **Default:** false. -* new_field: optional. Alias of the output pattern field. **Default:** "patterns_field". +Use the following syntax: + +`patterns [by byClause...] [method=simple_pattern | brain] [mode=label | aggregation] [max_sample_count=integer] [buffer_limit=integer] [show_numbered_token=boolean] [new_field=] (algorithm parameters...)` +* `field`: mandatory. The text field to analyze for patterns. +* `byClause`: optional. Fields or scalar functions used to group logs for labeling/aggregation. +* `method`: optional. Algorithm choice: `simple_pattern` or `brain`. **Default:** `simple_pattern`. +* `mode`: optional. Output mode: `label` or `aggregation`. **Default:** `label`. +* `max_sample_count`: optional. Max sample logs returned per pattern in aggregation mode. **Default:** 10. +* `buffer_limit`: optional. Safeguard parameter for `brain` algorithm to limit internal temporary buffer size (min: 50,000). **Default:** 100,000. +* `show_numbered_token`: optional. The flag to turn on numbered token output format. **Default:** false. +* `new_field`: optional. Alias of the output pattern field. **Default:** "patterns_field". * algorithm parameters: optional. Algorithm-specific tuning: - * `simple_pattern`: Define regex via "pattern". + * `simple_pattern`: Define regex through "pattern". * `brain`: Adjust sensitivity with variable_count_threshold and frequency_threshold_percentage. * `variable_count_threshold`: optional integer. Words are split by space. Algorithm counts how many distinct words are at specific position in initial log groups. Adjusting this threshold can determine the sensitivity of constant words. **Default:** 5. * `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low frequency words. **Default:** 0.3. + ## Change the default pattern method To override default pattern parameters, users can run following command @@ -42,9 +44,10 @@ To override default pattern parameters, users can run following command } ``` -## Simple Pattern Example 1: Create the new field -This example shows how to extract patterns in `email` for each document. Parsing a null field will return an empty string. +## Simple pattern example 1: Create the new field + +The following example PPL query shows how to use `patterns` to extract patterns in `email` for each document. Parsing a null field will return an empty string. ```ppl source=accounts @@ -66,9 +69,10 @@ fetched rows / total rows = 4/4 +-----------------------+----------------+ ``` -## Simple Pattern Example 2: Extract log patterns -This example shows how to extract patterns from a raw log field using the default patterns. +## Simple pattern example 2: Extract log patterns + +The following example PPL query shows how to use `patterns` to extract patterns from a raw log field using the default patterns. ```ppl source=apache @@ -90,9 +94,10 @@ fetched rows / total rows = 4/4 +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+ ``` -## Simple Pattern Example 3: Extract log patterns with custom regex pattern -This example shows how to extract patterns from a raw log field using user defined patterns. +## Simple pattern example 3: Extract log patterns with custom regex pattern + +The following example PPL query shows how to use `patterns` to extract patterns from a raw log field using user defined patterns. ```ppl source=apache @@ -114,9 +119,10 @@ fetched rows / total rows = 4/4 +-----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` -## Simple Pattern Example 4: Return log patterns aggregation result -This example shows how to get aggregated results from a raw log field. +## Simple pattern example 4: Return log patterns aggregation result + +The following example PPL query shows how to use `patterns` to get aggregated results from a raw log field. ```ppl source=apache @@ -138,9 +144,11 @@ fetched rows / total rows = 4/4 +---------------------------------------------------------------------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------+ ``` -## Simple Pattern Example 5: Return log patterns aggregation result with detected variable tokens -This example shows how to get aggregated results with detected variable tokens. +## Simple pattern example 5: Return log patterns aggregation result with detected variable tokens + +The following example PPL query shows how to use `patterns` to get aggregated results with detected variable tokens. + ## Configuration With option `show_numbered_token` enabled, the output can detect numbered variable tokens from the pattern field. @@ -163,9 +171,10 @@ fetched rows / total rows = 1/1 +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` + ## Brain Example 1: Extract log patterns -This example shows how to extract semantic meaningful log patterns from a raw log field using the brain algorithm. The default variable count threshold is 5. +The following example PPL query shows how to use `patterns` to extract semantic meaningful log patterns from a raw log field using the brain algorithm. The default variable count threshold is 5. ```ppl source=apache @@ -187,9 +196,10 @@ fetched rows / total rows = 4/4 +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+ ``` + ## Brain Example 2: Extract log patterns with custom parameters -This example shows how to extract semantic meaningful log patterns from a raw log field using custom parameters of the brain algorithm. +The following example PPL query shows how to use `patterns` to extract semantic meaningful log patterns from a raw log field using custom parameters of the brain algorithm. ```ppl source=apache @@ -211,9 +221,10 @@ fetched rows / total rows = 4/4 +-----------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------+ ``` + ## Brain Example 3: Return log patterns aggregation result -This example shows how to get aggregated results from a raw log field using the brain algorithm. +The following example PPL query shows how to use `patterns` to get aggregated results from a raw log field using the brain algorithm. ```ppl source=apache @@ -232,9 +243,10 @@ fetched rows / total rows = 1/1 +----------------------------------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` + ## Brain Example 4: Return log patterns aggregation result with detected variable tokens -This example shows how to get aggregated results with detected variable tokens using the brain algorithm. +The following example PPL query shows how to use `patterns` to get aggregated results with detected variable tokens using the brain algorithm. With option `show_numbered_token` enabled, the output can detect numbered variable tokens from the pattern field. @@ -255,6 +267,7 @@ fetched rows / total rows = 1/1 +----------------------------------------------------------------------------------------------------------------------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` + ## Limitations - Patterns command is not pushed down to OpenSearch data node for now. It will only group log patterns on log messages returned to coordinator node. \ No newline at end of file diff --git a/docs/user/ppl/cmd/rare.md b/docs/user/ppl/cmd/rare.md index 6ee51c9f96..1302c99935 100644 --- a/docs/user/ppl/cmd/rare.md +++ b/docs/user/ppl/cmd/rare.md @@ -1,24 +1,28 @@ -# rare +# rare -## Description The `rare` command finds the least common tuple of values of all fields in the field list. + **Note**: A maximum of 10 results is returned for each distinct tuple of values of the group-by fields. -## Syntax -rare [rare-options] \ [by-clause] -* field-list: mandatory. Comma-delimited list of field names. -* by-clause: optional. One or more fields to group the results by. -* rare-options: optional. Options for the rare command. Supported syntax is [countfield=\] [showcount=\]. +## Syntax + +Use the following syntax: + +`rare [rare-options] [by-clause]` +* `field-list`: mandatory. Comma-delimited list of field names. +* `by-clause`: optional. One or more fields to group the results by. +* `rare-options`: optional. Options for the rare command. Supported syntax is [countfield=\] [showcount=\]. * showcount=\: optional. Whether to create a field in output that represent a count of the tuple of values. **Default:** `true`. * countfield=\: optional. The name of the field that contains count. **Default:** `'count'`. * usenull=\: optional. whether to output the null value. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`: * When `plugins.ppl.syntax.legacy.preferred=true`, `usenull` defaults to `true` * When `plugins.ppl.syntax.legacy.preferred=false`, `usenull` defaults to `false` + ## Example 1: Find the least common values in a field -This example shows how to find the least common gender of all the accounts. +The following example PPL query shows how to use `rare` to find the least common gender of all the accounts. ```ppl source=accounts @@ -37,9 +41,10 @@ fetched rows / total rows = 2/2 +--------+ ``` + ## Example 2: Find the least common values organized by gender -This example shows how to find the least common age of all the accounts grouped by gender. +The following example PPL query shows how to use `rare` to find the least common age of all the accounts grouped by gender. ```ppl source=accounts @@ -60,9 +65,10 @@ fetched rows / total rows = 4/4 +--------+-----+ ``` + ## Example 3: Rare command -This example shows how to find the least common gender of all the accounts. +The following example PPL query shows how to use `rare` to find the least common gender of all the accounts. ```ppl source=accounts @@ -81,9 +87,10 @@ fetched rows / total rows = 2/2 +--------+-------+ ``` + ## Example 4: Specify the count field option -This example shows how to specify the count field. +The following example PPL query shows how to use `rare` to specify the count field. ```ppl source=accounts @@ -102,6 +109,7 @@ fetched rows / total rows = 2/2 +--------+-----+ ``` + ## Example 5: Specify the usenull field option ```ppl @@ -141,6 +149,7 @@ fetched rows / total rows = 4/4 +-----------------------+-------+ ``` + ## Limitations -The `rare` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. \ No newline at end of file +The `rare` command is not rewritten to [query domain-specific language (DSL)](https://opensearch.org/docs/latest/query-dsl/index/). It is only run on the coordinating node. \ No newline at end of file diff --git a/docs/user/ppl/cmd/regex.md b/docs/user/ppl/cmd/regex.md index d108b635ab..7314788909 100644 --- a/docs/user/ppl/cmd/regex.md +++ b/docs/user/ppl/cmd/regex.md @@ -1,18 +1,21 @@ -# regex +# regex -## Description The `regex` command filters search results by matching field values against a regular expression pattern. Only documents where the specified field matches the pattern are included in the results. -## Syntax -regex \ = \ -regex \ != \ -* field: mandatory. The field name to match against. -* pattern: mandatory string. The regular expression pattern to match. Supports Java regex syntax including named groups, lookahead/lookbehind, and character classes. +## Syntax + +Use the following syntax: + +`regex = ` +`regex != ` +* `field`: mandatory. The field name to match against. +* `pattern`: mandatory string. The regular expression pattern to match. Supports Java regex syntax including named groups, lookahead/lookbehind, and character classes. * = : operator for positive matching (include matches) * != : operator for negative matching (exclude matches) -## Regular Expression Engine + +## Regular expression engine The regex command uses Java's built-in regular expression engine, which supports: * **Standard regex features**: Character classes, quantifiers, anchors @@ -21,9 +24,10 @@ The regex command uses Java's built-in regular expression engine, which supports * **Inline flags**: Case-insensitive `(?i)`, multiline `(?m)`, dotall `(?s)`, and other modes For complete documentation of Java regex patterns and available modes, see the [Java Pattern documentation](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). + ## Example 1: Basic pattern matching -This example shows how to filter documents where the `lastname` field matches names starting with uppercase letters. +The following example PPL query shows how to use `regex` to filter documents where the `lastname` field matches names starting with uppercase letters. ```ppl source=accounts @@ -45,9 +49,10 @@ fetched rows / total rows = 4/4 +----------------+-----------+----------+ ``` + ## Example 2: Negative matching -This example shows how to exclude documents where the `lastname` field ends with "son". +The following example PPL query shows how to use `regex` to exclude documents where the `lastname` field ends with "son". ```ppl source=accounts @@ -69,9 +74,10 @@ fetched rows / total rows = 4/4 +----------------+----------+ ``` + ## Example 3: Email domain matching -This example shows how to filter documents by email domain patterns. +The following example PPL query shows how to use `regex` to filter documents by email domain patterns. ```ppl source=accounts @@ -90,9 +96,10 @@ fetched rows / total rows = 1/1 +----------------+----------------------+ ``` + ## Example 4: Complex patterns with character classes -This example shows how to use complex regex patterns with character classes and quantifiers. +The following example PPL query shows how to use `regex` with complex regex patterns with character classes and quantifiers. ```ppl source=accounts | regex address="\\d{3,4}\\s+[A-Z][a-z]+\\s+(Street|Lane|Court)" | fields account_number, address @@ -112,9 +119,10 @@ fetched rows / total rows = 4/4 +----------------+----------------------+ ``` + ## Example 5: Case-sensitive matching -This example demonstrates that regex matching is case-sensitive by default. +The following example PPL query demonstrates that regex matching is case-sensitive by default. ```ppl source=accounts @@ -149,6 +157,7 @@ fetched rows / total rows = 1/1 +----------------+-------+ ``` + ## Limitations * **Field specification required**: A field name must be specified in the regex command. Pattern-only syntax (e.g., `regex "pattern"`) is not currently supported diff --git a/docs/user/ppl/cmd/rename.md b/docs/user/ppl/cmd/rename.md index 346513f232..c222ca20b9 100644 --- a/docs/user/ppl/cmd/rename.md +++ b/docs/user/ppl/cmd/rename.md @@ -1,24 +1,28 @@ -# rename +# rename -## Description -The `rename` command renames one or more fields in the search result. -## Syntax +The `rename` command renames one or more fields in the search results. -rename \ AS \["," \ AS \]... -* source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns using `*`. -* target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source. +## Syntax + +Use the following syntax: + +`rename AS ["," AS ]...` +* `source-field`: mandatory. The name of the field you want to rename. Supports wildcard patterns using `*`. +* `target-field`: mandatory. The name you want to rename to. Must have same number of wildcards as the source. + ## Behavior The rename command handles non-existent fields as follows: -* **Renaming a non-existent field to a non-existent field**: No change occurs to the result set. -* **Renaming a non-existent field to an existing field**: The existing target field is removed from the result set. +* **Renaming a non-existent field to a non-existent field**: No change occurs to the search results. +* **Renaming a non-existent field to an existing field**: The existing target field is removed from the search results. * **Renaming an existing field to an existing field**: The existing target field is removed and the source field is renamed to the target. + ## Example 1: Rename one field -This example shows how to rename one field. +The following example PPL query shows how to use `rename` to rename one field. ```ppl source=accounts @@ -40,9 +44,10 @@ fetched rows / total rows = 4/4 +----+ ``` + ## Example 2: Rename multiple fields -This example shows how to rename multiple fields. +The following example PPL query shows how to use `rename` to rename multiple fields. ```ppl source=accounts @@ -64,9 +69,10 @@ fetched rows / total rows = 4/4 +----+---------+ ``` + ## Example 3: Rename with wildcards -This example shows how to rename multiple fields using wildcard patterns. +The following example PPL query shows how to use `rename` to rename multiple fields using wildcard patterns. ```ppl source=accounts @@ -88,9 +94,10 @@ fetched rows / total rows = 4/4 +------------+-----------+ ``` + ## Example 4: Rename with multiple wildcard patterns -This example shows how to rename multiple fields using multiple wildcard patterns. +The following example PPL query shows how to use `rename` to rename multiple fields using multiple wildcard patterns. ```ppl source=accounts @@ -112,9 +119,10 @@ fetched rows / total rows = 4/4 +------------+-----------+---------------+ ``` + ## Example 5: Rename existing field to existing field -This example shows how to rename an existing field to an existing field. The target field gets removed and the source field is renamed to the target field. +The following example PPL query shows how to use `rename` to rename an existing field to an existing field. The target field gets removed and the source field is renamed to the target field. ```ppl source=accounts @@ -136,7 +144,8 @@ fetched rows / total rows = 4/4 +---------+ ``` + ## Limitations -The `rename` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. +The `rename` command is not rewritten to [query domain-specific language (DSL)](https://opensearch.org/docs/latest/query-dsl/index/). It is only run on the coordinating node. Literal asterisk (*) characters in field names cannot be replaced as asterisk is used for wildcard matching. \ No newline at end of file diff --git a/docs/user/ppl/cmd/replace.md b/docs/user/ppl/cmd/replace.md index 2333f46b3b..92d7c68ed0 100644 --- a/docs/user/ppl/cmd/replace.md +++ b/docs/user/ppl/cmd/replace.md @@ -1,18 +1,21 @@ -# replace +# replace -## Description -The `replace` replaces text in one or more fields in the search result. Supports literal string replacement and wildcard patterns using `*`. -## Syntax +The `replace` command replaces text in one or more fields in the search results. Supports literal string replacement and wildcard patterns using `*`. -replace '\' WITH '\' [, '\' WITH '\']... IN \[, \]... -* pattern: mandatory. The text pattern you want to replace. -* replacement: mandatory. The text you want to replace with. -* field-name: mandatory. One or more field names where the replacement should occur. +## Syntax + +Use the following syntax: + +`replace '' WITH '' [, '' WITH '']... IN [, ]...` +* `pattern`: mandatory. The text pattern you want to replace. +* `replacement`: mandatory. The text you want to replace with. +* `field-name`: mandatory. One or more field names where the replacement should occur. + ## Example 1: Replace text in one field -This example shows replacing text in one field. +The following example PPL query shows how to use `replace` to replace text in one field. ```ppl source=accounts @@ -34,9 +37,10 @@ fetched rows / total rows = 4/4 +----------+ ``` + ## Example 2: Replace text in multiple fields -This example shows replacing text in multiple fields. +The following example PPL query shows how to use `replace` to replace text in multiple fields. ```ppl source=accounts @@ -58,9 +62,10 @@ fetched rows / total rows = 4/4 +----------+----------------------+ ``` + ## Example 3: Replace with other commands in a pipeline -This example shows using replace with other commands in a query pipeline. +The following example PPL query shows how to use `replace` with other commands in a query pipeline. ```ppl source=accounts @@ -82,9 +87,10 @@ fetched rows / total rows = 3/3 +----------+-----+ ``` + ## Example 4: Replace with multiple pattern/replacement pairs -This example shows using multiple pattern/replacement pairs in a single replace command. The replacements are applied sequentially. +The following example PPL query shows how to use `replace` with multiple pattern/replacement pairs in a single replace command. The replacements are applied sequentially. ```ppl source=accounts @@ -106,6 +112,7 @@ fetched rows / total rows = 4/4 +-----------+ ``` + ## Example 5: Pattern matching with LIKE and replace Since replace command only supports plain string literals, you can use LIKE command with replace for pattern matching needs. @@ -128,6 +135,7 @@ fetched rows / total rows = 1/1 +-----------------+-------+--------+-----+--------+ ``` + ## Example 6: Wildcard suffix match Replace values that end with a specific pattern. The wildcard `*` matches any prefix. @@ -152,6 +160,7 @@ fetched rows / total rows = 4/4 +----------+ ``` + ## Example 7: Wildcard prefix match Replace values that start with a specific pattern. The wildcard `*` matches any suffix. @@ -176,6 +185,7 @@ fetched rows / total rows = 4/4 +----------+ ``` + ## Example 8: Wildcard capture and substitution Use wildcards in both pattern and replacement to capture and reuse matched portions. The number of wildcards must match in pattern and replacement. @@ -200,6 +210,7 @@ fetched rows / total rows = 4/4 +----------------------+ ``` + ## Example 9: Multiple wildcards for pattern transformation Use multiple wildcards to transform patterns. Each wildcard in the replacement substitutes the corresponding captured value. @@ -224,6 +235,7 @@ fetched rows / total rows = 4/4 +----------------------+ ``` + ## Example 10: Wildcard with zero wildcards in replacement When replacement has zero wildcards, all matching values are replaced with the literal replacement string. @@ -248,6 +260,7 @@ fetched rows / total rows = 4/4 +----------+ ``` + ## Example 11: Matching literal asterisks Use `\*` to match literal asterisk characters (`\*` = literal asterisk, `\\` = literal backslash). @@ -273,6 +286,7 @@ fetched rows / total rows = 4/4 +------------+ ``` + ## Example 12: Wildcard with no replacement wildcards Use wildcards in pattern but none in replacement to create a fixed output. @@ -298,6 +312,7 @@ fetched rows / total rows = 4/4 +---------+ ``` + ## Example 13: Escaped asterisks with wildcards Combine escaped asterisks (literal) with wildcards for complex patterns. @@ -323,8 +338,9 @@ fetched rows / total rows = 4/4 +----------+ ``` + ## Limitations -* Wildcards: `*` matches zero or more characters (case-sensitive) +* `Wildcards`: `*` matches zero or more characters (case-sensitive) * Replacement wildcards must match pattern wildcard count, or be zero * Escape sequences: `\*` (literal asterisk), `\\` (literal backslash) \ No newline at end of file diff --git a/docs/user/ppl/cmd/reverse.md b/docs/user/ppl/cmd/reverse.md index f63a8f18e9..c967b826e7 100644 --- a/docs/user/ppl/cmd/reverse.md +++ b/docs/user/ppl/cmd/reverse.md @@ -1,19 +1,23 @@ -# reverse +# reverse -## Description The `reverse` command reverses the display order of search results. The same results are returned, but in reverse order. -## Syntax -reverse +## Syntax + +Use the following syntax: + +`reverse` * No parameters: The reverse command takes no arguments or options. + ## Note The `reverse` command processes the entire dataset. If applied directly to millions of records, it will consume significant memory resources on the coordinating node. Users should only apply the `reverse` command to smaller datasets, typically after aggregation operations. + ## Example 1: Basic reverse operation -This example shows reversing the order of all documents. +The following example PPL query shows how to use `reverse` to reverse the order of all documents. ```ppl source=accounts @@ -35,9 +39,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 2: Reverse with sort -This example shows reversing results after sorting by age in ascending order, effectively giving descending order. +The following example PPL query shows how to use `reverse` to reverse results after sorting by age in ascending order, effectively giving descending order. ```ppl source=accounts @@ -60,9 +65,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 3: Reverse with head -This example shows using reverse with head to get the last 2 records from the original order. +The following example PPL query shows how to use `reverse` with head to get the last 2 records from the original order. ```ppl source=accounts @@ -83,9 +89,10 @@ fetched rows / total rows = 2/2 +----------------+-----+ ``` + ## Example 4: Double reverse -This example shows that applying reverse twice returns to the original order. +The following example PPL query demonstrates that applying reverse twice returns to the original order. ```ppl source=accounts @@ -108,9 +115,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 5: Reverse with complex pipeline -This example shows reverse working with filtering and field selection. +The following example PPL query shows how to use `reverse` with filtering and field selection. ```ppl source=accounts diff --git a/docs/user/ppl/cmd/rex.md b/docs/user/ppl/cmd/rex.md index 0f117373d8..42cb6de5bb 100644 --- a/docs/user/ppl/cmd/rex.md +++ b/docs/user/ppl/cmd/rex.md @@ -1,14 +1,16 @@ -# rex +# rex -## Description The `rex` command extracts fields from a raw text field using regular expression named capture groups. -## Syntax -rex [mode=\] field=\ \ [max_match=\] [offset_field=\] -* field: mandatory. The field must be a string field to extract data from. -* pattern: mandatory string. The regular expression pattern with named capture groups used to extract new fields. Pattern must contain at least one named capture group using `(?pattern)` syntax. -* mode: optional. Either `extract` or `sed`. **Default:** extract +## Syntax + +Use the following syntax: + +`rex [mode=] field= [max_match=] [offset_field=]` +* `field`: mandatory. The field must be a string field to extract data from. +* `pattern`: mandatory string. The regular expression pattern with named capture groups used to extract new fields. Pattern must contain at least one named capture group using `(?pattern)` syntax. +* `mode`: optional. Either `extract` or `sed`. **Default:** extract * **extract mode** (default): Creates new fields from regular expression named capture groups. This is the standard field extraction behavior. * **sed mode**: Performs text substitution on the field using sed-style patterns * `s/pattern/replacement/` - Replace first occurrence @@ -16,12 +18,13 @@ rex [mode=\] field=\ \ [max_match=\] [offset_fiel * `s/pattern/replacement/n` - Replace only the nth occurrence (where n is a number) * `y/from_chars/to_chars/` - Character-by-character transliteration * Backreferences: `\1`, `\2`, etc. reference captured groups in replacement -* max_match: optional integer (default=1). Maximum number of matches to extract. If greater than 1, extracted fields become arrays. The value 0 means unlimited matches, but is automatically capped to the configured limit (default: 10, configurable via `plugins.ppl.rex.max_match.limit`). -* offset_field: optional string. Field name to store the character offset positions of matches. Only available in extract mode. +* `max_match`: optional integer (default=1). Maximum number of matches to extract. If greater than 1, extracted fields become arrays. The value 0 means unlimited matches, but is automatically capped to the configured limit (default: 10, configurable through `plugins.ppl.rex.max_match.limit`). +* `offset_field`: optional string. Field name to store the character offset positions of matches. Only available in extract mode. -## Example 1: Basic Field Extraction -This example shows extracting username and domain from email addresses using named capture groups. Both extracted fields are returned as string type. +## Example 1: Basic field Extraction + +The following example PPL query shows how to use `rex` to extract username and domain from email addresses using named capture groups. Both extracted fields are returned as string type. ```ppl source=accounts @@ -42,9 +45,10 @@ fetched rows / total rows = 2/2 +-----------------------+------------+--------+ ``` -## Example 2: Handling Non-matching Patterns -This example shows the rex command returning all events, setting extracted fields to null for non-matching patterns. Extracted fields would be string type when matches are found. +## Example 2: Handling non-matching Patterns + +The following example PPL query shows that the rex command returns all events, setting extracted fields to null for non-matching patterns. Extracted fields would be string type when matches are found. ```ppl source=accounts @@ -65,9 +69,10 @@ fetched rows / total rows = 2/2 +-----------------------+------+--------+ ``` -## Example 3: Multiple Matches with max_match -This example shows extracting multiple words from address field using max_match parameter. The extracted field is returned as an array type containing string elements. +## Example 3: Multiple matches with max_match + +The following example PPL query shows how to use `rex` to extract multiple words from address field using max_match parameter. The extracted field is returned as an array type containing string elements. ```ppl source=accounts @@ -89,9 +94,10 @@ fetched rows / total rows = 3/3 +--------------------+------------------+ ``` -## Example 4: Text Replacement with mode=sed -This example shows replacing email domains using sed mode for text substitution. The extracted field is returned as string type. +## Example 4: Text replacement with mode=sed + +The following example PPL query shows how to use `rex` to replace email domains using sed mode for text substitution. The extracted field is returned as string type. ```ppl source=accounts @@ -112,9 +118,10 @@ fetched rows / total rows = 2/2 +------------------------+ ``` + ## Example 5: Using offset_field -This example shows tracking the character positions where matches occur. Extracted fields are string type, and the offset_field is also string type. +The following example PPL query shows how to use `rex` to track the character positions where matches occur. Extracted fields are string type, and the offset_field is also string type. ```ppl source=accounts @@ -135,9 +142,10 @@ fetched rows / total rows = 2/2 +-----------------------+------------+--------+---------------------------+ ``` -## Example 6: Complex Email Pattern -This example shows extracting comprehensive email components including top-level domain. All extracted fields are returned as string type. +## Example 6: Complex email Pattern + +The following example PPL query shows how to use `rex` to extract comprehensive email components including top-level domain. All extracted fields are returned as string type. ```ppl source=accounts @@ -158,9 +166,10 @@ fetched rows / total rows = 2/2 +-----------------------+------------+--------+-----+ ``` -## Example 7: Chaining Multiple rex Commands -This example shows extracting initial letters from both first and last names. All extracted fields are returned as string type. +## Example 7: Chaining multiple rex Commands + +The following example PPL query shows how to use `rex` to extract initial letters from both first and last names. All extracted fields are returned as string type. ```ppl source=accounts @@ -183,9 +192,10 @@ fetched rows / total rows = 3/3 +-----------+----------+--------------+-------------+ ``` -## Example 8: Named Capture Group Limitations -This example demonstrates naming restrictions for capture groups. Group names cannot contain underscores due to Java regex limitations. +## Example 8: Named capture group limitations + +The following example PPL query demonstrates naming restrictions for capture groups. Group names cannot contain underscores due to Java regex limitations. Invalid PPL query with underscores ```ppl @@ -222,9 +232,10 @@ fetched rows / total rows = 2/2 +-----------------------+------------+-------------+ ``` -## Example 9: Max Match Limit Protection -This example demonstrates the max_match limit protection mechanism. When max_match=0 (unlimited) is specified, the system automatically caps it to prevent memory exhaustion. +## Example 9: Max match limit protection + +The following example PPL query demonstrates the max_match limit protection mechanism. When max_match=0 (unlimited) is specified, the system automatically caps it to prevent memory exhaustion. PPL query with max_match=0 automatically capped to default limit of 10 ```ppl @@ -262,7 +273,8 @@ Expected output: Error: Query returned no data ``` -## Comparison with Related Commands + +## Comparison with related commands | Feature | rex | parse | | --- | --- | --- | @@ -274,6 +286,7 @@ Error: Query returned no data | Offset Tracking | Yes | No | | Special Characters in Group Names | No | No | + ## Limitations **Named Capture Group Naming:** @@ -288,4 +301,4 @@ Error: Query returned no data * The `max_match` parameter is subject to a configurable system limit to prevent memory exhaustion * When `max_match=0` (unlimited) is specified, it is automatically capped at the configured limit (default: 10) * User-specified values exceeding the configured limit will result in an error -* Users can adjust the limit via the `plugins.ppl.rex.max_match.limit` cluster setting. Setting this limit to a large value is not recommended as it can lead to excessive memory consumption, especially with patterns that match empty strings (e.g., `\d*`, `\w*`) \ No newline at end of file +* Users can adjust the limit through the `plugins.ppl.rex.max_match.limit` cluster setting. Setting this limit to a large value is not recommended as it can lead to excessive memory consumption, especially with patterns that match empty strings (e.g., `\d*`, `\w*`) \ No newline at end of file diff --git a/docs/user/ppl/cmd/search.md b/docs/user/ppl/cmd/search.md index f05f47aa19..0d43e19960 100644 --- a/docs/user/ppl/cmd/search.md +++ b/docs/user/ppl/cmd/search.md @@ -1,16 +1,19 @@ -# search +# search -## Description -The `search` command retrieves document from the index. The `search` command can only be used as the first command in the PPL query. -## Syntax +The `search` command retrieves documents from the index. The `search` command can only be used as the first command in the PPL query. -search source=[\:]\ [search-expression] -* search: search keyword, which could be ignored. -* index: mandatory. search command must specify which index to query from. The index name can be prefixed by "\:" for cross-cluster search. -* search-expression: optional. Search expression that gets converted to OpenSearch [query_string](https://docs.opensearch.org/latest/query-dsl/full-text/query-string/) function which uses [Lucene Query Syntax](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html). +## Syntax + +Use the following syntax: + +`search source=[:] [search-expression]` +* `search`: search keyword, which could be ignored. +* `index`: mandatory. search command must specify which index to query from. The index name can be prefixed by "\:" for cross-cluster search. +* `search-expression`: optional. Search expression that gets converted to OpenSearch [query_string](https://docs.opensearch.org/latest/query-dsl/full-text/query-string/) function which uses [Lucene Query Syntax](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html). -## Search Expression + +## Search expression The search expression syntax supports: * **Full text search**: `error` or `"error message"` - Searches the default field configured by the `index.query.default_field` setting (defaults to `*` which searches all fields) @@ -22,12 +25,12 @@ The search expression syntax supports: * **Wildcards**: `*` (zero or more characters), `?` (exactly one character) **Full Text Search**: Unlike other PPL commands, search supports both quoted and unquoted strings. Unquoted terms are limited to alphanumeric characters, hyphens, underscores, and wildcards. Any other characters require double quotes. -* Unquoted: `search error`, `search user-123`, `search log_*` -* Quoted: `search "error message"`, `search "user@example.com"` +* `Unquoted`: `search error`, `search user-123`, `search log_*` +* `Quoted`: `search "error message"`, `search "user@example.com"` **Field Values**: Follow the same quoting rules as search text. -* Unquoted: `status=active`, `code=ERR-401` -* Quoted: `email="user@example.com"`, `message="server error"` +* `Unquoted`: `status=active`, `code=ERR-401` +* `Quoted`: `email="user@example.com"`, `message="server error"` **Time Modifiers**: Filter search results by time range using the implicit `@timestamp` field. [Time modifiers support the same formats as the EARLIEST and LATEST condition functions.](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/functions/condition.md#earliest): 1. **Current time**: `now` or `now()` - the current time @@ -53,7 +56,8 @@ Read more details on time modifiers in the [PPL relative_timestamp documentation * **Column name conflicts**: If your data contains columns named "earliest" or "latest", use backticks to access them as regular fields (e.g., `` `earliest`="value"``) to avoid conflicts with time modifier syntax. * **Time snap syntax**: Time modifiers with chained time offsets must be wrapped in quotes (e.g., `latest='+1d@month-10h'`) for proper query parsing. -## Default Field Configuration + +## Default field configuration When you search without specifying a field, it searches the default field configured by the `index.query.default_field` index setting (defaults to `*` which searches all fields). You can check or modify the default field setting @@ -62,40 +66,43 @@ You can check or modify the default field setting { "index.query.default_field": "firstname,lastname,email" } -## Field Types and Search Behavior + +## Field types and search behavior **Text Fields**: Full-text search, phrase search * `search message="error occurred" source=logs` -* Limitations: Wildcards apply to terms after analysis, not entire field value. +* `Limitations`: Wildcards apply to terms after analysis, not entire field value. **Keyword Fields**: Exact matching, wildcard patterns * `search status="ACTIVE" source=logs` -* Limitations: No text analysis, case-sensitive matching +* `Limitations`: No text analysis, case-sensitive matching **Numeric Fields**: Range queries, exact matching, IN operator * `search age>=18 AND balance<50000 source=accounts` -* Limitations: No wildcard or text search support +* `Limitations`: No wildcard or text search support **Date Fields**: Range queries, exact matching, IN operator * `search timestamp>="2024-01-01" source=logs` -* Limitations: Must use index mapping date format, no wildcards +* `Limitations`: Must use index mapping date format, no wildcards **Boolean Fields**: true/false values only, exact matching, IN operator * `search active=true source=users` -* Limitations: No wildcards or range queries +* `Limitations`: No wildcards or range queries **IP Fields**: Exact matching, CIDR notation * `search client_ip="192.168.1.0/24" source=logs` -* Limitations: No wildcards for partial IP matching. For wildcard search use multi field with keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'` +* `Limitations`: No wildcards for partial IP matching. For wildcard search use multi field with keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'` **Field Type Performance Tips**: * Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy * For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message` -## Cross-Cluster Search + +## Cross-cluster search Cross-cluster search lets any node in a cluster execute search requests against other clusters. Refer to [Cross-Cluster Search](../admin/cross_cluster_search.md) for configuration. -## Example 1: Text Search + +## Example 1: Text search **Basic Text Search** (unquoted single term) @@ -174,7 +181,7 @@ fetched rows / total rows = 1/1 +--------------------------------------------------------------------------------------------------------------------+ ``` -### Mixed Phrase and Boolean +### Mixed phrase and boolean ```ppl search "User authentication" OR OAuth2 source=otellogs @@ -194,9 +201,12 @@ fetched rows / total rows = 1/1 +----------------------------------------------------------------------------------------------------------+ ``` -## Example 2: Boolean Logic and Operator Precedence -### Boolean Operators +## Example 2: Boolean logic and operator precedence + +The following examples demonstrate boolean operators and precedence. + +### Boolean operators ```ppl search severityText="ERROR" OR severityText="FATAL" source=otellogs @@ -256,8 +266,9 @@ fetched rows / total rows = 2/2 +--------------+----------------+ ``` -The above evaluates as `(severityText="ERROR" OR severityText="WARN") AND severityNumber>15` -## Example 3: NOT vs != Semantics +The preceding expression evaluates as `(severityText="ERROR" OR severityText="WARN") AND severityNumber>15` + +## Example 3: NOT compared to != Semantics **!= operator** (field must exist and not equal the value) @@ -298,9 +309,12 @@ fetched rows / total rows = 3/3 **Key difference**: `!=` excludes null values, `NOT` includes them. Dale Adams (account 18) has `employer=null`. He appears in `NOT employer="Quility"` but not in `employer!="Quility"`. + ## Example 4: Wildcards -### Wildcard Patterns +The following examples demonstrate wildcard pattern matching. + +### Wildcard patterns ```ppl search severityText=ERR* source=otellogs @@ -367,7 +381,8 @@ fetched rows / total rows = 3/3 +--------------+ ``` -## Example 5: Range Queries + +## Example 5: Range queries Use comparison operators (>, <, >=, <=) to filter numeric and date fields within specific ranges. Range queries are particularly useful for filtering by age, price, timestamps, or any numeric metrics. @@ -407,7 +422,8 @@ fetched rows / total rows = 1/1 +---------------------------------------------------------+ ``` -## Example 6: Field Search with Wildcards + +## Example 6: Field search with Wildcards When searching in text or keyword fields, wildcards enable partial matching. This is particularly useful for finding records where you only know part of the value. Note that wildcards work best with keyword fields, while text fields may produce unexpected results due to tokenization. **Partial Search in Keyword Fields** @@ -428,7 +444,7 @@ fetched rows / total rows = 1/1 +-----------+----------+ ``` -### Combining Wildcards with Field Comparisons +### Combining wildcards with field comparisons ```ppl search firstname=A* AND age>30 source=accounts @@ -452,7 +468,8 @@ fetched rows / total rows = 1/1 * **Performance**: Leading wildcards (e.g., `*@example.com`) are slower than trailing wildcards * **Case sensitivity**: Keyword field wildcards are case-sensitive unless normalized during indexing -## Example 7: IN Operator and Field Comparisons + +## Example 7: IN operator and field comparisons The IN operator efficiently checks if a field matches any value from a list. This is cleaner and more performant than chaining multiple OR conditions for the same field. **IN Operator** @@ -477,7 +494,7 @@ fetched rows / total rows = 3/3 +--------------+ ``` -### Field Comparison Examples +### Field comparison examples ```ppl search severityNumber=17 source=otellogs @@ -513,7 +530,8 @@ fetched rows / total rows = 1/1 +---------------------------------------------------------+ ``` -## Example 8: Complex Expressions + +## Example 8: Complex expressions Combine multiple conditions using boolean operators and parentheses to create sophisticated search queries. @@ -553,7 +571,8 @@ fetched rows / total rows = 1/1 +---------------------------------------------------------+ ``` -## Example 9: Time Modifiers + +## Example 9: Time modifiers Time modifiers filter search results by time range using the implicit `@timestamp` field. They support various time formats for precise temporal filtering. **Absolute Time Filtering** @@ -621,7 +640,7 @@ fetched rows / total rows = 2/2 +-------------------------------+--------------+ ``` -### Unix Timestamp Filtering +### Unix timestamp filtering ```ppl search earliest=1705314600 latest=1705314605 source=otellogs @@ -643,7 +662,8 @@ fetched rows / total rows = 5/5 +-------------------------------+--------------+ ``` -## Example 10: Special Characters and Escaping + +## Example 10: Special characters and Escaping Understand when and how to escape special characters in your search queries. There are two categories of characters that need escaping: **Characters that must be escaped**: @@ -655,7 +675,7 @@ Understand when and how to escape special characters in your search queries. The * **Question mark (?)**: Use as-is for wildcard, escape as `\\?` to search for literal question mark -| Intent | PPL Syntax | Result | +| Intent | PPL syntax | Result | |--------|------------|--------| | Wildcard search | `field=user*` | Matches "user", "user123", "userABC" | | Literal "user*" | `field="user\\*"` | Matches only "user*" | @@ -721,7 +741,8 @@ fetched rows / total rows = 1/1 +--------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` -## Example 11: Fetch All Data + +## Example 11: Fetch all Data Retrieve all documents from an index by specifying only the source without any search conditions. This is useful for exploring small datasets or verifying data ingestion. diff --git a/docs/user/ppl/cmd/showdatasources.md b/docs/user/ppl/cmd/showdatasources.md index 10129873aa..53fb3e7bab 100644 --- a/docs/user/ppl/cmd/showdatasources.md +++ b/docs/user/ppl/cmd/showdatasources.md @@ -1,14 +1,17 @@ -# show datasources +# show datasources -## Description -Use the `show datasources` command to query datasources configured in the PPL engine. The `show datasources` command can only be used as the first command in the PPL query. -## Syntax +The `show datasources` command queries datasources configured in the PPL engine. The `show datasources` command can only be used as the first command in the PPL query. + +## Syntax + +Use the following syntax: + +`show datasources` -show datasources ## Example 1: Fetch all PROMETHEUS datasources -This example shows fetching all the datasources of type prometheus. +The following example PPL query shows how to use `showdatasources` to fetch all the datasources of type prometheus. PPL query for all PROMETHEUS DATASOURCES ```ppl @@ -27,6 +30,7 @@ fetched rows / total rows = 1/1 +-----------------+----------------+ ``` + ## Limitations The `show datasources` command can only work with `plugins.calcite.enabled=false`. \ No newline at end of file diff --git a/docs/user/ppl/cmd/sort.md b/docs/user/ppl/cmd/sort.md index a6e5ba1c0e..88a8ab00b9 100644 --- a/docs/user/ppl/cmd/sort.md +++ b/docs/user/ppl/cmd/sort.md @@ -1,15 +1,17 @@ -# sort +# sort -## Description The `sort` command sorts all the search results by the specified fields. -## Syntax -sort [count] <[+\|-] sort-field \| sort-field [asc\|a\|desc\|d]>... -* count: optional. The number of results to return. Specifying a count of 0 or less than 0 returns all results. **Default:** 0. -* [+\|-]: optional. The plus [+] stands for ascending order and NULL/MISSING first and a minus [-] stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. -* [asc\|a\|desc\|d]: optional. asc/a stands for ascending order and NULL/MISSING first. desc/d stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. -* sort-field: mandatory. The field used to sort. Can use `auto(field)`, `str(field)`, `ip(field)`, or `num(field)` to specify how to interpret field values. +## Syntax + +Use the following syntax: + +`sort [count] <[+|-] sort-field | sort-field [asc|a|desc|d]>...` +* `count`: optional. The number of results to return. Specifying a count of 0 or less than 0 returns all results. **Default:** 0. +* `[+|-]`: optional. The plus [+] stands for ascending order and NULL/MISSING first and a minus [-] stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. +* `[asc|a|desc|d]`: optional. asc/a stands for ascending order and NULL/MISSING first. desc/d stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. +* `sort-field`: mandatory. The field used to sort. Can use `auto(field)`, `str(field)`, `ip(field)`, or `num(field)` to specify how to interpret field values. > **Note:** > You cannot mix +/- and asc/desc in the same sort command. Choose one approach for all fields in a single sort command. @@ -18,7 +20,7 @@ sort [count] <[+\|-] sort-field \| sort-field [asc\|a\|desc\|d]>... ## Example 1: Sort by one field -This example shows sorting all documents by age field in ascending order. +The following example PPL query shows how to use `sort` to sort all documents by age field in ascending order. ```ppl source=accounts @@ -40,9 +42,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 2: Sort by one field return all the result -This example shows sorting all documents by age field in ascending order and returning all results. +The following example PPL query shows how to use `sort` to sort all documents by age field in ascending order and return all results. ```ppl source=accounts @@ -64,9 +67,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 3: Sort by one field in descending order (using -) -This example shows sorting all documents by age field in descending order. +The following example PPL query shows how to use `sort` to sort all documents by age field in descending order. ```ppl source=accounts @@ -88,9 +92,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 4: Sort by one field in descending order (using desc) -This example shows sorting all the document by the age field in descending order using the desc keyword. +The following example PPL query shows how to use `sort` to sort all documents by the age field in descending order using the desc keyword. ```ppl source=accounts @@ -112,9 +117,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 5: Sort by multiple fields (using +/-) -This example shows sorting all documents by gender field in ascending order and age field in descending order using +/- operators. +The following example PPL query shows how to use `sort` to sort all documents by gender field in ascending order and age field in descending order using +/- operators. ```ppl source=accounts @@ -136,9 +142,10 @@ fetched rows / total rows = 4/4 +----------------+--------+-----+ ``` + ## Example 6: Sort by multiple fields (using asc/desc) -This example shows sorting all the document by the gender field in ascending order and age field in descending order using asc/desc keywords. +The following example PPL query shows how to use `sort` to sort all documents by the gender field in ascending order and age field in descending order using asc/desc keywords. ```ppl source=accounts @@ -160,9 +167,10 @@ fetched rows / total rows = 4/4 +----------------+--------+-----+ ``` + ## Example 7: Sort by field include null value -This example shows sorting employer field by default option (ascending order and null first). The result shows that null value is in the first row. +The following example PPL query shows how to use `sort` to sort employer field by default option (ascending order and null first). The result shows that null value is in the first row. ```ppl source=accounts @@ -184,9 +192,10 @@ fetched rows / total rows = 4/4 +----------+ ``` + ## Example 8: Specify the number of sorted documents to return -This example shows sorting all documents and returning 2 documents. +The following example PPL query shows how to use `sort` to sort all documents and return 2 documents. ```ppl source=accounts @@ -206,9 +215,10 @@ fetched rows / total rows = 2/2 +----------------+-----+ ``` + ## Example 9: Sort with desc modifier -This example shows sorting with the desc modifier to reverse sort order. +The following example PPL query shows how to use `sort` to sort with the desc modifier to reverse sort order. ```ppl source=accounts @@ -230,9 +240,10 @@ fetched rows / total rows = 4/4 +----------------+-----+ ``` + ## Example 10: Sort with specifying field type -This example shows sorting with str() to sort numeric values lexicographically. +The following example PPL query shows how to use `sort` to sort with str() to sort numeric values lexicographically. ```ppl source=accounts diff --git a/docs/user/ppl/cmd/spath.md b/docs/user/ppl/cmd/spath.md index c83afc3a31..e1d2f657fc 100644 --- a/docs/user/ppl/cmd/spath.md +++ b/docs/user/ppl/cmd/spath.md @@ -1,19 +1,23 @@ -# spath +# spath -## Description -The `spath` command allows extracting fields from structured text data. It currently allows selecting from JSON data with JSON paths. -## Syntax +The `spath` command extracts fields from structured text data. It currently allows selecting from JSON data with JSON paths. -spath input=\ [output=\] [path=]\ -* input: mandatory. The field to scan for JSON data. -* output: optional. The destination field that the data will be loaded to. **Default:** value of `path`. -* path: mandatory. The path of the data to load for the object. For more information on path syntax, see [json_extract](../functions/json.md#json_extract). +## Syntax + +Use the following syntax: + +`spath input= [output=] [path=]` +* `input`: mandatory. The field to scan for JSON data. +* `output`: optional. The destination field that the data will be loaded to. **Default:** value of `path`. +* `path`: mandatory. The path of the data to load for the object. For more information about path syntax, see [json_extract](../functions/json.md#json_extract). + ## Note The `spath` command currently does not support pushdown behavior for extraction. It will be slow on large datasets. It's generally better to index fields needed for filtering directly instead of using `spath` to filter nested fields. -## Example 1: Simple Field Extraction + +## Example 1: Simple field Extraction The simplest spath is to extract a single field. This example extracts `n` from the `doc` field of type `text`. @@ -36,9 +40,10 @@ fetched rows / total rows = 3/3 +----------+---+ ``` -## Example 2: Lists & Nesting -This example demonstrates more JSON path uses, like traversing nested fields and extracting list elements. +## Example 2: Lists and nesting + +The following example PPL query demonstrates more JSON path uses, like traversing nested fields and extracting list elements. ```ppl source=structured @@ -61,9 +66,10 @@ fetched rows / total rows = 3/3 +------------------------------------------------------+---------------+--------------+--------+ ``` + ## Example 3: Sum of inner elements -This example shows extracting an inner field and doing statistics on it, using the docs from example 1. It also demonstrates that `spath` always returns strings for inner types. +The following example PPL query shows how to use `spath` to extract an inner field and do statistics on it, using the docs from example 1. It also demonstrates that `spath` always returns strings for inner types. ```ppl source=structured @@ -84,6 +90,7 @@ fetched rows / total rows = 1/1 +--------+ ``` + ## Example 4: Escaped paths `spath` can escape paths with strings to accept any path that `json_extract` does. This includes escaping complex field names as array components. diff --git a/docs/user/ppl/cmd/stats.md b/docs/user/ppl/cmd/stats.md index 000d910d97..63af220d43 100644 --- a/docs/user/ppl/cmd/stats.md +++ b/docs/user/ppl/cmd/stats.md @@ -1,17 +1,19 @@ -# stats +# stats -## Description -The `stats` command calculates the aggregation from the search result. -## Syntax +The `stats` command calculates the aggregation from the search results. -stats [bucket_nullable=bool] \... [by-clause] -* aggregation: mandatory. An aggregation function. -* bucket_nullable: optional. Controls whether the stats command includes null buckets in group-by aggregations. When set to `false`, the aggregation ignores records where the group-by field is null, resulting in faster performance by excluding null bucket. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. +## Syntax + +Use the following syntax: + +`stats [bucket_nullable=bool] ... [by-clause]` +* `aggregation`: mandatory. An aggregation function. +* `bucket_nullable`: optional. Controls whether the stats command includes null buckets in group-by aggregations. When set to `false`, the aggregation ignores records where the group-by field is null, resulting in faster performance by excluding null bucket. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true` * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false` -* by-clause: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... **Default:** If no by-clause is specified, the stats command returns only one row, which is the aggregation over the entire result set. -* span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). The unit of the interval expression is the natural unit by default. If the field is a date/time type field, the aggregation results always ignore null bucket. For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets. +* `by-clause`: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... **Default:** If no by-clause is specified, the stats command returns only one row, which is the aggregation over the entire search results. +* `span-expression`: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). The unit of the interval expression is the natural unit by default. If the field is a date/time type field, the aggregation results always ignore null bucket. For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets. * Available time units * millisecond (ms) * second (s) @@ -23,34 +25,37 @@ stats [bucket_nullable=bool] \... [by-clause] * quarter (q) * year (y) -## Aggregation Functions + +## Aggregation functions The stats command supports the following aggregation functions: * COUNT/C: Count of values -* SUM: Sum of numeric values -* AVG: Average of numeric values -* MAX: Maximum value -* MIN: Minimum value -* VAR_SAMP: Sample variance -* VAR_POP: Population variance -* STDDEV_SAMP: Sample standard deviation -* STDDEV_POP: Population standard deviation -* DISTINCT_COUNT_APPROX: Approximate distinct count -* TAKE: List of original values +* `SUM`: Sum of numeric values +* `AVG`: Average of numeric values +* `MAX`: Maximum value +* `MIN`: Minimum value +* `VAR_SAMP`: Sample variance +* `VAR_POP`: Population variance +* `STDDEV_SAMP`: Sample standard deviation +* `STDDEV_POP`: Population standard deviation +* `DISTINCT_COUNT_APPROX`: Approximate distinct count +* `TAKE`: List of original values * PERCENTILE/PERCENTILE_APPROX: Percentile calculations * PERC\/P\: Percentile shortcut functions -* MEDIAN: 50th percentile -* EARLIEST: Earliest value by timestamp -* LATEST: Latest value by timestamp -* FIRST: First non-null value -* LAST: Last non-null value -* LIST: Collect all values into array -* VALUES: Collect unique values into sorted array +* `MEDIAN`: 50th percentile +* `EARLIEST`: Earliest value by timestamp +* `LATEST`: Latest value by timestamp +* `FIRST`: First non-null value +* `LAST`: Last non-null value +* `LIST`: Collect all values into array +* `VALUES`: Collect unique values into sorted array For detailed documentation of each function, see [Aggregation Functions](../functions/aggregations.md). ## Limitations +The following limitations apply to the `stats` command. + ### Bucket aggregation result may be approximate in large dataset In OpenSearch, `doc_count` values for a terms bucket aggregation may be approximate. As a result, any aggregations (such as `sum` and `avg`) on the terms bucket aggregation may also be approximate. @@ -67,7 +72,7 @@ This query is pushed down to a terms bucket aggregation DSL query with `"order": ### Sorting by ascending doc_count may produce inaccurate results -Similar to above PPL query, the following query (find the rare 10 URLs) often produces inaccurate results. +Similar to the preceding PPL query, the following query (find the rare 10 URLs) often produces inaccurate results. ```ppl ignore source=hits @@ -80,7 +85,7 @@ A term that is globally infrequent might not appear as infrequent on every indiv ## Example 1: Calculate the count of events -This example shows calculating the count of events in the accounts. +The following example PPL query shows how to use `stats` to calculate the count of events in the accounts. ```ppl source=accounts @@ -98,9 +103,10 @@ fetched rows / total rows = 1/1 +---------+ ``` + ## Example 2: Calculate the average of a field -This example shows calculating the average age of all the accounts. +The following example PPL query shows how to use `stats` to calculate the average age of all the accounts. ```ppl source=accounts @@ -118,9 +124,10 @@ fetched rows / total rows = 1/1 +----------+ ``` + ## Example 3: Calculate the average of a field by group -This example shows calculating the average age of all the accounts group by gender. +The following example PPL query shows how to use `stats` to calculate the average age of all the accounts group by gender. ```ppl source=accounts @@ -139,9 +146,10 @@ fetched rows / total rows = 2/2 +--------------------+--------+ ``` + ## Example 4: Calculate the average, sum and count of a field by group -This example shows calculating the average age, sum age and count of events of all the accounts group by gender. +The following example PPL query shows how to use `stats` to calculate the average age, sum age and count of events of all the accounts group by gender. ```ppl source=accounts @@ -160,6 +168,7 @@ fetched rows / total rows = 2/2 +--------------------+----------+---------+--------+ ``` + ## Example 5: Calculate the maximum of a field The example calculates the max age of all the accounts. @@ -180,6 +189,7 @@ fetched rows / total rows = 1/1 +----------+ ``` + ## Example 6: Calculate the maximum and minimum of a field by group The example calculates the max and min age values of all the accounts group by gender. @@ -201,6 +211,7 @@ fetched rows / total rows = 2/2 +----------+----------+--------+ ``` + ## Example 7: Calculate the distinct count of a field To get the count of distinct values of a field, you can use `DISTINCT_COUNT` (or `DC`) function instead of `COUNT`. The example calculates both the count and the distinct count of gender field of all the accounts. @@ -221,6 +232,7 @@ fetched rows / total rows = 1/1 +---------------+------------------------+ ``` + ## Example 8: Calculate the count by a span The example gets the count of age by the interval of 10 years. @@ -242,6 +254,7 @@ fetched rows / total rows = 2/2 +------------+----------+ ``` + ## Example 9: Calculate the count by a gender and span The example gets the count of age by the interval of 10 years and group by gender. @@ -284,6 +297,7 @@ fetched rows / total rows = 3/3 +-----+----------+--------+ ``` + ## Example 10: Calculate the count and get email list by a gender and span The example gets the count of age by the interval of 10 years and group by gender, additionally for each row get a list of at most 5 emails. @@ -306,9 +320,10 @@ fetched rows / total rows = 3/3 +-----+--------------------------------------------+----------+--------+ ``` + ## Example 11: Calculate the percentile of a field -This example shows calculating the percentile 90th age of all the accounts. +The following example PPL query shows how to use `stats` to calculate the percentile 90th age of all the accounts. ```ppl source=accounts @@ -326,9 +341,10 @@ fetched rows / total rows = 1/1 +---------------------+ ``` + ## Example 12: Calculate the percentile of a field by group -This example shows calculating the percentile 90th age of all the accounts group by gender. +The following example PPL query shows how to use `stats` to calculate the percentile 90th age of all the accounts group by gender. ```ppl source=accounts @@ -347,6 +363,7 @@ fetched rows / total rows = 2/2 +---------------------+--------+ ``` + ## Example 13: Calculate the percentile by a gender and span The example gets the percentile 90th age by the interval of 10 years and group by gender. @@ -368,9 +385,10 @@ fetched rows / total rows = 2/2 +-----+----------+--------+ ``` + ## Example 14: Collect all values in a field using LIST -The example shows how to collect all firstname values, preserving duplicates and order. +The following example PPL query shows how to use `stats` to collect all firstname values, preserving duplicates and order. ```ppl source=accounts @@ -388,6 +406,7 @@ fetched rows / total rows = 1/1 +-----------------------------+ ``` + ## Example 15: Ignore null bucket ```ppl @@ -408,9 +427,10 @@ fetched rows / total rows = 3/3 +-----+-----------------------+ ``` + ## Example 16: Collect unique values in a field using VALUES -The example shows how to collect all unique firstname values, sorted lexicographically with duplicates removed. +The following example PPL query shows how to use `stats` to collect all unique firstname values, sorted lexicographically with duplicates removed. ```ppl source=accounts @@ -428,6 +448,7 @@ fetched rows / total rows = 1/1 +-----------------------------+ ``` + ## Example 17: Span on date/time field always ignore null bucket Index example data: @@ -495,9 +516,10 @@ fetched rows / total rows = 3/3 +-----+------------+--------+ ``` + ## Example 18: Calculate the count by the implicit @timestamp field -This example demonstrates that if you omit the field parameter in the span function, it will automatically use the implicit `@timestamp` field. +The following example PPL query demonstrates that if you omit the field parameter in the span function, it will automatically use the implicit `@timestamp` field. ```ppl ignore source=big5 diff --git a/docs/user/ppl/cmd/streamstats.md b/docs/user/ppl/cmd/streamstats.md index c7f79b2133..3538cceb22 100644 --- a/docs/user/ppl/cmd/streamstats.md +++ b/docs/user/ppl/cmd/streamstats.md @@ -1,8 +1,7 @@ -# streamstats +# streamstats -## Description -The `streamstats` command is used to calculate cumulative or rolling statistics as events are processed in order. Unlike `stats` or `eventstats` which operate on the entire dataset at once, it computes values incrementally on a per-event basis, often respecting the order of events in the search results. It allows you to generate running totals, moving averages, and other statistics that evolve with the stream of events. +The `streamstats` command calculates cumulative or rolling statistics as events are processed in order. Unlike `stats` or `eventstats` which operate on the entire dataset at once, it computes values incrementally on a per-event basis, often respecting the order of events in the search results. It allows you to generate running totals, moving averages, and other statistics that evolve with the stream of events. Key aspects of `streamstats`: 1. It computes statistics incrementally as each event is processed, making it suitable for time-series and sequence-based analysis. 2. Supports arguments such as window (for sliding window calculations) and current (to control whether the current event included in calculation). @@ -28,20 +27,23 @@ All of these commands can be used to generate aggregations such as average, sum, * `eventstats`: When aggregated statistics are needed alongside original event data. * `streamstats`: When a running total or cumulative statistic is needed across event streams. -## Syntax -streamstats [bucket_nullable=bool] [current=\] [window=\] [global=\] [reset_before="("\")"] [reset_after="("\")"] \... [by-clause] -* function: mandatory. A aggregation function or window function. -* bucket_nullable: optional. Controls whether the streamstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. +## Syntax + +Use the following syntax: + +`streamstats [bucket_nullable=bool] [current=] [window=] [global=] [reset_before="("")"] [reset_after="("")"] ... [by-clause]` +* `function`: mandatory. A aggregation function or window function. +* `bucket_nullable`: optional. Controls whether the streamstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true` * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false` -* current: optional. If true, the search includes the given, or current, event in the summary calculations. If false, the search uses the field value from the previous event. Syntax: current=\. **Default:** true. -* window: optional. Specifies the number of events to use when computing the statistics. Syntax: window=\. **Default:** 0, which means that all previous and current events are used. -* global: optional. Used only when the window argument is set. Defines whether to use a single window, global=true, or to use separate windows based on the by clause. If global=false and window is set to a non-zero value, a separate window is used for each group of values of the field specified in the by clause. Syntax: global=\. **Default:** true. -* reset_before: optional. Before streamstats calculates for an event, reset_before resets all accumulated statistics when the eval-expression evaluates to true. If used with window, the window is also reset. Syntax: reset_before="("\")". **Default:** false. -* reset_after: optional. After streamstats calculations for an event, reset_after resets all accumulated statistics when the eval-expression evaluates to true. This expression can reference fields returned by streamstats. If used with window, the window is also reset. Syntax: reset_after="("\")". **Default:** false. -* by-clause: optional. The by clause could be the fields and expressions like scalar functions and aggregation functions. Besides, the span clause can be used to split specific field into buckets in the same interval, the stats then does the aggregation by these span buckets. Syntax: by [span-expression,] [field,]... **Default:** If no \ is specified, all events are processed as a single group and running statistics are computed across the entire event stream. -* span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets. +* `current`: optional. If true, the search includes the given, or current, event in the summary calculations. If false, the search uses the field value from the previous event. Syntax: current=\. **Default:** true. +* `window`: optional. Specifies the number of events to use when computing the statistics. Syntax: window=\. **Default:** 0, which means that all previous and current events are used. +* `global`: optional. Used only when the window argument is set. Defines whether to use a single window, global=true, or to use separate windows based on the by clause. If global=false and window is set to a non-zero value, a separate window is used for each group of values of the field specified in the by clause. Syntax: global=\. **Default:** true. +* `reset_before`: optional. Before streamstats calculates for an event, reset_before resets all accumulated statistics when the eval-expression evaluates to true. If used with window, the window is also reset. Syntax: reset_before="("\")". **Default:** false. +* `reset_after`: optional. After streamstats calculations for an event, reset_after resets all accumulated statistics when the eval-expression evaluates to true. This expression can reference fields returned by streamstats. If used with window, the window is also reset. Syntax: reset_after="("\")". **Default:** false. +* `by-clause`: optional. The by clause could be the fields and expressions like scalar functions and aggregation functions. Besides, the span clause can be used to split specific field into buckets in the same interval, the stats then does the aggregation by these span buckets. Syntax: by [span-expression,] [field,]... **Default:** If no \ is specified, all events are processed as a single group and running statistics are computed across the entire event stream. +* `span-expression`: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets. * Available time units * millisecond (ms) * second (s) @@ -53,28 +55,30 @@ streamstats [bucket_nullable=bool] [current=\] [window=\] [global=\ * quarter (q) * year (y) -## Aggregation Functions + +## Aggregation functions The streamstats command supports the following aggregation functions: -* COUNT: Count of values -* SUM: Sum of numeric values -* AVG: Average of numeric values -* MAX: Maximum value -* MIN: Minimum value -* VAR_SAMP: Sample variance -* VAR_POP: Population variance -* STDDEV_SAMP: Sample standard deviation -* STDDEV_POP: Population standard deviation +* `COUNT`: Count of values +* `SUM`: Sum of numeric values +* `AVG`: Average of numeric values +* `MAX`: Maximum value +* `MIN`: Minimum value +* `VAR_SAMP`: Sample variance +* `VAR_POP`: Population variance +* `STDDEV_SAMP`: Sample standard deviation +* `STDDEV_POP`: Population standard deviation * DISTINCT_COUNT/DC: Distinct count of values -* EARLIEST: Earliest value by timestamp -* LATEST: Latest value by timestamp +* `EARLIEST`: Earliest value by timestamp +* `LATEST`: Latest value by timestamp For detailed documentation of each function, see [Aggregation Functions](../functions/aggregations.md). + ## Usage Streamstats -``` +```ppl ignore source = table | streamstats avg(a) source = table | streamstats current = false avg(a) source = table | streamstats window = 5 sum(b) @@ -89,6 +93,7 @@ source = table | streamstats window=2 reset_before=a>31 avg(b) source = table | streamstats current=false reset_after=a>31 avg(b) by c ``` + ## Example 1: Calculate the running average, sum, and count of a field by group This example calculates the running average age, running sum of age, and running count of events for all the accounts, grouped by gender. @@ -112,6 +117,7 @@ fetched rows / total rows = 4/4 +----------------+-----------+----------------------+---------+--------+--------+----------+-------+-----+-----------------------+----------+--------------------+-------------+---------------+ ``` + ## Example 2: Running maximum age over a 2-row window This example calculates the running maximum age over a 2-row window, excluding the current event. @@ -139,28 +145,31 @@ fetched rows / total rows = 8/8 +-------+---------+------------+-------+------+-----+--------------+ ``` + ## Example 3: Use the global argument to calculate running statistics The global argument is only applicable when a window argument is set. It defines how the window is applied in relation to the grouping fields: * global=true: a global window is applied across all rows, but the calculations inside the window still respect the by groups. * global=false: the window itself is created per group, meaning each group gets its own independent window. -This example shows how to calculate the running average of age across accounts by country, using global argument. -original data - +-------+---------+------------+-------+------+-----+ - | name | country | state | month | year | age | - - |-------+---------+------------+-------+------+-----+ - | Jake | USA | California | 4 | 2023 | 70 | - | Hello | USA | New York | 4 | 2023 | 30 | - | John | Canada | Ontario | 4 | 2023 | 25 | - | Jane | Canada | Quebec | 4 | 2023 | 20 | - | Jim | Canada | B.C | 4 | 2023 | 27 | - | Peter | Canada | B.C | 4 | 2023 | 57 | - | Rick | Canada | B.C | 4 | 2023 | 70 | - | David | USA | Washington | 4 | 2023 | 40 | - - +-------+---------+------------+-------+------+-----+ +The following example PPL query shows how to use `streamstats` to calculate the running average of age across accounts by country, using global argument. +Original data: +```text ++-------+---------+------------+-------+------+-----+ +| name | country | state | month | year | age | + +|-------+---------+------------+-------+------+-----+ +| Jake | USA | California | 4 | 2023 | 70 | +| Hello | USA | New York | 4 | 2023 | 30 | +| John | Canada | Ontario | 4 | 2023 | 25 | +| Jane | Canada | Quebec | 4 | 2023 | 20 | +| Jim | Canada | B.C | 4 | 2023 | 27 | +| Peter | Canada | B.C | 4 | 2023 | 57 | +| Rick | Canada | B.C | 4 | 2023 | 70 | +| David | USA | Washington | 4 | 2023 | 40 | + ++-------+---------+------------+-------+------+-----+ +``` * global=true: The window slides across all rows globally (following their input order), but inside each window, aggregation is still computed by country. So we process the data stream row by row to build the sliding window with size 2. We can see that David and Rick are in a window. * global=false: Each by group (country) forms its own independent stream and window (size 2). So David and Hello are in one window for USA. This time we get running_avg 35 for David, rather than 40 when global is set true. @@ -210,6 +219,7 @@ fetched rows / total rows = 8/8 +-------+---------+------------+-------+------+-----+-------------+ ``` + ## Example 4: Use the reset_before and reset_after arguments to reset statistics This example calculates the running average of age across accounts by country, with resets applied. @@ -237,6 +247,7 @@ fetched rows / total rows = 8/8 +-------+---------+------------+-------+------+-----+---------+ ``` + ## Example 5: Null buckets handling ```ppl diff --git a/docs/user/ppl/cmd/subquery.md b/docs/user/ppl/cmd/subquery.md index aa33fbbb11..12abd6228e 100644 --- a/docs/user/ppl/cmd/subquery.md +++ b/docs/user/ppl/cmd/subquery.md @@ -1,49 +1,54 @@ -# subquery +# subquery -## Description -The `subquery` command allows you to embed one PPL query inside another, enabling complex filtering and data retrieval operations. A subquery is a nested query that executes first and returns results that are used by the outer query for filtering, comparison, or joining operations. +The `subquery` command embeds one PPL query inside another, enabling complex filtering and data retrieval operations. A subquery is a nested query that executes first and returns results that are used by the outer query for filtering, comparison, or joining operations. Subqueries are useful for: 1. Filtering data based on results from another query 2. Checking for the existence of related data 3. Performing calculations that depend on aggregated values from other tables 4. Creating complex joins with dynamic conditions -## Syntax -subquery: [ source=... \| ... \| ... ] +## Syntax + +Use the following syntax: + +`subquery: [ source=... | ... | ... ]` Subqueries use the same syntax as regular PPL queries but must be enclosed in square brackets. There are four main types of subqueries: **IN Subquery** Tests whether a field value exists in the results of a subquery: -```sql ignore +```ppl ignore where [not] in [ source=... | ... | ... ] ``` **EXISTS Subquery** Tests whether a subquery returns any results: -```sql ignore +```ppl ignore where [not] exists [ source=... | ... | ... ] ``` **Scalar Subquery** Returns a single value that can be used in comparisons or calculations -```sql ignore +```ppl ignore where = [ source=... | ... | ... ] ``` **Relation Subquery** Used in join operations to provide dynamic right-side data -```sql ignore +```ppl ignore | join ON condition [ source=... | ... | ... ] ``` -## Configuration + +## Configuration + +The following settings configure the `subquery` command behavior. ### plugins.ppl.subsearch.maxout @@ -51,10 +56,15 @@ The size configures the maximum of rows to return from subsearch. The default va Change the subsearch.maxout to unlimited: -```bash ignore -sh$ curl -sS -H 'Content-Type: application/json' \ -... -X PUT localhost:9200/_plugins/_query/settings \ -... -d '{"persistent" : {"plugins.ppl.subsearch.maxout" : "0"}}' +```bash ignore +curl -sS -H 'Content-Type: application/json' \ +-X PUT localhost:9200/_plugins/_query/settings \ +-d '{"persistent" : {"plugins.ppl.subsearch.maxout" : "0"}}' +``` + +Expected output: + +```json { "acknowledged": true, "persistent": { @@ -70,11 +80,12 @@ sh$ curl -sS -H 'Content-Type: application/json' \ } ``` + ## Usage InSubquery: -``` +```ppl ignore source = outer | where a in [ source = inner | fields b ] source = outer | where (a) in [ source = inner | fields b ] source = outer | where (a,b,c) in [ source = inner | fields d,e,f ] @@ -89,7 +100,7 @@ source = table1 | inner join left = l right = r on l.a = r.a AND r.a in [ source ExistsSubquery: -``` +```ppl ignore // Assumptions: `a`, `b` are fields of table outer, `c`, `d` are fields of table inner, `e`, `f` are fields of table nested source = outer | where exists [ source = inner | where a = c ] source = outer | where not exists [ source = inner | where a = c ] @@ -107,7 +118,7 @@ source = outer | where exists [ source = inner ] | eval l = "nonEmpty" | fields ScalarSubquery: -``` +```ppl ignore //Uncorrelated scalar subquery in Select source = outer | eval m = [ source = inner | stats max(c) ] | fields m, a source = outer | eval m = [ source = inner | stats max(c) ] + b | fields m, a @@ -134,11 +145,12 @@ source = table1 | join left = l right = r on condition [ source = table2 | where source = [ source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ] | stats count(a) by b ] as outer | head 1 ``` + ## Example 1: TPC-H q20 -This example shows a complex TPC-H query 20 implementation using nested subqueries. +The following example PPL query shows a complex TPC-H query 20 implementation using nested subqueries. -```bash ignore +```bash ignore curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ "query" : """ source = supplier @@ -167,9 +179,10 @@ curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d }' ``` + ## Example 2: TPC-H q22 -This example shows a TPC-H query 22 implementation using EXISTS and scalar subqueries. +The following example PPL query shows a TPC-H query 22 implementation using EXISTS and scalar subqueries. ```bash ignore curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ @@ -193,5 +206,5 @@ curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d | stats count() as numcust, sum(c_acctbal) as totacctbal by cntrycode | sort cntrycode """ - }' - ``` \ No newline at end of file +}' +``` diff --git a/docs/user/ppl/cmd/syntax.md b/docs/user/ppl/cmd/syntax.md index 32c5ebe89d..03800fec39 100644 --- a/docs/user/ppl/cmd/syntax.md +++ b/docs/user/ppl/cmd/syntax.md @@ -1,18 +1,72 @@ -# Syntax +# PPL syntax -## Command Order +Every PPL query starts with the `search` command. It specifies the index to search and retrieve documents from. -The PPL query starts with either the `search` command to reference a table to search from, or the `describe` command to reference a table to get its metadata. All the following command could be in any order. In the following example, `search` command refer the accounts index as the source, then using fields and where command to do the further processing. +`PPL` supports exactly one `search` command per PPL query, and it is always the first command. The word `search` can be omitted. + +Subsequent commands can follow in any order. + + +## Syntax + +```ppl ignore +search source= [boolean-expression] +source= [boolean-expression] +``` -```text + +Field | Description | Required +:--- | :--- |:--- +`index` | Specifies the index to query. | No +`bool-expression` | Specifies an expression that evaluates to a Boolean value. | No + + +### Required arguments + +Required arguments are shown in angle brackets `< >`. + +### Optional arguments + +Optional arguments are enclosed in square brackets `[ ]`. + + +## Examples + +**Example 1: Search through accounts index** + +In the following example, the `search` command refers to an `accounts` index as the source and uses `fields` and `where` commands for the conditions: + +```ppl ignore search source=accounts | where age > 18 | fields firstname, lastname ``` - -## Required arguments -Required arguments are shown in angle brackets < >. -## Optional arguments +**Example 2: Get all documents** + +To get all documents from the `accounts` index, specify it as the `source`: + +```ppl ignore +search source=accounts; +``` + + +| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname | +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke +| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | hattiebond@netagy.com | Bond +| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates +| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | Adams + +**Example 3: Get documents that match a condition** + +To get all documents from the `accounts` index that either have `account_number` equal to 1 or have `gender` as `F`, use the following query: + +```ppl ignore +search source=accounts account_number=1 or gender=\"F\"; +``` -Optional arguments are enclosed in square brackets [ ]. \ No newline at end of file +| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname | +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke | +| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates | diff --git a/docs/user/ppl/cmd/table.md b/docs/user/ppl/cmd/table.md index 176752ebfb..2b821b83eb 100644 --- a/docs/user/ppl/cmd/table.md +++ b/docs/user/ppl/cmd/table.md @@ -1,17 +1,20 @@ -# table +# table -## Description -The `table` command is an alias for the [`fields`](fields.md) command and provides the same field selection capabilities. It allows you to keep or remove fields from the search result using enhanced syntax options. -## Syntax +The `table` command is an alias for the [`fields`](fields.md) command and provides the same field selection capabilities. It allows you to keep or remove fields from the search results using enhanced syntax options. -table [+\|-] \ -* [+\|-]: optional. If the plus (+) is used, only the fields specified in the field list will be kept. If the minus (-) is used, all the fields specified in the field list will be removed. **Default:** +. -* field-list: mandatory. Comma-delimited or space-delimited list of fields to keep or remove. Supports wildcard patterns. +## Syntax + +Use the following syntax: + +`table [+|-] ` +* `[+|-]`: optional. If the plus (+) is used, only the fields specified in the field list will be kept. If the minus (-) is used, all the fields specified in the field list will be removed. **Default:** +. +* `field-list`: mandatory. Comma-delimited or space-delimited list of fields to keep or remove. Supports wildcard patterns. + ## Example 1: Basic table command usage -This example shows basic field selection using the table command. +The following example PPL query shows basic field selection using the table command. ```ppl source=accounts @@ -32,6 +35,7 @@ fetched rows / total rows = 4/4 +-----------+----------+-----+ ``` -## See Also + +## See also - [fields](fields.md) - Alias command with identical functionality \ No newline at end of file diff --git a/docs/user/ppl/cmd/timechart.md b/docs/user/ppl/cmd/timechart.md index da3831c7ae..54c3058ca7 100644 --- a/docs/user/ppl/cmd/timechart.md +++ b/docs/user/ppl/cmd/timechart.md @@ -1,13 +1,15 @@ -# timechart +# timechart -## Description The `timechart` command creates a time-based aggregation of data. It groups data by time intervals and optionally by a field, then applies an aggregation function to each group. The results are returned in an unpivoted format with separate rows for each time-field combination. -## Syntax -timechart [timefield=\] [span=\] [limit=\] [useother=\] \ [by \] -* timefield: optional. Specifies the timestamp field to use for time interval grouping. **Default**: `@timestamp`. -* span: optional. Specifies the time interval for grouping data. **Default:** 1m (1 minute). +## Syntax + +Use the following syntax: + +`timechart [timefield=] [span=] [limit=] [useother=] [by ]` +* `timefield`: optional. Specifies the timestamp field to use for time interval grouping. **Default**: `@timestamp`. +* `span`: optional. Specifies the time interval for grouping data. **Default:** 1m (1 minute). * Available time units: * millisecond (ms) * second (s) @@ -18,56 +20,62 @@ timechart [timefield=\] [span=\] [limit=\] * month (M, case sensitive) * quarter (q) * year (y) -* limit: optional. Specifies the maximum number of distinct values to display when using the "by" clause. **Default:** 10. +* `limit`: optional. Specifies the maximum number of distinct values to display when using the "by" clause. **Default:** 10. * When there are more distinct values than the limit, the additional values are grouped into an "OTHER" category if useother is not set to false. * The "most distinct" values are determined by calculating the sum of the aggregation values across all time intervals for each distinct field value. The top N values with the highest sums are displayed individually, while the rest are grouped into the "OTHER" category. * Set to 0 to show all distinct values without any limit (when limit=0, useother is automatically set to false). * The parameters can be specified in any order before the aggregation function. * Only applies when using the "by" clause to group results. -* useother: optional. Controls whether to create an "OTHER" category for values beyond the limit. **Default:** true. +* `useother`: optional. Controls whether to create an "OTHER" category for values beyond the limit. **Default:** true. * When set to false, only the top N values (based on limit) are shown without an "OTHER" column. * When set to true, values beyond the limit are grouped into an "OTHER" category. * Only applies when using the "by" clause and when there are more distinct values than the limit. -* usenull: optional. Controls whether NULL values are placed into a separate category in the chart. **Default:** true. +* `usenull`: optional. Controls whether NULL values are placed into a separate category in the chart. **Default:** true. * When set to true, NULL values are grouped into a separate category with the label specified by nullstr. * When set to false, NULL values are excluded from the results. -* nullstr: optional. The display label used for NULL values when usenull is true. **Default:** "NULL". +* `nullstr`: optional. The display label used for NULL values when usenull is true. **Default:** "NULL". * Specifies the string representation for the NULL category in the chart output. -* aggregation_function: mandatory. The aggregation function to apply to each time bucket. +* `aggregation_function`: mandatory. The aggregation function to apply to each time bucket. * Currently, only a single aggregation function is supported. - * Available functions: All aggregation functions supported by the [stats](stats.md) command, as well as the timechart-specific aggregations listed below. -* by: optional. Groups the results by the specified field in addition to time intervals. If not specified, the aggregation is performed across all documents in each time interval. + * Available functions: All aggregation functions supported by the [stats](stats.md) command, as well as the timechart-specific aggregations listed in the following section. +* `by`: optional. Groups the results by the specified field in addition to time intervals. If not specified, the aggregation is performed across all documents in each time interval. + ## PER_SECOND Usage: per_second(field) calculates the per-second rate for a numeric field within each time bucket. The calculation formula is: `per_second(field) = sum(field) / span_in_seconds`, where `span_in_seconds` is the span interval in seconds. Return type: DOUBLE + ## PER_MINUTE Usage: per_minute(field) calculates the per-minute rate for a numeric field within each time bucket. The calculation formula is: `per_minute(field) = sum(field) * 60 / span_in_seconds`, where `span_in_seconds` is the span interval in seconds. Return type: DOUBLE + ## PER_HOUR Usage: per_hour(field) calculates the per-hour rate for a numeric field within each time bucket. The calculation formula is: `per_hour(field) = sum(field) * 3600 / span_in_seconds`, where `span_in_seconds` is the span interval in seconds. Return type: DOUBLE + ## PER_DAY Usage: per_day(field) calculates the per-day rate for a numeric field within each time bucket. The calculation formula is: `per_day(field) = sum(field) * 86400 / span_in_seconds`, where `span_in_seconds` is the span interval in seconds. Return type: DOUBLE + ## Notes * The `timechart` command requires a timestamp field in the data. By default, it uses the `@timestamp` field, but you can specify a different field using the `timefield` parameter. * Results are returned in an unpivoted format with separate rows for each time-field combination that has data. -* Only combinations with actual data are included in the results - empty combinations are omitted rather than showing null or zero values. +* Only combinations with actual data is included in the results - empty combinations are omitted rather than showing null or zero values. * The "top N" values for the `limit` parameter are selected based on the sum of values across all time intervals for each distinct field value. * When using the `limit` parameter, values beyond the limit are grouped into an "OTHER" category (unless `useother=false`). * Examples 6 and 7 use different datasets: Example 6 uses the `events` dataset with fewer hosts for simplicity, while Example 7 uses the `events_many_hosts` dataset with 11 distinct hosts. * **Null values**: Documents with null values in the "by" field are treated as a separate category and appear as null in the results. + ## Example 1: Count events by hour This example counts events for each hour and groups them by host. @@ -89,6 +97,7 @@ fetched rows / total rows = 2/2 +---------------------+---------+---------+ ``` + ## Example 2: Count events by minute This example counts events for each minute and groups them by host. @@ -116,6 +125,7 @@ fetched rows / total rows = 8/8 +---------------------+---------+---------+ ``` + ## Example 3: Calculate average number of packets by minute This example calculates the average packets for each minute without grouping by any field. @@ -143,6 +153,7 @@ fetched rows / total rows = 8/8 +---------------------+--------------+ ``` + ## Example 4: Calculate average number of packets by every 20 minutes and status This example calculates the average number of packets for every 20 minutes and groups them by status. @@ -170,6 +181,7 @@ fetched rows / total rows = 8/8 +---------------------+------------+--------------+ ``` + ## Example 5: Count events by hour and category This example counts events for each second and groups them by category @@ -191,6 +203,7 @@ fetched rows / total rows = 2/2 +---------------------+----------+---------+ ``` + ## Example 6: Using the limit parameter with count() function When there are many distinct values in the "by" field, the timechart command will display the top values based on the limit parameter and group the rest into an "OTHER" category. @@ -219,6 +232,7 @@ fetched rows / total rows = 8/8 +---------------------+---------+---------+ ``` + ## Example 7: Using limit=0 with count() to show all values To display all distinct values without any limit, set limit=0: @@ -250,6 +264,7 @@ fetched rows / total rows = 11/11 ``` This shows all 11 hosts as separate rows without an "OTHER" category. + ## Example 8: Using useother=false with count() function Limit to top 10 hosts without OTHER category (useother=false): @@ -279,6 +294,7 @@ fetched rows / total rows = 10/10 +---------------------+--------+---------+ ``` + ## Example 9: Using limit with useother parameter and avg() function Limit to top 3 hosts with OTHER category (default useother=true): @@ -322,9 +338,10 @@ fetched rows / total rows = 3/3 +---------------------+--------+----------------+ ``` + ## Example 10: Handling null values in the "by" field -This example shows how null values in the "by" field are treated as a separate category. The dataset events_null has 1 entry that does not have a host field. +The following example PPL query shows how null values in the "by" field are treated as a separate category. The dataset events_null has 1 entry that does not have a host field. It is put into a separate "NULL" category because the defaults for `usenull` and `nullstr` are `true` and `"NULL"` respectively. ```ppl @@ -346,6 +363,7 @@ fetched rows / total rows = 4/4 +---------------------+--------+---------+ ``` + ## Example 11: Calculate packets per second rate This example calculates the per-second packet rate for network traffic data using the per_second() function. @@ -369,6 +387,7 @@ fetched rows / total rows = 4/4 +---------------------+---------+---------------------+ ``` + ## Limitations * Only a single aggregation function is supported per timechart command. diff --git a/docs/user/ppl/cmd/top.md b/docs/user/ppl/cmd/top.md index fa644f2a11..b5b4af0ca6 100644 --- a/docs/user/ppl/cmd/top.md +++ b/docs/user/ppl/cmd/top.md @@ -1,21 +1,24 @@ -# top +# top -## Description The `top` command finds the most common tuple of values of all fields in the field list. -## Syntax -top [N] [top-options] \ [by-clause] -* N: optional. number of results to return. **Default**: 10 -* top-options: optional. options for the top command. Supported syntax is [countfield=\] [showcount=\]. +## Syntax + +Use the following syntax: + +`top [N] [top-options] [by-clause]` +* `N`: optional. number of results to return. **Default**: 10 +* `top-options`: optional. options for the top command. Supported syntax is [countfield=\] [showcount=\]. * showcount=\: optional. whether to create a field in output that represent a count of the tuple of values. **Default:** true. * countfield=\: optional. the name of the field that contains count. **Default:** 'count'. * usenull=\: optional (since 3.4.0). whether to output the null value. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. * When `plugins.ppl.syntax.legacy.preferred=true`, `usenull` defaults to `true` * When `plugins.ppl.syntax.legacy.preferred=false`, `usenull` defaults to `false` -* field-list: mandatory. comma-delimited list of field names. -* by-clause: optional. one or more fields to group the results by. +* `field-list`: mandatory. comma-delimited list of field names. +* `by-clause`: optional. one or more fields to group the results by. + ## Example 1: Find the most common values in a field This example finds the most common gender of all the accounts. @@ -37,6 +40,7 @@ fetched rows / total rows = 2/2 +--------+ ``` + ## Example 2: Limit results to top N values This example finds the most common gender and limits results to 1 value. @@ -57,6 +61,7 @@ fetched rows / total rows = 1/1 +--------+ ``` + ## Example 3: Find the most common values grouped by field This example finds the most common age of all the accounts grouped by gender. @@ -78,6 +83,7 @@ fetched rows / total rows = 2/2 +--------+-----+ ``` + ## Example 4: Top command with count field This example finds the most common gender of all the accounts and includes the count. @@ -99,6 +105,7 @@ fetched rows / total rows = 2/2 +--------+-------+ ``` + ## Example 5: Specify the count field option This example specifies a custom name for the count field. @@ -120,6 +127,7 @@ fetched rows / total rows = 2/2 +--------+-----+ ``` + ## Example 5: Specify the usenull field option ```ppl @@ -159,6 +167,7 @@ fetched rows / total rows = 4/4 +-----------------------+-------+ ``` + ## Limitations -The `top` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. \ No newline at end of file +The `top` command is not rewritten to [query domain-specific language (DSL)](https://opensearch.org/docs/latest/query-dsl/index/). It is only run on the coordinating node. \ No newline at end of file diff --git a/docs/user/ppl/cmd/trendline.md b/docs/user/ppl/cmd/trendline.md index ff4c2fcef3..2b98c02ba3 100644 --- a/docs/user/ppl/cmd/trendline.md +++ b/docs/user/ppl/cmd/trendline.md @@ -1,21 +1,24 @@ -# trendline +# trendline -## Description The `trendline` command calculates moving averages of fields. -## Syntax -trendline [sort <[+\|-] sort-field>] \[sma\|wma\](number-of-datapoints, field) [as \] [\[sma\|wma\](number-of-datapoints, field) [as \]]... -* [+\|-]: optional. The plus [+] stands for ascending order and NULL/MISSING first and a minus [-] stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. -* sort-field: mandatory when sorting is used. The field used to sort. -* sma\|wma: mandatory. Simple Moving Average (sma) applies equal weighting to all values, Weighted Moving Average (wma) applies greater weight to more recent values. -* number-of-datapoints: mandatory. The number of datapoints to calculate the moving average (must be greater than zero). -* field: mandatory. The name of the field the moving average should be calculated for. -* alias: optional. The name of the resulting column containing the moving average. **Default:** field name with "_trendline". - -## Example 1: Calculate the simple moving average on one field. +## Syntax -This example shows how to calculate the simple moving average on one field. +Use the following syntax: + +`trendline [sort <[+|-] sort-field>] [sma|wma](number-of-datapoints, field) [as ] [[sma|wma](number-of-datapoints, field) [as ]]...` +* `[+|-]`: optional. The plus [+] stands for ascending order and NULL/MISSING first and a minus [-] stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. +* `sort-field`: mandatory when sorting is used. The field used to sort. +* `sma|wma`: mandatory. Simple Moving Average (sma) applies equal weighting to all values, Weighted Moving Average (wma) applies greater weight to more recent values. +* `number-of-datapoints`: mandatory. The number of datapoints to calculate the moving average (must be greater than zero). +* `field`: mandatory. The name of the field the moving average should be calculated for. +* `alias`: optional. The name of the resulting column containing the moving average. **Default:** field name with "_trendline". + + +## Example 1: Calculate the simple moving average on one field + +The following example PPL query shows how to use `trendline` to calculate the simple moving average on one field. ```ppl source=accounts @@ -37,9 +40,10 @@ fetched rows / total rows = 4/4 +------+ ``` -## Example 2: Calculate the simple moving average on multiple fields. -This example shows how to calculate the simple moving average on multiple fields. +## Example 2: Calculate the simple moving average on multiple fields + +The following example PPL query shows how to use `trendline` to calculate the simple moving average on multiple fields. ```ppl source=accounts @@ -61,9 +65,10 @@ fetched rows / total rows = 4/4 +------+-----------+ ``` -## Example 3: Calculate the simple moving average on one field without specifying an alias. -This example shows how to calculate the simple moving average on one field. +## Example 3: Calculate the simple moving average on one field without specifying an alias + +The following example PPL query shows how to use `trendline` to calculate the simple moving average on one field. ```ppl source=accounts @@ -85,9 +90,10 @@ fetched rows / total rows = 4/4 +--------------------------+ ``` -## Example 4: Calculate the weighted moving average on one field. -This example shows how to calculate the weighted moving average on one field. +## Example 4: Calculate the weighted moving average on one field + +The following example PPL query shows how to use `trendline` to calculate the weighted moving average on one field. ```ppl source=accounts @@ -109,6 +115,7 @@ fetched rows / total rows = 4/4 +--------------------------+ ``` + ## Limitations The `trendline` command requires all values in the specified `field` to be non-null. Any rows with null values present in the calculation field will be automatically excluded from the command's output. \ No newline at end of file diff --git a/docs/user/ppl/cmd/where.md b/docs/user/ppl/cmd/where.md index 6d87ba4946..ac2e925492 100644 --- a/docs/user/ppl/cmd/where.md +++ b/docs/user/ppl/cmd/where.md @@ -1,16 +1,19 @@ -# where +# where -## Description -The `where` command filters the search result. The `where` command only returns the result when the bool-expression evaluates to true. -## Syntax +The `where` command filters the search results. The `where` command only returns the result when the bool-expression evaluates to true. -where \ -* bool-expression: optional. Any expression which could be evaluated to boolean value. +## Syntax + +Use the following syntax: + +`where ` +* `bool-expression`: optional. Any expression which could be evaluated to boolean value. -## Example 1: Filter result set with condition -This example shows fetching all the documents from the accounts index where account_number is 1 or gender is "F". +## Example 1: Filter search results with condition + +The following example PPL query shows how to use `where` to fetch all the documents from the accounts index where account_number is 1 or gender is "F". ```ppl source=accounts @@ -30,9 +33,10 @@ fetched rows / total rows = 2/2 +----------------+--------+ ``` -## Example 2: Basic Field Comparison -The example shows how to filter accounts with balance greater than 30000. +## Example 2: Basic field Comparison + +The following example PPL query shows how to use `where` to filter accounts with balance greater than 30000. ```ppl source=accounts @@ -52,10 +56,11 @@ fetched rows / total rows = 2/2 +----------------+---------+ ``` -## Example 3: Pattern Matching with LIKE + +## Example 3: Pattern matching with LIKE Pattern Matching with Underscore (\_) -The example demonstrates using LIKE with underscore (\_) to match a single character. +The following example PPL query demonstrates using LIKE with underscore (\_) to match a single character. ```ppl source=accounts @@ -75,7 +80,7 @@ fetched rows / total rows = 1/1 ``` Pattern Matching with Percent (%) -The example demonstrates using LIKE with percent (%) to match multiple characters. +The following example PPL query demonstrates using LIKE with percent (%) to match multiple characters. ```ppl source=accounts @@ -94,9 +99,10 @@ fetched rows / total rows = 1/1 +----------------+-------+ ``` -## Example 4: Multiple Conditions -The example shows how to combine multiple conditions using AND operator. +## Example 4: Multiple conditions + +The following example PPL query shows how to combine multiple conditions using AND operator. ```ppl source=accounts @@ -117,9 +123,10 @@ fetched rows / total rows = 3/3 +----------------+-----+--------+ ``` -## Example 5: Using IN Operator -The example demonstrates using IN operator to match multiple values. +## Example 5: Using IN operator + +The following example PPL query demonstrates using IN operator to match multiple values. ```ppl source=accounts @@ -139,9 +146,10 @@ fetched rows / total rows = 2/2 +----------------+-------+ ``` + ## Example 6: NULL Checks -The example shows how to filter records with NULL values. +The following example PPL query shows how to filter records with NULL values. ```ppl source=accounts @@ -160,9 +168,10 @@ fetched rows / total rows = 1/1 +----------------+----------+ ``` -## Example 7: Complex Conditions -The example demonstrates combining multiple conditions with parentheses and logical operators. +## Example 7: Complex conditions + +The following example PPL query demonstrates combining multiple conditions with parentheses and logical operators. ```ppl source=accounts @@ -181,9 +190,10 @@ fetched rows / total rows = 1/1 +----------------+---------+-----+--------+ ``` -## Example 8: NOT Conditions -The example shows how to use NOT operator to exclude matching records. +## Example 8: NOT conditions + +The following example PPL query shows how to use NOT operator to exclude matching records. ```ppl source=accounts diff --git a/docs/user/ppl/functions/aggregations.md b/docs/user/ppl/functions/aggregations.md index b2cabef985..c4a84e3114 100644 --- a/docs/user/ppl/functions/aggregations.md +++ b/docs/user/ppl/functions/aggregations.md @@ -24,7 +24,7 @@ The following table shows how NULL/MISSING values are handled by aggregation fun #### Description Usage: Returns a count of the number of expr in the rows retrieved. The `C()` function, `c`, and `count` can be used as abbreviations for `COUNT()`. To perform a filtered counting, wrap the condition to satisfy in an `eval` expression. -Example +### Example ```ppl source=accounts @@ -64,8 +64,8 @@ fetched rows / total rows = 1/1 #### Description -Usage: SUM(expr). Returns the sum of expr. -Example +Usage: `SUM(expr)`. Returns the sum of expr. +### Example ```ppl source=accounts @@ -88,8 +88,8 @@ fetched rows / total rows = 2/2 #### Description -Usage: AVG(expr). Returns the average value of expr. -Example +Usage: `AVG(expr)`. Returns the average value of expr. +### Example ```ppl source=accounts @@ -112,9 +112,9 @@ fetched rows / total rows = 2/2 #### Description -Usage: MAX(expr). Returns the maximum value of expr. +Usage: `MAX(expr)`. Returns the maximum value of expr. For non-numeric fields, values are sorted lexicographically. -Example +### Example ```ppl source=accounts @@ -154,9 +154,9 @@ fetched rows / total rows = 1/1 #### Description -Usage: MIN(expr). Returns the minimum value of expr. +Usage: `MIN(expr)`. Returns the minimum value of expr. For non-numeric fields, values are sorted lexicographically. -Example +### Example ```ppl source=accounts @@ -196,8 +196,8 @@ fetched rows / total rows = 1/1 #### Description -Usage: VAR_SAMP(expr). Returns the sample variance of expr. -Example +Usage: `VAR_SAMP(expr)`. Returns the sample variance of expr. +### Example ```ppl source=accounts @@ -219,8 +219,8 @@ fetched rows / total rows = 1/1 #### Description -Usage: VAR_POP(expr). Returns the population standard variance of expr. -Example +Usage: `VAR_POP(expr)`. Returns the population standard variance of expr. +### Example ```ppl source=accounts @@ -242,8 +242,8 @@ fetched rows / total rows = 1/1 #### Description -Usage: STDDEV_SAMP(expr). Return the sample standard deviation of expr. -Example +Usage: `STDDEV_SAMP(expr)`. Return the sample standard deviation of expr. +### Example ```ppl source=accounts @@ -265,8 +265,8 @@ fetched rows / total rows = 1/1 #### Description -Usage: STDDEV_POP(expr). Return the population standard deviation of expr. -Example +Usage: `STDDEV_POP(expr)`. Return the population standard deviation of expr. +### Example ```ppl source=accounts @@ -288,9 +288,9 @@ fetched rows / total rows = 1/1 #### Description -Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values using the HyperLogLog++ algorithm. Both functions are equivalent. +Usage: `DISTINCT_COUNT(expr)`, `DC(expr)`. Returns the approximate number of distinct values using the HyperLogLog++ algorithm. Both functions are equivalent. For details on algorithm accuracy and precision control, see the [OpenSearch Cardinality Aggregation documentation](https://docs.opensearch.org/latest/aggregations/metric/cardinality/#controlling-precision). -Example +### Example ```ppl source=accounts @@ -313,8 +313,8 @@ fetched rows / total rows = 2/2 #### Description -Usage: DISTINCT_COUNT_APPROX(expr). Return the approximate distinct count value of the expr, using the hyperloglog++ algorithm. -Example +Usage: `DISTINCT_COUNT_APPROX(expr)`. Return the approximate distinct count value of the expr, using the hyperloglog++ algorithm. +### Example ```ppl source=accounts @@ -336,11 +336,11 @@ fetched rows / total rows = 1/1 #### Description -Usage: EARLIEST(field [, time_field]). Return the earliest value of a field based on timestamp ordering. -* field: mandatory. The field to return the earliest value for. -* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. +Usage: `EARLIEST(field [, time_field])`. Return the earliest value of a field based on timestamp ordering. +* `field`: mandatory. The field to return the earliest value for. +* `time_field`: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. -Example +### Example ```ppl source=events @@ -384,11 +384,11 @@ fetched rows / total rows = 2/2 #### Description -Usage: LATEST(field [, time_field]). Return the latest value of a field based on timestamp ordering. -* field: mandatory. The field to return the latest value for. -* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. +Usage: `LATEST(field [, time_field])`. Return the latest value of a field based on timestamp ordering. +* `field`: mandatory. The field to return the latest value for. +* `time_field`: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. -Example +### Example ```ppl source=events @@ -432,11 +432,11 @@ fetched rows / total rows = 2/2 #### Description -Usage: TAKE(field [, size]). Return original values of a field. It does not guarantee on the order of values. -* field: mandatory. The field must be a text field. -* size: optional integer. The number of values should be returned. Default is 10. +Usage: `TAKE(field [, size])`. Return original values of a field. It does not guarantee on the order of values. +* `field`: mandatory. The field must be a text field. +* `size`: optional integer. The number of values should be returned. Default is 10. -Example +### Example ```ppl source=accounts @@ -458,11 +458,11 @@ fetched rows / total rows = 1/1 #### Description -Usage: PERCENTILE(expr, percent) or PERCENTILE_APPROX(expr, percent). Return the approximate percentile value of expr at the specified percentage. -* percent: The number must be a constant between 0 and 100. +Usage: `PERCENTILE(expr, percent)` or `PERCENTILE_APPROX(expr, percent)`. Return the approximate percentile value of expr at the specified percentage. +* `percent`: The number must be a constant between 0 and 100. Note: From 3.1.0, the percentile implementation is switched to MergingDigest from AVLTreeDigest. Ref [issue link](https://github.com/opensearch-project/OpenSearch/issues/18122). -Example +### Example ```ppl source=accounts @@ -525,8 +525,8 @@ fetched rows / total rows = 1/1 #### Description -Usage: MEDIAN(expr). Returns the median (50th percentile) value of `expr`. This is equivalent to `PERCENTILE(expr, 50)`. -Example +Usage: `MEDIAN(expr)`. Returns the median (50th percentile) value of `expr`. This is equivalent to `PERCENTILE(expr, 50)`. +### Example ```ppl source=accounts @@ -548,10 +548,10 @@ fetched rows / total rows = 1/1 #### Description -Usage: FIRST(field). Return the first non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. -* field: mandatory. The field to return the first value for. +Usage: `FIRST(field)`. Return the first non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. +* `field`: mandatory. The field to return the first value for. -Example +### Example ```ppl source=accounts @@ -574,10 +574,10 @@ fetched rows / total rows = 2/2 #### Description -Usage: LAST(field). Return the last non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. -* field: mandatory. The field to return the last value for. +Usage: `LAST(field)`. Return the last non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. +* `field`: mandatory. The field to return the last value for. -Example +### Example ```ppl source=accounts @@ -600,9 +600,9 @@ fetched rows / total rows = 2/2 #### Description -Usage: LIST(expr). Collects all values from the specified expression into an array. Values are converted to strings, nulls are filtered, and duplicates are preserved. +Usage: `LIST(expr)`. Collects all values from the specified expression into an array. Values are converted to strings, nulls are filtered, and duplicates are preserved. The function returns up to 100 values with no guaranteed ordering. -* expr: The field expression to collect values from. +* `expr`: The field expression to collect values from. * This aggregation function doesn't support Array, Struct, Object field types. Example with string fields @@ -627,7 +627,7 @@ fetched rows / total rows = 1/1 #### Description -Usage: VALUES(expr). Collects all unique values from the specified expression into a sorted array. Values are converted to strings, nulls are filtered, and duplicates are removed. +Usage: `VALUES(expr)`. Collects all unique values from the specified expression into a sorted array. Values are converted to strings, nulls are filtered, and duplicates are removed. The maximum number of unique values returned is controlled by the `plugins.ppl.values.max.limit` setting: * Default value is 0, which means unlimited values are returned * Can be configured to any positive integer to limit the number of unique values diff --git a/docs/user/ppl/functions/collection.md b/docs/user/ppl/functions/collection.md index c37f8390dd..ca9f7015c1 100644 --- a/docs/user/ppl/functions/collection.md +++ b/docs/user/ppl/functions/collection.md @@ -5,9 +5,9 @@ ### Description Usage: `array(value1, value2, value3...)` create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example `array(1, "demo")` -> ["1", "demo"] -Argument type: value1: ANY, value2: ANY, ... -Return type: ARRAY -Example +**Argument type:** `value1: ANY, value2: ANY, ...` +**Return type:** `ARRAY` +### Example ```ppl source=people @@ -50,9 +50,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `array_length(array)` returns the length of input array. -Argument type: array:ARRAY -Return type: INTEGER -Example +**Argument type:** `array:ARRAY` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -78,9 +78,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `forall(array, function)` check whether all element inside array can meet the lambda function. The function should also return boolean. The lambda function accepts one single input. -Argument type: array:ARRAY, function:LAMBDA -Return type: BOOLEAN -Example +**Argument type:** `array:ARRAY, function:LAMBDA` +**Return type:** `BOOLEAN` +### Example ```ppl source=people @@ -105,9 +105,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `exists(array, function)` check whether existing one of element inside array can meet the lambda function. The function should also return boolean. The lambda function accepts one single input. -Argument type: array:ARRAY, function:LAMBDA -Return type: BOOLEAN -Example +**Argument type:** `array:ARRAY, function:LAMBDA` +**Return type:** `BOOLEAN` +### Example ```ppl source=people @@ -132,9 +132,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `filter(array, function)` filter the element in the array by the lambda function. The function should return boolean. The lambda function accepts one single input. -Argument type: array:ARRAY, function:LAMBDA -Return type: ARRAY -Example +**Argument type:** `array:ARRAY, function:LAMBDA` +**Return type:** `ARRAY` +### Example ```ppl source=people @@ -159,9 +159,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `transform(array, function)` transform the element of array one by one using lambda. The lambda function can accept one single input or two input. If the lambda accepts two argument, the second one is the index of element in array. -Argument type: array:ARRAY, function:LAMBDA -Return type: ARRAY -Example +**Argument type:** `array:ARRAY, function:LAMBDA` +**Return type:** `ARRAY` +### Example ```ppl source=people @@ -204,9 +204,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `reduce(array, acc_base, function, )` use lambda function to go through all element and interact with acc_base. The lambda function accept two argument accumulator and array element. If add one more reduce_function, will apply reduce_function to accumulator finally. The reduce function accept accumulator as the one argument. -Argument type: array:ARRAY, acc_base:ANY, function:LAMBDA, reduce_function:LAMBDA -Return type: ANY -Example +**Argument type:** `array:ARRAY, acc_base:ANY, function:LAMBDA, reduce_function:LAMBDA` +**Return type:** `ANY` +### Example ```ppl source=people @@ -248,10 +248,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: mvjoin(array, delimiter) joins string array elements into a single string, separated by the specified delimiter. NULL elements are excluded from the output. Only string arrays are supported. -Argument type: array: ARRAY of STRING, delimiter: STRING -Return type: STRING -Example +Usage: `mvjoin(array, delimiter)` joins string array elements into a single string, separated by the specified delimiter. NULL elements are excluded from the output. Only string arrays are supported. +**Argument type:** `array: ARRAY of STRING, delimiter: STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -294,10 +294,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: mvappend(value1, value2, value3...) appends all elements from arguments to create an array. Flattens array arguments and collects all individual elements. Always returns an array or null for consistent type behavior. -Argument type: value1: ANY, value2: ANY, ... -Return type: ARRAY -Example +Usage: `mvappend(value1, value2, value3...)` appends all elements from arguments to create an array. Flattens array arguments and collects all individual elements. Always returns an array or null for consistent type behavior. +**Argument type:** `value1: ANY, value2: ANY, ...` +**Return type:** `ARRAY` +### Example ```ppl source=people @@ -465,11 +465,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: split(str, delimiter) splits the string values on the delimiter and returns the string values as a multivalue field (array). Use an empty string ("") to split the original string into one value per character. If the delimiter is not found, returns an array containing the original string. If the input string is empty, returns an empty array. +Usage: `split(str, delimiter)` splits the string values on the delimiter and returns the string values as a multivalue field (array). Use an empty string ("") to split the original string into one value per character. If the delimiter is not found, returns an array containing the original string. If the input string is empty, returns an empty array. -Argument type: str: STRING, delimiter: STRING +**Argument type:** `str: STRING, delimiter: STRING` -Return type: ARRAY of STRING +**Return type:** `ARRAY of STRING` ### Example @@ -567,10 +567,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: mvdedup(array) removes duplicate values from a multivalue array while preserving the order of first occurrence. NULL elements are filtered out. Returns an array with duplicates removed, or null if the input is null. -Argument type: array: ARRAY -Return type: ARRAY -Example +Usage: `mvdedup(array)` removes duplicate values from a multivalue array while preserving the order of first occurrence. NULL elements are filtered out. Returns an array with duplicates removed, or null if the input is null. +**Argument type:** `array: ARRAY` +**Return type:** `ARRAY` +### Example ```ppl source=people @@ -711,10 +711,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: mvindex(array, start, [end]) returns a subset of the multivalue array using the start and optional end index values. Indexes are 0-based (first element is at index 0). Supports negative indexing where -1 refers to the last element. When only start is provided, returns a single element. When both start and end are provided, returns an array of elements from start to end (inclusive). -Argument type: array: ARRAY, start: INTEGER, end: INTEGER (optional) -Return type: ANY (single element) or ARRAY (range) -Example +Usage: `mvindex(array, start, [end])` returns a subset of the multivalue array using the start and optional end index values. Indexes are 0-based (first element is at index 0). Supports negative indexing where -1 refers to the last element. When only start is provided, returns a single element. When both start and end are provided, returns an array of elements from start to end (inclusive). +**Argument type:** `array: ARRAY, start: INTEGER, end: INTEGER (optional)` +**Return type:** `ANY (single element) or ARRAY (range)` +### Example ```ppl source=people @@ -878,7 +878,7 @@ fetched rows / total rows = 1/1 ### Description -Usage: mvzip(mv_left, mv_right, [delim]) combines the values in two multivalue arrays by pairing corresponding elements and joining them into strings. The delimiter is used to specify a delimiting character to join the two values. This is similar to the Python zip command. +Usage: `mvzip(mv_left, mv_right, [delim])` combines the values in two multivalue arrays by pairing corresponding elements and joining them into strings. The delimiter is used to specify a delimiting character to join the two values. This is similar to the Python zip command. The values are stitched together combining the first value of mv_left with the first value of mv_right, then the second with the second, and so on. Each pair is concatenated into a string using the delimiter. The function stops at the length of the shorter array. @@ -886,9 +886,9 @@ The delimiter is optional. When specified, it must be enclosed in quotation mark Returns null if either input is null. Returns an empty array if either input array is empty. -Argument type: mv_left: ARRAY, mv_right: ARRAY, delim: STRING (optional) -Return type: ARRAY of STRING -Example +**Argument type:** `mv_left: ARRAY, mv_right: ARRAY, delim: STRING (optional)` +**Return type:** `ARRAY of STRING` +### Example ```ppl source=people diff --git a/docs/user/ppl/functions/condition.md b/docs/user/ppl/functions/condition.md index 8d65680fcd..2998ebf3a7 100644 --- a/docs/user/ppl/functions/condition.md +++ b/docs/user/ppl/functions/condition.md @@ -1,18 +1,22 @@ # Condition Functions +PPL functions use the search capabilities of the OpenSearch engine. However, these functions don't execute directly within the OpenSearch plugin's memory. Instead, they facilitate the global filtering of query results based on specific conditions, such as a `WHERE` or `HAVING` clause. +The following sections describe the condition PPL functions. ## ISNULL ### Description -Usage: isnull(field) returns TRUE if field is NULL, FALSE otherwise. +Usage: `isnull(field)` returns TRUE if field is NULL, FALSE otherwise. + The `isnull()` function is commonly used: - In `eval` expressions to create conditional fields - With the `if()` function to provide default values - In `where` clauses to filter null records -Argument type: all the supported data types. -Return type: BOOLEAN -Example +**Argument type:** All supported data types. +**Return type:** `BOOLEAN` + +### Example ```ppl source=accounts @@ -79,17 +83,19 @@ fetched rows / total rows = 1/1 ### Description -Usage: isnotnull(field) returns TRUE if field is NOT NULL, FALSE otherwise. +Usage: `isnotnull(field)` returns TRUE if field is NOT NULL, FALSE otherwise. The `isnotnull(field)` function is the opposite of `isnull(field)`. Instead of checking for null values, it checks a specific field and returns `true` if the field contains data, that is, it is not null. + The `isnotnull()` function is commonly used: - In `eval` expressions to create boolean flags - In `where` clauses to filter out null values - With the `if()` function for conditional logic - To validate data presence -Argument type: all the supported data types. -Return type: BOOLEAN -Synonyms: [ISPRESENT](#ispresent) -Example +**Argument type:** All supported data types. +**Return type:** `BOOLEAN` +**Synonyms:** [ISPRESENT](#ispresent) + +### Example ```ppl source=accounts @@ -178,10 +184,12 @@ fetched rows / total rows = 1/1 ### Description -Usage: ifnull(field1, field2) returns field2 if field1 is null. -Argument type: all the supported data types (NOTE : if two parameters have different types, you will fail semantic check). -Return type: any -Example +Usage: `ifnull(field1, field2)` returns field2 if field1 is null. + +**Argument type:** All supported data types (NOTE: if two parameters have different types, you will fail semantic check). +**Return type:** `any` + +### Example ```ppl source=accounts @@ -206,8 +214,8 @@ fetched rows / total rows = 4/4 ### Nested IFNULL Pattern For OpenSearch versions prior to 3.1, COALESCE-like functionality can be achieved using nested IFNULL statements. This pattern is particularly useful in observability use cases where field names may vary across different data sources. -Usage: ifnull(field1, ifnull(field2, ifnull(field3, default_value))) -Example +Usage: `ifnull(field1, ifnull`(field2, ifnull(field3, default_value))) +### Example ```ppl source=accounts @@ -233,10 +241,12 @@ fetched rows / total rows = 4/4 ### Description -Usage: nullif(field1, field2) returns null if two parameters are same, otherwise returns field1. -Argument type: all the supported data types (NOTE : if two parameters have different types, you will fail semantic check). -Return type: any -Example +Usage: `nullif(field1, field2)` returns null if two parameters are same, otherwise returns field1. + +**Argument type:** All supported data types (NOTE: if two parameters have different types, you will fail semantic check). +**Return type:** `any` + +### Example ```ppl source=accounts @@ -262,10 +272,12 @@ fetched rows / total rows = 4/4 ### Description -Usage: if(condition, expr1, expr2) returns expr1 if condition is true, otherwise returns expr2. -Argument type: all the supported data types (NOTE : if expr1 and expr2 are different types, you will fail semantic check). -Return type: any -Example +Usage: `if(condition, expr1, expr2)` returns expr1 if condition is true, otherwise returns expr2. + +**Argument type:** All supported data types (NOTE: if expr1 and expr2 are different types, you will fail semantic check). +**Return type:** `any` + +### Example ```ppl source=accounts @@ -331,16 +343,18 @@ fetched rows / total rows = 4/4 ### Description -Usage: case(condition1, expr1, condition2, expr2, ... conditionN, exprN else default) returns expr1 if condition1 is true, or returns expr2 if condition2 is true, ... if no condition is true, then returns the value of ELSE clause. If the ELSE clause is not defined, returns NULL. -Argument type: all the supported data types (NOTE : there is no comma before "else"). -Return type: any +Usage: `case(condition1, expr1, condition2, expr2, ... conditionN, exprN else default)` returns expr1 if condition1 is true, or returns expr2 if condition2 is true, ... if no condition is true, then returns the value of ELSE clause. If the ELSE clause is not defined, returns NULL. + +**Argument type:** All supported data types (NOTE: there is no comma before "else"). +**Return type:** `any` + ### Limitations When each condition is a field comparison with a numeric literal and each result expression is a string literal, the query will be optimized as [range aggregations](https://docs.opensearch.org/latest/aggregations/bucket/range) if pushdown optimization is enabled. However, this optimization has the following limitations: - Null values will not be grouped into any bucket of a range aggregation and will be ignored - The default ELSE clause will use the string literal `"null"` instead of actual NULL values -Example +### Example ```ppl source=accounts @@ -404,9 +418,10 @@ fetched rows / total rows = 2/2 ### Description -Usage: coalesce(field1, field2, ...) returns the first non-null, non-missing value in the argument list. -Argument type: all the supported data types. Supports mixed data types with automatic type coercion. -Return type: determined by the least restrictive common type among all arguments, with fallback to string if no common type can be determined +Usage: `coalesce(field1, field2, ...)` returns the first non-null, non-missing value in the argument list. + +**Argument type:** All supported data types. Supports mixed data types with automatic type coercion. +**Return type:** Determined by the least restrictive common type among all arguments, with fallback to string if no common type can be determined. Behavior: - Returns the first value that is not null and not missing (missing includes non-existent fields) - Empty strings ("") and whitespace strings (" ") are considered valid values @@ -424,7 +439,7 @@ Limitations: - Type coercion may result in unexpected string conversions for incompatible types - Performance may degrade with very large numbers of arguments -Example +### Example ```ppl source=accounts @@ -537,11 +552,13 @@ fetched rows / total rows = 4/4 ### Description -Usage: ispresent(field) returns true if the field exists. -Argument type: all the supported data types. -Return type: BOOLEAN -Synonyms: [ISNOTNULL](#isnotnull) -Example +Usage: `ispresent(field)` returns true if the field exists. + +**Argument type:** All supported data types. +**Return type:** `BOOLEAN` +**Synonyms:** [ISNOTNULL](#isnotnull) + +### Example ```ppl source=accounts @@ -566,10 +583,12 @@ fetched rows / total rows = 3/3 ### Description -Usage: isblank(field) returns true if the field is null, an empty string, or contains only white space. -Argument type: all the supported data types. -Return type: BOOLEAN -Example +Usage: `isblank(field)` returns true if the field is null, an empty string, or contains only white space. + +**Argument type:** All supported data types. +**Return type:** `BOOLEAN` + +### Example ```ppl source=accounts @@ -596,10 +615,12 @@ fetched rows / total rows = 4/4 ### Description -Usage: isempty(field) returns true if the field is null or is an empty string. -Argument type: all the supported data types. -Return type: BOOLEAN -Example +Usage: `isempty(field)` returns true if the field is null or is an empty string. + +**Argument type:** All supported data types. +**Return type:** `BOOLEAN` + +### Example ```ppl source=accounts @@ -626,7 +647,7 @@ fetched rows / total rows = 4/4 ### Description -Usage: earliest(relative_string, field) returns true if the value of field is after the timestamp derived from relative_string relative to the current time. Otherwise, returns false. +Usage: `earliest(relative_string, field)` returns true if the value of field is after the timestamp derived from relative_string relative to the current time. Otherwise, returns false. relative_string: The relative string can be one of the following formats: 1. `"now"` or `"now()"`: @@ -648,9 +669,11 @@ The relative string can be one of the following formats: - `-3M+1y@M` → `2026-02-01 00:00:00` Read more details [here](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-datetime.md#relative_timestamp) -Argument type: relative_string:STRING, field: TIMESTAMP -Return type: BOOLEAN -Example + +**Argument type:** `relative_string`: `STRING`, `field`: `TIMESTAMP` +**Return type:** `BOOLEAN` + +### Example ```ppl source=accounts @@ -692,10 +715,12 @@ fetched rows / total rows = 1/1 ### Description -Usage: latest(relative_string, field) returns true if the value of field is before the timestamp derived from relative_string relative to the current time. Otherwise, returns false. -Argument type: relative_string:STRING, field: TIMESTAMP -Return type: BOOLEAN -Example +Usage: `latest(relative_string, field)` returns true if the value of field is before the timestamp derived from relative_string relative to the current time. Otherwise, returns false. + +**Argument type:** `relative_string`: `STRING`, `field`: `TIMESTAMP` +**Return type:** `BOOLEAN` + +### Example ```ppl source=accounts @@ -737,11 +762,13 @@ fetched rows / total rows = 1/1 ### Description -Usage: regexp_match(string, pattern) returns true if the regular expression pattern finds a match against any substring of the string value, otherwise returns false. +Usage: `regexp_match(string, pattern)` returns true if the regular expression pattern finds a match against any substring of the string value, otherwise returns false. The function uses Java regular expression syntax for the pattern. -Argument type: STRING, STRING -Return type: BOOLEAN -Example + +**Argument type:** `STRING`, `STRING` +**Return type:** `BOOLEAN` + +### Example ``` ppl ignore source=logs | where regexp_match(message, 'ERROR|WARN|FATAL') | fields timestamp, message diff --git a/docs/user/ppl/functions/conversion.md b/docs/user/ppl/functions/conversion.md index 9e3b1d1ed7..99efe16103 100644 --- a/docs/user/ppl/functions/conversion.md +++ b/docs/user/ppl/functions/conversion.md @@ -4,7 +4,7 @@ ### Description -Usage: cast(expr as dateType) cast the expr to dataType. return the value of dataType. The following conversion rules are used: +Usage: `cast(expr as dateType)` cast the expr to dataType. return the value of dataType. The following conversion rules are used: | Src/Target | STRING | NUMBER | BOOLEAN | TIMESTAMP | DATE | TIME | IP | | --- | --- | --- | --- | --- | --- | --- | --- | @@ -19,7 +19,8 @@ Usage: cast(expr as dateType) cast the expr to dataType. return the value of dat Note1: the conversion follow the JDK specification. Note2: IP will be converted to its canonical representation. Canonical representation for IPv6 is described in [RFC 5952](https://datatracker.ietf.org/doc/html/rfc5952). -Cast to string example + +### Example: Cast to string ```ppl source=people @@ -38,7 +39,7 @@ fetched rows / total rows = 1/1 +-------+------+------------+ ``` -Cast to number example +### Example: Cast to number ```ppl source=people @@ -57,7 +58,7 @@ fetched rows / total rows = 1/1 +-------+---------+ ``` -Cast to date example +### Example: Cast to date ```ppl source=people @@ -76,7 +77,7 @@ fetched rows / total rows = 1/1 +------------+----------+---------------------+ ``` -Cast function can be chained +### Example: Cast function can be chained ```ppl source=people @@ -101,14 +102,14 @@ Implicit conversion is automatic casting. When a function does not have an exact input types, the engine looks for another signature that can safely work with the values. It picks the option that requires the least stretching of the original types, so you can mix literals and fields without adding `CAST` everywhere. + ### String to numeric When a string stands in for a number we simply parse the text: - The value must be something like `"3.14"` or `"42"`. Anything else causes the query to fail. -- If a string appears next to numeric arguments, it is treated as a `DOUBLE` so the numeric - - overload of the function can run. -Use string in arithmetic operator example +- If a string appears next to numeric arguments, it is treated as a `DOUBLE` so the numeric overload of the function can run. + +### Example: Use string in arithmetic operator ```ppl source=people @@ -127,7 +128,7 @@ fetched rows / total rows = 1/1 +--------+----------+------+-------+--------+ ``` -Use string in comparison operator example +### Example: Use string in comparison operator ```ppl source=people @@ -151,11 +152,17 @@ fetched rows / total rows = 1/1 ### Description The following usage options are available, depending on the parameter types and the number of parameters. -Usage with format type: tostring(ANY, [format]): Converts the value in first argument to provided format type string in second argument. If second argument is not provided, then it converts to default string representation. -Return type: string -Usage for boolean parameter without format type tostring(boolean): Converts the string to 'TRUE' or 'FALSE'. -Return type: string -You can use this function with the eval commands and as part of eval expressions. If first argument can be any valid type , second argument is optional and if provided , it needs to be format name to convert to where first argument contains only numbers. If first argument is boolean, then second argument is not used even if its provided. + +Usage with format type: `tostring(ANY, [format])`: Converts the value in first argument to provided format type string in second argument. If second argument is not provided, then it converts to default string representation. + +**Return type:** `STRING` + +Usage for boolean parameter without format type `tostring(boolean)`: Converts the string to 'TRUE' or 'FALSE'. + +**Return type:** `STRING` + +You can use this function with the eval commands and as part of eval expressions. If first argument can be any valid type, second argument is optional and if provided, it needs to be format name to convert to where first argument contains only numbers. If first argument is boolean, then second argument is not used even if its provided. + Format types: 1. "binary" Converts a number to a binary value. 2. "hex" Converts the number to a hexadecimal value. @@ -164,9 +171,10 @@ Format types: 5. "duration_millis" Converts the value in milliseconds to the readable time format HH:MM:SS. The format argument is optional and is only used when the value argument is a number. The tostring function supports the following formats. -Basic examples: + +### Example: Convert number to binary string + You can use this function to convert a number to a string of its binary representation. -Example ```ppl source=accounts @@ -186,8 +194,9 @@ fetched rows / total rows = 1/1 +-----------+------------------+---------+ ``` +### Example: Convert number to hex string + You can use this function to convert a number to a string of its hex representation. -Example ```ppl source=accounts @@ -207,8 +216,9 @@ fetched rows / total rows = 1/1 +-----------+-------------+---------+ ``` -The following example formats the column totalSales to display values with commas. -Example +### Example: Format number with commas + +The following example formats the column totalSales to display values with commas. ```ppl source=accounts @@ -228,8 +238,9 @@ fetched rows / total rows = 1/1 +-----------+----------------+---------+ ``` +### Example: Convert seconds to duration format + The following example converts number of seconds to HH:MM:SS format representing hours, minutes and seconds. -Example ```ppl source=accounts @@ -249,8 +260,9 @@ fetched rows / total rows = 1/1 +-----------+----------+ ``` -The following example for converts boolean parameter to string. -Example +### Example: Convert boolean to string + +The following example converts boolean parameter to string. ```ppl source=accounts @@ -274,66 +286,78 @@ fetched rows / total rows = 1/1 ### Description -The following usage options are available, depending on the parameter -types and the number of parameters. - -Usage: tonumber(string, \[base\]) converts the value in first argument. -The second argument describe the base of first argument. If second -argument is not provided, then it converts to base 10 number -representation. - -Return type: Number - -You can use this function with the eval commands and as part of eval -expressions. Base values can be between 2 and 36. The maximum value -supported for base 10 is +(2-2\^-52)·2\^1023 and minimum is --(2-2\^-52)·2\^1023. The maximum for other supported bases is 2\^63-1 -(or 7FFFFFFFFFFFFFFF) and minimum is -2\^63 (or -7FFFFFFFFFFFFFFF). If -the tonumber function cannot parse a field value to a number, the -function returns NULL. You can use this function to convert a string -representation of a binary number to return the corresponding number in -base 10. - -Following example converts a string in binary to the number -representation: - - os> source=people | eval int_value = tonumber('010101',2) | fields int_value | head 1 - fetched rows / total rows = 1/1 - +-----------+ - | int_value | - |-----------| - | 21.0 | - +-----------+ - -Following example converts a string in hex to the number representation: - - os> source=people | eval int_value = tonumber('FA34',16) | fields int_value | head 1 - fetched rows / total rows = 1/1 - +-----------+ - | int_value | - |-----------| - | 64052.0 | - +-----------+ - -Following example converts a string in decimal to the number -representation: - - os> source=people | eval int_value = tonumber('4598') | fields int_value | head 1 - fetched rows / total rows = 1/1 - +-----------+ - | int_value | - |-----------| - | 4598.0 | - +-----------+ - -Following example converts a string in decimal with fraction to the -number representation: - - os> source=people | eval double_value = tonumber('4598.678') | fields double_value | head 1 - fetched rows / total rows = 1/1 - +--------------+ - | double_value | - |--------------| - | 4598.678 | - +--------------+ +Usage: `tonumber(string, [base])` converts the value in first argument. +The second argument describes the base of first argument. If second argument is not provided, then it converts to base 10 number representation. + +**Return type:** `NUMBER` + +You can use this function with the eval commands and as part of eval expressions. Base values can be between 2 and 36. The maximum value supported for base 10 is +(2-2^-52)·2^1023 and minimum is -(2-2^-52)·2^1023. The maximum for other supported bases is 2^63-1 (or 7FFFFFFFFFFFFFFF) and minimum is -2^63 (or -7FFFFFFFFFFFFFFF). If the tonumber function cannot parse a field value to a number, the function returns NULL. You can use this function to convert a string representation of a binary number to return the corresponding number in base 10. + +### Example: Convert binary string to number + +```ppl +source=people | eval int_value = tonumber('010101',2) | fields int_value | head 1 +``` + +Expected output: + +```text +fetched rows / total rows = 1/1 ++-----------+ +| int_value | +|-----------| +| 21.0 | ++-----------+ +``` + +### Example: Convert hex string to number + +```ppl +source=people | eval int_value = tonumber('FA34',16) | fields int_value | head 1 +``` + +Expected output: + +```text +fetched rows / total rows = 1/1 ++-----------+ +| int_value | +|-----------| +| 64052.0 | ++-----------+ +``` + +### Example: Convert decimal string to number + +```ppl +source=people | eval int_value = tonumber('4598') | fields int_value | head 1 +``` + +Expected output: + +```text +fetched rows / total rows = 1/1 ++-----------+ +| int_value | +|-----------| +| 4598.0 | ++-----------+ +``` + +### Example: Convert decimal string with fraction to number + +```ppl +source=people | eval double_value = tonumber('4598.678') | fields double_value | head 1 +``` + +Expected output: + +```text +fetched rows / total rows = 1/1 ++--------------+ +| double_value | +|--------------| +| 4598.678 | ++--------------+ +``` diff --git a/docs/user/ppl/functions/cryptographic.md b/docs/user/ppl/functions/cryptographic.md index 33853cfd64..1ea1ca50f5 100644 --- a/docs/user/ppl/functions/cryptographic.md +++ b/docs/user/ppl/functions/cryptographic.md @@ -6,9 +6,11 @@ Version: 3.1.0 Usage: `md5(str)` calculates the MD5 digest and returns the value as a 32-character hex string. -Argument type: STRING -Return type: STRING -Example + +**Argument type:** `STRING` +**Return type:** `STRING` + +### Example ```ppl source=people @@ -33,9 +35,11 @@ fetched rows / total rows = 1/1 Version: 3.1.0 Usage: `sha1(str)` returns the hex string result of SHA-1. -Argument type: STRING -Return type: STRING -Example + +**Argument type:** `STRING` +**Return type:** `STRING` + +### Example ```ppl source=people @@ -61,9 +65,11 @@ fetched rows / total rows = 1/1 Version: 3.1.0 Usage: `sha2(str, numBits)` returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The numBits indicates the desired bit length of the result, which must have a value of 224, 256, 384, or 512. -Argument type: STRING, INTEGER -Return type: STRING -Example + +**Argument type:** `STRING`, `INTEGER` +**Return type:** `STRING` + +### Example ```ppl source=people @@ -98,4 +104,4 @@ fetched rows / total rows = 1/1 | 9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043 | +----------------------------------------------------------------------------------------------------------------------------------+ ``` - \ No newline at end of file + diff --git a/docs/user/ppl/functions/datetime.md b/docs/user/ppl/functions/datetime.md index 0cd474b546..ac3ef0738b 100644 --- a/docs/user/ppl/functions/datetime.md +++ b/docs/user/ppl/functions/datetime.md @@ -8,16 +8,16 @@ ### Description -Usage: adddate(date, INTERVAL expr unit) / adddate(date, days) adds the interval of second argument to date; adddate(date, days) adds the second argument as integer number of days to date. +Usage: `adddate(date, INTERVAL expr unit)` / adddate(date, days) adds the interval of second argument to date; adddate(date, days) adds the second argument as integer number of days to date. If first argument is TIME, today's date is used; if first argument is DATE, time at midnight is used. -Argument type: DATE/TIMESTAMP/TIME, INTERVAL/LONG +**Argument type:** `DATE/TIMESTAMP/TIME, INTERVAL/LONG` Return type map: (DATE/TIMESTAMP/TIME, INTERVAL) -> TIMESTAMP (DATE, LONG) -> DATE (TIMESTAMP/TIME, LONG) -> TIMESTAMP Synonyms: [DATE_ADD](#date_add) when invoked with the INTERVAL form of the second argument. Antonyms: [SUBDATE](#subdate) -Example +### Example ```ppl source=people @@ -40,13 +40,13 @@ fetched rows / total rows = 1/1 ### Description -Usage: addtime(expr1, expr2) adds expr2 to expr1 and returns the result. If argument is TIME, today's date is used; if argument is DATE, time at midnight is used. -Argument type: DATE/TIMESTAMP/TIME, DATE/TIMESTAMP/TIME +Usage: `addtime(expr1, expr2)` adds expr2 to expr1 and returns the result. If argument is TIME, today's date is used; if argument is DATE, time at midnight is used. +**Argument type:** `DATE/TIMESTAMP/TIME, DATE/TIMESTAMP/TIME` Return type map: (DATE/TIMESTAMP, DATE/TIMESTAMP/TIME) -> TIMESTAMP (TIME, DATE/TIMESTAMP/TIME) -> TIME Antonyms: [SUBTIME](#subtime) -Example +### Example ```ppl source=people @@ -137,11 +137,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: convert_tz(timestamp, from_timezone, to_timezone) constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timeszone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00. -Argument type: TIMESTAMP/STRING, STRING, STRING -Return type: TIMESTAMP +Usage: `convert_tz(timestamp, from_timezone, to_timezone)` constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timeszone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00. +**Argument type:** `TIMESTAMP/STRING, STRING, STRING` +**Return type:** `TIMESTAMP` Conversion from +00:00 timezone to +10:00 timezone. Returns the timestamp argument converted from +00:00 to +10:00 -Example +### Example ```ppl source=people @@ -161,7 +161,7 @@ fetched rows / total rows = 1/1 ``` The valid timezone range for convert_tz is (-13:59, +14:00) inclusive. Timezones outside of the range, such as +15:00 in this example will return null. -Example +### Example ```ppl source=people @@ -181,7 +181,7 @@ fetched rows / total rows = 1/1 ``` Conversion from a positive timezone to a negative timezone that goes over date line. -Example +### Example ```ppl source=people @@ -201,7 +201,7 @@ fetched rows / total rows = 1/1 ``` Valid dates are required in convert_tz, invalid dates such as April 31st (not a date in the Gregorian calendar) will result in null. -Example +### Example ```ppl source=people @@ -221,7 +221,7 @@ fetched rows / total rows = 1/1 ``` Valid dates are required in convert_tz, invalid dates such as February 30th (not a date in the Gregorian calendar) will result in null. -Example +### Example ```ppl source=people @@ -241,7 +241,7 @@ fetched rows / total rows = 1/1 ``` February 29th 2008 is a valid date because it is a leap year. -Example +### Example ```ppl source=people @@ -261,7 +261,7 @@ fetched rows / total rows = 1/1 ``` Valid dates are required in convert_tz, invalid dates such as February 29th 2007 (2007 is not a leap year) will result in null. -Example +### Example ```ppl source=people @@ -281,7 +281,7 @@ fetched rows / total rows = 1/1 ``` The valid timezone range for convert_tz is (-13:59, +14:00) inclusive. Timezones outside of the range, such as +14:01 in this example will return null. -Example +### Example ```ppl source=people @@ -301,7 +301,7 @@ fetched rows / total rows = 1/1 ``` The valid timezone range for convert_tz is (-13:59, +14:00) inclusive. Timezones outside of the range, such as +14:00 in this example will return a correctly converted date time object. -Example +### Example ```ppl source=people @@ -321,7 +321,7 @@ fetched rows / total rows = 1/1 ``` The valid timezone range for convert_tz is (-13:59, +14:00) inclusive. Timezones outside of the range, such as -14:00 will result in null -Example +### Example ```ppl source=people @@ -341,7 +341,7 @@ fetched rows / total rows = 1/1 ``` The valid timezone range for convert_tz is (-13:59, +14:00) inclusive. This timezone is within range so it is valid and will convert the time. -Example +### Example ```ppl source=people @@ -366,9 +366,9 @@ fetched rows / total rows = 1/1 Returns the current date as a value in 'YYYY-MM-DD' format. CURDATE() returns the current date in UTC at the time the statement is executed. -Return type: DATE +**Return type:** `DATE` Specification: CURDATE() -> DATE -Example +### Example ```ppl ignore source=people @@ -392,7 +392,7 @@ fetched rows / total rows = 1/1 ### Description `CURRENT_DATE()` is a synonym for [CURDATE()](#curdate). -Example +### Example ```ppl ignore source=people @@ -416,7 +416,7 @@ fetched rows / total rows = 1/1 ### Description `CURRENT_TIME()` is a synonym for [CURTIME()](#curtime). -Example +### Example ```ppl ignore source=people @@ -440,7 +440,7 @@ fetched rows / total rows = 1/1 ### Description `CURRENT_TIMESTAMP()` is a synonym for [NOW()](#now). -Example +### Example ```ppl ignore source=people @@ -465,9 +465,9 @@ fetched rows / total rows = 1/1 Returns the current time as a value in 'hh:mm:ss' format in the UTC time zone. CURTIME() returns the time at which the statement began to execute as [NOW()](#now) does. -Return type: TIME +**Return type:** `TIME` Specification: CURTIME() -> TIME -Example +### Example ```ppl ignore source=people @@ -490,10 +490,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: date(expr) constructs a date type with the input string expr as a date. If the argument is of date/timestamp, it extracts the date value part from the expression. -Argument type: STRING/DATE/TIMESTAMP -Return type: DATE -Example +Usage: `date(expr)` constructs a date type with the input string expr as a date. If the argument is of date/timestamp, it extracts the date value part from the expression. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `DATE` +### Example ```ppl source=people @@ -567,12 +567,12 @@ fetched rows / total rows = 1/1 ### Description -Usage: date_add(date, INTERVAL expr unit) adds the interval expr to date. If first argument is TIME, today's date is used; if first argument is DATE, time at midnight is used. -Argument type: DATE/TIMESTAMP/TIME, INTERVAL -Return type: TIMESTAMP +Usage: `date_add(date, INTERVAL expr unit)` adds the interval expr to date. If first argument is TIME, today's date is used; if first argument is DATE, time at midnight is used. +**Argument type:** `DATE/TIMESTAMP/TIME, INTERVAL` +**Return type:** `TIMESTAMP` Synonyms: [ADDDATE](#adddate) Antonyms: [DATE_SUB](#date_sub) -Example +### Example ```ppl source=people @@ -595,7 +595,7 @@ fetched rows / total rows = 1/1 ### Description -Usage: date_format(date, format) formats the date argument using the specifiers in the format argument. +Usage: `date_format(date, format)` formats the date argument using the specifiers in the format argument. If an argument of type TIME is provided, the local date is used. The following table describes the available specifier arguments. @@ -638,9 +638,9 @@ The following table describes the available specifier arguments. | x | x, for any smallcase/uppercase alphabet except [aydmshiHIMYDSEL] | -Argument type: STRING/DATE/TIME/TIMESTAMP, STRING -Return type: STRING -Example +**Argument type:** `STRING/DATE/TIME/TIMESTAMP, STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -663,13 +663,13 @@ fetched rows / total rows = 1/1 ### Description -Usage: DATETIME(timestamp)/ DATETIME(date, to_timezone) Converts the datetime to a new timezone -Argument type: timestamp/STRING +Usage: `DATETIME(timestamp)`/ DATETIME(date, to_timezone) Converts the datetime to a new timezone +**Argument type:** `timestamp/STRING` Return type map: (TIMESTAMP, STRING) -> TIMESTAMP (TIMESTAMP) -> TIMESTAMP Converting timestamp with timezone to the second argument timezone. -Example +### Example ```ppl source=people @@ -689,7 +689,7 @@ fetched rows / total rows = 1/1 ``` The valid timezone range for convert_tz is (-13:59, +14:00) inclusive. Timezones outside of the range will result in null. -Example +### Example ```ppl source=people @@ -712,12 +712,12 @@ fetched rows / total rows = 1/1 ### Description -Usage: date_sub(date, INTERVAL expr unit) subtracts the interval expr from date. If first argument is TIME, today's date is used; if first argument is DATE, time at midnight is used. -Argument type: DATE/TIMESTAMP/TIME, INTERVAL -Return type: TIMESTAMP +Usage: `date_sub(date, INTERVAL expr unit)` subtracts the interval expr from date. If first argument is TIME, today's date is used; if first argument is DATE, time at midnight is used. +**Argument type:** `DATE/TIMESTAMP/TIME, INTERVAL` +**Return type:** `TIMESTAMP` Synonyms: [SUBDATE](#subdate) Antonyms: [DATE_ADD](#date_add) -Example +### Example ```ppl source=people @@ -739,9 +739,9 @@ fetched rows / total rows = 1/1 ## DATEDIFF Usage: Calculates the difference of date parts of given values. If the first argument is time, today's date is used. -Argument type: DATE/TIMESTAMP/TIME, DATE/TIMESTAMP/TIME -Return type: LONG -Example +**Argument type:** `DATE/TIMESTAMP/TIME, DATE/TIMESTAMP/TIME` +**Return type:** `LONG` +### Example ```ppl source=people @@ -764,11 +764,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: day(date) extracts the day of the month for date, in the range 1 to 31. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +Usage: `day(date)` extracts the day of the month for date, in the range 1 to 31. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [DAYOFMONTH](#dayofmonth), [DAY_OF_MONTH](#day_of_month) -Example +### Example ```ppl source=people @@ -791,10 +791,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: dayname(date) returns the name of the weekday for date, including Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday. -Argument type: STRING/DATE/TIMESTAMP -Return type: STRING -Example +Usage: `dayname(date)` returns the name of the weekday for date, including Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `STRING` +### Example ```ppl source=people @@ -817,11 +817,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: dayofmonth(date) extracts the day of the month for date, in the range 1 to 31. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +Usage: `dayofmonth(date)` extracts the day of the month for date, in the range 1 to 31. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [DAY](#day), [DAY_OF_MONTH](#day_of_month) -Example +### Example ```ppl source=people @@ -844,11 +844,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: day_of_month(date) extracts the day of the month for date, in the range 1 to 31. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +Usage: `day_of_month(date)` extracts the day of the month for date, in the range 1 to 31. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [DAY](#day), [DAYOFMONTH](#dayofmonth) -Example +### Example ```ppl source=people @@ -871,11 +871,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: dayofweek(date) returns the weekday index for date (1 = Sunday, 2 = Monday, ..., 7 = Saturday). -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +Usage: `dayofweek(date)` returns the weekday index for date (1 = Sunday, 2 = Monday, ..., 7 = Saturday). +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [DAY_OF_WEEK](#day_of_week) -Example +### Example ```ppl source=people @@ -898,11 +898,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: day_of_week(date) returns the weekday index for date (1 = Sunday, 2 = Monday, ..., 7 = Saturday). -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +Usage: `day_of_week(date)` returns the weekday index for date (1 = Sunday, 2 = Monday, ..., 7 = Saturday). +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [DAYOFWEEK](#dayofweek) -Example +### Example ```ppl source=people @@ -926,10 +926,10 @@ fetched rows / total rows = 1/1 ### Description Usage: dayofyear(date) returns the day of the year for date, in the range 1 to 366. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [DAY_OF_YEAR](#day_of_year) -Example +### Example ```ppl source=people @@ -953,10 +953,10 @@ fetched rows / total rows = 1/1 ### Description Usage: day_of_year(date) returns the day of the year for date, in the range 1 to 366. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [DAYOFYEAR](#dayofyear) -Example +### Example ```ppl source=people @@ -979,9 +979,9 @@ fetched rows / total rows = 1/1 ### Description -Usage: extract(part FROM date) returns a LONG with digits in order according to the given 'part' arguments. +Usage: `extract(part FROM date)` returns a LONG with digits in order according to the given 'part' arguments. The specific format of the returned long is determined by the table below. -Argument type: PART, where PART is one of the following tokens in the table below. +**Argument type:** `PART, where PART is one of the following tokens in the table below.` The format specifiers found in this table are the same as those found in the [DATE_FORMAT](#date_format) function. The following table describes the mapping of a 'part' to a particular format. @@ -1009,8 +1009,8 @@ The following table describes the mapping of a 'part' to a particular format. | YEAR_MONTH | %V%m | -Return type: LONG -Example +**Return type:** `LONG` +### Example ```ppl source=people @@ -1033,10 +1033,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: from_days(N) returns the date value given the day number N. -Argument type: INTEGER/LONG -Return type: DATE -Example +Usage: `from_days(N)` returns the date value given the day number N. +**Argument type:** `INTEGER/LONG` +**Return type:** `DATE` +### Example ```ppl source=people @@ -1062,7 +1062,7 @@ fetched rows / total rows = 1/1 Usage: Returns a representation of the argument given as a timestamp or character string value. Perform reverse conversion for [UNIX_TIMESTAMP](#unix_timestamp) function. If second argument is provided, it is used to format the result in the same way as the format string used for the [DATE_FORMAT](#date_format) function. If timestamp is outside of range 1970-01-01 00:00:00 - 3001-01-18 23:59:59.999999 (0 to 32536771199.999999 epoch time), function returns NULL. -Argument type: DOUBLE, STRING +**Argument type:** `DOUBLE, STRING` Return type map: DOUBLE -> TIMESTAMP DOUBLE, STRING -> STRING @@ -1107,7 +1107,7 @@ fetched rows / total rows = 1/1 ### Description Usage: Returns a string value containing string format specifiers based on the input arguments. -Argument type: TYPE, STRING, where TYPE must be one of the following tokens: [DATE, TIME, TIMESTAMP], and +**Argument type:** `TYPE, STRING, where TYPE must be one of the following tokens: [DATE, TIME, TIMESTAMP], and` STRING must be one of the following tokens: ["USA", "JIS", "ISO", "EUR", "INTERNAL"] (" can be replaced by '). Examples @@ -1132,11 +1132,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: hour(time) extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER +Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [HOUR_OF_DAY](#hour_of_day) -Example +### Example ```ppl source=people @@ -1159,11 +1159,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: hour_of_day(time) extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER +Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [HOUR](#hour) -Example +### Example ```ppl source=people @@ -1185,9 +1185,9 @@ fetched rows / total rows = 1/1 ## LAST_DAY Usage: Returns the last day of the month as a DATE for a valid argument. -Argument type: DATE/STRING/TIMESTAMP/TIME -Return type: DATE -Example +**Argument type:** `DATE/STRING/TIMESTAMP/TIME` +**Return type:** `DATE` +### Example ```ppl source=people @@ -1211,7 +1211,7 @@ fetched rows / total rows = 1/1 ### Description `LOCALTIMESTAMP()` are synonyms for [NOW()](#now). -Example +### Example ```ppl ignore source=people @@ -1235,7 +1235,7 @@ fetched rows / total rows = 1/1 ### Description `LOCALTIME()` are synonyms for [NOW()](#now). -Example +### Example ```ppl ignore source=people @@ -1269,9 +1269,9 @@ Limitations: Specifications: 1. MAKEDATE(DOUBLE, DOUBLE) -> DATE -Argument type: DOUBLE -Return type: DATE -Example +**Argument type:** `DOUBLE` +**Return type:** `DATE` +### Example ```ppl source=people @@ -1303,9 +1303,9 @@ Limitations: Specifications: 1. MAKETIME(DOUBLE, DOUBLE, DOUBLE) -> TIME -Argument type: DOUBLE -Return type: TIME -Example +**Argument type:** `DOUBLE` +**Return type:** `TIME` +### Example ```ppl source=people @@ -1328,10 +1328,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: microsecond(expr) returns the microseconds from the time or timestamp expression expr as a number in the range from 0 to 999999. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER -Example +Usage: `microsecond(expr)` returns the microseconds from the time or timestamp expression expr as a number in the range from 0 to 999999. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -1354,11 +1354,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: minute(time) returns the minute for time, in the range 0 to 59. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER +Usage: `minute(time)` returns the minute for time, in the range 0 to 59. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [MINUTE_OF_HOUR](#minute_of_hour) -Example +### Example ```ppl source=people @@ -1381,10 +1381,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: minute(time) returns the amount of minutes in the day, in the range of 0 to 1439. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER -Example +Usage: `minute(time)` returns the amount of minutes in the day, in the range of 0 to 1439. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -1407,11 +1407,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: minute(time) returns the minute for time, in the range 0 to 59. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER +Usage: `minute(time)` returns the minute for time, in the range 0 to 59. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [MINUTE](#minute) -Example +### Example ```ppl source=people @@ -1434,11 +1434,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: month(date) returns the month for date, in the range 1 to 12 for January to December. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +Usage: `month(date)` returns the month for date, in the range 1 to 12 for January to December. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [MONTH_OF_YEAR](#month_of_year) -Example +### Example ```ppl source=people @@ -1461,11 +1461,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: month_of_year(date) returns the month for date, in the range 1 to 12 for January to December. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER +Usage: `month_of_year(date)` returns the month for date, in the range 1 to 12 for January to December. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [MONTH](#month) -Example +### Example ```ppl source=people @@ -1488,10 +1488,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: monthname(date) returns the full name of the month for date. -Argument type: STRING/DATE/TIMESTAMP -Return type: STRING -Example +Usage: `monthname(date)` returns the full name of the month for date. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `STRING` +### Example ```ppl source=people @@ -1516,9 +1516,9 @@ fetched rows / total rows = 1/1 Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the UTC time zone. `NOW()` returns a constant time that indicates the time at which the statement began to execute. This differs from the behavior for [SYSDATE()](#sysdate), which returns the exact time at which it executes. -Return type: TIMESTAMP +**Return type:** `TIMESTAMP` Specification: NOW() -> TIMESTAMP -Example +### Example ```ppl ignore source=people @@ -1541,10 +1541,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: period_add(P, N) add N months to period P (in the format YYMM or YYYYMM). Returns a value in the format YYYYMM. -Argument type: INTEGER, INTEGER -Return type: INTEGER -Example +Usage: `period_add(P, N)` add N months to period P (in the format YYMM or YYYYMM). Returns a value in the format YYYYMM. +**Argument type:** `INTEGER, INTEGER` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -1567,10 +1567,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: period_diff(P1, P2) returns the number of months between periods P1 and P2 given in the format YYMM or YYYYMM. -Argument type: INTEGER, INTEGER -Return type: INTEGER -Example +Usage: `period_diff(P1, P2)` returns the number of months between periods P1 and P2 given in the format YYMM or YYYYMM. +**Argument type:** `INTEGER, INTEGER` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -1593,10 +1593,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: quarter(date) returns the quarter of the year for date, in the range 1 to 4. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER -Example +Usage: `quarter(date)` returns the quarter of the year for date, in the range 1 to 4. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -1619,13 +1619,13 @@ fetched rows / total rows = 1/1 ### Description -Usage: sec_to_time(number) returns the time in HH:mm:ssss[.nnnnnn] format. +Usage: `sec_to_time(number)` returns the time in HH:mm:ssss[.nnnnnn] format. Note that the function returns a time between 00:00:00 and 23:59:59. If an input value is too large (greater than 86399), the function will wrap around and begin returning outputs starting from 00:00:00. If an input value is too small (less than 0), the function will wrap around and begin returning outputs counting down from 23:59:59. -Argument type: INTEGER, LONG, DOUBLE, FLOAT -Return type: TIME -Example +**Argument type:** `INTEGER, LONG, DOUBLE, FLOAT` +**Return type:** `TIME` +### Example ```ppl source=people @@ -1649,11 +1649,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: second(time) returns the second for time, in the range 0 to 59. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER +Usage: `second(time)` returns the second for time, in the range 0 to 59. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [SECOND_OF_MINUTE](#second_of_minute) -Example +### Example ```ppl source=people @@ -1676,11 +1676,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: second_of_minute(time) returns the second for time, in the range 0 to 59. -Argument type: STRING/TIME/TIMESTAMP -Return type: INTEGER +Usage: `second_of_minute(time)` returns the second for time, in the range 0 to 59. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `INTEGER` Synonyms: [SECOND](#second) -Example +### Example ```ppl source=people @@ -1704,7 +1704,7 @@ fetched rows / total rows = 1/1 **Version: 3.3.0** ### Description -Usage: strftime(time, format) takes a UNIX timestamp (in seconds) and renders it as a string using the format specified. For numeric inputs, the UNIX time must be in seconds. Values greater than 100000000000 are automatically treated as milliseconds and converted to seconds. +Usage: `strftime(time, format)` takes a UNIX timestamp (in seconds) and renders it as a string using the format specified. For numeric inputs, the UNIX time must be in seconds. Values greater than 100000000000 are automatically treated as milliseconds and converted to seconds. You can use time format variables with the strftime function. This function performs the reverse operation of [UNIX_TIMESTAMP](#unix_timestamp) and is similar to [FROM_UNIXTIME](#from_unixtime) but with POSIX-style format specifiers. - **Available only when Calcite engine is enabled** - All timestamps are interpreted as UTC timezone @@ -1712,8 +1712,8 @@ You can use time format variables with the strftime function. This function perf - String inputs are NOT supported - use `unix_timestamp()` to convert strings first - Functions that return date/time values (like `date()`, `now()`, `timestamp()`) are supported -Argument type: INTEGER/LONG/DOUBLE/TIMESTAMP, STRING -Return type: STRING +**Argument type:** `INTEGER/LONG/DOUBLE/TIMESTAMP, STRING` +**Return type:** `STRING` Format specifiers: The following table describes the available specifier arguments. @@ -1863,13 +1863,13 @@ fetched rows / total rows = 1/1 ### Description -Usage: str_to_date(string, string) is used to extract a TIMESTAMP from the first argument string using the formats specified in the second argument string. +Usage: `str_to_date(string, string)` is used to extract a TIMESTAMP from the first argument string using the formats specified in the second argument string. The input argument must have enough information to be parsed as a DATE, TIMESTAMP, or TIME. Acceptable string format specifiers are the same as those used in the [DATE_FORMAT](#date_format) function. It returns NULL when a statement cannot be parsed due to an invalid pair of arguments, and when 0 is provided for any DATE field. Otherwise, it will return a TIMESTAMP with the parsed values (as well as default values for any field that was not parsed). -Argument type: STRING, STRING -Return type: TIMESTAMP -Example +**Argument type:** `STRING, STRING` +**Return type:** `TIMESTAMP` +### Example ```ppl @@ -1897,16 +1897,16 @@ fetched rows / total rows = 1/1 ### Description -Usage: subdate(date, INTERVAL expr unit) / subdate(date, days) subtracts the interval expr from date; subdate(date, days) subtracts the second argument as integer number of days from date. +Usage: `subdate(date, INTERVAL expr unit)` / subdate(date, days) subtracts the interval expr from date; subdate(date, days) subtracts the second argument as integer number of days from date. If first argument is TIME, today's date is used; if first argument is DATE, time at midnight is used. -Argument type: DATE/TIMESTAMP/TIME, INTERVAL/LONG +**Argument type:** `DATE/TIMESTAMP/TIME, INTERVAL/LONG` Return type map: (DATE/TIMESTAMP/TIME, INTERVAL) -> TIMESTAMP (DATE, LONG) -> DATE (TIMESTAMP/TIME, LONG) -> TIMESTAMP Synonyms: [DATE_SUB](#date_sub) when invoked with the INTERVAL form of the second argument. Antonyms: [ADDDATE](#adddate) -Example +### Example ```ppl @@ -1934,13 +1934,13 @@ fetched rows / total rows = 1/1 ### Description -Usage: subtime(expr1, expr2) subtracts expr2 from expr1 and returns the result. If argument is TIME, today's date is used; if argument is DATE, time at midnight is used. -Argument type: DATE/TIMESTAMP/TIME, DATE/TIMESTAMP/TIME +Usage: `subtime(expr1, expr2)` subtracts expr2 from expr1 and returns the result. If argument is TIME, today's date is used; if argument is DATE, time at midnight is used. +**Argument type:** `DATE/TIMESTAMP/TIME, DATE/TIMESTAMP/TIME` Return type map: (DATE/TIMESTAMP, DATE/TIMESTAMP/TIME) -> TIMESTAMP (TIME, DATE/TIMESTAMP/TIME) -> TIME Antonyms: [ADDTIME](#addtime) -Example +### Example ```ppl @@ -2060,9 +2060,9 @@ Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss[.nnnnnn]'. SYSDATE() returns the date and time at which it executes in UTC. This differs from the behavior for [NOW()](#now), which returns a constant time that indicates the time at which the statement began to execute. If an argument is given, it specifies a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits. Optional argument type: INTEGER -Return type: TIMESTAMP +**Return type:** `TIMESTAMP` Specification: SYSDATE([INTEGER]) -> TIMESTAMP -Example +### Example ```ppl ignore @@ -2090,10 +2090,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: time(expr) constructs a time type with the input string expr as a time. If the argument is of date/time/timestamp, it extracts the time value part from the expression. -Argument type: STRING/DATE/TIME/TIMESTAMP -Return type: TIME -Example +Usage: `time(expr)` constructs a time type with the input string expr as a time. If the argument is of date/time/timestamp, it extracts the time value part from the expression. +**Argument type:** `STRING/DATE/TIME/TIMESTAMP` +**Return type:** `TIME` +### Example ```ppl @@ -2187,7 +2187,7 @@ fetched rows / total rows = 1/1 ### Description -Usage: time_format(time, format) formats the time argument using the specifiers in the format argument. +Usage: `time_format(time, format)` formats the time argument using the specifiers in the format argument. This supports a subset of the time format specifiers available for the [date_format](#date_format) function. Using date format specifiers supported by [date_format](#date_format) will return 0 or null. Acceptable format specifiers are listed in the table below. @@ -2209,9 +2209,9 @@ The following table describes the available specifier arguments. | %T | Time, 24-hour (hh:mm:ss) | -Argument type: STRING/DATE/TIME/TIMESTAMP, STRING -Return type: STRING -Example +**Argument type:** `STRING/DATE/TIME/TIMESTAMP, STRING` +**Return type:** `STRING` +### Example ```ppl @@ -2239,10 +2239,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: time_to_sec(time) returns the time argument, converted to seconds. -Argument type: STRING/TIME/TIMESTAMP -Return type: LONG -Example +Usage: `time_to_sec(time)` returns the time argument, converted to seconds. +**Argument type:** `STRING/TIME/TIMESTAMP` +**Return type:** `LONG` +### Example ```ppl @@ -2271,9 +2271,9 @@ fetched rows / total rows = 1/1 ### Description Usage: returns the difference between two time expressions as a time. -Argument type: TIME, TIME -Return type: TIME -Example +**Argument type:** `TIME, TIME` +**Return type:** `TIME` +### Example ```ppl @@ -2301,13 +2301,13 @@ fetched rows / total rows = 1/1 ### Description -Usage: timestamp(expr) constructs a timestamp type with the input string `expr` as an timestamp. If the argument is not a string, it casts `expr` to timestamp type with default timezone UTC. If argument is a time, it applies today's date before cast. +Usage: `timestamp(expr)` constructs a timestamp type with the input string `expr` as an timestamp. If the argument is not a string, it casts `expr` to timestamp type with default timezone UTC. If argument is a time, it applies today's date before cast. With two arguments `timestamp(expr1, expr2)` adds the time expression `expr2` to the date or timestamp expression `expr1` and returns the result as a timestamp value. -Argument type: STRING/DATE/TIME/TIMESTAMP +**Argument type:** `STRING/DATE/TIME/TIMESTAMP` Return type map: (STRING/DATE/TIME/TIMESTAMP) -> TIMESTAMP (STRING/DATE/TIME/TIMESTAMP, STRING/DATE/TIME/TIMESTAMP) -> TIMESTAMP -Example +### Example ```ppl @@ -2338,7 +2338,7 @@ fetched rows / total rows = 1/1 Usage: Returns a TIMESTAMP value based on a passed in DATE/TIME/TIMESTAMP/STRING argument and an INTERVAL and INTEGER argument which determine the amount of time to be added. If the third argument is a STRING, it must be formatted as a valid TIMESTAMP. If only a TIME is provided, a TIMESTAMP is still returned with the DATE portion filled in using the current date. If the third argument is a DATE, it will be automatically converted to a TIMESTAMP. -Argument type: INTERVAL, INTEGER, DATE/TIME/TIMESTAMP/STRING +**Argument type:** `INTERVAL, INTEGER, DATE/TIME/TIMESTAMP/STRING` INTERVAL must be one of the following tokens: [MICROSECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR] Examples @@ -2369,11 +2369,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: TIMESTAMPDIFF(interval, start, end) returns the difference between the start and end date/times in interval units. +Usage: `TIMESTAMPDIFF(interval, start, end)` returns the difference between the start and end date/times in interval units. If a TIME is provided as an argument, it will be converted to a TIMESTAMP with the DATE portion filled in using the current date. Arguments will be automatically converted to a TIME/TIMESTAMP when appropriate. Any argument that is a STRING must be formatted as a valid TIMESTAMP. -Argument type: INTERVAL, DATE/TIME/TIMESTAMP/STRING, DATE/TIME/TIMESTAMP/STRING +**Argument type:** `INTERVAL, DATE/TIME/TIMESTAMP/STRING, DATE/TIME/TIMESTAMP/STRING` INTERVAL must be one of the following tokens: [MICROSECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR] Examples @@ -2404,10 +2404,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: to_days(date) returns the day number (the number of days since year 0) of the given date. Returns NULL if date is invalid. -Argument type: STRING/DATE/TIMESTAMP -Return type: LONG -Example +Usage: `to_days(date)` returns the day number (the number of days since year 0) of the given date. Returns NULL if date is invalid. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `LONG` +### Example ```ppl @@ -2435,11 +2435,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: to_seconds(date) returns the number of seconds since the year 0 of the given value. Returns NULL if value is invalid. +Usage: `to_seconds(date)` returns the number of seconds since the year 0 of the given value. Returns NULL if value is invalid. An argument of a LONG type can be used. It must be formatted as YMMDD, YYMMDD, YYYMMDD or YYYYMMDD. Note that a LONG type argument cannot have leading 0s as it will be parsed using an octal numbering system. -Argument type: STRING/LONG/DATE/TIME/TIMESTAMP -Return type: LONG -Example +**Argument type:** `STRING/LONG/DATE/TIME/TIMESTAMP` +**Return type:** `LONG` +### Example ```ppl @@ -2472,9 +2472,9 @@ Usage: Converts given argument to Unix time (seconds since Epoch - very beginnin The date argument may be a DATE, or TIMESTAMP string, or a number in YYMMDD, YYMMDDhhmmss, YYYYMMDD, or YYYYMMDDhhmmss format. If the argument includes a time part, it may optionally include a fractional seconds part. If argument is in invalid format or outside of range 1970-01-01 00:00:00 - 3001-01-18 23:59:59.999999 (0 to 32536771199.999999 epoch time), function returns NULL. You can use [FROM_UNIXTIME](#from_unixtime) to do reverse conversion. -Argument type: \/DOUBLE/DATE/TIMESTAMP -Return type: DOUBLE -Example +**Argument type:** `\/DOUBLE/DATE/TIMESTAMP` +**Return type:** `DOUBLE` +### Example ```ppl @@ -2503,9 +2503,9 @@ fetched rows / total rows = 1/1 ### Description Returns the current UTC date as a value in 'YYYY-MM-DD'. -Return type: DATE +**Return type:** `DATE` Specification: UTC_DATE() -> DATE -Example +### Example ```ppl ignore @@ -2534,9 +2534,9 @@ fetched rows / total rows = 1/1 ### Description Returns the current UTC time as a value in 'hh:mm:ss'. -Return type: TIME +**Return type:** `TIME` Specification: UTC_TIME() -> TIME -Example +### Example ```ppl ignore @@ -2565,9 +2565,9 @@ fetched rows / total rows = 1/1 ### Description Returns the current UTC timestamp as a value in 'YYYY-MM-DD hh:mm:ss'. -Return type: TIMESTAMP +**Return type:** `TIMESTAMP` Specification: UTC_TIMESTAMP() -> TIMESTAMP -Example +### Example ```ppl ignore @@ -2595,7 +2595,7 @@ fetched rows / total rows = 1/1 ### Description -Usage: week(date[, mode]) returns the week number for date. If the mode argument is omitted, the default mode 0 is used. +Usage: `week(date[, mode])` returns the week number for date. If the mode argument is omitted, the default mode 0 is used. The following table describes how the mode argument works. @@ -2611,10 +2611,10 @@ The following table describes how the mode argument works. | 7 | Monday | 1-53 | with a Monday in this year | -Argument type: DATE/TIMESTAMP/STRING -Return type: INTEGER +**Argument type:** `DATE/TIMESTAMP/STRING` +**Return type:** `INTEGER` Synonyms: [WEEK_OF_YEAR](#week_of_year) -Example +### Example ```ppl @@ -2642,11 +2642,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: weekday(date) returns the weekday index for date (0 = Monday, 1 = Tuesday, ..., 6 = Sunday). +Usage: `weekday(date)` returns the weekday index for date (0 = Monday, 1 = Tuesday, ..., 6 = Sunday). It is similar to the [dayofweek](#dayofweek) function, but returns different indexes for each day. -Argument type: STRING/DATE/TIME/TIMESTAMP -Return type: INTEGER -Example +**Argument type:** `STRING/DATE/TIME/TIMESTAMP` +**Return type:** `INTEGER` +### Example ```ppl @@ -2675,7 +2675,7 @@ fetched rows / total rows = 1/1 ### Description -Usage: week_of_year(date[, mode]) returns the week number for date. If the mode argument is omitted, the default mode 0 is used. +Usage: `week_of_year(date[, mode])` returns the week number for date. If the mode argument is omitted, the default mode 0 is used. The following table describes how the mode argument works. @@ -2691,10 +2691,10 @@ The following table describes how the mode argument works. | 7 | Monday | 1-53 | with a Monday in this year | -Argument type: DATE/TIMESTAMP/STRING -Return type: INTEGER +**Argument type:** `DATE/TIMESTAMP/STRING` +**Return type:** `INTEGER` Synonyms: [WEEK](#week) -Example +### Example ```ppl @@ -2722,10 +2722,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: year(date) returns the year for date, in the range 1000 to 9999, or 0 for the “zero” date. -Argument type: STRING/DATE/TIMESTAMP -Return type: INTEGER -Example +Usage: `year(date)` returns the year for date, in the range 1000 to 9999, or 0 for the “zero” date. +**Argument type:** `STRING/DATE/TIMESTAMP` +**Return type:** `INTEGER` +### Example ```ppl @@ -2753,10 +2753,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: yearweek(date[, mode]) returns the year and week for date as an integer. It accepts and optional mode arguments aligned with those available for the [WEEK](#week) function. -Argument type: STRING/DATE/TIME/TIMESTAMP -Return type: INTEGER -Example +Usage: `yearweek(date[, mode])` returns the year and week for date as an integer. It accepts and optional mode arguments aligned with those available for the [WEEK](#week) function. +**Argument type:** `STRING/DATE/TIME/TIMESTAMP` +**Return type:** `INTEGER` +### Example ```ppl diff --git a/docs/user/ppl/functions/ip.md b/docs/user/ppl/functions/ip.md index 673a0a8d25..c21816baea 100644 --- a/docs/user/ppl/functions/ip.md +++ b/docs/user/ppl/functions/ip.md @@ -5,9 +5,11 @@ ### Description Usage: `cidrmatch(ip, cidr)` checks if `ip` is within the specified `cidr` range. -Argument type: STRING/IP, STRING -Return type: BOOLEAN -Example + +**Argument type:** `STRING`/`IP`, `STRING` +**Return type:** `BOOLEAN` + +### Example ```ppl source=weblogs @@ -37,9 +39,11 @@ Note: ### Description Usage: `geoip(dataSourceName, ipAddress[, options])` to lookup location information from given IP addresses via OpenSearch GeoSpatial plugin API. -Argument type: STRING, STRING/IP, STRING -Return type: OBJECT -Example: + +**Argument type:** `STRING`, `STRING`/`IP`, `STRING` +**Return type:** `OBJECT` + +### Example: ```ppl ignore source=weblogs @@ -58,4 +62,4 @@ fetched rows / total rows = 1/1 Note: - `dataSourceName` must be an established dataSource on OpenSearch GeoSpatial plugin, detail of configuration can be found: https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/ - `ip` can be an IPv4 or an IPv6 address - - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" \ No newline at end of file + - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" diff --git a/docs/user/ppl/functions/json.md b/docs/user/ppl/functions/json.md index 8d0b29883a..e9bd8cf8ac 100644 --- a/docs/user/ppl/functions/json.md +++ b/docs/user/ppl/functions/json.md @@ -23,9 +23,9 @@ Notes: ### Description Usage: `json(value)` Evaluates whether a string can be parsed as a json-encoded string. Returns the value if valid, null otherwise. -Argument type: STRING -Return type: STRING -Example +**Argument type:** `STRING` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -53,10 +53,10 @@ fetched rows / total rows = 4/4 ### Description Version: 3.1.0 -Limitation: Only works when plugins.calcite.enabled=true +Limitation: Only works when `plugins.calcite.enabled=true` Usage: `json_valid(value)` Evaluates whether a string uses valid JSON syntax. Returns TRUE if valid, FALSE if invalid. NULL input returns NULL. -Argument type: STRING -Return type: BOOLEAN +**Argument type:** `STRING ` +**Return type:** `BOOLEAN ` Example ```ppl @@ -82,9 +82,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_object(key1, value1, key2, value2...)` create a json object string with key value pairs. The key must be string. -Argument type: key1: STRING, value1: ANY, key2: STRING, value2: ANY ... -Return type: STRING -Example +**Argument type:** `key1: STRING, value1: ANY, key2: STRING, value2: ANY ...` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -109,9 +109,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_array(element1, element2, ...)` create a json array string with elements. -Argument type: element1: ANY, element2: ANY ... -Return type: STRING -Example +**Argument type:** `element1: ANY, element2: ANY ...` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -136,9 +136,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_array_length(value)` parse the string to json array and return size,, null is returned in case of any other valid JSON string, null or an invalid JSON. -Argument type: value: A JSON STRING -Return type: INTEGER -Example +**Argument type:** `value: A JSON STRING` +**Return type:** `INTEGER` +### Example ```ppl source=json_test @@ -181,9 +181,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_extract(json_string, path1, path2, ...)` Extracts values using the specified JSON paths. If only one path is provided, it returns a single value. If multiple paths are provided, it returns a JSON Array in the order of the paths. If one path cannot find value, return null as the result for this path. The path use "{}" to represent index for array, "{}" means "{*}". -Argument type: json_string: STRING, path1: STRING, path2: STRING ... -Return type: STRING -Example +**Argument type:** `json_string: STRING, path1: STRING, path2: STRING ...` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -226,9 +226,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_delete(json_string, path1, path2, ...)` Delete values using the specified JSON paths. Return the json string after deleting. If one path cannot find value, do nothing. -Argument type: json_string: STRING, path1: STRING, path2: STRING ... -Return type: STRING -Example +**Argument type:** `json_string: STRING, path1: STRING, path2: STRING ...` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -289,9 +289,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_set(json_string, path1, value1, path2, value2...)` Set values to corresponding paths using the specified JSON paths. If one path's parent node is not a json object, skip the path. Return the json string after setting. -Argument type: json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ... -Return type: STRING -Example +**Argument type:** `json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ...` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -334,9 +334,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_append(json_string, path1, value1, path2, value2...)` Append values to corresponding paths using the specified JSON paths. If one path's target node is not an array, skip the path. Return the json string after setting. -Argument type: json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ... -Return type: STRING -Example +**Argument type:** `json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ...` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -397,9 +397,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_extend(json_string, path1, value1, path2, value2...)` Extend values to corresponding paths using the specified JSON paths. If one path's target node is not an array, skip the path. The function will try to parse the value as an array. If it can be parsed, extend it to the target array. Otherwise, regard the value a single one. Return the json string after setting. -Argument type: json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ... -Return type: STRING -Example +**Argument type:** `json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ...` +**Return type:** `STRING` +### Example ```ppl source=json_test @@ -460,9 +460,9 @@ fetched rows / total rows = 1/1 ### Description Usage: `json_keys(json_string)` Return the key list of the Json object as a Json array. Otherwise, return null. -Argument type: json_string: A JSON STRING -Return type: STRING -Example +**Argument type:** `json_string: A JSON STRING` +**Return type:** `STRING` +### Example ```ppl source=json_test diff --git a/docs/user/ppl/functions/math.md b/docs/user/ppl/functions/math.md index 6b2fe319df..834e3523fd 100644 --- a/docs/user/ppl/functions/math.md +++ b/docs/user/ppl/functions/math.md @@ -4,10 +4,10 @@ ### Description -Usage: abs(x) calculates the abs x. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: INTEGER/LONG/FLOAT/DOUBLE -Example +Usage: `abs(x)` calculates the abs x. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `INTEGER/LONG/FLOAT/DOUBLE` +### Example ```ppl source=people @@ -30,11 +30,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: add(x, y) calculates x plus y. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: Wider number between x and y +Usage: `add(x, y)` calculates x plus y. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `Wider number between x and y` Synonyms: Addition Symbol (+) -Example +### Example ```ppl source=people @@ -57,11 +57,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: subtract(x, y) calculates x minus y. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: Wider number between x and y +Usage: `subtract(x, y)` calculates x minus y. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `Wider number between x and y` Synonyms: Subtraction Symbol (-) -Example +### Example ```ppl source=people @@ -84,11 +84,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: multiply(x, y) calculates the multiplication of x and y. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: Wider number between x and y. If y equals to 0, then returns NULL. +Usage: `multiply(x, y)` calculates the multiplication of x and y. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `Wider number between x and y. If y equals to 0, then returns NULL.` Synonyms: Multiplication Symbol (\*) -Example +### Example ```ppl source=people @@ -111,11 +111,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: divide(x, y) calculates x divided by y. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: Wider number between x and y +Usage: `divide(x, y)` calculates x divided by y. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `Wider number between x and y` Synonyms: Division Symbol (/) -Example +### Example ```ppl source=people @@ -138,11 +138,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: sum(x, y, ...) calculates the sum of all provided arguments. This function accepts a variable number of arguments. +Usage: `sum(x, y, ...)` calculates the sum of all provided arguments. This function accepts a variable number of arguments. Note: This function is only available in the eval command context and is rewritten to arithmetic addition while query parsing. -Argument type: Variable number of INTEGER/LONG/FLOAT/DOUBLE arguments -Return type: Wider number type among all arguments -Example +**Argument type:** `Variable number of INTEGER/LONG/FLOAT/DOUBLE arguments` +**Return type:** `Wider number type among all arguments` +### Example ```ppl source=accounts @@ -188,11 +188,11 @@ fetched rows / total rows = 4/4 ### Description -Usage: avg(x, y, ...) calculates the average (arithmetic mean) of all provided arguments. This function accepts a variable number of arguments. +Usage: `avg(x, y, ...)` calculates the average (arithmetic mean) of all provided arguments. This function accepts a variable number of arguments. Note: This function is only available in the eval command context and is rewritten to arithmetic expression (sum / count) at query parsing time. -Argument type: Variable number of INTEGER/LONG/FLOAT/DOUBLE arguments -Return type: DOUBLE -Example +**Argument type:** `Variable number of INTEGER/LONG/FLOAT/DOUBLE arguments` +**Return type:** `DOUBLE` +### Example ```ppl source=accounts @@ -238,10 +238,10 @@ fetched rows / total rows = 4/4 ### Description -Usage: acos(x) calculates the arc cosine of x. Returns NULL if x is not in the range -1 to 1. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `acos(x)` calculates the arc cosine of x. Returns NULL if x is not in the range -1 to 1. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -264,10 +264,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: asin(x) calculate the arc sine of x. Returns NULL if x is not in the range -1 to 1. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `asin(x)` calculate the arc sine of x. Returns NULL if x is not in the range -1 to 1. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -290,10 +290,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: atan(x) calculates the arc tangent of x. atan(y, x) calculates the arc tangent of y / x, except that the signs of both arguments are used to determine the quadrant of the result. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `atan(x)` calculates the arc tangent of x. atan(y, x) calculates the arc tangent of y / x, except that the signs of both arguments are used to determine the quadrant of the result. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -317,9 +317,9 @@ fetched rows / total rows = 1/1 ### Description Usage: atan2(y, x) calculates the arc tangent of y / x, except that the signs of both arguments are used to determine the quadrant of the result. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -345,12 +345,12 @@ An alias for [CEILING](#ceiling) function. ### Description -Usage: CEILING(T) takes the ceiling of value T. +Usage: `CEILING(T)` takes the ceiling of value T. Note: [CEIL](#ceil) and CEILING functions have the same implementation & functionality Limitation: CEILING only works as expected when IEEE 754 double type displays decimal when stored. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: same type with input -Example +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `same type with input` +### Example ```ppl source=people @@ -390,10 +390,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: CONV(x, a, b) converts the number x from a base to b base. -Argument type: x: STRING, a: INTEGER, b: INTEGER -Return type: STRING -Example +Usage: `CONV(x, a, b)` converts the number x from a base to b base. +**Argument type:** `x: STRING, a: INTEGER, b: INTEGER` +**Return type:** `STRING` +### Example ```ppl source=people @@ -416,10 +416,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: cos(x) calculates the cosine of x, where x is given in radians. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `cos(x)` calculates the cosine of x, where x is given in radians. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -442,10 +442,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: cosh(x) calculates the hyperbolic cosine of x, defined as (((e^x) + (e^(-x))) / 2). -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `cosh(x)` calculates the hyperbolic cosine of x, defined as (((e^x) + (e^(-x))) / 2). +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -468,10 +468,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: cot(x) calculates the cotangent of x. Returns out-of-range error if x equals to 0. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `cot(x)` calculates the cotangent of x. Returns out-of-range error if x equals to 0. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -495,9 +495,9 @@ fetched rows / total rows = 1/1 ### Description Usage: Calculates a cyclic redundancy check value and returns a 32-bit unsigned value. -Argument type: STRING -Return type: LONG -Example +**Argument type:** `STRING` +**Return type:** `LONG` +### Example ```ppl source=people @@ -520,10 +520,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: degrees(x) converts x from radians to degrees. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `degrees(x)` converts x from radians to degrees. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -546,9 +546,9 @@ fetched rows / total rows = 1/1 ### Description -Usage: E() returns the Euler's number -Return type: DOUBLE -Example +Usage: `E()` returns the Euler's number +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -571,10 +571,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: exp(x) return e raised to the power of x. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `exp(x)` return e raised to the power of x. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -598,9 +598,9 @@ fetched rows / total rows = 1/1 ### Description Usage: expm1(NUMBER T) returns the exponential of T, minus 1. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -623,11 +623,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: FLOOR(T) takes the floor of value T. +Usage: `FLOOR(T)` takes the floor of value T. Limitation: FLOOR only works as expected when IEEE 754 double type displays decimal when stored. -Argument type: a: INTEGER/LONG/FLOAT/DOUBLE -Return type: same type with input -Example +**Argument type:** `a: INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `same type with input` +### Example ```ppl source=people @@ -684,10 +684,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: ln(x) return the the natural logarithm of x. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `ln(x)` return the the natural logarithm of x. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -711,10 +711,10 @@ fetched rows / total rows = 1/1 ### Description Specifications: -Usage: log(x) returns the natural logarithm of x that is the base e logarithm of the x. log(B, x) is equivalent to log(x)/log(B). -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `log(x)` returns the natural logarithm of x that is the base e logarithm of the x. log(B, x) is equivalent to log(x)/log(B). +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -739,9 +739,9 @@ fetched rows / total rows = 1/1 Specifications: Usage: log2(x) is equivalent to log(x)/log(2). -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -766,9 +766,9 @@ fetched rows / total rows = 1/1 Specifications: Usage: log10(x) is equivalent to log(x)/log(10). -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -791,10 +791,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: MOD(n, m) calculates the remainder of the number n divided by m. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: Wider type between types of n and m if m is nonzero value. If m equals to 0, then returns NULL. -Example +Usage: `MOD(n, m)` calculates the remainder of the number n divided by m. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `Wider type between types of n and m if m is nonzero value. If m equals to 0, then returns NULL.` +### Example ```ppl source=people @@ -817,10 +817,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: MODULUS(n, m) calculates the remainder of the number n divided by m. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: Wider type between types of n and m if m is nonzero value. If m equals to 0, then returns NULL. -Example +Usage: `MODULUS(n, m)` calculates the remainder of the number n divided by m. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `Wider type between types of n and m if m is nonzero value. If m equals to 0, then returns NULL.` +### Example ```ppl source=people @@ -843,9 +843,9 @@ fetched rows / total rows = 1/1 ### Description -Usage: PI() returns the constant pi -Return type: DOUBLE -Example +Usage: `PI()` returns the constant pi +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -868,11 +868,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: POW(x, y) calculates the value of x raised to the power of y. Bad inputs return NULL result. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE +Usage: `POW(x, y)` calculates the value of x raised to the power of y. Bad inputs return NULL result. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` Synonyms: [POWER](#power) -Example +### Example ```ppl source=people @@ -895,11 +895,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: POWER(x, y) calculates the value of x raised to the power of y. Bad inputs return NULL result. -Argument type: INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE +Usage: `POWER(x, y)` calculates the value of x raised to the power of y. Bad inputs return NULL result. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE, INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` Synonyms: [POW](#pow) -Example +### Example ```ppl source=people @@ -922,10 +922,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: radians(x) converts x from degrees to radians. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `radians(x)` converts x from degrees to radians. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -948,10 +948,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: RAND()/RAND(N) returns a random floating-point value in the range 0 <= value < 1.0. If integer N is specified, the seed is initialized prior to execution. One implication of this behavior is with identical argument N, rand(N) returns the same value each time, and thus produces a repeatable sequence of column values. -Argument type: INTEGER -Return type: FLOAT -Example +Usage: `RAND()`/`RAND(`N) returns a random floating-point value in the range 0 <= value < 1.0. If integer N is specified, the seed is initialized prior to execution. One implication of this behavior is with identical argument N, rand(N) returns the same value each time, and thus produces a repeatable sequence of column values. +**Argument type:** `INTEGER` +**Return type:** `FLOAT` +### Example ```ppl source=people @@ -974,12 +974,12 @@ fetched rows / total rows = 1/1 ### Description -Usage: ROUND(x, d) rounds the argument x to d decimal places, d defaults to 0 if not specified -Argument type: INTEGER/LONG/FLOAT/DOUBLE +Usage: `ROUND(x, d)` rounds the argument x to d decimal places, d defaults to 0 if not specified +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` Return type map: (INTEGER/LONG [,INTEGER]) -> LONG (FLOAT/DOUBLE [,INTEGER]) -> LONG -Example +### Example ```ppl source=people @@ -1003,9 +1003,9 @@ fetched rows / total rows = 1/1 ### Description Usage: Returns the sign of the argument as -1, 0, or 1, depending on whether the number is negative, zero, or positive -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: same type with input -Example +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `same type with input` +### Example ```ppl source=people @@ -1029,10 +1029,10 @@ fetched rows / total rows = 1/1 ### Description Usage: Returns the sign of the argument as -1, 0, or 1, depending on whether the number is negative, zero, or positive -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: INTEGER +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `INTEGER` Synonyms: `SIGN` -Example +### Example ```ppl source=people @@ -1055,10 +1055,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: sin(x) calculates the sine of x, where x is given in radians. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `sin(x)` calculates the sine of x, where x is given in radians. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -1081,10 +1081,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: sinh(x) calculates the hyperbolic sine of x, defined as (((e^x) - (e^(-x))) / 2). -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `sinh(x)` calculates the hyperbolic sine of x, defined as (((e^x) - (e^(-x))) / 2). +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people @@ -1108,11 +1108,11 @@ fetched rows / total rows = 1/1 ### Description Usage: Calculates the square root of a non-negative number -Argument type: INTEGER/LONG/FLOAT/DOUBLE +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` Return type map: (Non-negative) INTEGER/LONG/FLOAT/DOUBLE -> DOUBLE (Negative) INTEGER/LONG/FLOAT/DOUBLE -> NULL -Example +### Example ```ppl source=people @@ -1136,10 +1136,10 @@ fetched rows / total rows = 1/1 ### Description Usage: Calculates the cube root of a number -Argument type: INTEGER/LONG/FLOAT/DOUBLE +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` Return type DOUBLE: INTEGER/LONG/FLOAT/DOUBLE -> DOUBLE -Example +### Example ```ppl ignore source=location @@ -1163,10 +1163,10 @@ fetched rows / total rows = 2/2 ### Description -Usage: rint(NUMBER T) returns T rounded to the closest whole integer number. -Argument type: INTEGER/LONG/FLOAT/DOUBLE -Return type: DOUBLE -Example +Usage: `rint(NUMBER T)` returns T rounded to the closest whole integer number. +**Argument type:** `INTEGER/LONG/FLOAT/DOUBLE` +**Return type:** `DOUBLE` +### Example ```ppl source=people diff --git a/docs/user/ppl/functions/statistical.md b/docs/user/ppl/functions/statistical.md index b109856691..7f87e11ca5 100644 --- a/docs/user/ppl/functions/statistical.md +++ b/docs/user/ppl/functions/statistical.md @@ -4,11 +4,14 @@ ### Description -Usage: max(x, y, ...) returns the maximum value from all provided arguments. Strings are treated as greater than numbers, so if provided both strings and numbers, it will return the maximum string value (lexicographically ordered) +Usage: `max(x, y, ...)` returns the maximum value from all provided arguments. Strings are treated as greater than numbers, so if provided both strings and numbers, it will return the maximum string value (lexicographically ordered). + Note: This function is only available in the eval command context. -Argument type: Variable number of INTEGER/LONG/FLOAT/DOUBLE/STRING arguments -Return type: Type of the selected argument -Example + +**Argument type:** Variable number of `INTEGER`/`LONG`/`FLOAT`/`DOUBLE`/`STRING` arguments +**Return type:** Type of the selected argument + +### Example ```ppl source=accounts @@ -74,11 +77,14 @@ fetched rows / total rows = 4/4 ### Description -Usage: min(x, y, ...) returns the minimum value from all provided arguments. Strings are treated as greater than numbers, so if provided both strings and numbers, it will return the minimum numeric value. +Usage: `min(x, y, ...)` returns the minimum value from all provided arguments. Strings are treated as greater than numbers, so if provided both strings and numbers, it will return the minimum numeric value. + Note: This function is only available in the eval command context. -Argument type: Variable number of INTEGER/LONG/FLOAT/DOUBLE/STRING arguments -Return type: Type of the selected argument -Example + +**Argument type:** Variable number of `INTEGER`/`LONG`/`FLOAT`/`DOUBLE`/`STRING` arguments +**Return type:** Type of the selected argument + +### Example ```ppl source=accounts @@ -139,4 +145,4 @@ fetched rows / total rows = 4/4 | 33 | Dale | 33 | +-----+-----------+--------+ ``` - \ No newline at end of file + diff --git a/docs/user/ppl/functions/string.md b/docs/user/ppl/functions/string.md index 04a3485c49..c1c64d21da 100644 --- a/docs/user/ppl/functions/string.md +++ b/docs/user/ppl/functions/string.md @@ -4,10 +4,10 @@ ### Description -Usage: CONCAT(str1, str2, ...., str_9) adds up to 9 strings together. -Argument type: STRING, STRING, ...., STRING -Return type: STRING -Example +Usage: `CONCAT(str1, str2, ...., str_9)` adds up to 9 strings together. +**Argument type:** `STRING, STRING, ...., STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -30,10 +30,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: CONCAT_WS(sep, str1, str2) returns str1 concatenated with str2 using sep as a separator between them. -Argument type: STRING, STRING, STRING -Return type: STRING -Example +Usage: `CONCAT_WS(sep, str1, str2)` returns str1 concatenated with str2 using sep as a separator between them. +**Argument type:** `STRING, STRING, STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -59,10 +59,10 @@ fetched rows / total rows = 1/1 Specifications: 1. LENGTH(STRING) -> INTEGER -Usage: length(str) returns length of string measured in bytes. -Argument type: STRING -Return type: INTEGER -Example +Usage: `length(str)` returns length of string measured in bytes. +**Argument type:** `STRING` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -85,7 +85,7 @@ fetched rows / total rows = 1/1 ### Description -Usage: like(string, PATTERN[, case_sensitive]) return true if the string match the PATTERN. `case_sensitive` is optional. When set to `true`, PATTERN is **case-sensitive**. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. +Usage: `like(string, PATTERN[, case_sensitive])` return true if the string match the PATTERN. `case_sensitive` is optional. When set to `true`, PATTERN is **case-sensitive**. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`. * When `plugins.ppl.syntax.legacy.preferred=true`, `case_sensitive` defaults to `false` * When `plugins.ppl.syntax.legacy.preferred=false`, `case_sensitive` defaults to `true` @@ -93,9 +93,9 @@ There are two wildcards often used in conjunction with the LIKE operator: * `%` - The percent sign represents zero, one, or multiple characters * `_` - The underscore represents a single character -Argument type: STRING, STRING [, BOOLEAN] -Return type: INTEGER -Example +**Argument type:** `STRING, STRING [, BOOLEAN]` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -119,14 +119,14 @@ Limitation: The pushdown of the LIKE function to a DSL wildcard query is support ### Description -Usage: ilike(string, PATTERN) return true if the string match the PATTERN, PATTERN is **case-insensitive**. +Usage: `ilike(string, PATTERN)` return true if the string match the PATTERN, PATTERN is **case-insensitive**. There are two wildcards often used in conjunction with the ILIKE operator: * `%` - The percent sign represents zero, one, or multiple characters * `_` - The underscore represents a single character -Argument type: STRING, STRING -Return type: INTEGER -Example +**Argument type:** `STRING, STRING` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -150,10 +150,10 @@ Limitation: The pushdown of the ILIKE function to a DSL wildcard query is suppor ### Description -Usage: locate(substr, str[, start]) returns the position of the first occurrence of substring substr in string str, starting searching from position start. If start is not specified, it defaults to 1 (the beginning of the string). Returns 0 if substr is not found. If any argument is NULL, the function returns NULL. -Argument type: STRING, STRING[, INTEGER] -Return type: INTEGER -Example +Usage: `locate(substr, str[, start])` returns the position of the first occurrence of substring substr in string str, starting searching from position start. If start is not specified, it defaults to 1 (the beginning of the string). Returns 0 if substr is not found. If any argument is NULL, the function returns NULL. +**Argument type:** `STRING, STRING[, INTEGER]` +**Return type:** `INTEGER` +### Example ```ppl source=people @@ -176,10 +176,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: lower(string) converts the string to lowercase. -Argument type: STRING -Return type: STRING -Example +Usage: `lower(string)` converts the string to lowercase. +**Argument type:** `STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -202,10 +202,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: ltrim(str) trims leading space characters from the string. -Argument type: STRING -Return type: STRING -Example +Usage: `ltrim(str)` trims leading space characters from the string. +**Argument type:** `STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -229,10 +229,10 @@ fetched rows / total rows = 1/1 ### Description Usage: The syntax POSITION(substr IN str) returns the position of the first occurrence of substring substr in string str. Returns 0 if substr is not in str. Returns NULL if any argument is NULL. -Argument type: STRING, STRING +**Argument type:** `STRING, STRING` Return type INTEGER (STRING IN STRING) -> INTEGER -Example +### Example ```ppl source=people @@ -255,10 +255,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: replace(str, pattern, replacement) returns a string with all occurrences of the pattern replaced by the replacement string in str. If any argument is NULL, the function returns NULL. +Usage: `replace(str, pattern, replacement)` returns a string with all occurrences of the pattern replaced by the replacement string in str. If any argument is NULL, the function returns NULL. **Regular Expression Support**: The pattern argument supports Java regex syntax, including: -Argument type: STRING, STRING (regex pattern), STRING (replacement) -Return type: STRING +**Argument type:** `STRING, STRING (regex pattern), STRING (replacement)` +**Return type:** `STRING` **Important - Regex Special Characters**: The pattern is interpreted as a regular expression. Characters like `.`, `*`, `+`, `[`, `]`, `(`, `)`, `{`, `}`, `^`, `$`, `|`, `?`, and `\` have special meaning in regex. To match them literally, escape with backslashes: * To match `example.com`: use `'example\\.com'` (escape the dots) * To match `value*`: use `'value\\*'` (escape the asterisk) @@ -368,10 +368,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: REVERSE(str) returns reversed string of the string supplied as an argument. -Argument type: STRING -Return type: STRING -Example +Usage: `REVERSE(str)` returns reversed string of the string supplied as an argument. +**Argument type:** `STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -394,10 +394,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: right(str, len) returns the rightmost len characters from the string str, or NULL if any argument is NULL. -Argument type: STRING, INTEGER -Return type: STRING -Example +Usage: `right(str, len)` returns the rightmost len characters from the string str, or NULL if any argument is NULL. +**Argument type:** `STRING, INTEGER` +**Return type:** `STRING` +### Example ```ppl source=people @@ -420,10 +420,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: rtrim(str) trims trailing space characters from the string. -Argument type: STRING -Return type: STRING -Example +Usage: `rtrim(str)` trims trailing space characters from the string. +**Argument type:** `STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -446,11 +446,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: substring(str, start) or substring(str, start, length) returns substring using start and length. With no length, entire string from start is returned. -Argument type: STRING, INTEGER, INTEGER -Return type: STRING +Usage: `substring(str, start)` or substring(str, start, length) returns substring using start and length. With no length, entire string from start is returned. +**Argument type:** `STRING, INTEGER, INTEGER` +**Return type:** `STRING` Synonyms: SUBSTR -Example +### Example ```ppl source=people @@ -474,8 +474,8 @@ fetched rows / total rows = 1/1 ### Description Argument Type: STRING -Return type: STRING -Example +**Return type:** `STRING` +### Example ```ppl source=people @@ -498,10 +498,10 @@ fetched rows / total rows = 1/1 ### Description -Usage: upper(string) converts the string to uppercase. -Argument type: STRING -Return type: STRING -Example +Usage: `upper(string)` converts the string to uppercase. +**Argument type:** `STRING` +**Return type:** `STRING` +### Example ```ppl source=people @@ -524,11 +524,11 @@ fetched rows / total rows = 1/1 ### Description -Usage: regexp_replace(str, pattern, replacement) replace all substrings of the string value that match pattern with replacement and returns modified string value. -Argument type: STRING, STRING, STRING -Return type: STRING +Usage: `regexp_replace(str, pattern, replacement)` replace all substrings of the string value that match pattern with replacement and returns modified string value. +**Argument type:** `STRING, STRING, STRING` +**Return type:** `STRING` Synonyms: [REPLACE](#replace) -Example +### Example ```ppl source=people diff --git a/docs/user/ppl/functions/system.md b/docs/user/ppl/functions/system.md index 4eb2aeb811..4d394d2dd7 100644 --- a/docs/user/ppl/functions/system.md +++ b/docs/user/ppl/functions/system.md @@ -4,11 +4,12 @@ ### Description -Usage: typeof(expr) function returns name of the data type of the value that is passed to it. This can be helpful for troubleshooting or dynamically constructing SQL queries. -Argument type: ANY -Return type: STRING +Usage: `typeof(expr)` function returns name of the data type of the value that is passed to it. This can be helpful for troubleshooting or dynamically constructing SQL queries. -Example +**Argument type:** `ANY` +**Return type:** `STRING` + +### Example ```ppl source=people @@ -26,4 +27,4 @@ fetched rows / total rows = 1/1 | DATE | INT | TIMESTAMP | STRUCT | +--------------+-------------+---------------+----------------+ ``` - \ No newline at end of file + diff --git a/scripts/docs_exporter/README.md b/scripts/docs_exporter/README.md new file mode 100644 index 0000000000..6de9d7dafa --- /dev/null +++ b/scripts/docs_exporter/README.md @@ -0,0 +1,80 @@ +# PPL Documentation Exporter + +Exports PPL documentation to the OpenSearch documentation website. Auto-injects Jekyll front-matter, converts SQL CLI tables to markdown, fixes relative links, and adds copy buttons. + +## Directory Structure + +``` +sql/ +├── docs/user/ppl/ <-- SOURCE +├── scripts/docs_exporter/ <-- THIS TOOL + +documentation-website/ <-- MUST BE SIBLING OF sql/ +└── _sql-and-ppl/ppl/ <-- DESTINATION +``` + +## SOP: Exporting to Documentation Website + +### 1. Clone documentation-website to same root as `sql` repo (first time only) + +```bash +cd /path/to/sql/../ +git clone https://github.com/opensearch-project/documentation-website.git +``` + +### 2. Rebase documentation-website to latest + +```bash +cd documentation-website +git fetch origin +git rebase origin/main +``` + +### 3. Run the export + +As of Dec 17 2025, the migration to auto-export for documentation-website is ongoing. +Currently select directories (e.g. `docs/user/ppl/cmd`) are only exported to documentation-website. + +#### How to export specific directories only: +```bash +# Export only cmd/ +./export_to_docs_website.py --only-dirs cmd + +# Export cmd/ and functions/ +./export_to_docs_website.py --only-dirs cmd,functions +``` + +#### How to export all directories +```bash +cd sql/scripts/docs_exporter +./export_to_docs_website.py +``` + +### 4. Review and commit changes + +```bash +cd documentation-website +git diff +git add -A +git commit -m "Update PPL documentation" +``` + +### 5. Open Pull Request in documentation-website repo +Example: https://github.com/opensearch-project/documentation-website/pull/11688 + +## Options + +| Option | Description | +|--------|-------------| +| `-y, --yes` | Auto-overwrite existing files without prompting | +| `--only-dirs` | Comma-separated list of directories to export (e.g., `cmd`, `cmd,functions`) | + +## What the exporter does + +- Injects Jekyll front-matter (title, parent, nav_order, etc.) +- Converts SQL CLI table output to markdown tables +- Converts `docs.opensearch.org` links to Jekyll site variables +- Fixes relative links to use `{{site.url}}{{site.baseurl}}` +- Converts `ppl` code fences to `sql` +- Adds copy buttons to code blocks +- Rolls up third-level directories to avoid Jekyll rendering limitations diff --git a/scripts/docs_exporter/export_to_docs_website.py b/scripts/docs_exporter/export_to_docs_website.py index 0ba63aa5c5..90d7e34815 100755 --- a/scripts/docs_exporter/export_to_docs_website.py +++ b/scripts/docs_exporter/export_to_docs_website.py @@ -24,14 +24,15 @@ import re import os +import argparse from collections import defaultdict from pathlib import Path from typing import Optional # Base path for links in the documentation website -DOCS_BASE_PATH = "sql-and-ppl/ppl-reference" -DOCS_BASE_TITLE = "OpenSearch PPL Reference Manual" +DOCS_PARENT_BASE_PATH = "sql-and-ppl/ppl" +DOCS_PARENT_BASE_TITLE = "PPL" # Directory name to heading mappings (as they appear on website) DIR_NAMES_TO_HEADINGS_MAP = { @@ -44,12 +45,60 @@ "reference": "Reference", } +# Custom redirect_from lists for specific files (relative path from source root) +# Required for backward compatibility from old website links. Injected in Jekyll front-matter. +CUSTOM_REDIRECTS = { + "cmd/index.md": [ + "/search-plugins/sql/ppl/functions/", + "/observability-plugin/ppl/commands/", + "/search-plugins/ppl/commands/", + "/search-plugins/ppl/functions/", + "/sql-and-ppl/ppl/functions/", + ], +} + def get_heading_for_dir(dir_name: str) -> str: """Get heading for directory name, using mapped value or fallback to title-case.""" return DIR_NAMES_TO_HEADINGS_MAP.get(dir_name, dir_name.replace("-", " ").title()) +def convert_sql_table_to_markdown(table_text: str) -> str: + """Convert SQL CLI table format to markdown table.""" + lines = table_text.strip().split('\n') + result = [] + header_done = False + + for line in lines: + # Skip border lines (+---+---+), separator lines (|---+---|), and fetched rows line + if re.match(r'^\+[-+]+\+$', line.strip()) or re.match(r'^\|[-+|]+\|$', line.strip()): + continue + if re.match(r'^fetched rows\s*/\s*total rows\s*=', line.strip()): + continue + # Data/header row + if line.strip().startswith('|') and line.strip().endswith('|'): + cells = [c.strip() for c in line.strip().strip('|').split('|')] + result.append('| ' + ' | '.join(cells) + ' |') + if not header_done: + result.append('|' + '|'.join([' --- ' for _ in cells]) + '|') + header_done = True + + return '\n'.join(result) + + +def convert_tables_in_code_blocks(content: str) -> str: + """Find and convert SQL CLI tables in code blocks to markdown tables.""" + def replace_table(match): + block_content = match.group(1) + # Check if this looks like a SQL CLI table + if re.search(r'^\+[-+]+\+$', block_content, re.MULTILINE): + return convert_sql_table_to_markdown(block_content) + return match.group(0) + + # Match any code block containing tables (language specifier can have spaces like 'ppl ignore') + return re.sub(r'```[^\n]*\n(.*?)```', replace_table, content, flags=re.DOTALL) + + def extract_title(content: str) -> Optional[str]: """Extract title from first H1 heading or return None.""" match = re.search(r'^#\s+(.+)$', content, re.MULTILINE) @@ -62,25 +111,23 @@ def generate_frontmatter( grand_parent: Optional[str] = None, nav_order: int = 1, has_children: bool = False, - redirect_from: Optional[str] = None, + redirect_from: Optional[list] = None, ) -> str: """Generate Jekyll front-matter.""" - def escape_yaml_string(s: str) -> str: - """Escape string for YAML double quotes.""" - return s.replace('\\', '\\\\').replace('"', '\\"') - fm = ["---", "layout: default"] if title: - fm.append(f'title: "{escape_yaml_string(title)}"') + fm.append(f"title: {title}") if parent: - fm.append(f'parent: "{escape_yaml_string(parent)}"') + fm.append(f"parent: {parent}") if grand_parent: - fm.append(f'grand_parent: "{escape_yaml_string(grand_parent)}"') + fm.append(f"grand_parent: {grand_parent}") fm.append(f"nav_order: {nav_order}") if has_children: fm.append("has_children: true") if redirect_from: - fm.append(f'redirect_from: ["{escape_yaml_string(redirect_from)}"]') + fm.append("redirect_from:") + for r in redirect_from: + fm.append(f" - {r}") fm.append("---\n") return "\n".join(fm) @@ -142,11 +189,14 @@ def fix_link(match, current_file_path=None): if resolved_path and not resolved_path.endswith((".html", ".htm")) and not anchor: resolved_path = resolved_path.rstrip("/") + "/" - return f"]({{{{site.url}}}}{{{{site.baseurl}}}}/{DOCS_BASE_PATH}/{resolved_path}{anchor})" + return f"]({{{{site.url}}}}{{{{site.baseurl}}}}/{DOCS_PARENT_BASE_PATH}/{resolved_path}{anchor})" def process_content(content: str, current_file_path=None) -> str: """Process markdown content with PPL->SQL conversion, copy buttons, and link fixes.""" + # Convert SQL CLI tables in code blocks to markdown tables + content = convert_tables_in_code_blocks(content) + # Convert PPL code fences to SQL content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE) @@ -163,10 +213,24 @@ def fix_link_with_context(match): r"\]\((?!https?://)(.*?)(\.md)?(#[^\)]*)?\)", fix_link_with_context, content ) + # Convert docs.opensearch.org links to site variables + def fix_opensearch_link(match): + path = match.group(1) + return f"]({{{{site.url}}}}{{{{site.baseurl}}}}{path})" + + content = re.sub( + r"\]\(https://docs\.opensearch\.org/[^/]+(.*?)\)", fix_opensearch_link, content + ) + return content -def export_docs(source_dir: Path, target_dir: Path) -> None: +def export_docs( + source_dir: Path, + target_dir: Path, + auto_yes: bool = False, + only_dirs: Optional[set] = None, +) -> None: """Export PPL docs to documentation website.""" if not source_dir.exists(): print(f"Source directory {source_dir} not found") @@ -174,14 +238,33 @@ def export_docs(source_dir: Path, target_dir: Path) -> None: # Check if target directory exists and has files if target_dir.exists() and any(target_dir.glob('**/*.md')): - response = input(f"Target directory {target_dir} contains files. Overwrite? (y/n): ") - if response.lower() != 'y': - print("Export cancelled") - return + if auto_yes: + print( + f"Target directory {target_dir} contains files. Auto-overwriting (--yes flag)." + ) + else: + response = input( + f"Target directory {target_dir} contains files. Overwrite? (y/n): " + ) + if response.lower() != "y": + print("Export cancelled") + return # Get all markdown files sorted alphabetically md_files = sorted(source_dir.glob("**/*.md")) + # Filter to only specified directories if provided + if only_dirs: + md_files = [ + f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs + ] + + # Filter to only specified directories if provided + if only_dirs: + md_files = [ + f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs + ] + # Group files by directory for local nav_order files_by_dir = defaultdict(list) @@ -204,9 +287,12 @@ def export_docs(source_dir: Path, target_dir: Path) -> None: for _, files in files_by_dir.items(): for i, md_file in enumerate(files, 1): rel_path = md_file.relative_to(source_dir) + rel_path_str = str(rel_path) + + # Check for custom redirects + redirect_from = CUSTOM_REDIRECTS.get(rel_path_str, None) # Roll up third-level files to second level to avoid rendering limitations - redirect_from = None if len(rel_path.parts) >= 3: # Move from admin/connectors/file.md to admin/connectors_file.md parent_dir = rel_path.parts[0] # e.g., "admin" @@ -216,8 +302,8 @@ def export_docs(source_dir: Path, target_dir: Path) -> None: target_file = target_dir / parent_dir / new_filename # Generate redirect_from for the original path - original_path = f"/{DOCS_BASE_PATH}/{rel_path.with_suffix('')}/" - redirect_from = original_path + original_path = f"/{DOCS_PARENT_BASE_PATH}/{rel_path.with_suffix('')}/" + redirect_from = (redirect_from or []) + [original_path] print( f"\033[93mWARNING: Rolling up {rel_path} to {parent_dir}/{new_filename} due to rendering limitations\033[0m" @@ -236,27 +322,33 @@ def export_docs(source_dir: Path, target_dir: Path) -> None: elif len(rel_path.parts) == 2: # Second level files (including rolled-up files) parent = get_heading_for_dir(rel_path.parent.name) - grand_parent = DOCS_BASE_TITLE + grand_parent = DOCS_PARENT_BASE_TITLE else: # This shouldn't happen after roll-up, but keeping for safety parent = get_heading_for_dir(rel_path.parent.name) grand_parent = get_heading_for_dir(rel_path.parts[-3]) - grand_parent = DIR_NAMES_TO_HEADINGS_MAP.get( - grand_parent_name, grand_parent_name.replace("-", " ").title() - ) - # Check if this is the root index.md and has children - is_root_index = rel_path.name == "index.md" and rel_path.parent == Path(".") - has_children = ( - is_root_index - or (md_file.parent / md_file.stem).is_dir() - and any((md_file.parent / md_file.stem).glob("*/*.md")) - ) + # Check if this is an index.md (root or directory) - these have children + is_index = rel_path.name == "index.md" + has_children = is_index + + # For directory index files, parent should be one level up + if is_index and rel_path.parent != Path("."): + parent = DOCS_PARENT_BASE_TITLE + grand_parent = None - title = ( - extract_title(md_file.read_text(encoding="utf-8")) - or md_file.stem.replace("-", " ").title() - ) + # Determine title - use directory name for index files, filename for cmd files + if is_index: + # For index files, use the directory heading as title + title = get_heading_for_dir(rel_path.parent.name) if rel_path.parent != Path(".") else DOCS_PARENT_BASE_TITLE + elif len(rel_path.parts) >= 2 and rel_path.parts[0] == "cmd": + # For command files, use filename as ground truth (lowercase) + title = md_file.stem.replace("-", " ") + else: + title = ( + extract_title(md_file.read_text(encoding="utf-8")) + or md_file.stem.replace("-", " ").title() + ) frontmatter = generate_frontmatter( title, parent, grand_parent, i, has_children, redirect_from ) @@ -278,6 +370,12 @@ def export_docs(source_dir: Path, target_dir: Path) -> None: # Skip third-level directories since files are rolled up if len(dir_path.parts) > 1: continue + # Skip directories not in only_dirs filter + if only_dirs and dir_path.parts[0] not in only_dirs: + continue + # Skip if source index.md exists (it will be exported with the other files) + if (source_dir / dir_path / "index.md").exists(): + continue target_index = target_dir / dir_path / "index.md" title = get_heading_for_dir(dir_path.name) @@ -285,7 +383,7 @@ def export_docs(source_dir: Path, target_dir: Path) -> None: # Determine parent for directory index based on depth if len(dir_path.parts) == 1: # Second-level directory (e.g., admin/) - parent is root title - parent = DOCS_BASE_TITLE + parent = DOCS_PARENT_BASE_TITLE else: # This shouldn't happen after filtering, but keeping for safety parent = get_heading_for_dir(dir_path.parent.name) @@ -296,7 +394,26 @@ def export_docs(source_dir: Path, target_dir: Path) -> None: if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Export PPL docs to documentation website" + ) + parser.add_argument( + "-y", + "--yes", + action="store_true", + help="Automatically overwrite existing files without prompting", + ) + parser.add_argument( + "--only-dirs", + type=str, + help="Comma-separated list of directories to export (e.g., 'cmd' or 'cmd,functions')", + ) + args = parser.parse_args() + script_dir = Path(__file__).parent source_dir_ppl = script_dir / "../../docs/user/ppl" - target_dir_ppl = script_dir / f"../../../documentation-website/_{DOCS_BASE_PATH}" - export_docs(source_dir_ppl, target_dir_ppl) + target_dir_ppl = ( + script_dir / f"../../../documentation-website/_{DOCS_PARENT_BASE_PATH}" + ) + only_dirs = set(args.only_dirs.split(",")) if args.only_dirs else None + export_docs(source_dir_ppl, target_dir_ppl, args.yes, only_dirs)