opensearch-project · natebower · Dec 24, 2025 · Dec 8, 2025 · Dec 23, 2025 · Dec 23, 2025
@@ -9,3 +9,8 @@ Gemfile.lock
 .jekyll-cache
 .project
 vendor/bundle
+node_modules
+.vscode
+.ruby-version
+cdk*
+.dev*
@@ -16,9 +16,9 @@ You can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) dat
 
 Before connecting a data source, verify that the following requirements are met:
 
-- You have access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2).
+- You have access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.md#id2).
 - You have access to OpenSearch and OpenSearch Dashboards.
-- You have an understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for more information.
+- You have an understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.md#introduction) for more information.
 
 ## Connect your data source 
 
@@ -46,5 +46,5 @@ This feature is currently under development, including the data integration func
 
 - Learn about [querying your data in Data Explorer]({{site.url}}{{site.baseurl}}/dashboards/management/query-data-source/) through OpenSearch Dashboards.
 - Learn about [optimizing the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench. 
-- Learn about [Amazon S3 and AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst) and the APIS used with Amazon S3 data sources, including configuration settings and query examples.
+- Learn about [Amazon S3 and AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.md) and the APIs used with Amazon S3 data sources, including configuration settings and query examples.
 - Learn about [managing your indexes]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards.
@@ -0,0 +1,157 @@
+---
+layout: default
+title: ad
+parent: Commands
+grand_parent: PPL
+nav_order: 1
+---
+
+# ad (Deprecated)
+
+The `ad` command is deprecated in favor of the [`ml` command]({{site.url}}{{site.baseurl}}/sql-and-ppl/ppl/commands/ml/).
+{: .warning}
+
+The `ad` command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin to the search results returned by a PPL command. The command provides two anomaly detection approaches:
+
+- [Anomaly detection for time-series data](#anomaly-detection-for-time-series-data) using the fixed-in-time RCF algorithm
+- [Anomaly detection for non-time-series data](#anomaly-detection-for-non-time-series-data) using the batch RCF algorithm
+
+To use the `ad` command, `plugins.calcite.enabled` must be set to `false`.
+{: .note}
+
+## Syntax
+
+The `ad` command has two different syntax variants, depending on the algorithm type.
+
+### Anomaly detection for time-series data
+
+Use this syntax to detect anomalies in time-series data. This method uses the fixed-in-time RCF algorithm, which is optimized for sequential data patterns.
+
+The fixed-in-time RCF `ad` command has the following syntax:
+
+```sql
+ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] <time_field> [date_format] [time_zone] [category_field]
+```
+
+### Parameters
+
+The fixed-in-time RCF algorithm supports the following parameters.
+
+| Parameter | Required/Optional | Description |
+| --- | --- | --- |
+| `time_field` | Required | The time field for RCF to use as time-series data. |
+| `number_of_trees` | Optional | The number of trees in the forest. Default is `30`. |
+| `shingle_size` | Optional | The number of records in a shingle. A shingle is a consecutive sequence of the most recent records. Default is `8`. |
+| `sample_size` | Optional | The sample size used by the stream samplers in this forest. Default is `256`. |
+| `output_after` | Optional | The number of points required by the stream samplers before results are returned. Default is `32`. |
+| `time_decay` | Optional | The decay factor used by the stream samplers in this forest. Default is `0.0001`. |
+| `anomaly_rate` | Optional | The anomaly rate. Default is `0.005`. |
+| `date_format` | Optional | The format used for the `time_field` field. Default is `yyyy-MM-dd HH:mm:ss`. |
+| `time_zone` | Optional | The time zone for the `time_field` field. Default is `UTC`. |
+| `category_field` | Optional | The category field used to group input values. The predict operation is applied to each category independently. |  
+
+
+### Anomaly detection for non-time-series data
+
+Use this syntax to detect anomalies in data where the order doesn't matter. This method uses the batch RCF algorithm, which is optimized for independent data points.
+
+The batch RCF `ad` command has the following syntax:
+
+```sql
+ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]
+```
+
+### Parameters
+
+The batch RCF algorithm supports the following parameters.
+
+| Parameter | Required/Optional | Description |
+| --- | --- | --- |
+| `number_of_trees` | Optional | The number of trees in the forest. Default is `30`. |
+| `sample_size` | Optional | The number of random samples provided to each tree from the training dataset. Default is `256`. |
+| `output_after` | Optional | The number of points required by the stream samplers before results are returned. Default is `32`. |
+| `training_data_size` | Optional | The size of the training dataset. Default is the full dataset size. |
+| `anomaly_score_threshold` | Optional | The anomaly score threshold. Default is `1.0`. |
+| `category_field` | Optional | The category field used to group input values. The predict operation is applied to each category independently. |  
+
+
+## Example 1: Detecting events in New York City taxi ridership time-series data
+
+The following examples use the `nyc_taxi` dataset, which contains New York City taxi ridership data with fields including `value` (number of rides), `timestamp` (time of measurement), and `category` (time period classifications such as 'day' and 'night').
+
+This example trains an RCF model and uses it to detect anomalies in time-series ridership data:
+
+```sql
+source=nyc_taxi
+| fields value, timestamp
+| AD time_field='timestamp'
+| where value=10844.0
+```
+{% include copy.html %}
+
+The query returns the following results:
+
+| value | timestamp | score | anomaly_grade |
+| --- | --- | --- | --- |
+| 10844.0 | 2014-07-01 00:00:00 | 0.0 | 0.0 |
+
+
+## Example 2: Detecting events in New York City taxi ridership time-series data by category
+
+This example trains an RCF model and uses it to detect anomalies in time-series ridership data across multiple category values:
+
+```sql
+source=nyc_taxi
+| fields category, value, timestamp
+| AD time_field='timestamp' category_field='category'
+| where value=10844.0 or value=6526.0
+```
+{% include copy.html %}
+
+The query returns the following results:
+
+| category | value | timestamp | score | anomaly_grade |
+| --- | --- | --- | --- | --- |
+| night | 10844.0 | 2014-07-01 00:00:00 | 0.0 | 0.0 |
+| day | 6526.0 | 2014-07-01 06:00:00 | 0.0 | 0.0 |
+
+
+## Example 3: Detecting events in New York City taxi ridership non-time-series data
+
+This example trains an RCF model and uses it to detect anomalies in non-time-series ridership data:
+
+```sql
+source=nyc_taxi
+| fields value
+| AD
+| where value=10844.0
+```
+{% include copy.html %}
+
+The query returns the following results:
+
+| value | score | anomalous |
+| --- | --- | --- |
+| 10844.0 | 0.0 | False |
+
+
+## Example 4: Detecting events in New York City taxi ridership non-time-series data by category
+
+This example trains an RCF model and uses it to detect anomalies in non-time-series ridership data across multiple category values:
+
+```sql
+source=nyc_taxi
+| fields category, value
+| AD category_field='category'
+| where value=10844.0 or value=6526.0
+```
+{% include copy.html %}
+
+The query returns the following results:
+
+| category | value | score | anomalous |
+| --- | --- | --- | --- |
+| night | 10844.0 | 0.0 | False |
+| day | 6526.0 | 0.0 | False |
+
+
@@ -0,0 +1,94 @@
+---
+layout: default
+title: addcoltotals
+parent: Commands
+grand_parent: PPL
+nav_order: 2
+---
+
+# addcoltotals
+
+The `addcoltotals` command computes the sum of each column and adds a summary row showing the total for each column. This command is equivalent to using `addtotals` with `row=false` and `col=true`, making it useful for creating summary reports with column totals.
+
+The command only processes numeric fields (integers, floats, doubles). Non-numeric fields are ignored regardless of whether they are explicitly specified in the field list.
+
+
+## Syntax
+
+The `addcoltotals` command has the following syntax:
+
+```sql
+addcoltotals [field-list] [label=<string>] [labelfield=<field>]
+```
+
+## Parameters
+
+The `addcoltotals` command supports the following parameters.
+
+| Parameter | Required/Optional | Description |
+| --- | --- | --- |
+| `<field-list>` | Optional | A comma-separated list of numeric fields to add. By default, all numeric fields are added. |
+| `labelfield` | Optional | The field in which the label is placed. If the field does not exist, it is created and the label is shown in the summary row (last row) of the new field. |
+| `label` | Optional | The text that appears in the summary row (last row) to identify the computed totals. When used with `labelfield`, this text is placed in the specified field in the summary row. Default is `Total`. |
+
+### Example 1: Basic example
+
+The following query places the label in an existing field:
+
+```sql
+source=accounts 
+| fields firstname, balance 
+| head 3 
+| addcoltotals labelfield='firstname'
+```
+{% include copy.html %}
+
+The query returns the following results:
+
+| firstname | balance |
+| --- | --- |
+| Amber | 39225 |
+| Hattie | 5686 |
+| Nanette | 32838 |
+| Total | 77749 |
+
+## Example 2: Adding column totals with a custom summary label
+
+The following query adds totals after a `stats` command where the final summary event label is `Sum`. It also creates a new field specified by `labelfield` because this field does not exist in the data:
+
+```sql
+source=accounts 
+| stats count() by gender 
+| addcoltotals `count()` label='Sum' labelfield='Total'
+```
+{% include copy.html %}
+
+The query returns the following results:
+
+| count() | gender | Total |
+| --- | --- | --- |
+| 1 | F | null |
+| 3 | M | null |
+| 4 | null | Sum |
+
+## Example 3: Using all options
+
+The following query uses the `addcoltotals` command with all options set:
+
+```sql
+source=accounts 
+| where age > 30 
+| stats avg(balance) as avg_balance, count() as count by state 
+| head 3 
+| addcoltotals avg_balance, count  label='Sum' labelfield='Column Total'
+```
+{% include copy.html %}
+
+The query returns the following results:
+
+| avg_balance | count | state | Column Total |
+| --- | --- | --- | --- |
+| 39225.0 | 1 | IL | null |
+| 4180.0 | 1 | MD | null |
+| 5686.0 | 1 | TN | null |
+| 49091.0 | 3 | null | Sum |