diff --git a/docs/features/index.md b/docs/features/index.md index 331436ea..8e7e0479 100644 --- a/docs/features/index.md +++ b/docs/features/index.md @@ -355,6 +355,7 @@ - [PPL Rename Command](sql/ppl-rename-command.md) - [PPL Rex and Regex Commands](sql/ppl-rex-and-regex-commands.md) - [PPL Spath Command](sql/ppl-spath-command.md) +- [PPL Timechart Command](sql/ppl-timechart-command.md) - [Security Lake Data Source](sql/security-lake-data-source.md) - [SQL Error Handling](sql/sql-error-handling.md) - [SQL Pagination](sql/sql-pagination.md) diff --git a/docs/features/sql/ppl-timechart-command.md b/docs/features/sql/ppl-timechart-command.md new file mode 100644 index 00000000..217ec345 --- /dev/null +++ b/docs/features/sql/ppl-timechart-command.md @@ -0,0 +1,174 @@ +# PPL Timechart Command + +## Summary + +The PPL `timechart` command creates time-series visualizations by aggregating data into time buckets. It supports various span intervals, rate-based aggregation functions (`per_second`, `per_minute`, `per_hour`, `per_day`), custom timestamp fields, and grouping by categorical fields. The command is essential for performance monitoring, log analysis, and trend visualization in OpenSearch. + +## Details + +### Architecture + +```mermaid +graph TB + subgraph "PPL Query Processing" + A[PPL Query with timechart] --> B[Parser] + B --> C[Timechart AST Node] + C --> D[Query Rewriter] + D --> E{per_* function?} + E -->|Yes| F[Transform to sum + eval] + E -->|No| G[Standard aggregation] + F --> H[Chart Command Handler] + G --> H + H --> I[OpenSearch Aggregation Query] + end + + subgraph "Span Processing" + J[Span Definition] --> K{Span Type} + K -->|Fixed| L[Static calculation] + K -->|Variable| M[Dynamic timestampdiff] + L --> N[Time Bucket Creation] + M --> N + end +``` + +### Data Flow + +```mermaid +flowchart LR + A[Source Data] --> B[Time Bucketing] + B --> C[Aggregation] + C --> D{Has per_* function?} + D -->|Yes| E[Rate Normalization] + D -->|No| F[Direct Output] + E --> G[Result Set] + F --> G +``` + +### Components + +| Component | Description | +|-----------|-------------| +| `Timechart` AST | Represents the timechart command in the abstract syntax tree | +| `Chart` Command Handler | Unified handler for both `chart` and `timechart` commands | +| `SpanUnit` | Enum for time span units (millisecond, second, minute, hour, day, week, month, quarter, year) | +| `IntervalUnit` | Enum for interval calculations including millisecond support | +| `PlanUtils` | Utility for span unit to interval unit conversion | + +### Configuration + +| Setting | Description | Default | +|---------|-------------|---------| +| `timefield` | Specifies the timestamp field to use for time bucketing | `@timestamp` | +| `span` | Time interval for bucketing (e.g., `1m`, `5m`, `1h`, `500ms`) | Required | + +### Syntax + +``` +timechart [timefield=] span= ... [by ] +``` + +#### Parameters + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `timefield` | No | Custom timestamp field name (default: `@timestamp`) | +| `span` | Yes | Time bucket interval | +| `aggregation` | Yes | One or more aggregation functions | +| `by` | No | Field to split results by | + +#### Supported Span Units + +| Unit | Abbreviation | Example | +|------|--------------|---------| +| Millisecond | `ms` | `span=500ms` | +| Second | `s` | `span=30s` | +| Minute | `m` | `span=5m` | +| Hour | `h` | `span=1h` | +| Day | `d` | `span=1d` | +| Week | `w` | `span=1w` | +| Month | `mon` | `span=1mon` | +| Quarter | `q` | `span=1q` | +| Year | `y` | `span=1y` | + +### Rate-Based Aggregation Functions + +| Function | Description | Calculation | +|----------|-------------|-------------| +| `per_second(field)` | Per-second rate | `sum(field) / span_in_seconds` | +| `per_minute(field)` | Per-minute rate | `sum(field) / span_in_seconds × 60` | +| `per_hour(field)` | Per-hour rate | `sum(field) / span_in_seconds × 3600` | +| `per_day(field)` | Per-day rate | `sum(field) / span_in_seconds × 86400` | + +### Usage Examples + +#### Basic time-series aggregation + +``` +source=web_logs +| timechart span=1h count() as requests +``` + +#### Rate calculation with per_second + +``` +source=network_logs +| timechart span=5m per_second(bytes) as bytes_per_second +``` + +#### Multiple aggregations + +``` +source=metrics +| timechart span=1m avg(cpu_usage), max(memory_usage), per_second(requests) +``` + +#### Custom timestamp field + +``` +source=custom_events +| timechart timefield=event_timestamp span=1h count() by event_type +``` + +#### Millisecond precision for high-frequency data + +``` +source=trading_data +| timechart span=100ms per_second(transactions) +``` + +#### Grouping by category + +``` +source=application_logs +| timechart span=15m count() by log_level +``` + +## Limitations + +- `per_*` functions work exclusively with the `timechart` command +- Variable-length spans (month/quarter/year) use dynamic calculation which may have slight performance overhead +- The `timechart` command requires a timestamp field in the source data + +## Related PRs + +| Version | PR | Description | +|---------|-----|-------------| +| v3.4.0 | [#4464](https://github.com/opensearch-project/sql/pull/4464) | Add `per_second` function support | +| v3.4.0 | [#4531](https://github.com/opensearch-project/sql/pull/4531) | Add `per_minute`, `per_hour`, `per_day` functions | +| v3.4.0 | [#4672](https://github.com/opensearch-project/sql/pull/4672) | Support millisecond span | +| v3.4.0 | [#4755](https://github.com/opensearch-project/sql/pull/4755) | Merge `timechart` and `chart` implementations | +| v3.4.0 | [#4784](https://github.com/opensearch-project/sql/pull/4784) | Add `timefield` option | + +## References + +- [Issue #4350](https://github.com/opensearch-project/sql/issues/4350): PPL `per_*` aggregation function support +- [Issue #4550](https://github.com/opensearch-project/sql/issues/4550): Millisecond span bug fix +- [Issue #4576](https://github.com/opensearch-project/sql/issues/4576): Custom timestamp field feature request +- [Issue #4581](https://github.com/opensearch-project/sql/issues/4581): Timechart bug fixes +- [Issue #4582](https://github.com/opensearch-project/sql/issues/4582): Timechart bug fixes +- [Issue #4632](https://github.com/opensearch-project/sql/issues/4632): Timechart bug fixes +- [PPL Commands Documentation](https://docs.opensearch.org/3.0/search-plugins/sql/ppl/functions/) + +## Change History + +- **v3.4.0** (2026-01): Added `per_second`, `per_minute`, `per_hour`, `per_day` functions; millisecond span support; `timefield` option; merged `timechart` and `chart` implementations diff --git a/docs/releases/v3.4.0/features/sql/ppl-timechart-functions.md b/docs/releases/v3.4.0/features/sql/ppl-timechart-functions.md new file mode 100644 index 00000000..26b8ddc5 --- /dev/null +++ b/docs/releases/v3.4.0/features/sql/ppl-timechart-functions.md @@ -0,0 +1,162 @@ +# PPL Timechart Functions + +## Summary + +OpenSearch v3.4.0 introduces significant enhancements to the PPL `timechart` command with new rate-based aggregation functions (`per_second`, `per_minute`, `per_hour`, `per_day`), millisecond span support, custom timestamp field specification via `timefield`, and improved internal architecture by merging `timechart` and `chart` implementations. + +## Details + +### What's New in v3.4.0 + +#### Rate-Based Aggregation Functions + +The `per_*` functions calculate rate-based metrics by normalizing aggregated values to specific time units: + +| Function | Description | Multiplier | +|----------|-------------|------------| +| `per_second(field)` | Per-second rate | sum / span_seconds | +| `per_minute(field)` | Per-minute rate | sum / span_seconds × 60 | +| `per_hour(field)` | Per-hour rate | sum / span_seconds × 3600 | +| `per_day(field)` | Per-day rate | sum / span_seconds × 86400 | + +#### Implementation Approach + +The `per_*` functions use an **Eval Transformation** approach that rewrites the query at compile time: + +``` +-- Original query +source=events | timechart span=5m per_second(packets) + +-- Rewritten internally to +source=events | timechart span=5m sum(packets) as `per_second(packets)` + | eval `per_second(packets)` = `per_second(packets)` / 300 +``` + +For variable-length spans (month/quarter/year), the implementation dynamically calculates bucket length: + +``` +-- For span=2mon, uses timestampdiff for accurate calculation +| eval `per_second(packets)` = `per_second(packets)` / + timestampdiff(SECOND, @timestamp, timestampadd(MONTH, 2, @timestamp)) +``` + +#### Custom Timestamp Field (`timefield`) + +Users can now specify a custom timestamp field instead of the implicit `@timestamp`: + +``` +source=events | timechart timefield=start_at span=1hour count() by category +``` + +This resolves the limitation where indexes with non-standard timestamp field names required workarounds like renaming fields. + +#### Millisecond Span Support + +Fixed the bug where millisecond spans were incorrectly converted to microseconds. The `IntervalUnit` enum now includes `MILLISECOND`, enabling accurate time bucketing for sub-second precision: + +``` +source=logs | timechart span=500ms count() +``` + +#### Architecture Improvements + +The `timechart` and `chart` command implementations have been merged. Since `timechart` is semantically a subset of `chart` (with row-split fixed to the timestamp field), this consolidation: + +- Reduces code duplication +- Fixes several existing bugs in `timechart` +- Simplifies maintenance + +### Technical Changes + +```mermaid +graph TB + subgraph "Query Processing" + A[PPL Query] --> B[Parser] + B --> C[Timechart AST] + C --> D{Has per_* function?} + D -->|Yes| E[Rewrite to sum + eval] + D -->|No| F[Standard aggregation] + E --> G[Execute Query] + F --> G + end + + subgraph "Span Calculation" + H[Fixed Span] --> I[Static seconds calculation] + J[Variable Span] --> K[timestampdiff dynamic calculation] + end +``` + +#### New Configuration + +| Setting | Description | Default | +|---------|-------------|---------| +| `timefield` | Custom timestamp field name | `@timestamp` | + +### Usage Examples + +#### Basic per_second calculation + +``` +source=network_logs +| timechart span=1m per_second(packets) +``` + +Result: +``` +| @timestamp | per_second(packets) | +|---------------------|---------------------| +| 2025-09-08 10:00:00 | 2.0 | +``` + +#### Multiple rate functions + +``` +source=network_logs +| timechart span=1m per_second(packets), per_minute(packets), per_hour(packets) +``` + +#### Custom timestamp field + +``` +source=ocsf_events +| timechart timefield=event_time span=1h count() by category +``` + +#### Millisecond precision + +``` +source=high_frequency_logs +| timechart span=100ms per_second(requests) +``` + +### Migration Notes + +- No breaking changes; existing `timechart` queries continue to work +- Users with custom timestamp fields can now use `timefield` instead of `rename` workaround +- Millisecond spans now work correctly without manual adjustments + +## Limitations + +- `per_*` functions work exclusively with the `timechart` command (due to implicit timestamp field dependency) +- Variable-length spans (month/quarter/year) require dynamic calculation which may have slight performance overhead + +## Related PRs + +| PR | Description | +|----|-------------| +| [#4464](https://github.com/opensearch-project/sql/pull/4464) | Add `per_second` function support for `timechart` command | +| [#4531](https://github.com/opensearch-project/sql/pull/4531) | Add `per_minute`, `per_hour`, `per_day` function support | +| [#4672](https://github.com/opensearch-project/sql/pull/4672) | Support millisecond span | +| [#4755](https://github.com/opensearch-project/sql/pull/4755) | Merge the implementation of `timechart` and `chart` | +| [#4784](https://github.com/opensearch-project/sql/pull/4784) | Specify timestamp field with `timefield` in timechart command | + +## References + +- [Issue #4350](https://github.com/opensearch-project/sql/issues/4350): PPL `per_*` aggregation function support +- [Issue #4550](https://github.com/opensearch-project/sql/issues/4550): Span millisecond incorrectly converted to microsecond +- [Issue #4576](https://github.com/opensearch-project/sql/issues/4576): timechart with option to specify timestamp column +- [PPL Commands Documentation](https://docs.opensearch.org/3.0/search-plugins/sql/ppl/functions/) + +## Related Feature Report + +- [Full feature documentation](../../../features/sql/ppl-timechart-command.md) diff --git a/docs/releases/v3.4.0/index.md b/docs/releases/v3.4.0/index.md index bba3660e..f1113449 100644 --- a/docs/releases/v3.4.0/index.md +++ b/docs/releases/v3.4.0/index.md @@ -134,6 +134,7 @@ ### SQL +- [PPL Timechart Functions](features/sql/ppl-timechart-functions.md) - Rate-based aggregation functions (per_second, per_minute, per_hour, per_day), millisecond span support, custom timefield option, merged timechart/chart implementation - [PPL Query Optimization](features/sql/ppl-query-optimization.md) - 33 enhancements including sort pushdown, aggregation optimization, distinct count approx, case-to-range queries, fillnull command, YAML explain format - [SQL/PPL Bugfixes](features/sql/sql-ppl-bugfixes.md) - 48 bug fixes including memory exhaustion fix, race condition fix, rex nested capture groups, filter pushdown improvements, and CVE-2025-48924 - [SQL CI/Tests](features/sql/sql-ci-tests.md) - CI/CD improvements including Gradle 9.2.0, JDK 25, BWC test splitting, query timeouts, and maven snapshots publishing