Skip to content

Commit a2f32fc

Browse files
atovpekofabriziomellobilly-the-fish
authored
Clarify CAGG docs for real-time aggregation (#4041)
Co-authored-by: Fabrízio de Royes Mello <[email protected]> Co-authored-by: Iain Cox <[email protected]>
1 parent 3a3b093 commit a2f32fc

10 files changed

+43
-48
lines changed

_partials/_caggs-intro.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
import RealTimeAgg from 'versionContent/_partials/_real-time-aggregates.mdx';
2+
13
In modern applications, data usually grows very quickly. This means that aggregating
24
it into useful summaries can become very slow. $CLOUD_LONG continuous aggregates make
35
aggregating data lightning fast, accurate, and easy.
@@ -24,13 +26,12 @@ Because continuous aggregates are based on hypertables, you can query them in
2426
exactly the same way as your other tables, and enable [compression][compression]
2527
or [tiered storage][data-tiering] on them. You can even
2628
create
27-
[continuous aggregates on top of your continuous aggregates][hierarchical-caggs] - for an even more fine-tuned aggregation.
29+
[continuous aggregates on top of your continuous aggregates][hierarchical-caggs]for an even more fine-tuned aggregation.
2830

29-
By default, querying continuous aggregates provides you with real-time data.
30-
Pre-aggregated data from the materialized view is combined with recent data that
31-
hasn't been aggregated yet. This gives you up-to-date results on every query.
31+
[Real-time aggregation][real-time-aggregation] enables you to combine pre-aggregated data from the materialized view with the most recent raw data. This gives you up-to-date results on every query. <RealTimeAgg />
3232

3333
[compression]: /use-timescale/:currentVersion:/compression/about-compression
3434
[data-tiering]: /use-timescale/:currentVersion:/data-tiering/
3535
[hypercore]: /use-timescale/:currentVersion:/hypercore/
3636
[hierarchical-caggs]: /use-timescale/:currentVersion:/continuous-aggregates/hierarchical-continuous-aggregates/
37+
[real-time-aggregation]: /use-timescale/:currentVersion:/continuous-aggregates/real-time-aggregates/

_partials/_caggs-types.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
There are three main ways to make aggregation easier: materialized views,
2-
continuous aggregates, and real time aggregates.
2+
continuous aggregates, and real-time aggregates.
33

44
[Materialized views][pg-materialized views] are a standard PostgreSQL function.
55
They are used to cache the result of a complex query so that you can reuse it
@@ -14,9 +14,9 @@ intensive to maintain than materialized views. Continuous aggregates are based
1414
on hypertables, and you can query them in the same way as you do your other
1515
tables.
1616

17-
[Real time aggregates][real-time-aggs] are a Timescale only feature. They are
17+
[Real-time aggregates][real-time-aggs] are a Timescale only feature. They are
1818
the same as continuous aggregates, but they add the most recent raw data to the
19-
previously aggregated data to provide accurate and up to date results, without
19+
previously aggregated data to provide accurate and up-to-date results, without
2020
needing to aggregate data as it is being written.
2121

2222
[pg-materialized views]: https://www.postgresql.org/docs/current/rules-materializedviews.html

_partials/_real-time-aggregates.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
In $TIMESCALE_DB v2.13 and later, real-time aggregates are **DISABLED** by default. In earlier versions, real-time aggregates are **ENABLED** by default; when you create a continuous aggregate, queries to that view include the results from the most recent raw data.

api/continuous-aggregates/add_continuous_aggregate_policy.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,10 @@ depending on the type of the time column of the hypertable:
3636
`INTEGER` type.
3737

3838
<Highlight type="important">
39-
While setting `end_offset` to `NULL` is possible, it is not recommended. By
40-
default, querying a continuous aggregate returns data between `end_offset` and
41-
the current time. There is no need to set `end_offset` to `NULL`. To learn more
42-
about how continuous aggregates use real-time aggregation, see the
43-
[real-time aggregation section](/use-timescale/:currentVersion:/continuous-aggregates/real-time-aggregates/).
39+
40+
While setting `end_offset` to `NULL` is possible, it is not recommended. To include the data between `end_offset` and
41+
the current time in queries, enable [real-time aggregation](/use-timescale/:currentVersion:/continuous-aggregates/real-time-aggregates/).
42+
4443
</Highlight>
4544

4645
## Optional arguments

api/continuous-aggregates/create_materialized_view.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ For services running TimescaleDB v2.17.1 and greater, to dramatically decrease t
5252
of data written on a continuous aggregate in the presence of a small number of changes,
5353
reduce the i/o cost of refreshing a continuous aggregate, and generate fewer Write-Ahead
5454
Logs (WAL), set the`timescaledb.enable_merge_on_cagg_refresh`
55-
[configuration parameter][modify-parameters] to `TRUE`. This enables continuous aggregate
55+
configuration parameter to `TRUE`. This enables continuous aggregate
5656
refresh to use merge instead of deleting old materialized data and re-inserting.
5757

5858
For more settings for continuous aggregates, see [timescaledb_information.continuous_aggregates][info-views].

use-timescale/continuous-aggregates/about-continuous-aggregates.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ import CaggsTypes from "versionContent/_partials/_caggs-types.mdx";
2020
## Continuous aggregates on continuous aggregates
2121

2222
You can create a continuous aggregate on top of another continuous aggregate.
23-
This allows you to summarize data at different granularities. For example, you
23+
This allows you to summarize data at different granularity. For example, you
2424
might have a raw hypertable that contains second-by-second data. Create a
2525
continuous aggregate on the hypertable to calculate hourly data. To calculate
2626
daily data, create a continuous aggregate on top of your hourly continuous
@@ -31,7 +31,7 @@ For more information, see the documentation about
3131

3232
## Continuous aggregates with a `JOIN` clause
3333

34-
Continuous aggregates supports the following JOIN features:
34+
Continuous aggregates support the following JOIN features:
3535

3636
| Feature | TimescaleDB < 2.10.x | TimescaleDB <= 2.15.x | TimescaleDB >= 2.16.x|
3737
|-|-|-|-|
@@ -44,12 +44,12 @@ Continuous aggregates supports the following JOIN features:
4444
|Any join conditions|&#10060;|&#10060;|&#9989;|
4545

4646

47-
JOINS in TimescaleDB must that meet the following conditions:
47+
JOINS in TimescaleDB must meet the following conditions:
4848

49-
* Only changes to the hypertable are tracked, they are updated in the
49+
* Only the changes to the hypertable are tracked, and they are updated in the
5050
continuous aggregate when it is refreshed. Changes to standard
5151
PostgreSQL table are not tracked.
52-
* You can use an `INNER`, `LEFT` and `LATERAL` joins, no other join type is supported.
52+
* You can use an `INNER`, `LEFT`, and `LATERAL` joins; no other join type is supported.
5353
* Joins on the materialized hypertable of a continuous aggregate are not supported.
5454
* Hierarchical continuous aggregates can be created on top of a continuous
5555
aggregate with a `JOIN` clause, but cannot themselves have a `JOIN` clauses.
@@ -79,7 +79,7 @@ CREATE TABLE conditions (
7979
SELECT create_hypertable('conditions', by_range('time'));
8080
```
8181

82-
See the following `JOIN` examples on Continuous Aggregates:
82+
See the following `JOIN` examples on continuous aggregates:
8383

8484
- `INNER JOIN` on a single equality condition, using the `ON` clause:
8585

@@ -186,7 +186,7 @@ and `AVG`, and non-parallelizable aggregates, such as `RANK`.
186186
In TimescaleDB&nbsp;2.10.0 and later, the `FROM` clause supports `JOINS`, with
187187
some restrictions. For more information, see the [`JOIN` support section][caggs-joins].
188188

189-
In older versions of Timescale, continuous aggregates only support
189+
In older versions of TimescaleDB, continuous aggregates only support
190190
[aggregate functions that can be parallelized by PostgreSQL][postgres-parallel-agg].
191191
You can work around this by aggregating the other parts of your query in the
192192
continuous aggregate, then

use-timescale/continuous-aggregates/hierarchical-continuous-aggregates.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ excerpt: Running advanced real-time analytic workloads? Create continuous aggreg
44
keywords: [continuous aggregates, hierarchical, create]
55
---
66

7+
import RealTimeAgg from 'versionContent/_partials/_real-time-aggregates.mdx';
8+
79
# Hierarchical continuous aggregates
810

911
The more data you have, the more likely you are to run a more sophisticated analysis on it. When a simple one-level aggregation is not enough, $CLOUD_LONG lets you create continuous aggregates on top of other continuous aggregates. This way, you summarize data at different levels of granularity, while still saving resources with precomputing.
@@ -28,8 +30,7 @@ For more information, see the instructions for
2830

2931
## Use real-time aggregation with hierarchical continuous aggregates
3032

31-
In TimescaleDB v2.13 and later, real-time aggregates are *DISABLED* by default.
32-
In TimescaleDB v1.7 to v2.12, real-time aggregates are *ENABLED* by default.
33+
<RealTimeAgg />
3334

3435
Real-time aggregates always return up-to-date data in response to queries. They accomplish this by
3536
joining the materialized data in the continuous aggregate with unmaterialized

use-timescale/continuous-aggregates/real-time-aggregates.md

+8-9
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ keywords: [continuous aggregates, real-time aggregates]
66
---
77

88
import CaggsRealTimeHistoricalDataRefreshes from 'versionContent/_partials/_caggs-real-time-historical-data-refreshes.mdx';
9+
import RealTimeAgg from 'versionContent/_partials/_real-time-aggregates.mdx';
910

1011
# Real-time aggregates
1112

@@ -14,14 +15,12 @@ Rapidly growing data means you need more control over what to aggregate and how
1415
By default, continuous aggregates do not include the most recent data chunk from the
1516
underlying hypertable. Real-time aggregates, however, use the aggregated data **and** add the
1617
most recent raw data to it. This provides accurate and up-to-date results, without
17-
needing to aggregate data as it is being written.
18+
needing to aggregate data as it is being written.
1819

19-
In Timescale&nbsp;2.13 and later real-time aggregates are *DISABLED* by default. In Timescale versions 1.7 to 2.12, real-time aggregates are enabled by default; when you create a continuous
20-
aggregate view, queries to that view include the most recent data, even if
21-
it has not yet been aggregated.
20+
<RealTimeAgg />
2221

2322
For more detail on the comparison between continuous and real-time aggregates,
24-
see our [real time aggregate blog post][blog-rtaggs].
23+
see our [real-time aggregate blog post][blog-rtaggs].
2524

2625
## Use real-time aggregates
2726

@@ -30,16 +29,16 @@ You can enable and disable real-time aggregation by setting the
3029

3130
<Procedure>
3231

33-
1. For an existing table, at the `psql` prompt, disable real-time aggregation:
32+
1. Enable real-time aggregation for an existing continuous aggregate:
3433

3534
```sql
36-
ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = true);
35+
ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = false);
3736
```
3837

39-
1. Re-enable real-time aggregation:
38+
1. Disable real-time aggregation:
4039

4140
```sql
42-
ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = false);
41+
ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = true);
4342
```
4443

4544
</Procedure>

use-timescale/continuous-aggregates/refresh-policies.md

+10-16
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ you can also refresh it manually.
1616
Continuous aggregates require a policy for automatic refreshing. You can adjust
1717
this to suit different use cases. For example, you can have the continuous
1818
aggregate and the hypertable stay in sync, even when data is removed from the
19-
hypertable, or you could keep source data in the continuous aggregate even after
19+
hypertable. Alternatively, you could keep source data in the continuous aggregate even after
2020
it is removed from the hypertable.
2121

2222
You can change the way your continuous aggregate is refreshed by calling
@@ -30,15 +30,15 @@ Among others, `add_continuous_aggregate_policy` takes the following arguments:
3030
* `schedule_interval`: the refresh interval in minutes or hours. Defaults to
3131
24 hours.
3232

33-
If you set the `start_offset` or `end_offset` to `NULL`, the range is open-ended
34-
and extends to the beginning or end of time.
33+
Note the following:
3534

36-
If you set `end_offset` within the current time bucket, this bucket is excluded. This is done for the following reasons:
35+
- If you set the `start_offset` or `end_offset` to `NULL`, the range is open-ended and extends to the beginning or end of time.
36+
- If you set `end_offset` within the current time bucket, this bucket is excluded from materialization. This is done for the following reasons:
3737

38-
- The current bucket is incomplete and can't be refreshed.
39-
- The current bucket gets lots of writes in the time-stamp order and its aggregate becomes outdated very quickly. Excluding it improves performance.
38+
- The current bucket is incomplete and can't be refreshed.
39+
- The current bucket gets a lot of writes in the time-stamp order, and its aggregate becomes outdated very quickly. Excluding it improves performance.
4040

41-
To include the current time bucket, enable [real-time aggregation][future-watermark]. In Timescale&nbsp;2.13 and later, it is disabled by default.
41+
To include the latest raw data in queries, enable [real-time aggregation][future-watermark].
4242

4343
See the [API reference][api-reference] for the full list of required and optional arguments and use examples.
4444

@@ -117,10 +117,10 @@ The `refresh` command takes three arguments:
117117
* The timestamp of the beginning of the refresh window
118118
* The timestamp of the end of the refresh window
119119
120-
Only buckets that are wholly within the range specified are refreshed. For
120+
Only buckets that are wholly within the specified range are refreshed. For
121121
example, if you specify `2021-05-01', '2021-06-01` the only buckets that are
122122
refreshed are those up to but not including 2021-06-01. It is possible to
123-
specify NULL in a manual refresh to get an open-ended range, but we do not
123+
specify `NULL` in a manual refresh to get an open-ended range, but we do not
124124
recommend using it, because you could inadvertently materialize a large amount
125125
of data, slow down your performance, and have unintended consequences on other
126126
policies like data retention.
@@ -137,13 +137,7 @@ policies like data retention.
137137
138138
</Procedure>
139139
140-
Avoid refreshing time intervals that are likely to have a lot of writes. In
141-
general, this means you should never refresh the most recent time bucket.
142-
Because the of constant change in the underlying data, they are unlikely to
143-
produce accurate aggregates. Additionally, refreshing this data slows down the
144-
ingest rate of the hypertable due to write amplification. If you want to include
145-
the latest bucket in your queries,
146-
use [real-time aggregation][real-time-aggregates] instead.
140+
Follow the logic used by automated refresh policies and avoid refreshing time buckets that are likely to have a lot of writes. This means that you should generally not refresh the latest incomplete time bucket. To include the latest raw data in your queries, use [real-time aggregation][real-time-aggregates] instead.
147141
148142
[cagg-drop-data]: /use-timescale/:currentVersion:/continuous-aggregates/drop-data
149143
[future-watermark]: /use-timescale/:currentVersion:/continuous-aggregates/troubleshooting/#continuous-aggregate-watermark-is-in-the-future

use-timescale/page-index/page-index.js

+1-1
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ module.exports = [
128128
excerpt: "Manage materialized hypertables in continuous aggregates",
129129
},
130130
{
131-
title: "Real time aggregates",
131+
title: "Real-time aggregates",
132132
href: "real-time-aggregates",
133133
excerpt: "Manage real time aggregates in continuous aggregates",
134134
},

0 commit comments

Comments
 (0)