Clarify CAGG docs for real-time aggregation (#4041)

atovpeko · fabriziomello · billy-the-fish · web-flow · commit a2f32fcc10a9 · 2025-04-29T11:55:19.000+03:00
Co-authored-by: Fabrízio de Royes Mello &lt;fabriziomello@gmail.com&gt;
Co-authored-by: Iain Cox &lt;iain@timescale.com&gt;
diff --git a/_partials/_caggs-intro.md b/_partials/_caggs-intro.md
@@ -1,3 +1,5 @@
+import RealTimeAgg from 'versionContent/_partials/_real-time-aggregates.mdx';
+
 In modern applications, data usually grows very quickly. This means that aggregating 
 it into useful summaries can become very slow. $CLOUD_LONG continuous aggregates make
 aggregating data lightning fast, accurate, and easy. 
@@ -24,13 +26,12 @@ Because continuous aggregates are based on hypertables, you can query them in
 exactly the same way as your other tables, and enable [compression][compression]
 or [tiered storage][data-tiering] on them. You can even
 create
-[continuous aggregates on top of your continuous aggregates][hierarchical-caggs] - for an even more fine-tuned aggregation. 
+[continuous aggregates on top of your continuous aggregates][hierarchical-caggs]—for an even more fine-tuned aggregation. 
 
-By default, querying continuous aggregates provides you with real-time data.
-Pre-aggregated data from the materialized view is combined with recent data that
-hasn't been aggregated yet. This gives you up-to-date results on every query.
+[Real-time aggregation][real-time-aggregation] enables you to combine pre-aggregated data from the materialized view with the most recent raw data. This gives you up-to-date results on every query. <RealTimeAgg />
 
 [compression]: /use-timescale/:currentVersion:/compression/about-compression
 [data-tiering]: /use-timescale/:currentVersion:/data-tiering/
 [hypercore]: /use-timescale/:currentVersion:/hypercore/
 [hierarchical-caggs]: /use-timescale/:currentVersion:/continuous-aggregates/hierarchical-continuous-aggregates/
+[real-time-aggregation]: /use-timescale/:currentVersion:/continuous-aggregates/real-time-aggregates/
diff --git a/_partials/_caggs-types.md b/_partials/_caggs-types.md
@@ -1,5 +1,5 @@
 There are three main ways to make aggregation easier: materialized views,
-continuous aggregates, and real time aggregates.
+continuous aggregates, and real-time aggregates.
 
 [Materialized views][pg-materialized views] are a standard PostgreSQL function.
 They are used to cache the result of a complex query so that you can reuse it
@@ -14,9 +14,9 @@ intensive to maintain than materialized views. Continuous aggregates are based
 on hypertables, and you can query them in the same way as you do your other
 tables.
 
-[Real time aggregates][real-time-aggs] are a Timescale only feature. They are
+[Real-time aggregates][real-time-aggs] are a Timescale only feature. They are
 the same as continuous aggregates, but they add the most recent raw data to the
-previously aggregated data to provide accurate and up to date results, without
+previously aggregated data to provide accurate and up-to-date results, without
 needing to aggregate data as it is being written.
 
 [pg-materialized views]: https://www.postgresql.org/docs/current/rules-materializedviews.html
diff --git a/_partials/_real-time-aggregates.md b/_partials/_real-time-aggregates.md
@@ -0,0 +1 @@
+In $TIMESCALE_DB v2.13 and later, real-time aggregates are **DISABLED** by default. In earlier versions, real-time aggregates are **ENABLED** by default; when you create a continuous aggregate, queries to that view include the results from the most recent raw data.
diff --git a/api/continuous-aggregates/add_continuous_aggregate_policy.md b/api/continuous-aggregates/add_continuous_aggregate_policy.md
@@ -36,11 +36,10 @@ depending on the type of the time column of the hypertable:
     `INTEGER` type.
 
 <Highlight type="important">
-While setting `end_offset` to `NULL` is possible, it is not recommended. By
-default, querying a continuous aggregate returns data between `end_offset` and
-the current time. There is no need to set `end_offset` to `NULL`. To learn more
-about how continuous aggregates use real-time aggregation, see the
-[real-time aggregation section](/use-timescale/:currentVersion:/continuous-aggregates/real-time-aggregates/).
+
+While setting `end_offset` to `NULL` is possible, it is not recommended. To include the data between `end_offset` and
+the current time in queries, enable [real-time aggregation](/use-timescale/:currentVersion:/continuous-aggregates/real-time-aggregates/).
+
 </Highlight>
 
 ## Optional arguments
diff --git a/api/continuous-aggregates/create_materialized_view.md b/api/continuous-aggregates/create_materialized_view.md
@@ -52,7 +52,7 @@ For services running TimescaleDB v2.17.1 and greater, to dramatically decrease t
 of data written on a continuous aggregate in the presence of a small number of changes,
 reduce the i/o cost of refreshing a continuous aggregate, and generate fewer Write-Ahead
 Logs (WAL), set the`timescaledb.enable_merge_on_cagg_refresh`
-[configuration parameter][modify-parameters] to `TRUE`. This enables continuous aggregate
+configuration parameter to `TRUE`. This enables continuous aggregate
 refresh to use merge instead of deleting old materialized data and re-inserting.
 
 For more settings for continuous aggregates, see [timescaledb_information.continuous_aggregates][info-views].
diff --git a/use-timescale/continuous-aggregates/about-continuous-aggregates.md b/use-timescale/continuous-aggregates/about-continuous-aggregates.md
@@ -20,7 +20,7 @@ import CaggsTypes from "versionContent/_partials/_caggs-types.mdx";
 ## Continuous aggregates on continuous aggregates
 
 You can create a continuous aggregate on top of another continuous aggregate.
-This allows you to summarize data at different granularities. For example, you
+This allows you to summarize data at different granularity. For example, you
 might have a raw hypertable that contains second-by-second data. Create a
 continuous aggregate on the hypertable to calculate hourly data. To calculate
 daily data, create a continuous aggregate on top of your hourly continuous
@@ -31,7 +31,7 @@ For more information, see the documentation about
 
 ## Continuous aggregates with a `JOIN` clause
 
-Continuous aggregates supports the following JOIN features: 
+Continuous aggregates support the following JOIN features: 
 
 | Feature | TimescaleDB < 2.10.x | TimescaleDB <= 2.15.x | TimescaleDB >= 2.16.x| 
 |-|-|-|-|
@@ -44,12 +44,12 @@ Continuous aggregates supports the following JOIN features:
 |Any join conditions|&#10060;|&#10060;|&#9989;|
 
 
-JOINS in TimescaleDB must that meet the following conditions:
+JOINS in TimescaleDB must meet the following conditions:
 
-*   Only changes to the hypertable are tracked, they are updated in the
+*   Only the changes to the hypertable are tracked, and they are updated in the
     continuous aggregate when it is refreshed. Changes to standard
     PostgreSQL table are not tracked.
-*   You can use an `INNER`, `LEFT` and `LATERAL` joins, no other join type is supported.
+*   You can use an `INNER`, `LEFT`, and `LATERAL` joins; no other join type is supported.
 *   Joins on the materialized hypertable of a continuous aggregate are not supported.
 *   Hierarchical continuous aggregates can be created on top of a continuous
     aggregate with a `JOIN` clause, but cannot themselves have a `JOIN` clauses.
@@ -79,7 +79,7 @@ CREATE TABLE conditions (
 SELECT create_hypertable('conditions', by_range('time'));
 ```
 
-See the following `JOIN` examples on Continuous Aggregates:
+See the following `JOIN` examples on continuous aggregates:
 
 - `INNER JOIN` on a single equality condition, using the `ON` clause:
 
@@ -186,7 +186,7 @@ and `AVG`, and non-parallelizable aggregates, such as `RANK`.
 In TimescaleDB&nbsp;2.10.0 and later, the `FROM` clause supports `JOINS`, with
 some restrictions. For more information, see the [`JOIN` support section][caggs-joins].
 
-In older versions of Timescale, continuous aggregates only support
+In older versions of TimescaleDB, continuous aggregates only support
 [aggregate functions that can be parallelized by PostgreSQL][postgres-parallel-agg].
 You can work around this by aggregating the other parts of your query in the
 continuous aggregate, then
diff --git a/use-timescale/continuous-aggregates/hierarchical-continuous-aggregates.md b/use-timescale/continuous-aggregates/hierarchical-continuous-aggregates.md
@@ -4,6 +4,8 @@ excerpt: Running advanced real-time analytic workloads? Create continuous aggreg
 keywords: [continuous aggregates, hierarchical, create]
 ---
 
+import RealTimeAgg from 'versionContent/_partials/_real-time-aggregates.mdx';
+
 # Hierarchical continuous aggregates
 
 The more data you have, the more likely you are to run a more sophisticated analysis on it. When a simple one-level aggregation is not enough, $CLOUD_LONG lets you create continuous aggregates on top of other continuous aggregates. This way, you summarize data at different levels of granularity, while still saving resources with precomputing. 
@@ -28,8 +30,7 @@ For more information, see the instructions for
 
 ## Use real-time aggregation with hierarchical continuous aggregates
 
-In TimescaleDB v2.13 and later, real-time aggregates are *DISABLED* by default. 
-In TimescaleDB v1.7 to v2.12, real-time aggregates are *ENABLED* by default. 
+<RealTimeAgg />
 
 Real-time aggregates always return up-to-date data in response to queries. They accomplish this by
 joining the materialized data in the continuous aggregate with unmaterialized
diff --git a/use-timescale/continuous-aggregates/real-time-aggregates.md b/use-timescale/continuous-aggregates/real-time-aggregates.md
@@ -6,6 +6,7 @@ keywords: [continuous aggregates, real-time aggregates]
 ---
 
 import CaggsRealTimeHistoricalDataRefreshes from 'versionContent/_partials/_caggs-real-time-historical-data-refreshes.mdx';
+import RealTimeAgg from 'versionContent/_partials/_real-time-aggregates.mdx';
 
 # Real-time aggregates
 
@@ -14,14 +15,12 @@ Rapidly growing data means you need more control over what to aggregate and how
 By default, continuous aggregates do not include the most recent data chunk from the
 underlying hypertable. Real-time aggregates, however, use the aggregated data **and** add the
 most recent raw data to it. This provides accurate and up-to-date results, without
-needing to aggregate data as it is being written. 
+needing to aggregate data as it is being written.
 
-In Timescale&nbsp;2.13 and later real-time aggregates are *DISABLED* by default. In Timescale versions 1.7 to 2.12, real-time aggregates are enabled by default; when you create a continuous
-aggregate view, queries to that view include the most recent data, even if
-it has not yet been aggregated. 
+<RealTimeAgg />
 
 For more detail on the comparison between continuous and real-time aggregates,
-see our [real time aggregate blog post][blog-rtaggs].
+see our [real-time aggregate blog post][blog-rtaggs].
 
 ## Use real-time aggregates
 
@@ -30,16 +29,16 @@ You can enable and disable real-time aggregation by setting the
 
 <Procedure>
 
-1.  For an existing table, at the `psql` prompt, disable real-time aggregation:
+1.  Enable real-time aggregation for an existing continuous aggregate:
 
     ```sql
-    ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = true);
+    ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = false);
     ```
 
-1.  Re-enable real-time aggregation:
+1.  Disable real-time aggregation:
 
     ```sql
-    ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = false);
+    ALTER MATERIALIZED VIEW table_name set (timescaledb.materialized_only = true);
     ```
 
 </Procedure>
diff --git a/use-timescale/continuous-aggregates/refresh-policies.md b/use-timescale/continuous-aggregates/refresh-policies.md
@@ -16,7 +16,7 @@ you can also refresh it manually.
 Continuous aggregates require a policy for automatic refreshing. You can adjust
 this to suit different use cases. For example, you can have the continuous
 aggregate and the hypertable stay in sync, even when data is removed from the
-hypertable, or you could keep source data in the continuous aggregate even after
+hypertable. Alternatively, you could keep source data in the continuous aggregate even after
 it is removed from the hypertable.
 
 You can change the way your continuous aggregate is refreshed by calling 
@@ -30,15 +30,15 @@ Among others, `add_continuous_aggregate_policy` takes the following arguments:
 *   `schedule_interval`: the refresh interval in minutes or hours. Defaults to
     24 hours.
 
-If you set the `start_offset` or `end_offset` to `NULL`, the range is open-ended
-and extends to the beginning or end of time. 
+Note the following:
 
-If you set `end_offset` within the current time bucket, this bucket is excluded. This is done for the following reasons:
+- If you set the `start_offset` or `end_offset` to `NULL`, the range is open-ended and extends to the beginning or end of time. 
+- If you set `end_offset` within the current time bucket, this bucket is excluded from materialization. This is done for the following reasons:
 
-- The current bucket is incomplete and can't be refreshed. 
-- The current bucket gets lots of writes in the time-stamp order and its aggregate becomes outdated very quickly. Excluding it improves performance. 
+  - The current bucket is incomplete and can't be refreshed. 
+  - The current bucket gets a lot of writes in the time-stamp order, and its aggregate becomes outdated very quickly. Excluding it improves performance. 
 
-To include the current time bucket, enable [real-time aggregation][future-watermark]. In Timescale&nbsp;2.13 and later, it is disabled by default.
+  To include the latest raw data in queries, enable [real-time aggregation][future-watermark]. 
 
 See the [API reference][api-reference] for the full list of required and optional arguments and use examples.
 
@@ -117,10 +117,10 @@ The `refresh` command takes three arguments:
 *   The timestamp of the beginning of the refresh window
 *   The timestamp of the end of the refresh window
 
-Only buckets that are wholly within the range specified are refreshed. For
+Only buckets that are wholly within the specified range are refreshed. For
 example, if you specify `2021-05-01', '2021-06-01` the only buckets that are
 refreshed are those up to but not including 2021-06-01. It is possible to
-specify NULL in a manual refresh to get an open-ended range, but we do not
+specify `NULL` in a manual refresh to get an open-ended range, but we do not
 recommend using it, because you could inadvertently materialize a large amount
 of data, slow down your performance, and have unintended consequences on other
 policies like data retention.
@@ -137,13 +137,7 @@ policies like data retention.
 
 </Procedure>
 
-Avoid refreshing time intervals that are likely to have a lot of writes. In
-general, this means you should never refresh the most recent time bucket.
-Because the of constant change in the underlying data, they are unlikely to
-produce accurate aggregates. Additionally, refreshing this data slows down the
-ingest rate of the hypertable due to write amplification. If you want to include
-the latest bucket in your queries,
-use [real-time aggregation][real-time-aggregates] instead.
+Follow the logic used by automated refresh policies and avoid refreshing time buckets that are likely to have a lot of writes. This means that you should generally not refresh the latest incomplete time bucket. To include the latest raw data in your queries, use [real-time aggregation][real-time-aggregates] instead.
 
 [cagg-drop-data]: /use-timescale/:currentVersion:/continuous-aggregates/drop-data
 [future-watermark]: /use-timescale/:currentVersion:/continuous-aggregates/troubleshooting/#continuous-aggregate-watermark-is-in-the-future
diff --git a/use-timescale/page-index/page-index.js b/use-timescale/page-index/page-index.js
@@ -128,7 +128,7 @@ module.exports = [
             excerpt: "Manage materialized hypertables in continuous aggregates",
           },
           {
-            title: "Real time aggregates",
+            title: "Real-time aggregates",
             href: "real-time-aggregates",
             excerpt: "Manage real time aggregates in continuous aggregates",
           },

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+In $TIMESCALE_DB v2.13 and later, real-time aggregates are DISABLED by default. In earlier versions, real-time aggregates are ENABLED by default; when you create a continuous aggregate, queries to that view include the results from the most recent raw data.`
Original file line number	Diff line number	Diff line change
`@@ -128,7 +128,7 @@ module.exports = [`
`128`	`128`	`excerpt: "Manage materialized hypertables in continuous aggregates",`
`129`	`129`	`},`
`130`	`130`	`{`
`131`		`- title: "Real time aggregates",`
	`131`	`+ title: "Real-time aggregates",`
`132`	`132`	`href: "real-time-aggregates",`
`133`	`133`	`excerpt: "Manage real time aggregates in continuous aggregates",`
`134`	`134`	`},`