[ENH]: Make all functions incremental #5893

tanujnay112 · 2025-11-21T09:50:22Z

Description of changes

Summarize the changes made by this PR.

This diff changes all functions (statistics, record_counter) to be incremental. Every run they read current data from the output and use incoming log data to produce updates to the output collection. This also adds total_count as a statistic record.

There was a bug where the AttachedFunctionOrchestrator didn't create a RecordSegment reader before this change that is fixed in this change.

Improvements & Bug fixes
- ...
New functionality
- ...

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_

github-actions · 2025-11-21T09:50:38Z

tanujnay112 · 2025-11-21T09:50:40Z

[ENH]: Make all functions incremental #5893 👈 (View in Graphite)
[ENH]: Execute task with no backfill or incremental #5867
[ENH]: Modified AttachFunction to do 2PC on a new is_ready column #5872 : 1 other dependent PR (#5884 )
[CHORE]: Remove next_run from attached_functions #5871
[CHORE]: Disable S3heap service and remove nonce-related logic #5866
[ENH]: Refactor compactor into three chained orchestrators #5831
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

propel-code-bot · 2025-11-21T09:53:04Z

Introduce incremental attached-function execution (statistics & record-counter)

This PR refactors the statistics and record-counter attached functions so they operate incrementally instead of writing a full refresh on every run. Existing output is loaded via RecordSegmentReader, deltas from new logs are merged, counts are incremented/decremented, and stale statistics are removed. The change ripples through the executor/operator/orchestrator stack, adds total_count summary statistics, and updates extensive test coverage.

Key Changes

• Redesigned trait StatisticsFunction (new methods observe_insert, observe_delete, is_changed, is_empty, as_any_mut)
• Extended CounterFunction with change-tracking and factory method with_initial_value
• Re-implemented StatisticsFunctionExecutor to load existing stats (load_existing_statistics), apply deltas, emit deletes for zero counts, and add summary key summary::s:total_count
• Added incremental logic to CountAttachedFunction including reading existing counts from output
• Updated ExecuteAttachedFunctionOperator to pass optional RecordSegmentReader for stateful executors and to support rebuild vs incremental modes
• Fixed AttachedFunctionOrchestrator bug: now creates output RecordSegmentReader, wires input segment reader, and propagates rebuild flag
• Removed test-only helper MaterializeLogsResult::from_logs_for_test; replaced tests with real materialize_logs flow
• Large test suite updates (+200 lines) covering inserts, deletes, updates, rebuild paths, and new summary statistics

Affected Areas

• rust/worker/src/execution/functions/statistics.rs
• rust/worker/src/execution/operators/execute_task.rs
• rust/worker/src/execution/orchestration/attached_function_orchestrator.rs
• rust/segment/src/types.rs (test helper removal)
• unit/integration tests

This summary was automatically generated by @propel-code-bot

rust/worker/src/execution/functions/statistics.rs

propel-code-bot · 2025-11-21T09:58:57Z

rust/worker/src/execution/functions/statistics.rs

+            let key = match metadata.get("key") {
+                Some(MetadataValue::Str(k)) => k.clone(),
+                _ => continue,
+            };
+
+            let value_type = match metadata.get("type") {
+                Some(MetadataValue::Str(t)) => t.as_str(),
+                _ => continue,
+            };
+
+            let value_str = match metadata.get("value") {
+                Some(MetadataValue::Str(v)) => v.as_str(),
+                _ => continue,
+            };
+
+            let count = match metadata.get("count") {
+                Some(MetadataValue::Int(c)) => *c,
+                _ => continue,
+            };
+
+            // Reconstruct the StatisticsValue from type and value
+            let stats_value = match value_type {
+                "bool" => match value_str {
+                    "true" => StatisticsValue::Bool(true),
+                    "false" => StatisticsValue::Bool(false),
+                    _ => continue,
+                },
+                "int" => match value_str.parse::<i64>() {
+                    Ok(i) => StatisticsValue::Int(i),
+                    _ => continue,
+                },
+                "float" => match value_str.parse::<f64>() {
+                    Ok(f) => StatisticsValue::Float(f),
+                    _ => continue,
+                },
+                "str" => StatisticsValue::Str(value_str.to_string()),
+                "sparse" => match value_str.parse::<u32>() {
+                    Ok(index) => StatisticsValue::SparseVector(index),
+                    _ => continue,
+                },
+                _ => continue,
+            };


[BestPractice]

In load_existing_statistics, parsing errors for records read from the output segment are handled by silently continueing. This could hide issues with data corruption in the output segment and lead to incorrect statistics being calculated. Consider adding logging (e.g., tracing::warn!) when a record is skipped due to a parsing failure. This would improve observability into the health of the system.

Context for Agents

In `load_existing_statistics`, parsing errors for records read from the output segment are handled by silently `continue`ing. This could hide issues with data corruption in the output segment and lead to incorrect statistics being calculated. Consider adding logging (e.g., `tracing::warn!`) when a record is skipped due to a parsing failure. This would improve observability into the health of the system. File: rust/worker/src/execution/functions/statistics.rs Line: 261

propel-code-bot · 2025-11-21T09:58:58Z

rust/worker/src/execution/operators/execute_task.rs

+    async fn get_existing_count(output_reader: Option<&RecordSegmentReader<'_>>) -> i64 {
+        let Some(reader) = output_reader else {
+            return 0;
+        };
+
+        // Try to get the existing record with the function output ID
+        let offset_id = match reader
+            .get_offset_id_for_user_id(COUNT_FUNCTION_OUTPUT_ID)
+            .await
+        {
+            Ok(Some(offset_id)) => offset_id,
+            _ => return 0,
+        };
+
+        // Get the data record for this offset id
+        let data_record = match reader.get_data_for_offset_id(offset_id).await {
+            Ok(Some(data_record)) => data_record,
+            _ => return 0,
+        };
+
+        // Extract total_count from metadata
+        if let Some(metadata) = &data_record.metadata {
+            if let Some(chroma_types::MetadataValue::Int(count)) = metadata.get(COUNT_METADATA_KEY)
+            {
+                return *count;
+            }
+        }
+
+        0
+    }


[BestPractice]

The get_existing_count function currently swallows errors from the RecordSegmentReader and defaults to returning 0. For instance, if reader.get_offset_id_for_user_id fails due to a transient I/O issue, the function will return 0 instead of propagating the error. This could cause the total count to be incorrectly reset.

It would be more robust to change the function signature to return a Result<i64, Box<dyn ChromaError>> and propagate any errors encountered during reading. The caller in execute can then handle the error appropriately using ?.

Context for Agents

The `get_existing_count` function currently swallows errors from the `RecordSegmentReader` and defaults to returning `0`. For instance, if `reader.get_offset_id_for_user_id` fails due to a transient I/O issue, the function will return `0` instead of propagating the error. This could cause the total count to be incorrectly reset. It would be more robust to change the function signature to return a `Result<i64, Box<dyn ChromaError>>` and propagate any errors encountered during reading. The caller in `execute` can then handle the error appropriately using `?`. File: rust/worker/src/execution/operators/execute_task.rs Line: 79

rescrv

I wonder what happens if there's a delete of something that doesn't exist. Is it taken care of by hydrateLogRecords or something else?

rescrv · 2025-11-22T19:33:06Z

rust/segment/src/types.rs

 struct MaterializedLogRecord {
    // False if the record exists only in the log, otherwise true.
-    offset_id_exists_in_segment: bool,
+    pub offset_id_exists_in_segment: bool,


Maybe make a getter method, unless you want this to change at any time.

oops, residue from when i was testing something

rescrv · 2025-11-22T19:34:54Z

rust/worker/src/execution/functions/statistics.rs

 pub trait StatisticsFunction: std::fmt::Debug + Send {
-    fn observe(&mut self, hydrated_record: &HydratedMaterializedLogRecord<'_, '_>);
+    fn observe_insert(&mut self, hydrated_record: &HydratedMaterializedLogRecord<'_, '_>);
+    fn observe_delete(&mut self, hydrated_record: &HydratedMaterializedLogRecord<'_, '_>);


It feels odd to pass in a variant. The old form recognized that HMLR would be variant. This has potential mismatch.

Discussed offline, leaving a TODO

rescrv · 2025-11-22T19:35:41Z

rust/worker/src/execution/functions/statistics.rs

                ),
            ),
+            (
+                "summary::s:total_count".to_string(),


What's the ::s for?

it's from the id naming scheme that you had made in this file {key}::{type_id}:{value}

Oh. I didn't make that connection having not looked at that code recently. A comment would be nice, but is OK to do in follow-up.

tanujnay112 · 2025-11-22T21:57:54Z

Materialize_logs before this takes care of that

tanujnay112 mentioned this pull request Nov 21, 2025

[ENH]: Execute task with no backfill or incremental #5867

Merged

1 task

tanujnay112 marked this pull request as ready for review November 21, 2025 09:52

tanujnay112 force-pushed the make_functions_great_again branch 2 times, most recently from 09a23f0 to 2bfcc0b Compare November 21, 2025 09:54

tanujnay112 force-pushed the incremental_fn branch from 30a452d to c7698f4 Compare November 21, 2025 09:54

propel-code-bot bot reviewed Nov 21, 2025

View reviewed changes

rust/worker/src/execution/functions/statistics.rs Outdated Show resolved Hide resolved

propel-code-bot bot reviewed Nov 21, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

tanujnay112 force-pushed the incremental_fn branch from c7698f4 to 1780f2e Compare November 21, 2025 10:04

This comment has been minimized.

Sign in to view

tanujnay112 changed the base branch from make_functions_great_again to graphite-base/5893 November 21, 2025 22:19

tanujnay112 force-pushed the graphite-base/5893 branch from 2bfcc0b to 3fec52a Compare November 22, 2025 11:51

tanujnay112 force-pushed the incremental_fn branch from 1780f2e to f02e24a Compare November 22, 2025 11:51

tanujnay112 changed the base branch from graphite-base/5893 to main November 22, 2025 11:51

tanujnay112 force-pushed the incremental_fn branch from f02e24a to 65d592d Compare November 22, 2025 11:52

This comment has been minimized.

Sign in to view

tanujnay112 force-pushed the incremental_fn branch from 65d592d to fef8b53 Compare November 22, 2025 12:05

tanujnay112 requested a review from rescrv November 22, 2025 12:06

rescrv reviewed Nov 22, 2025

View reviewed changes

[ENH]: Make all functions incremental

0e07a04

tanujnay112 force-pushed the incremental_fn branch from fef8b53 to 0e07a04 Compare November 22, 2025 22:10

rescrv approved these changes Nov 23, 2025

View reviewed changes

tanujnay112 merged commit 577a312 into main Nov 23, 2025
62 checks passed

[ENH]: Make all functions incremental #5893

[ENH]: Make all functions incremental #5893

Uh oh!

Conversation

tanujnay112 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

github-actions bot commented Nov 21, 2025

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

tanujnay112 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

propel-code-bot bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

propel-code-bot bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rescrv left a comment

Choose a reason for hiding this comment

Uh oh!

rescrv Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

tanujnay112 Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

rescrv Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

tanujnay112 Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

rescrv Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

tanujnay112 Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

rescrv Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

tanujnay112 commented Nov 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tanujnay112 commented Nov 21, 2025 •

edited

Loading

tanujnay112 commented Nov 21, 2025 •

edited

Loading

propel-code-bot bot commented Nov 21, 2025 •

edited

Loading