Skip to content

Conversation

@tanujnay112
Copy link
Contributor

@tanujnay112 tanujnay112 commented Nov 14, 2025

Description of changes

Summarize the changes made by this PR.

This change introduces an AttachedFunctionOrchestrator that does the following chain of operators

GetAttachedFunction -> GetCollectionAndSegments for output collection -> ExecuteAttachedFunction ->  MaterializeLogs

It edits the RegisterOrchestrator to spawn a FinishAttachedTask operators that flushes compaction for the input + output collections and flushes updated function data.

The compact method in compact.rs has been edited to launch an AttachedFunctionOrchestrator​ in parallel on the results of the initial LogFetchOrchestrator​. This orchestrator returns a chunk of MaterializedLog records that get applied via another instance of ApplyLogOrchestrator​.

The above runs in parallel to the normal compaction workflow which simply runs an ApplyLogsOrchestrator​ on the results of the initial LogFetchOrchestrator​.

The above two threads return a CollectionRegisterInfo​. The function-related thread also returns a FunctionContext​. The two threads are joined and each of these structures are passed onto the RegisterOrchestrator for completion.

CompactionContext.collection_info​ has been replaced by CompactionContext.input_collection_info, CompactionContext.output_collection_info

to reflect the fact that "compactions" can be pulling data from one collection and compacting to another. ApplyLogsOrchestrator always applies given logs to the collection specified by CompactionContext.output_collection_info

Hence, we take care to set this field to the appropriate collection before calling run_apply_logs​ in each thread.

  • Improvements & Bug fixes
    • ...
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@blacksmith-sh

This comment has been minimized.

@tanujnay112 tanujnay112 changed the base branch from refactor_compactor to graphite-base/5867 November 16, 2025 23:26
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 23911b3 to 41adcf1 Compare November 16, 2025 23:26
@tanujnay112 tanujnay112 changed the base branch from graphite-base/5867 to remove_next_run November 16, 2025 23:26
@blacksmith-sh

This comment has been minimized.

@tanujnay112 tanujnay112 changed the base branch from remove_next_run to graphite-base/5867 November 17, 2025 00:47
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 41adcf1 to e9fd0f2 Compare November 17, 2025 00:48
@tanujnay112 tanujnay112 changed the base branch from graphite-base/5867 to attach_fn2 November 17, 2025 00:48
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from e9fd0f2 to 64a563e Compare November 17, 2025 03:59
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 64a563e to 9bd6743 Compare November 17, 2025 04:28
@tanujnay112 tanujnay112 force-pushed the attach_fn2 branch 2 times, most recently from cf18054 to 5b066ac Compare November 17, 2025 04:31
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch 2 times, most recently from 3ea4c7a to 56d4987 Compare November 17, 2025 05:01
@tanujnay112 tanujnay112 force-pushed the attach_fn2 branch 2 times, most recently from 105f707 to d9e7a48 Compare November 17, 2025 05:23
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 56d4987 to 243f3f1 Compare November 17, 2025 05:23
@blacksmith-sh

This comment has been minimized.

@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch 2 times, most recently from 8406711 to 205dbe0 Compare November 17, 2025 06:41
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 967ebb5 to 3646f99 Compare November 21, 2025 08:54
@tanujnay112 tanujnay112 changed the base branch from attach_fn2 to graphite-base/5867 November 21, 2025 09:53
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 3646f99 to 09a23f0 Compare November 21, 2025 09:53
@graphite-app graphite-app bot changed the base branch from graphite-base/5867 to main November 21, 2025 09:54
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 09a23f0 to 2bfcc0b Compare November 21, 2025 09:54
Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not hitting approve as I don't have bandwidth to review the size of diff and others have read it


// AttachFunction creates a new attached function in the database
// AttachFunction creates an output collection and attached function in a single transaction
func (s *Coordinator) AttachFunction(ctx context.Context, req *coordinatorpb.AttachFunctionRequest) (*coordinatorpb.AttachFunctionResponse, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we handle soft deletes for

  1. Attaching
  2. Comapction flush?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, compaction flushing uses the same logic as before which fails the flush if is_deleted is true. Similarly, a flush on a function that doesn't exist or is soft deleted will abort the transaction.

Attaching a function does not interact with soft deleted functions and soft deleted functions are renamed with a "deleted" prefix just how collections do.

for key, filePath := range flushSegmentCompaction.FilePaths {
filePaths[key] = filePath.Paths
}
segmentCompactionInfo = append(segmentCompactionInfo, &model.FlushSegmentCompaction{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we properly handle empty compactions and the like here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's the same as normal compactions. That case is tested by compact.rs::test_compaction_with_empty_logs_from_inserts_and_deletes

};

Ok(Chunk::new(Arc::new([log_record])))
Ok(Chunk::new(std::sync::Arc::from(vec![output_record])))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please just import Arc

@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 2bfcc0b to cb021ee Compare November 21, 2025 22:19
Comment on lines +136 to 138
collection_register_infos: Vec<CollectionRegisterInfo>,
function_context: Option<FunctionContext>,
) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Missing validation for collection_register_infos bounds: The function validates length is not 0 or >2, but then immediately calls .first() without handling the empty case that could theoretically reach line 207 if the validation is bypassed. While the validation should prevent this, defensive programming suggests:

let output_collection_register_info = self.collection_register_infos.first()
    .ok_or_else(|| RegisterOrchestratorError::InvariantViolation(
        "No collection register info found"
    ))?;

This is already handled correctly. However, the validation message at line 138 says "Invalid number of collection register infos" which is vague. Be more specific: "Expected 1 or 2 collection register infos, got {count}"

Context for Agents
**Missing validation for collection_register_infos bounds**: The function validates length is not 0 or >2, but then immediately calls `.first()` without handling the empty case that could theoretically reach line 207 if the validation is bypassed. While the validation should prevent this, defensive programming suggests:

```rust
let output_collection_register_info = self.collection_register_infos.first()
    .ok_or_else(|| RegisterOrchestratorError::InvariantViolation(
        "No collection register info found"
    ))?;
```

This is already handled correctly. However, the validation message at line 138 says "Invalid number of collection register infos" which is vague. Be more specific: "Expected 1 or 2 collection register infos, got {count}"

File: rust/worker/src/execution/orchestration/register_orchestrator.rs
Line: 138

@blacksmith-sh

This comment has been minimized.

@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from cb021ee to af08c64 Compare November 21, 2025 22:31
@blacksmith-sh

This comment has been minimized.

@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from af08c64 to cb99156 Compare November 21, 2025 22:50
Copy link
Contributor

@Sicheng-Pan Sicheng-Pan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from cb99156 to 9b3a17d Compare November 21, 2025 23:52
return nil, status.Errorf(codes.Internal, "attached function has invalid completion_offset: %d", attachedFunction.CompletionOffset)
}

if !attachedFunction.IsReady {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The check if !attachedFunction.IsReady inside attachedFunctionToProto appears to be redundant. The callers of this function (GetAttachedFunctionByName, ListAttachedFunctions, GetAttachedFunctionByUuid) use data access methods (GetByName, GetByCollectionID, GetByID) that are already specified to return only "ready" attached functions.

This redundant check returns a generic codes.Internal error, which could be confusing for clients. If an unready function is passed here, it signifies an internal logic error. Consider removing this check to simplify the code and rely on the data access layer to enforce the "ready" contract. If the check is kept for defensiveness, a panic might be more appropriate to signal the internal invariant violation.

Context for Agents
The check `if !attachedFunction.IsReady` inside `attachedFunctionToProto` appears to be redundant. The callers of this function (`GetAttachedFunctionByName`, `ListAttachedFunctions`, `GetAttachedFunctionByUuid`) use data access methods (`GetByName`, `GetByCollectionID`, `GetByID`) that are already specified to return only "ready" attached functions.

This redundant check returns a generic `codes.Internal` error, which could be confusing for clients. If an unready function is passed here, it signifies an internal logic error. Consider removing this check to simplify the code and rely on the data access layer to enforce the "ready" contract. If the check is kept for defensiveness, a panic might be more appropriate to signal the internal invariant violation.

File: go/pkg/sysdb/coordinator/task.go
Line: 225

if let Some(function_context) = &self.function_context {
vec![(
wrap(
FinishAttachedFunctionOperator::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

There's a minor naming inconsistency. The PR description refers to a FinishAttachedTask operator, but the implementation is named FinishAttachedFunctionOperator. For consistency with the documentation and intent, consider renaming FinishAttachedFunctionOperator to FinishAttachedTaskOperator.

Context for Agents
There's a minor naming inconsistency. The PR description refers to a `FinishAttachedTask` operator, but the implementation is named `FinishAttachedFunctionOperator`. For consistency with the documentation and intent, consider renaming `FinishAttachedFunctionOperator` to `FinishAttachedTaskOperator`.

File: rust/worker/src/execution/orchestration/register_orchestrator.rs
Line: 184

@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from 9b3a17d to e1c52e2 Compare November 22, 2025 01:34
@tanujnay112 tanujnay112 force-pushed the make_functions_great_again branch from e1c52e2 to 8659f3d Compare November 22, 2025 01:37
@tanujnay112 tanujnay112 merged commit 3fec52a into main Nov 22, 2025
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants