Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache parameter queries and buckets #200

Merged
merged 30 commits into from
Mar 20, 2025
Merged

Conversation

rkistner
Copy link
Contributor

@rkistner rkistner commented Feb 12, 2025

Builds on #197.

For each checkpoint, this computes:

  1. A list of parameter "lookup" values that changed since the last one
  2. A list of buckets that had new operations since the last one

Then, for each connection, we check:

  1. Based on the changed lookup values, could any of the parameter queries have changed?
  2. If no parameter query has changed, we check if any of the buckets we're syncing has new operations. If there are any, we re-check the position and checksum for each of those buckets.
  3. If a parameter query has changed, we do a complete re-compute of all parameter queries for that connection, as well as re-checking each bucket. This can be optimized further in the future.

For both the parameter lookup values and changed buckets, we place a limit on the number of values we check between checkpoints, currently 1000. If the limit is exceeded, we re-check everything, rather than attempting to compute the incremental changes. This guards against excessive memory usage when there are bulk changes coming in.

Overall, the effect is that an incremental update only affecting specific users/connections, only requires computational overhead (on both in the API container and storage db) proportional to the number of connections affected, rather than proportional to the total number of connections. One remaining query to address is write checkpoints: This currently re-checks write checkpoints on for every connection on every change.

Copy link

changeset-bot bot commented Feb 12, 2025

🦋 Changeset detected

Latest commit: 80fbe0b

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@powersync/service-module-postgres-storage Minor
@powersync/service-module-mongodb-storage Minor
@powersync/service-core-tests Minor
@powersync/service-core Minor
@powersync/service-sync-rules Minor
@powersync/service-module-mongodb Patch
@powersync/service-module-mysql Patch
@powersync/service-module-postgres Patch
@powersync/service-image Minor
test-client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Base automatically changed from optimize-bucket-lookups to main February 19, 2025 08:01
@rkistner rkistner force-pushed the optimize-bucket-lookups-2 branch from 8aea0ab to 3e257fc Compare February 25, 2025 15:11
@rkistner rkistner force-pushed the optimize-bucket-lookups-2 branch from 9784136 to c8e63be Compare March 5, 2025 11:41
@rkistner rkistner force-pushed the optimize-bucket-lookups-2 branch from a4673c8 to 6ebefa9 Compare March 11, 2025 12:27
@rkistner rkistner changed the title [WIP] Optimize incremental sync: Phase 2 - cache parameter queries [WIP] Cache parameter queries and buckets Mar 12, 2025
@rkistner rkistner requested a review from Copilot March 13, 2025 15:29

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements caching of parameter queries and buckets to reduce incremental sync overhead. Key changes include:

  • Adding a new migration to create a unique index on the bucket state.
  • Caching dynamic bucket descriptions and parameter lookups in the sync and storage layers.
  • Updating types and functions to use a dedicated ParameterLookup class instead of raw arrays.

Reviewed Changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated no comments.

Show a summary per file
File Description
modules/module-mongodb-storage/src/migrations/db/migrations/1741697235857-bucket-state-index.ts Adds a migration to create a unique index for bucket state updates.
packages/service-core/src/sync/BucketChecksumState.ts Adjusts checkpoint update logic with caching and refined bucket update detection.
packages/service-core/src/sync/util.ts Reimplements the hasIntersection utility to support set-based lookup comparisons.
modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts Updates parameter set queries and ensures proper cleanup of bucket state during clear operations.
packages/service-core/src/storage/bson.ts Refactors lookup serialization functions to work with the new ParameterLookup type.
packages/sync-rules/src/SqlParameterQuery.ts Transitions lookup generation to return ParameterLookup objects with normalized values.
Comments suppressed due to low confidence (5)

packages/service-core/src/sync/BucketChecksumState.ts:404

  • [nitpick] Consider implementing a set intersection for 'updatedBuckets' to improve performance and clarity instead of iterating over each bucket.
        // TODO: Do set intersection instead

modules/module-mongodb-storage/src/storage/implementation/MongoCompactor.ts:342

  • [nitpick] Ensure that 'lastOpId' is defined when logging the flush operation or provide a fallback value in the log message for clarity.
            logger.info(`Flushing CLEAR for ${numberOfOpsToClear} ops at ${lastOpId?.o}`);

packages/sync-rules/src/SqlParameterQuery.ts:300

  • [nitpick] Ensure that all consumers of 'getLookups' are updated to expect ParameterLookup objects rather than raw arrays; update function documentation if needed.
getLookups(parameters: RequestParameters): ParameterLookup[] {

packages/service-core/src/storage/bson.ts:35

  • [nitpick] Consider wrapping the parsed lookup in a ParameterLookup instance (e.g., return new ParameterLookup(parsed)) to ensure consistent type usage across the codebase.
  const parsed = bson.deserialize(lookup.buffer, BSON_DESERIALIZE_INTERNAL_OPTIONS).l as SqliteJsonValue[];

packages/sync-rules/src/types.ts:65

  • [nitpick] Consider using 'instanceof ParameterLookup' to verify the type of 'lookup' rather than checking if its 'values' property is an array.
  return Array.isArray((e as EvaluatedParameters).lookup?.values);
@rkistner rkistner requested a review from Copilot March 19, 2025 10:03
@rkistner rkistner requested a review from stevensJourney March 19, 2025 10:11
@rkistner rkistner changed the title [WIP] Cache parameter queries and buckets Cache parameter queries and buckets Mar 19, 2025
@rkistner rkistner marked this pull request as ready for review March 19, 2025 10:11
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR caches parameter queries and bucket metadata to reduce the computational overhead of incremental syncs. Key changes include:

  • Updating utility functions for handling parameter lookups with proper normalization.
  • Refactoring bucket checksum and parameter state logic to use a consistent INVALIDATE_ALL_BUCKETS symbol and new caching for dynamic buckets.
  • Adjusting storage implementations and tests in both MongoDB and Postgres modules to accommodate the new ParameterLookup type.

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/service-core/test/src/util.test.ts Updated tests to use ParameterLookup and added new helper functions for intersection checking (note misspellings in function names).
modules/module-mongodb-storage/src/migrations/db/migrations/1741697235857-bucket-state-index.ts New migration to create/destroy an index on bucket updates.
packages/service-core/src/sync/util.ts Updated utility functions with new type MapOrSet and minor refactoring in getIntersection/hasIntersection.
packages/service-core/src/sync/BucketChecksumState.ts Refactored checkpoint update logic to use INVALIDATE_ALL_BUCKETS and improved caching of dynamic bucket queries.
modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts Adjusted type signatures to work with ParameterLookup and improved logging and error checking.
packages/sync-rules/src/BucketParameterQuerier.ts, SqlParameterQuery.ts, SqlBucketDescriptor.ts Refactored to replace dynamicBucketDefinitions with parameterQueryLookups and use normalization in parameter lookups.
packages/service-core-tests/src/tests/register-data-storage-tests.ts Updated tests to call ParameterLookup.normalized instead of literal arrays for parameter lookups.
Other files Minor adjustments in bson serialization, compactor logging, checkpoint changes caching, and Postgres storage to support the new parameter lookup handling.

@rkistner rkistner force-pushed the optimize-bucket-lookups-2 branch from daa68b2 to 8f45ca5 Compare March 19, 2025 11:29
@rkistner rkistner merged commit bfece49 into main Mar 20, 2025
21 checks passed
@rkistner rkistner deleted the optimize-bucket-lookups-2 branch March 20, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants