-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache parameter queries and buckets #200
Conversation
🦋 Changeset detectedLatest commit: 80fbe0b The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
8aea0ab
to
3e257fc
Compare
9784136
to
c8e63be
Compare
a4673c8
to
6ebefa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements caching of parameter queries and buckets to reduce incremental sync overhead. Key changes include:
- Adding a new migration to create a unique index on the bucket state.
- Caching dynamic bucket descriptions and parameter lookups in the sync and storage layers.
- Updating types and functions to use a dedicated ParameterLookup class instead of raw arrays.
Reviewed Changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
modules/module-mongodb-storage/src/migrations/db/migrations/1741697235857-bucket-state-index.ts | Adds a migration to create a unique index for bucket state updates. |
packages/service-core/src/sync/BucketChecksumState.ts | Adjusts checkpoint update logic with caching and refined bucket update detection. |
packages/service-core/src/sync/util.ts | Reimplements the hasIntersection utility to support set-based lookup comparisons. |
modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts | Updates parameter set queries and ensures proper cleanup of bucket state during clear operations. |
packages/service-core/src/storage/bson.ts | Refactors lookup serialization functions to work with the new ParameterLookup type. |
packages/sync-rules/src/SqlParameterQuery.ts | Transitions lookup generation to return ParameterLookup objects with normalized values. |
Comments suppressed due to low confidence (5)
packages/service-core/src/sync/BucketChecksumState.ts:404
- [nitpick] Consider implementing a set intersection for 'updatedBuckets' to improve performance and clarity instead of iterating over each bucket.
// TODO: Do set intersection instead
modules/module-mongodb-storage/src/storage/implementation/MongoCompactor.ts:342
- [nitpick] Ensure that 'lastOpId' is defined when logging the flush operation or provide a fallback value in the log message for clarity.
logger.info(`Flushing CLEAR for ${numberOfOpsToClear} ops at ${lastOpId?.o}`);
packages/sync-rules/src/SqlParameterQuery.ts:300
- [nitpick] Ensure that all consumers of 'getLookups' are updated to expect ParameterLookup objects rather than raw arrays; update function documentation if needed.
getLookups(parameters: RequestParameters): ParameterLookup[] {
packages/service-core/src/storage/bson.ts:35
- [nitpick] Consider wrapping the parsed lookup in a ParameterLookup instance (e.g., return new ParameterLookup(parsed)) to ensure consistent type usage across the codebase.
const parsed = bson.deserialize(lookup.buffer, BSON_DESERIALIZE_INTERNAL_OPTIONS).l as SqliteJsonValue[];
packages/sync-rules/src/types.ts:65
- [nitpick] Consider using 'instanceof ParameterLookup' to verify the type of 'lookup' rather than checking if its 'values' property is an array.
return Array.isArray((e as EvaluatedParameters).lookup?.values);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR caches parameter queries and bucket metadata to reduce the computational overhead of incremental syncs. Key changes include:
- Updating utility functions for handling parameter lookups with proper normalization.
- Refactoring bucket checksum and parameter state logic to use a consistent INVALIDATE_ALL_BUCKETS symbol and new caching for dynamic buckets.
- Adjusting storage implementations and tests in both MongoDB and Postgres modules to accommodate the new ParameterLookup type.
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
packages/service-core/test/src/util.test.ts | Updated tests to use ParameterLookup and added new helper functions for intersection checking (note misspellings in function names). |
modules/module-mongodb-storage/src/migrations/db/migrations/1741697235857-bucket-state-index.ts | New migration to create/destroy an index on bucket updates. |
packages/service-core/src/sync/util.ts | Updated utility functions with new type MapOrSet and minor refactoring in getIntersection/hasIntersection. |
packages/service-core/src/sync/BucketChecksumState.ts | Refactored checkpoint update logic to use INVALIDATE_ALL_BUCKETS and improved caching of dynamic bucket queries. |
modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts | Adjusted type signatures to work with ParameterLookup and improved logging and error checking. |
packages/sync-rules/src/BucketParameterQuerier.ts, SqlParameterQuery.ts, SqlBucketDescriptor.ts | Refactored to replace dynamicBucketDefinitions with parameterQueryLookups and use normalization in parameter lookups. |
packages/service-core-tests/src/tests/register-data-storage-tests.ts | Updated tests to call ParameterLookup.normalized instead of literal arrays for parameter lookups. |
Other files | Minor adjustments in bson serialization, compactor logging, checkpoint changes caching, and Postgres storage to support the new parameter lookup handling. |
daa68b2
to
8f45ca5
Compare
Builds on #197.
For each checkpoint, this computes:
Then, for each connection, we check:
as well as re-checking each bucket. This can be optimized further in the future.For both the parameter lookup values and changed buckets, we place a limit on the number of values we check between checkpoints, currently 1000. If the limit is exceeded, we re-check everything, rather than attempting to compute the incremental changes. This guards against excessive memory usage when there are bulk changes coming in.
Overall, the effect is that an incremental update only affecting specific users/connections, only requires computational overhead (on both in the API container and storage db) proportional to the number of connections affected, rather than proportional to the total number of connections. One remaining query to address is write checkpoints: This currently re-checks write checkpoints on for every connection on every change.