-
Notifications
You must be signed in to change notification settings - Fork 5
feat(data): add getBatchForStudyDeployments endpoint with filtering #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
d2b7a73 to
eca28e5
Compare
|
@Whathecode I squashed the new fixes into the original one.
to |
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/CollectedDataSet.kt
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
|
I see you added this to the 2.0.0 milestone instead of 1.3. Any reason you expect this to be a breaking change, i.e., warranting a new major release? |
eca28e5 to
e96bce2
Compare
|
@Whathecode I have updated the the code based on the last discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still an incomplete review, but I started looking at why you added ImmutableDataStreamBatch and ...Sequence. The PR description is missing some clarification in regards to why you are adding this. Have a look at some of the questions I asked and see whether you can clarify things.
I also have the impression that adding these changes can easily be done as a separate commit (and even PR). You don't need those for your updates to DataStreamService, and as far as I can tell, the existing data structures would work just fine.
While looking at changes, I noticed some incorrect code style whitespaces. I added a commit which you can squash.
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamSequence.kt
Outdated
Show resolved
Hide resolved
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt
Outdated
Show resolved
Hide resolved
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt
Outdated
Show resolved
Hide resolved
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt
Outdated
Show resolved
Hide resolved
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt
Outdated
Show resolved
Hide resolved
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt
Outdated
Show resolved
Hide resolved
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt
Outdated
Show resolved
Hide resolved
carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt
Outdated
Show resolved
Hide resolved
abce685 to
7435edb
Compare
Add new DataStreamService.getBatchForStudyDeployments() endpoint to retrieve data for multiple deployments with optional filters for device roles, data types, and time ranges. - Add getBatchForStudyDeployments to DataStreamService interface - Implement time-range filtering with exclusive upper bound - Adjust firstSequenceId when filtering removes measurements - Add comprehensive tests for filtering and batch retrieval - Update RPC examples for ParticipantGroupStatus changes - Add documentation and test request snapshots
7435edb to
30be1a8
Compare
|
@Whathecode & @yuanchen233 I have updated the PR, I think it is in a good state now and ready for revie. |
Whathecode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reviewed the DataStreamService.getBatchForStudyDeployments() contract for now.
| import kotlinx.serialization.Required | ||
| import kotlinx.serialization.Serializable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's configured, but wildcard is used for serialization imports across the codebase.
| * @param deviceRoleNames Optional device role name filter (e.g., "phone"). If null or empty, all are included. | ||
| * @param dataTypes Optional data type filter. If null or empty, all are included. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If null or empty, all are included.
That's counter intuitive. If empty, just filter out everything. A good API doesn't give two ways to do the same thing.
On why it matters: suppose a caller sets up a dynamic filter determining the set of device role names they are interested in, which ends up being empty. Now the caller will get all data, instead of no data, as expected.
| * The response is a canonical [DataStreamBatch]: for each [DataStreamId], sequences are | ||
| * ordered by start time and non-overlapping (contract preserved). No derived/secondary | ||
| * indexing is applied in this API; analytics-specific projections are out of scope here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this; all of this are implementation/design details. Not API documentation. The contract (API) of DataStreamBatch is documented already on DataStreamBatch.
| * Time range semantics: if [from] or [to] are specified, sequences are clipped to the | ||
| * half-open interval [from, to) (inclusive start, exclusive end). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That only works if from and to are specified. But, it looks like you can omit this and instead just document inclusive/exclusive nature in the corresponding from/to parameters. As is, this causes more confusion than it answers edge cases.
Instead, I'm more surprised about how Instant comes into the picture here. The data subsystem only has Long's for sensorStartTime and sensorEndTime. So ... what is happening here? How do I know what to pass?
| * Time range semantics: if [from] or [to] are specified, sequences are clipped to the | ||
| * half-open interval [from, to) (inclusive start, exclusive end). | ||
| * | ||
| * @param studyDeploymentIds Study deployments to query. Must not be empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must not be empty.
Why? It seems like an overly strict contract. You can easily return nothing if you pass nothing, which would cause less additional handling for this edge case by the caller if they don't care about optimization/saving a roundtrip.
| * @param to Optional absolute end time (exclusive). If null, no upper bound. | ||
| * @return A [DataStreamBatch] containing matching data sequences, preserving per-stream invariants. | ||
| */ | ||
| suspend fun getBatchForStudyDeployments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain about the naming of this. Maybe simply getData? But, it will depend a bit on what actually comes out. It still seems like some synchronization is bound to happen (which would need documentation!), given the from and to Instant parameters, in which case getSynchronizedData or similar could be more appropriate.
This PR adds a new
getBatchForStudyDeployments()endpoint toDataStreamServicethat enables efficient retrieval of data from multiple study deployments with optional filtering capabilities.Changes
New Endpoint
getBatchForStudyDeployments(): Retrieve data for multiple deployments in a single callDataStreamBatchwith non-overlapping sequences per streamImplementation Details
[from, to))firstSequenceIdwhen measurements are filtered from the beginningBug Fixes
ParticipantGroupStatusconstructor changesTesting
InMemoryDataStreamServiceBatchRetrievalTest