feat(data): add getBatchForStudyDeployments endpoint with filtering #517

NGrech · 2025-08-29T09:21:51Z

This PR adds a new getBatchForStudyDeployments() endpoint to DataStreamService that enables efficient retrieval of data from multiple study deployments with optional filtering capabilities.

Changes

New Endpoint

getBatchForStudyDeployments(): Retrieve data for multiple deployments in a single call
- Filter by device role names
- Filter by data types
- Filter by time range (inclusive start, exclusive end)
- Returns aggregated DataStreamBatch with non-overlapping sequences per stream

Implementation Details

Time-range filtering using inclusive lower and exclusive upper bound ([from, to))
Adjusts firstSequenceId when measurements are filtered from the beginning
Maintains sequence ordering and non-overlap guarantees per data stream

Bug Fixes

Fixed time filtering to properly adjust sequence IDs when clipping measurements
Updated RPC example requests to match ParticipantGroupStatus constructor changes

Testing

Added comprehensive test suite in InMemoryDataStreamServiceBatchRetrievalTest
Tests cover filtering, aggregation, edge cases, and non-monotonic timestamps

NGrech · 2025-09-01T12:14:32Z

@Whathecode I squashed the new fixes into the original one.
Note that the reason the generated test files were not in the original commit was that there is no mention of that requirement in the CONTRIBUTING.md.
I think we should add fix this:

You can also run detekt separately through gradle detekt

to gradle detektPasses, since that is the command run in the code analysis check when committing and (at least on windows) gradle detekt will build successfully when there are issues that gradle detektPasses will fail on.

carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/CollectedDataSet.kt

Whathecode · 2025-09-04T15:37:09Z

I see you added this to the 2.0.0 milestone instead of 1.3. Any reason you expect this to be a breaking change, i.e., warranting a new major release?

NGrech · 2025-09-17T11:12:46Z

@Whathecode I have updated the the code based on the last discussion.

Whathecode

Still an incomplete review, but I started looking at why you added ImmutableDataStreamBatch and ...Sequence. The PR description is missing some clarification in regards to why you are adding this. Have a look at some of the questions I asked and see whether you can clarify things.

I also have the impression that adding these changes can easily be done as a separate commit (and even PR). You don't need those for your updates to DataStreamService, and as far as I can tell, the existing data structures would work just fine.

While looking at changes, I noticed some incorrect code style whitespaces. I added a commit which you can squash.

carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamSequence.kt

carp.data.core/src/commonMain/kotlin/dk/cachet/carp/data/application/DataStreamBatch.kt

Add new DataStreamService.getBatchForStudyDeployments() endpoint to retrieve data for multiple deployments with optional filters for device roles, data types, and time ranges. - Add getBatchForStudyDeployments to DataStreamService interface - Implement time-range filtering with exclusive upper bound - Adjust firstSequenceId when filtering removes measurements - Add comprehensive tests for filtering and batch retrieval - Update RPC examples for ParticipantGroupStatus changes - Add documentation and test request snapshots

NGrech · 2025-10-28T10:51:54Z

@Whathecode & @yuanchen233 I have updated the PR, I think it is in a good state now and ready for revie.
Main thing to note from the last version is that I dropped immutable class approach to not violate LSP, and added specific tests to ensure that the returns are sequential and non overlapping.

Whathecode

Only reviewed the DataStreamService.getBatchForStudyDeployments() contract for now.

Whathecode · 2025-11-04T18:20:07Z