Skip to content

Conversation

@PorridgeSwim
Copy link

@PorridgeSwim PorridgeSwim commented Nov 26, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

We added availableNow trigger support in SparkMicroBatchStream.java.
Also refactored the latestOffset() function by abstracting out a latestOffsetInternal() method. The internal method takes DeltaSourceOffset and returns DeltaSourceOffset, while latestOffset() takes Offset and returns Offset.

How was this patch tested?

Parameterized tests verifying parity between DSv1 (DeltaSource) and DSv2 (SparkMicroBatchStream).

Does this PR introduce any user-facing changes?

No

}

DeltaSourceOffset deltaStartOffset = DeltaSourceOffset.apply(tableId, startOffset);
protected Optional<DeltaSourceOffset> latestOffsetInternal(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a new method here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new method would be better because our logic operates on DeltaSourceOffset. We don't want to retrieve Offset each time and then convert it to DeltaSourceOffset

return null;
}
// TODO(#5318): init trigger available now support
Optional<DeltaSourceOffset> deltaStartOffset =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wrap the start offset with Optional? I think it makes subsequent code less readable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the optional wrapper


@ParameterizedTest
@MethodSource("availableNowParameters")
public void testAvailableNow(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AvailableNow determines a fixed end point, so let's test the behavior via multiple batches? similar to testLatestOffset_SequentialBatchAdvancement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed test to use sequential batch


DeltaSourceOffset deltaStartOffset = DeltaSourceOffset.apply(tableId, startOffset);
initForTriggerAvailableNowIfNeeded(deltaStartOffset);
// endOffset is null: no data is available to read for this batch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This has been changed to // Return null when no data is available for this batch.

/* startVersion= */ 0L,
/* startIndex= */ BASE_INDEX,
ReadLimitConfig.maxFiles(1),
/* numIterations= */ 3,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should use a larger number of iterations for some of these.

Copy link
Collaborator

@huan233usc huan233usc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits

* initialize the internal states for AvailableNow if this method is called first time after
* prepareForTriggerAvailableNow.
*/
protected void initForTriggerAvailableNowIfNeeded(DeltaSourceOffset startOffsetOpt) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can those methods and lastOffsetForTriggerAvailableNow be private?

* process up. We may run multiple micro batches, but the query will stop itself when it reaches
* this offset.
*/
protected Optional<DeltaSourceOffset> lastOffsetForTriggerAvailableNow = Optional.empty();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move those variable definition after L76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants