Skip to content

Ingest async implementation #430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

ohadbitt
Copy link
Collaborator

Changed

  • [BREAKING] All synchronous queued and streaming ingestion APIs now delegate to their asynchronous counterparts
    internally and block for results.
  • [BREAKING] Streaming client no longer check for blob size and if it exists.

Added

  • The SDK now provides Reactor Core-based asynchronous APIs for all queued and streaming ingestion endpoints,
    enabling non-blocking operations.

Copy link

github-actions bot commented Apr 10, 2025

Test Results

342 tests  ±0   330 ✅  - 3   2m 56s ⏱️ -27s
 28 suites ±0     9 💤 ±0 
 28 files   ±0     3 ❌ +3 

For more details on these failures, see this check.

Results for commit d9d8760. ± Comparison against base commit fa305a3.

This pull request removes 24 and adds 24 tests. Note that renamed tests count towards both.
com.microsoft.azure.kusto.ingest.ManagedStreamingTest ‑ IngestFromStream_CsvStream
com.microsoft.azure.kusto.ingest.QueuedIngestClientTest ‑ ingestFromFile_FileDoesNotExist_IngestionClientException
com.microsoft.azure.kusto.ingest.QueuedIngestClientTest ‑ ingestFromResultSet_StreamIngest_IngestionClientException
com.microsoft.azure.kusto.ingest.QueuedIngestClientTest ‑ ingestFromResultSet_StreamIngest_IngestionServiceException
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlob_BlobSourceInfoWithBlankBlobPath_IllegalArgumentException
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlob_BlobSourceInfoWithNullBlobPath_IllegalArgumentException
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlob_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[1]
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlob_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[2]
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlob_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[3]
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlob_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[4]
…
com.microsoft.azure.kusto.ingest.ManagedStreamingTest ‑ ingestFromStream_CsvStream
com.microsoft.azure.kusto.ingest.QueuedIngestClientTest ‑ ingestFromFileAsync_FileDoesNotExist_IngestionClientException
com.microsoft.azure.kusto.ingest.QueuedIngestClientTest ‑ ingestFromResultSetAsync_StreamIngest_IngestionClientException
com.microsoft.azure.kusto.ingest.QueuedIngestClientTest ‑ ingestFromResultSetAsync_StreamIngest_IngestionServiceException
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlobAsync_BlobSourceInfoWithBlankBlobPath_IllegalArgumentException
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlobAsync_BlobSourceInfoWithNullBlobPath_IllegalArgumentException
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlobAsync_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[1]
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlobAsync_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[2]
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlobAsync_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[3]
com.microsoft.azure.kusto.ingest.StreamingIngestClientTest ‑ ingestFromBlobAsync_IngestionPropertiesWithIllegalDatabaseOrTableNames_IllegalArgumentException(String, String)[4]
…

♻️ This comment has been updated with latest results.

- [BREAKING] * Make ManagedStreamingQueuingPolicy internal, expose just a factor
* Dont allow users to pass raw data size, provide it only if we have it
- [BREAKING] Make ManagedStreamingQueuingPolicy internal, expose just a factor
- [BREAKING] Don't allow users to pass raw data size, provide it only if we have it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retroactively changing the 6.0.1 changelogs? maybe another PR?


### Changed

- [BREAKING] All synchronous queued and streaming ingestion APIs now delegate to their asynchronous counterparts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it have any effects for the user?

I think a big thing is the exception thanges. IDK if we mention it anywhere

* @return A configured {@link Retry} instance
*/
public Retry retry(@Nullable List<Class<? extends Throwable>> retriableErrorClasses) {
public Retry retry(@Nullable List<Class<? extends Throwable>> retriableErrorClasses,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the only time we use the first param is in a test. Do we really need it?
Maybe overloads to make sure they can't be used at the same time?

* Retrieves the detailed ingestion status of
* all data ingestion operations into Kusto associated with this com.microsoft.azure.kusto.ingest.IKustoIngestionResult instance.
*/
Mono<List<IngestionStatus>> getIngestionStatusCollection() throws URISyntaxException, TableServiceErrorException;

int getIngestionStatusesLength();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc (for consistency)

public List<IngestionStatus> getIngestionStatusCollection() {
return Collections.singletonList(this.ingestionStatus);
public Mono<List<IngestionStatus>> getIngestionStatusCollection() {
return Mono.defer(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the defer may not be needed here.


### Changed

- [BREAKING] All synchronous queued and streaming ingestion APIs now delegate to their asynchronous counterparts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any public api that broke and no longer available?


// If an error occurs, each time the retryWhen subscribes to executeStream create a new instance
// instead of using the same executeStream Mono for all retries
return executeStream(blobSourceInfo, ingestionProperties, blobAsyncClient, i.increment())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.increment() only happens once here, wouldn't it always be 1?

log.error(msg, ex);
throw new IngestionClientException(msg, ex);
}
return Mono.fromCallable(() -> IngestionUtils.resultSetToStream(resultSetSourceInfo))// TODO: ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The todo?

ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
new CsvRoutines().write(resultSetSourceInfo.getResultSet(), byteArrayOutputStream);
new CsvRoutines().write(resultSetSourceInfo.getResultSet(), byteArrayOutputStream); // TODO: CsvRoutines is not maintained from 2021. replace?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe with something async?

```

From File:
```java
OperationStatus status = streamingIngestClient.ingestFromFile(fileSourceInfo, ingestionProperties).getIngestionStatusCollection().get(0).status;
OperationStatus status = streamingIngestClient.ingestFromFile(fileSourceInfo, ingestionProperties).getIngestionStatusCollection().get(0).status; //TODO: this is async now? should we have a sync equivalent?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we want to break as least as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants