Skip to content

Conversation

@TingDaoK
Copy link
Contributor

@TingDaoK TingDaoK commented Oct 3, 2025

Issue #, if available:

  • When the client is configured to allow using more memory, it makes more sense to default to larger range instead of a static 8MiB range. Based on the object characteristics stored in S3, memory constraints, and throughput target, we now dynamic calculate the range for getting the object.
  • Parse the etag response from s3, as it will be <hash>-<number of part>, so that we can have an estimated part size that S3 stores the object.
  • Keep the discovery phase to use the 8MiB as a default fallback.

Description of changes:

  • Fix a race condition in the cancel tests and introduced a compiling flag to allow the test to stub into the function for testing the specific case.

TODO:

  • Buffer Pool Refactor
    • Currently the default buffer pool implementation will not handle the large range allocation from the pool.
    • To avoid wasting too many memory for small objects, we cannot just increase the buffer pool chunk size
    • We will refactor the buffer pool to handle different size chunks as a quick follow up.
    • Buffer pool refactor #584
  • Integrate
    • Integrate the buffer pool refactor and the dynamic size change
    • Apply reasonable upper bound and alignment for the implementation
    • Tests
  • S3 Express Optimization
  • Fuzz tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 89.94975% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.12%. Comparing base (6b38088) to head (5d7d268).

Files with missing lines Patch % Lines
source/s3_util.c 91.20% 8 Missing ⚠️
source/s3_auto_ranged_get.c 90.00% 3 Missing ⚠️
source/s3_client.c 85.71% 3 Missing ⚠️
source/s3_auto_ranged_put.c 91.30% 2 Missing ⚠️
source/s3_default_meta_request.c 0.00% 2 Missing ⚠️
source/s3_meta_request.c 86.66% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
+ Coverage   88.97%   89.12%   +0.14%     
==========================================
  Files          23       23              
  Lines        7201     7336     +135     
==========================================
+ Hits         6407     6538     +131     
- Misses        794      798       +4     
Files with missing lines Coverage Δ
source/s3_request.c 87.16% <100.00%> (+3.94%) ⬆️
source/s3_auto_ranged_put.c 91.98% <91.30%> (-0.25%) ⬇️
source/s3_default_meta_request.c 94.25% <0.00%> (-1.13%) ⬇️
source/s3_meta_request.c 91.04% <86.66%> (-0.09%) ⬇️
source/s3_auto_ranged_get.c 97.31% <90.00%> (-0.69%) ⬇️
source/s3_client.c 91.61% <85.71%> (-0.09%) ⬇️
source/s3_util.c 95.58% <91.20%> (-1.57%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


include(CTest)
if (BUILD_TESTING)
add_definitions(-DAWS_C_S3_ENABLE_TEST_STUBS)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any concern adding this def here? Or it's better to keep as a separate one.

pro: Nothing will break.
con: Maybe people built with tests in prod and don't want to have this stub enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we enable this ourselves when we package java, python, etc...?
probably not a big concern in practice, it add a private api. we already expose a bunch of private apis used by tests only

Copy link
Contributor Author

@TingDaoK TingDaoK Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/awslabs/aws-crt-python/blob/main/crt/CMakeLists.txt#L26
https://github.com/awslabs/aws-crt-java/blob/main/pom.xml#L185
Yeah, we do set this to off when we build the bindings.
Also, we should not build the C tests when we package for the CRT bindings.

@TingDaoK TingDaoK marked this pull request as ready for review October 7, 2025 23:04
bool memory_allocated_from_pool;
} crt_info_metrics;

struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking out loud:
maybe we should unify this with

struct aws_s3_mpu_part_info {
.
we can probably do some of the same sanity checks in upload path as well


if (aws_http_headers_get(request->send_data.response_headers, g_etag_header_name, &etag_header_value)) {
if (!auto_ranged_get->initial_message_has_if_match_header) {
/* Store ETag if needed for If-Match header */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only if needed? isnt it easier to just always store it (i.e. the old way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old way also only store the etag when initial_message_has_if_match_header is false.

AWS_PRECONDITION(response_headers);

struct aws_byte_cursor content_range_header_value;
/* Expected Format of header is: "bytes StartByte-EndByte/TotalObjectSize" */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does s3 always respect that or there are cases where it skips start and end (which is valid in http spec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://httpwg.org/specs/rfc9110.html#field.content-range
I don't think the Spec supports skipping those for a success response. Not like Range to be sent for the request. The Content-range in the response will need to have both start and end for a range-resp

source/s3_util.c Outdated

int aws_s3_extract_parts_from_etag(struct aws_byte_cursor etag_header_value, uint32_t *out_num_parts) {

AWS_PRECONDITION(out_num_parts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do AWS_ERROR_PRECONDITION here and in other places

source/s3_util.c Outdated
while (aws_byte_cursor_next_split(&remaining_cursor, '-', &substr)) {
split_count++;
if (split_count == 2) {
/* The ETag should follow the pattern <hash>-<parts_count>, so the second part is the parts count. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while its should be the case in most cases, the spec we have from s3 for etag is -.
value in theory can contain dashes. we should find - from the back and try to parse from that until the end as a number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to confirm this with S3, if the value contains -, find it from the back will still result in unexpected behaviors.

As far as I find, the etag value should be a hash value? So it should not contain -

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather to force the etag to be exactly have one or less - in it. So that if it's not the expected format, we will error out and fallback, instead of reading from the back try to parse the case where it's not formatted as expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants