Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When putting an S3Object connecting to S3, the contentEncoding for that object is always "aws-chunked" #5769

Open
1 task
ngudbhav opened this issue Jan 3, 2025 · 5 comments
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@ngudbhav
Copy link

ngudbhav commented Jan 3, 2025

Describe the bug

s3.txt

This is a re-opened thread from #4746 (comment).

I have attached the packet details of the PUT Object call from SDK to S3 (Localstack).

SDK always send Content-encoding as aws-chunked. This causes the result to fail to decompress. I have tried to explicitly set the Content-Length to a sufficiently high number but in vain. This is only reproducible with localstack and not the real AWS.

Screenshot 2025-01-03 at 5 51 09 PM

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Content-encoding should not always be aws-chunked.

Current Behavior

Content-encoding is not always be aws-chunked.

Reproduction Steps

I have used the below code to upload the file

S3AsyncClient buildS3Client() {
        S3CrtAsyncClientBuilder builder = S3AsyncClient.crtBuilder()
                .credentialsProvider(getAwsCredentialsProvider())
                .region(Region.of(region));
        Optional<String> s3Endpoint = getLocalStackEndpoint();
        s3Endpoint.ifPresent(s -> {
            builder.endpointOverride(URI.create("https://s3.localhost.localstack.cloud:4566"));
            builder.forcePathStyle(true);
            builder.minimumPartSizeInBytes((long) (8 * 1024 * 1024));
        });
        return builder.build();
    }
s3Client = buildS3Client()
s3TransferManager = S3TransferManager.builder()
                .s3Client(s3Client)
                .build();

The above snippet initialises the S3Client. I have used the minimumPartSizeInBytes in trial and error.

putObjectRequest = PutObjectRequest.builder()
            .bucket(bucket)
            .key(key)
            .contentEncoding(GZIP_ENCODING)
            .contentType(contentType)
            .contentLength((long) (8 * 1024 * 1024))
            .tagging(tagging)
            .build();
uploadRequest = UploadRequest.builder()
            .putObjectRequest(putObjectRequest)
            .requestBody(AsyncRequestBody.fromBytes(bytes))
            .build();
s3TransferManager.upload(uploadRequest).completionFuture().join()

This code actually facilitates the transfer!

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.29.15

JDK version used

17.0.13

Operating System and version

Ubuntu 22.04.5 LTS, Linux 6.10.14-linuxkit, Inside Docker 27.4.0

@ngudbhav ngudbhav added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 3, 2025
@ngudbhav ngudbhav changed the title When putting an S3Object connecting to S3 via http, the contentEncoding for that object is always "aws-chunked" When putting an S3Object connecting to S3, the contentEncoding for that object is always "aws-chunked" Jan 3, 2025
@bhoradc bhoradc added investigating This issue is being investigated and/or work is in progress to resolve the issue. p1 This is a high priority issue potential-regression Marking this issue as a potential regression to be checked by team member and removed needs-triage This issue or PR still needs to be triaged. labels Jan 3, 2025
@bhoradc bhoradc self-assigned this Jan 3, 2025
@bhoradc
Copy link

bhoradc commented Jan 3, 2025

Hi @ngudbhav,

Thank you for reporting the issue. I tried to reproduce this scenario but found the behavior to be consistent between AWS S3 and LocalStack.

Both environments:

  • Accept the dual content-encoding (gzip,aws-chunked)
  • Successfully process the request
  • Return 200 status codes

Could you please go through the reproduction steps from below and let me know for any deviation that may result in your reported behavior?

pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.example</groupId>
    <artifactId>V2_ContentEncoding_5769</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <aws.sdk.version>2.29.15</aws.sdk.version>
    </properties>
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>software.amazon.awssdk</groupId>
                <artifactId>bom</artifactId>
                <version>2.29.15</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-bom</artifactId>
                <version>2.19.0</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>s3</artifactId>
        </dependency>
        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>s3-transfer-manager</artifactId>
        </dependency>
        <dependency>
            <groupId>software.amazon.awssdk.crt</groupId>
            <artifactId>aws-crt</artifactId>
            <version>0.33.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-slf4j2-impl</artifactId>
        </dependency>
    </dependencies>
</project>

1. AWS S3 Behavior:

Code snippet
public static void main(String[] args) {

        Log.initLoggingToFile(Log.LogLevel.Trace, "/Users/***/IdeaProjects/V2_ContentEncoding_5769/log.txt");

        String bucket = "<<bucket_name>>";
        String key = "testing-file.txt";
        String contentType = "text/plain";

        String content = "Hello Java SDK!";
        byte[] bytes = content.getBytes(StandardCharsets.UTF_8);

        S3AsyncClient s3Client = buildS3Client();

        S3TransferManager s3TransferManager = S3TransferManager.builder()
                .s3Client(s3Client)
                .build();

        PutObjectRequest putObjectRequest = PutObjectRequest.builder()
                .bucket(bucket)
                .key(key)
                .contentEncoding(GZIP_ENCODING)
                .contentType(contentType)
                .contentLength((long) bytes.length)
                .build();

        UploadRequest uploadRequest = UploadRequest.builder()
                .putObjectRequest(putObjectRequest)
                .requestBody(AsyncRequestBody.fromBytes(bytes))
                .build();

        try {
            s3TransferManager.upload(uploadRequest).completionFuture().join();
            System.out.println("Upload completed successfully");
        } catch (Exception e) {
            System.err.println("Upload failed: " + e.getMessage());
            e.printStackTrace();
        } finally {
            s3TransferManager.close();
            s3Client.close();
        }
    }
    private static S3AsyncClient buildS3Client() {
        S3CrtAsyncClientBuilder builder = S3AsyncClient.crtBuilder()
                .credentialsProvider(DefaultCredentialsProvider.create())
                .region(Region.of(REGION))
                .minimumPartSizeInBytes((long) (8 * 1024 * 1024));
        return builder.build();
    }
}
CRT debug log
content-encoding:gzip,aws-chunked
content-length:56
content-type:text/plain
host:<<bucket_name>>.s3.amazonaws.com

2. LocalStack Behavior:

Code snippet
public class Main {
    private static final String GZIP_ENCODING = "gzip";
    private static final String REGION = "us-east-1";

    public static void main(String[] args) {

        Log.initLoggingToFile(Log.LogLevel.Trace, "/Users/bhoradc/IdeaProjects/V2_ContentEncoding_5769/log.txt");

        String bucket = "<<bucket_name>>";
        String key = "testing-file.txt";
        String contentType = "text/plain";

        String content = "Hello Java SDK!";
        byte[] bytes = content.getBytes(StandardCharsets.UTF_8);

       // S3AsyncClient s3Client = buildS3Client();
        S3AsyncClient s3Client =localstackbuildS3Client();

        S3TransferManager s3TransferManager = S3TransferManager.builder()
                .s3Client(s3Client)
                .build();

        s3Client.createBucket(CreateBucketRequest.builder()
                        .bucket(bucket)
                        .build())
                .join();
        System.out.println("Bucket created successfully: " + bucket);

        PutObjectRequest putObjectRequest = PutObjectRequest.builder()
                .bucket(bucket)
                .key(key)
                .contentEncoding(GZIP_ENCODING)
                .contentType(contentType)
                .contentLength((long) bytes.length)
                .build();

        UploadRequest uploadRequest = UploadRequest.builder()
                .putObjectRequest(putObjectRequest)
                .requestBody(AsyncRequestBody.fromBytes(bytes))
                .build();

        try {
            s3TransferManager.upload(uploadRequest).completionFuture().join();
            System.out.println("Upload completed successfully");
        } catch (Exception e) {
            System.err.println("Upload failed: " + e.getMessage());
            e.printStackTrace();
        } finally {
            s3TransferManager.close();
            s3Client.close();
        }
    }
    private static S3AsyncClient buildS3Client() {
        S3CrtAsyncClientBuilder builder = S3AsyncClient.crtBuilder()
                .credentialsProvider(DefaultCredentialsProvider.create())
                .region(Region.of(REGION))
                .minimumPartSizeInBytes((long) (8 * 1024 * 1024));
        return builder.build();
    }

    private static S3AsyncClient localstackbuildS3Client() {
        S3CrtAsyncClientBuilder builder = S3AsyncClient.crtBuilder()
                .credentialsProvider(StaticCredentialsProvider.create(
                AwsBasicCredentials.create("test", "test")))
                .region(Region.of(REGION));
        Optional<String> s3Endpoint = getLocalStackEndpoint();
        s3Endpoint.ifPresent(s -> {
            builder.endpointOverride(URI.create("http://localhost:4566"));
            builder.forcePathStyle(true);
            builder.minimumPartSizeInBytes((long) (8 * 1024 * 1024));
        });
        return builder.build();
    }

    private static Optional<String> getLocalStackEndpoint() {
        return Optional.of("http://localhost:4566");
    }
}
CRT debug log
PUT
/<<bucket_name>>/testing-file.txt

amz-sdk-invocation-id:16b7fdce-8be2-0a1e-26af-fe4488c7e0b9
amz-sdk-request:attempt=1; max=1
content-encoding:gzip,aws-chunked
content-length:56
content-type:text/plain
host:localhost:4566
x-amz-content-sha256:STREAMING-UNSIGNED-PAYLOAD-TRAILER
x-amz-date:20250103T182315Z
x-amz-decoded-content-length:15
x-amz-trailer:x-amz-checksum-crc32
LocalStack version
~ % docker inspect localstack/localstack:3.7.2 | grep -i version
        "DockerVersion": "",
                "PYTHON_VERSION=3.11.9",
                "PYTHON_PIP_VERSION=24.0",
                "PYTHON_SETUPTOOLS_VERSION=65.5.1",
                "LOCALSTACK_BUILD_VERSION=3.7.2"

Only notable difference I see in the networking setup, your environment is localstack:4566 (docker’s internal network) whereas I am running it on localhost:4566. But this difference should not affect the content-encoding behavior is what I believe.

Regards,
Chaitanya

@bhoradc bhoradc added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jan 3, 2025
@ngudbhav
Copy link
Author

ngudbhav commented Jan 4, 2025

Hi @bhoradc
Thanks a lot for the quick reply.
Is there any way I can disable the aws-chunked content encoding?

I have tried various ways but downloading the file requires manual decompression.

As you can see in the screenshot, even the wireshark displays an error that decompression failed. I have tried using a browser, Go AWS client but the automatic decompression is not working.

However, if I explicitly write a code to decompress the GZIP file, I get the expected contents back.

I am not sure if the dual headers is the cause of this behaviour.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jan 4, 2025
@bhoradc
Copy link

bhoradc commented Jan 7, 2025

Hi @ngudbhav,

Currently, I don’t see the CRT builder having any support disabling chunked encoding through signer parameters or configuration settings, similar to the S3 Standard/Builder clients.

However, I don't see this as a regression from #5043. The results I shared in my previous comment demonstrate the expected behavior for dual content-encoding with the Java SDK.

Regards,
Chaitanya

@bhoradc bhoradc added p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed p1 This is a high priority issue potential-regression Marking this issue as a potential regression to be checked by team member labels Jan 7, 2025
@ngudbhav
Copy link
Author

ngudbhav commented Jan 8, 2025

Thanks a lot for your reply.

I understand that adding support to the CRT builder may not be in the pipeline. Is this something I can pick up? Our development experience is stuck because the browser cannot decompress the JSONs and CSVs from the local stack's S3.

Also, Can you please help me understand why the clients cannot decompress the server response? May be this is something that can be fixed without adding the support.

Thank you

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jan 8, 2025
@DmitriyMusatkin
Copy link

I dont think its really crt issue.
aws-chunked is a fairly old s3 protocol for sending payload in chunks and supporting trailing headers.
this is used by clients to compute checksum as data is streamed and send it in the header.
the server should remove the aws-chunked from Content-Encoding after interpreting the chunks.
but looks like localstack has limited support for that and might not do it the same way as s3 does.
java transfer manager calculates checksum by default, so aws-chunked usage has been there since launch.
it might be possible to disable checksums or provide a precomputed checksum for the payload to workaround this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants