Skip to content

Conversation

@hevinhsu
Copy link
Contributor

@hevinhsu hevinhsu commented Jan 9, 2026

What changes were proposed in this pull request?

Add support for the Content-MD5 header to verify object integrity during object uploads.

Please describe your PR in detail:

  • Validate the Content-MD5 header using pre-commit hooks before the key is committed.
  • Add unit tests, integration tests, and end-to-end tests covering:
    • PUT object with Content-MD5
    • Multipart upload with Content-MD5
  • Add no-argument constructors to KeyDataStreamOutput to simplify unit test setup.

What is the link to the Apache JIRA?

https://issues.apache.org/jira/browse/HDDS-10633

How was this patch tested?

https://github.com/hevinhsu/ozone/actions/runs/20842334193

@ivandika3 ivandika3 added the s3 S3 Gateway label Jan 12, 2026
@ivandika3 ivandika3 self-requested a review January 12, 2026 01:42
@jojochuang
Copy link
Contributor

Note:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html

The ETag may or may not be an MD5 digest of the object data.

@hevinhsu
Copy link
Contributor Author

Thanks @jojochuang for pointing this out.

I agree that the final ETag may not always be an MD5 (especially for multipart or encrypted objects). However, this patch focuses on in-transit integrity checking via the Content-MD5 header.

During the upload, S3G calculates the MD5 of the incoming data stream to verify it against the client-provided Content-MD5. This verification happens before the object is committed. The document you shared refers to the ETag's behavior after the commit, which is independent of the Content-MD5 validation logic at the ingestion stage. Therefore, the ETag's final format does not affect this check.

# Conflicts:
#	hadoop-ozone/integration-test-s3/src/test/java/org/apache/hadoop/ozone/s3/awssdk/v2/AbstractS3SDKV2Tests.java
#	hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java
Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hevinhsu for the patch and the extensive tests. Overall LGTM. Just left a few comments.

Comment on lines +1078 to +1086
String eTag = DatatypeConverter.printHexBinary(digest).toLowerCase();
String clientContentMD5 = getHeaders().getHeaderString(S3Consts.CHECKSUM_HEADER);
if (clientContentMD5 != null) {
CheckedRunnable<IOException> checkContentMD5Hook = () -> {
S3Utils.validateContentMD5(clientContentMD5, eTag, key);
};
ozoneOutputStream.getKeyOutputStream().setPreCommits(Collections.singletonList(checkContentMD5Hook));
}
ozoneOutputStream.getMetadata().put(OzoneConsts.ETAG, eTag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since now MD5 hash is used for both ETag and content-md5 verification, let's rename variable eTag to md5Hash. This also applied to other places.

Let's also change getMessageDigestDistance and ETAG_PROVIDER to something like getMD5DigestInstance and MD5_PROVIDER.

assertEquals("\"37b51d194a7513e45b56f6524f2d51f2\"", getObjectResponse.eTag());
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong omission?

@ivandika3
Copy link
Contributor

The ETag may or may not be an MD5 digest of the object data.

Yes, ETag can technically be any hash that based on the object data so it doesn't need to be MD5. Moreover, object uploaded using multipart upload ETag is a hash of all the ETag of all the parts with the "-<num_parts>" prefix, see S3MultipartUploadCompleteRequest#multipartUploadedKeyHash.

If we decide in the future to change ETag to not be md5 hash, then we need to setup a new MD5 MessageDigest. However, since ETag is currently already using MD5, we can piggy back that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

s3 S3 Gateway

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants