Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core)!: implement write returns metadata #5562

Merged
merged 18 commits into from
Feb 18, 2025

Conversation

meteorgan
Copy link
Contributor

@meteorgan meteorgan commented Jan 19, 2025

Which issue does this PR close?

Part of #5557

Rationale for this change

What changes are included in this PR?

  • implement write returns metadata for Operator and BlockingOperator
  • enable write_has_content_length for all services and implement the logic for them
  • implement write returns metadata logic for service S3, fs, monoiofs, hdfs and webhdfs
  • other services return default Metadata after writing
  • skip return value for language bindings

Are there any user-facing changes?

Yes.

  • write, write_with in Operator returns Result<Metadata> instead of Return<()>
  • Writer.close() returns Return<Metadata> instead of Return<()>
  • write in BlockingOperator returns Return<Metadata> instead of Return<()>
  • FunctionWrite.call() used by BlockingOperator returns Return<Metadata> instead of Return<()>

@meteorgan
Copy link
Contributor Author

emmm, another issue related to ceph rados: the Etag returned by CompleteMultipartUpload doesn't follow the standard.
image

@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

the Etag returned by CompleteMultipartUpload doesn't follow the standard.

Well, well, well...

Let's do our best to permit this since it's not harmful. Technically, we don't allow users to parse the ETag value; instead, they should store it and compare it with values from the same source. I'm guessing the issue lies in part of our code assuming that the ETag must contain "?

@meteorgan
Copy link
Contributor Author

the Etag returned by CompleteMultipartUpload doesn't follow the standard.

Well, well, well...

Let's do our best to permit this since it's not harmful. Technically, we don't allow users to parse the ETag value; instead, they should store it and compare it with values from the same source. I'm guessing the issue lies in part of our code assuming that the ETag must contain "?

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

@meteorgan
Copy link
Contributor Author

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

I've done some tests. it's very interesting that stat_with().if_match(etag) works fine regardless of whether the ETag includes the prefix and suffix " or not.

image

@meteorgan
Copy link
Contributor Author

I've done some tests. it's very interesting that stat_with().if_match(etag) works fine regardless of whether the ETag includes the prefix and suffix " or not.

The results are the same across Ceph RADOS, Minio and S3

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from 81aeb13 to ddc3de5 Compare January 20, 2025 13:47
@meteorgan
Copy link
Contributor Author

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

I still created a workaround by adding prefix and suffix " for Etag in the test if they are not included. I think this approach is simpler, what do you think ?

@meteorgan meteorgan marked this pull request as ready for review January 20, 2025 14:03
@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

Perhaps we should introduce write_has_content_type flags to indicate which metadata is available in the meta returned by the write operation. Checking the meta.content_type() may pass the test but is meaningless to users; it could also potentially conceal incorrect implementations that developers can simplely return Metadata::default() to pass all tests. And it makes it more difficult for us to determine the implementation status of various services.

@meteorgan
Copy link
Contributor Author

meteorgan commented Jan 21, 2025

Perhaps we should introduce write_has_content_type flags to indicate which metadata is available in the meta returned by the write operation.

Would write_returns_content_type be a better name?

Checking the meta.content_type() may pass the test but is meaningless to users; it could also potentially conceal incorrect implementations that developers can simplely return Metadata::default() to pass all tests.

if a service returns Metadata::default(), it indicates that it implements nothing. There's a chance that a developer intended to return some contents but forgot set the fields. Unfortunately, the tests can't catch this.

And it makes it more difficult for us to determine the implementation status of various services.

Make sense ! Making it explicit is better.

@meteorgan
Copy link
Contributor Author

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

Wow, they've already had a PR for this issue: ceph/ceph#60477

@meteorgan
Copy link
Contributor Author

Gently push. Hi, @Xuanwo I've provided some replies, do you have any suggestions ?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 24, 2025

Would write_returns_content_type be a better name?

I prefer to align with exists stat_has_xxx and list_has_xxx.

Wow, they've already had a PR for this issue: ceph/ceph#60477

That's nice, maybe we can disable ceph test for now...

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from ddc3de5 to a1e0b0a Compare January 26, 2025 12:53
@meteorgan meteorgan changed the title feat(core)!: implement write returns metadata for s3 feat(core)!: implement write returns metadata Jan 26, 2025
@meteorgan
Copy link
Contributor Author

meteorgan commented Feb 12, 2025

Hi, @Xuanwo Would you mind taking some time to review this PR at your convenience? since it involves a lot of files, it may become difficult to merge if it takes too long.

@Xuanwo
Copy link
Member

Xuanwo commented Feb 12, 2025

Hi, @Xuanwo Would you mind taking some time to review this PR at your convenience? since it involves a lot of files, it may become difficult to merge if it takes too long.

Thanks a lot for your great work! I will review this PR after #5620 been addressed.

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from 1773dd4 to ffcecb4 Compare February 16, 2025 12:30
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! Only some small polishment changes to do.

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from dae34cb to 742c1e7 Compare February 17, 2025 11:40
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @meteorgan for this great work!

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @meteorgan for this great work!

@Xuanwo Xuanwo merged commit 7455995 into apache:main Feb 18, 2025
244 checks passed
@meteorgan meteorgan deleted the write_returns_metadata_for_s3 branch February 18, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants