Skip to content

Commit fe9469d

Browse files
authored
RFC-5871: Read Returns Metadata (#5871)
* RFC: Read Returns Metadata * fix pr number and issue number * fix cargo fmt * add metadata() method to impl Reader * roll back the changes made to the read operations * polish * store metadata in RpRead
1 parent 74afdf7 commit fe9469d

File tree

2 files changed

+116
-0
lines changed

2 files changed

+116
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
- Proposal Name: `read_returns_metadata`
2+
- Start Date: 2025-03-24
3+
- RFC PR: [apache/opendal#5871](https://github.com/apache/opendal/pull/5871)
4+
- Tracking Issue: [apache/opendal#5872](https://github.com/apache/opendal/issues/5872)
5+
6+
# Summary
7+
8+
Enhance read operations by returning metadata along with data in read operations.
9+
10+
# Motivation
11+
12+
Currently, read operations (`read`, `read_with`, `reader`, `reader_with`) only return the data content. Users who need metadata
13+
during reads (like `Content-Type`, `ETag`, `version_id`, etc.) must make an additional `stat()` call. This is inefficient and
14+
can lead to race conditions if the file is modified between the read and stat operations.
15+
16+
Many storage services (like S3, GCS, Azure Blob) return metadata in their read responses. For example, S3's GetObject API returns
17+
important metadata like `ContentType`, `ETag`, `VersionId`, `LastModified`, etc. We should expose this information to users
18+
directly during read operations.
19+
20+
# Guide-level explanation
21+
22+
For `reader` API, we will introduce a new method `metadata()` that returns metadata:
23+
24+
```rust
25+
// Before
26+
let data = op.reader("path/to/file").await?.read(..).await?;
27+
let meta = op.stat("path/to/file").await?;
28+
if let Some(etag) = meta.etag() {
29+
println!("ETag: {}", etag);
30+
}
31+
32+
// After
33+
let reader = op.reader("path/to/file").await?;
34+
let meta = reader.metadata();
35+
if let Some(etag) = meta.etag() {
36+
println!("ETag: {}", etag);
37+
}
38+
let data = reader.read(..).await?;
39+
```
40+
The new API will be provided alongside existing functionality, allowing users to continue using current `reader` methods without modification.
41+
42+
For backward compatibility and to minimize migration costs, We won't change the existing `read` API. Anyone who wants
43+
to obtain metadata during reading can use the new reader operations instead.
44+
45+
# Reference-level explanation
46+
47+
## Changes to `Reader` API
48+
49+
The `impl Reader` will be modified to include a new function `metadata()` that returns metadata.
50+
51+
```rust
52+
impl Reader {
53+
// Existing fields...
54+
55+
fn metadata(&self) -> &Metadata {}
56+
}
57+
```
58+
59+
## Changes to struct `raw::RpRead`
60+
61+
The `raw::RpRead` struct will be modified to include a new field `metadata` that stores the metadata returned by the read operation.
62+
Existing fields will be evaluated and potentially removed if they become redundant.
63+
64+
```rust
65+
pub struct RpRead {
66+
// New field to store metadata
67+
metadata: Metadata,
68+
}
69+
```
70+
71+
72+
## Implementation Details
73+
74+
For services that return metadata in their read responses:
75+
- The metadata will be captured from the service response.
76+
- All available fields (content_type, etag, version_id, last_modified, etc.) will be populated
77+
78+
For services that don't return metadata in read responses:
79+
- We'll make an additional `stat` call to fetch the metadata and populate the `metadata` field in `raw::RpRead`.
80+
81+
Special considerations:
82+
- We should always return total object size in the metadata, even if it's not part of the read response
83+
- For range reads, the metadata should reflect the full object's properties (like total size) rather than the range
84+
- For versioned objects, the metadata should include version information if available
85+
86+
# Drawbacks
87+
88+
- Additional memory overhead for storing metadata during reads
89+
- Potential complexity in handling metadata for range reads
90+
91+
# Rationale and alternatives
92+
93+
- Maintains full backward compatibility with existing read operations
94+
- Improves performance by avoiding additional stat calls
95+
- Aligns with common storage service APIs (S3, GCS, Azure)
96+
97+
# Prior art
98+
99+
Similar patterns exist in other storage SDKs:
100+
101+
- `object_store` crate returns metadata in `GetResult` after calling `get_opts`
102+
- AWS S3 SDK returns comprehensive metadata in `GetObjectOutput`
103+
- Azure Blob SDK returns properties and metadata in `DownloadResponse`
104+
105+
# Unresolved questions
106+
107+
None
108+
109+
# Future possibilities
110+
111+
- Once we return metadata during reader initialization, we can optimize `ReadContext::parse_into_range` by using the
112+
`content_length` from `metadata` directly, eliminating the need for an additional `stat` call

core/src/docs/rfcs/mod.rs

+4
Original file line numberDiff line numberDiff line change
@@ -260,3 +260,7 @@ pub mod rfc_5495_list_with_deleted {}
260260
/// Write Returns Metadata
261261
#[doc = include_str!("5556_write_returns_metadata.md")]
262262
pub mod rfc_5556_write_returns_metadata {}
263+
264+
/// Read Returns Metadata
265+
#[doc = include_str!("5871_read_returns_metadata.md")]
266+
pub mod rfc_5871_read_returns_metadata {}

0 commit comments

Comments
 (0)