Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AWS C++ SDK for communicating with S3 #149

Draft
wants to merge 69 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
f03e2ff
Include the aws-cpp-sdk in the build
sjperkins Apr 2, 2024
a003a76
Remove unnecessarily introduced os-specific includes in aws-c-event-s…
sjperkins Apr 2, 2024
c0efd6e
@platforms//os:windows
sjperkins Apr 3, 2024
da85486
Remove system.BUILD.bazel
sjperkins Apr 3, 2024
98550fc
build_file formatting
sjperkins Apr 3, 2024
71fd676
Change - to _ in cc_library name field
sjperkins Apr 3, 2024
9672159
#define WIN32_LEAN_AND_MEAN in aws/core/SDKConfig.h
sjperkins Apr 3, 2024
6ffb5c9
Change - to _ in referenced locations
sjperkins Apr 3, 2024
de6b273
More @platforms//os:windows
sjperkins Apr 13, 2024
4c1fcfc
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Apr 22, 2024
b80b85d
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Apr 25, 2024
cfcf0d2
Convert patch_cmds to patch file in @com_github_aws_cpp_sdk
sjperkins Apr 26, 2024
5c877d9
Use write_file in com_github_aws_checksums
sjperkins Apr 26, 2024
84cda4a
Generate SDKConfig.h with a write_file
sjperkins Apr 26, 2024
a03e45e
Add s3_encryption
sjperkins Apr 26, 2024
278cb8f
Add s3 context
sjperkins Apr 26, 2024
354f700
header cleanup
sjperkins Apr 26, 2024
e2eac84
sanity check basic credential retrieval
sjperkins Apr 26, 2024
1e1bd30
Remove AuthSigner cruft from s3_context_test.cc
sjperkins Apr 26, 2024
2a08922
Add NewS3RequestBuilder
sjperkins Apr 26, 2024
0c00783
Update BUILD
sjperkins Apr 26, 2024
aac65f2
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 3, 2024
25a7209
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 3, 2024
33a995d
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 8, 2024
2908289
Adapt more AWS classes, improve SDK setup
sjperkins May 8, 2024
6d2a933
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 9, 2024
9d9ea8a
Move S3Client code in a separate kvstore/s3_sdk directory
sjperkins May 9, 2024
0e3a781
Move AWS SDK Adapter code into anonymous namespace
sjperkins May 9, 2024
fbcb788
fixups
sjperkins May 9, 2024
484d6b0
Add localstack test
sjperkins May 9, 2024
b843af4
Updates
sjperkins May 9, 2024
1408ed7
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 9, 2024
65644fe
Simplify
sjperkins May 10, 2024
83893e9
Add Async S3Client test cases
sjperkins May 10, 2024
edfb694
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 24, 2024
f94f55d
Reomve unnecessary Cord construction
sjperkins May 27, 2024
c122d4a
Fix closing namespace typo
sjperkins May 27, 2024
5106902
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 28, 2024
47def67
Initial CordStreamBuf implementation
sjperkins May 30, 2024
ce7bdf9
Fix indenting
sjperkins May 30, 2024
6eb3c89
comment
sjperkins May 30, 2024
f885bcc
Header hygiene
sjperkins May 31, 2024
9edf7d7
Remove commented out test case
sjperkins May 31, 2024
d2d7f87
Improve seek from current position case
sjperkins May 31, 2024
171393c
Sharpen test cases
sjperkins May 31, 2024
259792d
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins May 31, 2024
430f947
seekoff improvements
sjperkins Jun 2, 2024
f849e35
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 6, 2024
24144d3
Prefer defining the CordStreamBuf get area in terms of a Cord chunk
sjperkins Jun 10, 2024
b47fe87
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 10, 2024
cafd5fa
typo
sjperkins Jun 10, 2024
f8ba211
Fix test cases
sjperkins Jun 10, 2024
4ac69a0
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 11, 2024
16321f4
Touch up CordStreamBuf and integrate into HttpRequest/Response workflow
sjperkins Jun 14, 2024
d728a79
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 14, 2024
f156b4d
xsgetn fixes
sjperkins Jun 19, 2024
bb2c423
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 19, 2024
85b44d9
MoveCord -> DetachCord, TakeCord -> AssignCord
sjperkins Jun 20, 2024
62397fe
Warn on copies of large Http Request/Response bodies
sjperkins Jun 20, 2024
381a62c
Comment grammar
sjperkins Jun 20, 2024
ad453e5
Update logging statements
sjperkins Jun 20, 2024
d851a89
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 21, 2024
fe5dd54
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 24, 2024
da0b1b6
workspace whitespace
sjperkins Jun 25, 2024
aee0ce7
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 25, 2024
1216c4a
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 26, 2024
16c76c4
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jun 28, 2024
6b69c72
Merge branch 'master' into depend-on-aws-cpp-sdk-for-auth
sjperkins Jul 2, 2024
3883f04
Update to more recent version of the AWS C, CRT and C++ SDK's
sjperkins Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions tensorstore/kvstore/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ DRIVER_DOCS = [
"neuroglancer_uint64_sharded",
"ocdbt",
"s3",
# "s3_sdk",
"tsgrpc",
"zarr3_sharding_indexed",
"zip",
Expand Down
28 changes: 28 additions & 0 deletions tensorstore/kvstore/s3/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,34 @@ tensorstore_cc_test(
],
)

tensorstore_cc_library(
name = "new_s3_request_builder",
srcs = [
"new_s3_request_builder.cc",
],
hdrs = [
"new_s3_request_builder.h"
],
deps = [
"//tensorstore/kvstore/s3_sdk:s3_context",
"//tensorstore/internal/http",
"@com_github_aws_cpp_sdk//:core",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/strings:cord",
]
)

tensorstore_cc_test(
name = "new_s3_request_builder_test",
srcs = [
"new_s3_request_builder_test.cc",
],
deps = [
":new_s3_request_builder",
"@com_google_googletest//:gtest_main",
]
)

tensorstore_cc_library(
name = "validate",
srcs = [
Expand Down
Empty file.
143 changes: 143 additions & 0 deletions tensorstore/kvstore/s3/new_s3_request_builder.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
#include <iostream>
#include <streambuf>
#include <string>
#include <string_view>


#include <aws/core/Aws.h>
#include <aws/core/auth/AWSAuthSigner.h>
#include <aws/core/http/standard/StandardHttpRequest.h>
#include <aws/core/http/URI.h>

#include "absl/strings/cord.h"

#include "tensorstore/internal/http/http_request.h"
#include "tensorstore/kvstore/s3_sdk/s3_context.h"

namespace tensorstore {
namespace internal_kvstore_s3 {

// Make an absl::Cord look like a streambuf
class CordStreambuf : public std::streambuf {
public:
CordStreambuf(const absl::Cord& cord) : cord_(cord), current_(cord_.char_begin()) {
setg(nullptr, nullptr, nullptr);
}

protected:
// Refill the get area of the buffer
int_type underflow() override {
if (current_ == cord_.char_end()) {
return traits_type::eof();
}

// Set buffer pointers for the next character
setg(const_cast<char*>(&*current_),
const_cast<char*>(&*current_),
const_cast<char*>(&*std::next(current_)));

return traits_type::to_int_type(*current_++);
}

private:
const absl::Cord& cord_;
absl::Cord::CharIterator current_;
};

// Make an absl::Cord look like an iostream
class CordIOStream : public std::iostream {
public:
CordIOStream(const absl::Cord& cord) : std::iostream(&buffer_), buffer_(cord) {
rdbuf(&buffer_);
}

private:
CordStreambuf buffer_;
};

class AwsHttpRequestAdapter : public Aws::Http::Standard::StandardHttpRequest {
private:
static Aws::Http::HttpMethod FromStringMethod(std::string_view method) {
if(method == "GET") {
return Aws::Http::HttpMethod::HTTP_GET;
} else if (method == "PUT") {
return Aws::Http::HttpMethod::HTTP_PUT;
} else if (method == "HEAD") {
return Aws::Http::HttpMethod::HTTP_HEAD;
} else if (method == "DELETE") {
return Aws::Http::HttpMethod::HTTP_DELETE;
} else if (method == "POST") {
return Aws::Http::HttpMethod::HTTP_POST;
} else if (method == "PATCH") {
return Aws::Http::HttpMethod::HTTP_PATCH;
} else {
// NOTE: return an error
return Aws::Http::HttpMethod::HTTP_GET;
}
}

public:
AwsHttpRequestAdapter(std::string_view method, std::string endpoint_url) :
Aws::Http::Standard::StandardHttpRequest(Aws::Http::URI(Aws::String(endpoint_url)),
FromStringMethod(method)) {}
};

/// Similar interface to S3RequestBuilder,
/// but builds an AwsHttpRequestAdapter internally
class NewS3RequestBuilder {
public:
NewS3RequestBuilder(std::string_view method, std::string endpoint_url) :
request_(method, endpoint_url) {}

NewS3RequestBuilder & AddBody(const absl::Cord & body) {
// NOTE: eliminate allocation
auto cord_adapter = std::make_shared<CordIOStream>(body);
request_.AddContentBody(cord_adapter);
return *this;
}

NewS3RequestBuilder & AddHeader(std::string_view header) {
auto delim_pos = header.find(':');
assert(delim_pos != std::string_view::npos);
// NOTE: string copies
request_.SetHeaderValue(Aws::String(header.substr(0, delim_pos)),
Aws::String(header.substr(delim_pos + 1)));
return *this;
}

NewS3RequestBuilder & AddQueryParameter(std::string key, std::string value) {
// Note: string copies
request_.AddQueryStringParameter(key.c_str(), Aws::String(value));
return *this;
}

internal_http::HttpRequest BuildRequest(AwsContext ctx) {
auto signer = Aws::Client::AWSAuthV4Signer(ctx.cred_provider_, "s3", "us-east-1");
assert(!request_.HasAuthorization());
auto succeeded = signer.SignRequest(request_, true);
assert(succeeded);
assert(request_.HasAuthorization());
auto method = Aws::Http::HttpMethodMapper::GetNameForHttpMethod(request_.GetMethod());
auto aws_headers = request_.GetHeaders();

std::vector<std::string> headers;
headers.reserve(aws_headers.size());

for(auto & pair: aws_headers) {
headers.emplace_back(absl::StrFormat("%s: %s", pair.first, pair.second));
}

return internal_http::HttpRequest{
std::move(method),
std::string(request_.GetURIString(true)),
"",
headers};
}

public:
std::shared_ptr<Aws::IOStream> body_;
AwsHttpRequestAdapter request_;
};

} // namespace internal_kvstore_s3
} // namespace tensorstore
24 changes: 24 additions & 0 deletions tensorstore/kvstore/s3/new_s3_request_builder_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#include <gtest/gtest.h>

#include "tensorstore/kvstore/s3_sdk/s3_context.h"
#include "tensorstore/kvstore/s3/new_s3_request_builder.h"


using ::tensorstore::internal_kvstore_s3::NewS3RequestBuilder;

namespace {

TEST(NewS3RequestBuilderTest, Basic) {
auto ctx = tensorstore::internal_kvstore_s3::GetAwsContext();
auto builder = NewS3RequestBuilder("get", "http://bucket")
.AddBody(absl::Cord{"foobar"})
.AddHeader("foo: bar")
.AddQueryParameter("qux", "baz");

auto req = builder.BuildRequest(*ctx);
EXPECT_TRUE(builder.request_.HasAuthorization());

ABSL_LOG(INFO) << req;
}

} // namespace
122 changes: 122 additions & 0 deletions tensorstore/kvstore/s3_sdk/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Placeholder: load py_binary
load("//bazel:tensorstore.bzl", "tensorstore_cc_library", "tensorstore_cc_test")

package(default_visibility = ["//visibility:public"])

licenses(["notice"])

filegroup(
name = "doc_sources",
srcs = glob([
"*.rst",
"*.yml",
]),
)


tensorstore_cc_library(
name = "cord_streambuf",
srcs = ["cord_streambuf.cc"],
hdrs = ["cord_streambuf.h"],
deps = [
"@com_google_absl//absl/strings:cord",
]
)


tensorstore_cc_library(
name = "s3_context",
srcs = ["s3_context.cc"],
hdrs = ["s3_context.h"],
deps = [
":cord_streambuf",
"//tensorstore/util:executor",
"//tensorstore/internal/http",
"//tensorstore/internal/http:curl_transport",
"//tensorstore/internal/thread:thread_pool",
"@com_google_absl//absl/log:absl_log",
"@com_google_absl//absl/synchronization",
"@com_github_aws_cpp_sdk//:core",
]
)

tensorstore_cc_test(
name = "s3_context_test",
size = "small",
srcs = ["s3_context_test.cc"],
deps = [
":s3_context",
"@com_github_aws_cpp_sdk//:s3",
"@com_google_googletest//:gtest_main",
]
)

tensorstore_cc_test(
name = "cord_streambuf_test",
size = "small",
srcs = ["cord_streambuf_test.cc"],
deps = [
":cord_streambuf",
"@com_google_googletest//:gtest_main",
"@com_github_aws_cpp_sdk//:core",
]
)

py_binary(
name = "moto_server",
testonly = 1,
srcs = ["moto_server.py"],
tags = [
"manual",
"notap",
"skip-cmake",
],
deps = ["@pypa_moto//:moto"],
)

tensorstore_cc_test(
name = "localstack_test",
size = "small",
srcs = ["localstack_test.cc"],
args = [
"--localstack_binary=$(location :moto_server)",
"--binary_mode=moto",
],
data = [":moto_server"],
flaky = 1, # Spawning the test process can be flaky.
tags = [
"cpu:2",
"requires-net:loopback",
"skip-cmake",
],
deps = [
":s3_context",
"//tensorstore:context",
"//tensorstore:json_serialization_options_base",
"//tensorstore/internal:env",
"//tensorstore/internal:json_gtest",
"//tensorstore/internal/http",
"//tensorstore/internal/http:curl_transport",
"//tensorstore/internal/http:transport_test_utils",
"//tensorstore/internal/os:subprocess",
"//tensorstore/kvstore",
"//tensorstore/kvstore:batch_util",
"//tensorstore/kvstore:test_util",
"//tensorstore/util:future",
"//tensorstore/util:result",
"//tensorstore/util:status_testutil",
"@com_github_nlohmann_json//:json",
"@com_google_absl//absl/flags:flag",
"@com_google_absl//absl/log:absl_check",
"@com_google_absl//absl/log:absl_log",
"@com_google_absl//absl/status",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/strings:cord",
"@com_google_absl//absl/strings:str_format",
"@com_google_absl//absl/time",
"@com_google_googletest//:gtest_main",
"@com_github_aws_cpp_sdk//:core",
"@com_github_aws_cpp_sdk//:s3",

],
)
Loading
Loading