Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup integration test with minio #81

Closed
xushiyan opened this issue Jul 19, 2024 · 9 comments · Fixed by #226
Closed

Setup integration test with minio #81

xushiyan opened this issue Jul 19, 2024 · 9 comments · Fixed by #226
Assignees
Labels
Milestone

Comments

@xushiyan
Copy link
Member

No description provided.

@xushiyan xushiyan added good first issue Good for newcomers test p1 labels Jul 19, 2024
@xushiyan xushiyan added this to the release-0.2.0 milestone Jul 19, 2024
@abyssnlp
Copy link
Contributor

I'll take this one if noone's assigned to it yet.

@xushiyan
Copy link
Member Author

@abyssnlp before you start, can you please elaborate on the design?

@abyssnlp
Copy link
Contributor

Sure, so at a high level:

  • Spin up minio via testcontainers before running the integration tests
  • Setup mino object storage with s3, gs and create the test hudi table
  • Run marked integration tests

I'll add more details today after work. Please feel free to add things I should keep in mind while I work on this.

@xushiyan
Copy link
Member Author

@abyssnlp high-level looks good. a heads-up about testing data - since hudi-rs not yet supports hudi writer, we are using fixed pre-generated tables as the testing tables: see https://github.com/apache/hudi-rs/tree/main/crates/tests/data/tables Would like to see some detailed design around provisioning test tables through minio volumes

@abyssnlp
Copy link
Contributor

abyssnlp commented Jul 29, 2024

Sorry about the delay.
Thanks for pointing to that. So we can mount the existing tables under here into the container before running the tests.

Something like:

.with_mount(Mount::bind_mount(canonicalize(Path::new("tests/data"))?.into_os_string().into_string().unwrap(),

However something that I found out about testcontainers in Rust is that it doesn't support reusing the container for multiple tests so the integration tests would be for ex. in a single test function. More about it here. There are also workarounds. Alternative would be to use docker-compose to spin up minio before running integration tests and spin it down after.

So this is how I'm thinking about approaching it:

  • (optional) ./docker-compose.yaml for spinning up required containers for MinIO
  • crates/tests/src/common.rs - some utility code for ex. to create an s3 bucket, put the pre-generated tables
  • crates/tests/src/integration_test.rs - spin up minio using testcontainers and run integration tests
  • Integration tests are marked as a separate feature (integration_test) so they can be run separately

I had some questions as well.

  • Does this approach make sense? Would love to hear what you think
  • Do we plan on also running integration tests for Azure and GCP?

@xushiyan
Copy link
Member Author

@abyssnlp sounds good to make use of docker-compose - it'll be convenient to evolve the tests as we probably need to add more components in future. to answer the questions

  • crates/tests is a crate to provide all kinds of common hudi test utilities, but we don't want to host actual tests in it.
  • we may want to install the hudi crates locally, to mimic how users install hudi from crates.io
  • we can organize the integration tests in a separate folder, not necessarily as a crate, since it's docker based, and we don't publish it either
  • we surely want to integ test for azure and gcp as well - just picked minio for easy start with s3.

@abyssnlp
Copy link
Contributor

Thanks for sharing your thoughts on it.
Having them separate from the crates sounds good. I've started some initial work on a local branch and managed to get Minio up with the pre-generated tables.

I'm currently running into some issues trying to read the tables via hudi-rs.

I've tried using both environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and local MinIO specific AWS_ENDPOINT and ALLOW_HTTP) and providing the config via the HudiDataSource::new_with_options as a Vec<&str, &str>.
I can confirm the object store config works for hudi::storage::Storage for ex. while trying to read the contents of .hoodie/hoodie.properties.

Might be some bad configuration on my end. I'll continue working on it this week and keep posting updates here.

@xushiyan
Copy link
Member Author

@abyssnlp any plan to put this up in a PR?

@abyssnlp
Copy link
Contributor

@xushiyan Yes i'll put it up in a PR soon (today or tomorrow).

@xushiyan xushiyan moved this to In Progress in hudi-rs roadmap Aug 20, 2024
@xushiyan xushiyan modified the milestones: release-0.2.0, release-0.3.0 Nov 22, 2024
@xushiyan xushiyan self-assigned this Dec 13, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in hudi-rs roadmap Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
2 participants