-
Notifications
You must be signed in to change notification settings - Fork 371
Docs for multiple storage backend support #8758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
♻️ PR Preview 8b2f30f has been successfully destroyed since this PR has been closed. 🤖 By surge-preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks GREAT! A few nitpicks but nothing blocking
|
||
This example setup configures lakeFS to manage data across two separate MinIO instances. | ||
|
||
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation is wrong (2 spaces everywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
<div markdown="1" id="on-prem"> | ||
|
||
This example setup configures lakeFS to manage data across two separate MinIO instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example setup configures lakeFS to manage data across two separate MinIO instances. | |
This example setup configures lakeFS to manage data across two separate MinIO instances: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
<div markdown="2" id="multi-cloud"> | ||
|
||
This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure. | |
This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* If static credentials are provided, lakeFS will use them. Otherwise, it will fall back to the AWS credentials chain. | ||
This means that for setups with multiple storages of type `s3`, static credentials are required for all but one. | ||
|
||
### Upgrading from Single to Multi-Store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Upgrading from Single to Multi-Store | |
### Upgrading from a single storage backend to Multiple Storage backends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
The [Get Config](https://docs.lakefs.io/reference/api.html#/config/getConfig) API endpoint now returns a list of storage | ||
configurations. In multi-store setups, this is the recommended method to list connected storage backends and view their details. | ||
|
||
### Common Configuration Errors & Fixes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Common Configuration Errors & Fixes | |
### Troubleshooting |
* Amazon S3 | ||
* Local storage | ||
|
||
While this feature is designed to support any blockstore combination, testing for Azure and GCS in multi-store setups is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this line for now. It's not a limitation, it just wasn't tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, Approving in order not to be a blocker
--- | ||
title: Multiple Storage Backends | ||
description: How to manage data across multiple storage systems with lakeFS | ||
parent: How-To |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is the parent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I opened this task to move features to the right place on a separate pr.
{: .label .label-purple } | ||
|
||
{: .note} | ||
> Multi-storage backend support is only available for licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers.<br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: Is this correct? IIUC it's not supported for them either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? it is available for licensed customers
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3, | ||
Azure Blob, Google Cloud Storage, other S3-compatible storage, and even local storages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3, | |
Azure Blob, Google Cloud Storage, other S3-compatible storage, and even local storages. | |
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including: | |
- AWS S3 | |
- Azure Blob | |
- Google Cloud Storage | |
- other S3-compatible storage (such as minIO) | |
- local storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Use the new `blockstores` structure, **replacing** the existing `blockstore` configuration. Note that `blockstore` and `blockstores` | ||
configurations are mutually exclusive - lakeFS does not support both simultaneously. | ||
* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends. | ||
* The `signing.secret_key` remains a required global setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by remains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed wording, thx
* Restart the server. | ||
|
||
{: .warning} | ||
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lakeFS will fail to start, it's not un-expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, changed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to Guy's comment - it will also make the repositories associated with the storage ID unusable
The [Get Config](https://docs.lakefs.io/reference/api.html#/config/getConfig) API endpoint now returns a list of storage | ||
configurations. In multi-store setups, this is the recommended method to list connected storage backends and view their details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you want to be concrete here, but the API returns both a string and a list and the user should use the relevant
* Restart the server. | ||
|
||
{: .warning} | ||
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would like this kind of warning to replacing IDs, it's mentioned above but maybe we would like to emphesize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
### Unsupported storage backends | ||
|
||
Multi-storage backend support has been validated on: | ||
* Self-managed S3-compatible object storage (e.g., MinIO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think e.g fits here, it's only tested on MinIO
still in progress. | ||
|
||
{: .note} | ||
> **Note:** Other untested combinations may still work. You are encouraged to try them and share feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace the encouraged to try them, prefer something like if you are interested please contact us.
No one can really try it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly agree
@@ -34,8 +34,11 @@ With lakeFS Enterprise you’ll receive access to the security package containin | |||
|
|||
## What additional functionality does lakeFS Enterprise provide? | |||
|
|||
1. [lakeFS Mount]({% link reference/mount.md %}) allows users to virtually mount a remote lakeFS repository onto a local directory. Once mounted, users can access the data as if it resides on their local filesystem, using any tool, library, or framework that reads from a local filesystem. | |||
1. [lakeFS Mount]({% link reference/mount.md %}) - allows users to virtually mount a remote lakeFS repository onto a local directory. Once mounted, users can access the data as if it resides on their local filesystem, using any tool, library, or framework that reads from a local filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on how we communicate mount - we mount a branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outside the scope of this prd, let's differ for later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a different issue - the docs we specify the additional field as "stoarge id" and in the ui is is labeled "storage"
for all organizational data assets, which is especially critical in AI/ML environments that rely on diverse datasets stored | ||
in multiple locations. | ||
|
||
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multi-storage backend
For a complete list of available options, refer to the [server configuration reference](../reference/configuration.md#blockstores). | ||
|
||
{: .note} | ||
> **Note:** If you're upgrading from a single-store lakeFS setup, refer to the [upgrade guidelines](#upgrading-from-single-to-multi-store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use the same term "multi-storage" instead of store in all the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
### Listing Connected Storage Backends | ||
|
||
The [Get Config](https://docs.lakefs.io/reference/api.html#/config/getConfig) API endpoint now returns a list of storage | ||
configurations. In multi-store setups, this is the recommended method to list connected storage backends and view their details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multi-storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
**Note**: The `--storage-id` flag is currently hidden in the CLI. | ||
* UI: Select a storage backend from the dropdown menu. | ||
 | ||
* High-level Python SDK: TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to include todo here? is it not limitation or we can update the docs
### Viewing Repository Details | ||
|
||
To check which storage backend is associated with a repository: | ||
* API – The [List Repositories](https://docs.lakefs.io/reference/api.html#/repositories/listRepositories) response includes the storage ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the link needs to be to a markdown related to the doc
docs/reference/configuration.md
Outdated
* `blockstores.stores[].s3.pre_signed_endpoint` `(string : )` - Custom endpoint for pre-signed URLs. | ||
* `blockstores.stores[].s3.disable_pre_signed` `(bool : false)` - Disable use of pre-signed URL. | ||
* `blockstores.stores[].s3.disable_pre_signed_ui` `(bool : true)` - Disable use of pre-signed URL in the UI. | ||
* `blockstores.stores[].s3.disable_pre_signed_multipart` `(bool : )` - Disable use of pre-signed multipart upload **experimental**, enabled on s3 block adapter with presign support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `blockstores.stores[].s3.disable_pre_signed_multipart` `(bool : )` - Disable use of pre-signed multipart upload **experimental**, enabled on s3 block adapter with presign support. | |
* `blockstores.stores[].s3.disable_pre_signed_multipart` `(bool : )` - Disable use of pre-signed multipart upload **experimental**, enabled on S3 block adapter with presign support. |
* Access data across multiple storage backends using a single, consistent [URI format](../understand/model.md#lakefs-protocol-uris). | ||
|
||
3. **Centralized Access Control & Governance**: | ||
* Access permissions and policies can be centrally managed across all connected storage systems using lakeFS [RBAC](../security/rbac.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note sure we need to specify - but we currently do not provide a mechanism to control access based on storage id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think that we shouldn't specify this, at least not here where we are walking through the high level use cases.
for now, I'd like not to mention this
blockstores: | ||
signing: | ||
secret_key: "some-secret" | ||
stores: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for stores
and everything below it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thank you!
**Note**: The `--storage-id` flag is currently hidden in the CLI. | ||
* UI: Select a storage backend from the dropdown menu. | ||
 | ||
* High-level Python SDK: TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@N-o-Z can you please assist in completing this part?
|
||
<div markdown="1" id="on-prem"> | ||
|
||
This example setup configures lakeFS to manage data across two separate MinIO instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
This example setup configures lakeFS to manage data across two separate MinIO instances. | ||
|
||
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
<div markdown="2" id="multi-cloud"> | ||
|
||
This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3, | ||
Azure Blob, Google Cloud Storage, other S3-compatible storage, and even local storages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Use the new `blockstores` structure, **replacing** the existing `blockstore` configuration. Note that `blockstore` and `blockstores` | ||
configurations are mutually exclusive - lakeFS does not support both simultaneously. | ||
* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends. | ||
* The `signing.secret_key` remains a required global setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed wording, thx
* Restart the server. | ||
|
||
{: .warning} | ||
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, changed it
still in progress. | ||
|
||
{: .note} | ||
> **Note:** Other untested combinations may still work. You are encouraged to try them and share feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, done
* Restart the server. | ||
|
||
{: .warning} | ||
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very detailed and well phrased.
Some comments/suggestions
1. **Distributed Data Management**: | ||
* Eliminate data silos and enable seamless cross-cloud collaboration. | ||
* Maintain version control across different storage providers for consistency and reproducibility. | ||
* Ideal for AI/ML environments where datasets are distributed across multiple storage locations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for SEO purposes?
I think removing the AI/ML makes this bullet more inclusive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for SEO but to make it clear to what we think is our target customers.
I think that making this more inclusive is not necessarily a good thing here. I want people to easily understand what's in it for them and that's one way
To configure your lakeFS server to connect to multiple storage backends, define them under the `blockstores` | ||
section in your server configurations. The `blockstores.stores` field is an array of storage backends, each with its own configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To configure your lakeFS server to connect to multiple storage backends, define them under the `blockstores` | |
section in your server configurations. The `blockstores.stores` field is an array of storage backends, each with its own configuration. | |
To configure your lakeFS server to connect to multiple storage backends, define them under the `blockstores` section in your server configurations. | |
The `blockstores.stores` field is an array of storage backends, each with its own configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* S3 Authentication Handling: | ||
* All standard S3 authentication methods are supported. | ||
* If static credentials are provided, lakeFS will use them. Otherwise, it will fall back to the AWS credentials chain. | ||
This means that for setups with multiple storages of type `s3`, static credentials are required for all but one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to caveat that with the usage of profiles
Not necessarily defined as static but still can be used to define multiple aws blockstores
configurations are mutually exclusive - lakeFS does not support both simultaneously. | ||
* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends. | ||
* The `signing.secret_key` remains a required global setting. | ||
* Set `backward_compatible: true` for the existing storage backend to ensure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we need to be much more explicit here to the fact that things will not work correctly without it and that it cannot be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, I added your points
* Restart the server. | ||
|
||
{: .warning} | ||
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to Guy's comment - it will also make the repositories associated with the storage ID unusable
### Importing Data into a Repository | ||
|
||
Importing data into a repository is supported when the following conditions are met: | ||
* The credentials used for the repository's backing blockstore allow access to the storage location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The credentials used for the repository's backing blockstore allow access to the storage location. | |
* The credentials used for the repository's backing blockstore allow access (read, list) to the storage location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
Importing data into a repository is supported when the following conditions are met: | ||
* The credentials used for the repository's backing blockstore allow access to the storage location. | ||
* The storage location is in the same region as the repository's backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that is true. I don't think we enforce region
still in progress. | ||
|
||
{: .note} | ||
> **Note:** Other untested combinations may still work. You are encouraged to try them and share feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly agree
docs/reference/configuration.md
Outdated
@@ -211,6 +211,70 @@ Configuration section when using `database.type="local"` | |||
* `blockstore.gs.server_side_encryption_customer_supplied` `(string : )` - Server side encryption with AES key in hex format, exclusive with key ID below | |||
* `blockstore.gs.server_side_encryption_kms_key_id` `(string : )` - Server side encryption KMS key ID, exclusive with above | |||
|
|||
### blockstores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the best thing we could do will be to create a separate documentation for enterprise configurations. However, if not possible perhaps separate the configuration documentation clearly and have an "Enterprise" section where the enterprise configurations are detailed.
I wouldn't want as an OSS reader to weave through enterprise configuration as I'm trying to understand what configurations I need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of separating Enterprise only configurations until we have separate docs - thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great doc @talSofer !
Added some comments.
docs/enterprise/index.md
Outdated
2. [Transactional Mirroring]({% link howto/mirroring.md %}) - allows replicating lakeFS repositories into consistent read-only copies in remote locations. | ||
3. [Multiple Storage Backends]({% link howto/multiple-storage-backends.md %}) - allows managing data stored across multiple storage locations: on-prem, hybrid, or multi-cloud. | ||
|
||
<br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: <br>
is html, shouldn't belong in md.
parent: How-To | ||
--- | ||
|
||
# Multi-Storage Backend Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Multi-Storage Backend Support | |
# Multi-Storage Backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
{% include toc.html %} | ||
|
||
## What is Multi-storage Backend Support? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also - every feature can have Support
in it.
I think we should simply use Multi-storage Backend
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. I changed the title but the sentence "What is Multi-storage Backend?" has different answer than I expect for "What is Multi-storage Backend support?" the later is related to what it means in lakeFS
|
||
## What is Multi-storage Backend Support? | ||
|
||
lakeFS multi-storage backend support enables seamless data management across multiple storage systems — |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lakeFS multi-storage backend support enables seamless data management across multiple storage systems — | |
Using lakeFS multi-storage backend enables seamless data management across multiple storage systems — |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disagree here too
* Ideal for AI/ML environments where datasets are distributed across multiple storage locations. | ||
|
||
2. **Unified Data Access**: | ||
* Access data across multiple storage backends using a single, consistent [URI format](../understand/model.md#lakefs-protocol-uris). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this has a different indentation than 1 and 3,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
After setting up lakeFS Enterprise to connect with multiple storage backends, this section explains how to use these | ||
connected storages when working with lakeFS. | ||
|
||
With multiple storage backends configured, lakeFS repositories are now linked to a specific storage backend. Together with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With multiple storage backends configured, lakeFS repositories are now linked to a specific storage backend. Together with | |
With multiple storage backends configured, lakeFS repositories are now linked to a specific storage. Together with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
In a multi-store setup, users must specify a storage ID when creating a repository. This can be done using the following methods: | ||
* API: Use the `storage_id` parameter in the [Create Repository endpoint](https://docs.lakefs.io/reference/api.html#/repositories/createRepository). | ||
* CLI: Use the `storage-id` flag with the [repo create](../reference/cli.md#lakectl-repo-create) command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--storage-id
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
### Creating a Repository | ||
|
||
In a multi-store setup, users must specify a storage ID when creating a repository. This can be done using the following methods: | ||
* API: Use the `storage_id` parameter in the [Create Repository endpoint](https://docs.lakefs.io/reference/api.html#/repositories/createRepository). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion - make these (API, CLI, WebUI, Py HL SDK) to headers:
#### API
Use the...
### CLI
Use the...
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
## Limitations | ||
|
||
### Unsupported storage backends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Unsupported storage backends | |
### Supported storages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Impressive path through all the docs...
{: .label .label-purple } | ||
|
||
{: .note} | ||
> Multi-storage backend support is only available for licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers.<br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
> Multi-storage backend support is only available for licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers.<br> | |
> Multi-storage backend support is only available to licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
## What is Multi-storage Backend Support? | ||
|
||
lakeFS multi-storage backend support enables seamless data management across multiple storage systems — | ||
on-premises, across public clouds, or hybrid environments. This capability makes lakeFS a unified data management platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-cloud support may be a trap. Obviously it will work, but most use cases will involve paying egress fees to at least one of the cloud providers. Do we want to go there with our customers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this feature sets the grounds for supporting cross cloud in a way that is egress fees aware - I can now manage cross cloud data for a single platform. making it more efficient in cost is another (very reasonable) feature to build
|
||
### Example Configurations | ||
|
||
<div class="tabs"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"tabs"
... this is some CSS thing that renders multiple selectable tabs.
access_key_id: "prod_access_key" | ||
secret_access_key: "prod_secret_key" | ||
- id: "minio-backup" | ||
description: "Backup MinIO storage for disaster recovery" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document how disaster recovery works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate on that?
* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends. | ||
* The `signing.secret_key` is a required setting global to all connected stores. | ||
* Set `backward_compatible: true` for the existing storage backend to ensure: | ||
* Existing repositories continue using the original storage backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Existing repositories continue using the original storage backend. | |
* Existing repositories continue to use the original storage backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* The `signing.secret_key` is a required setting global to all connected stores. | ||
* Set `backward_compatible: true` for the existing storage backend to ensure: | ||
* Existing repositories continue using the original storage backend. | ||
* Newly created repositories default to this backend unless explicitly assigned a different one, to ensure a non-breaking upgrade process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are re-opening the whole default vs. b/c debate here. I would much prefer not to say "upgrade" here, given that upgrades might not be possible when using external tools, or for customers whose users do not know their storage IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arielshaqed i'm not sure I understand your suggestion. can you please help me understand?
|
||
### Creating a Repository | ||
|
||
In a multi-store setup, users must specify a storage ID when creating a repository. This can be done using the following methods: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we put equivalent methods (CLI / API / UI / ...) in tabs.
Do we want do document how to do this in HTTP REST (which many people seem to do), or in the Java SDK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we put equivalent methods (CLI / API / UI / ...) in tabs.
great idea - I changed it!
Do we want do document how to do this in HTTP REST (which many people seem to do), or in the Java SDK?
I'm differing for later if ok with you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great detailed doc 👍
Closes #8765
and https://github.com/treeverse/lakeFS-Enterprise/issues/57