Skip to content

Docs for multiple storage backend support #8758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 6, 2025
Merged

Conversation

talSofer
Copy link
Contributor

@talSofer talSofer commented Mar 5, 2025

@talSofer talSofer added docs Improvements or additions to documentation exclude-changelog PR description should not be included in next release changelog labels Mar 5, 2025
Copy link

github-actions bot commented Mar 5, 2025

♻️ PR Preview 8b2f30f has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

@talSofer talSofer requested a review from ozkatz March 5, 2025 09:20
Copy link

github-actions bot commented Mar 5, 2025

E2E Test Results - DynamoDB Local - Local Block Adapter

14 passed

Copy link

github-actions bot commented Mar 5, 2025

E2E Test Results - Quickstart

11 passed

Copy link
Collaborator

@ozkatz ozkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks GREAT! A few nitpicks but nothing blocking


This example setup configures lakeFS to manage data across two separate MinIO instances.

```yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is wrong (2 spaces everywhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


<div markdown="1" id="on-prem">

This example setup configures lakeFS to manage data across two separate MinIO instances.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This example setup configures lakeFS to manage data across two separate MinIO instances.
This example setup configures lakeFS to manage data across two separate MinIO instances:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


<div markdown="2" id="multi-cloud">

This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure.
This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* If static credentials are provided, lakeFS will use them. Otherwise, it will fall back to the AWS credentials chain.
This means that for setups with multiple storages of type `s3`, static credentials are required for all but one.

### Upgrading from Single to Multi-Store
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Upgrading from Single to Multi-Store
### Upgrading from a single storage backend to Multiple Storage backends

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The [Get Config](https://docs.lakefs.io/reference/api.html#/config/getConfig) API endpoint now returns a list of storage
configurations. In multi-store setups, this is the recommended method to list connected storage backends and view their details.

### Common Configuration Errors & Fixes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Common Configuration Errors & Fixes
### Troubleshooting

* Amazon S3
* Local storage

While this feature is designed to support any blockstore combination, testing for Azure and GCS in multi-store setups is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this line for now. It's not a limitation, it just wasn't tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, Approving in order not to be a blocker

---
title: Multiple Storage Backends
description: How to manage data across multiple storage systems with lakeFS
parent: How-To
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is the parent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I opened this task to move features to the right place on a separate pr.

{: .label .label-purple }

{: .note}
> Multi-storage backend support is only available for licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers.<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: Is this correct? IIUC it's not supported for them either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? it is available for licensed customers

Comment on lines 25 to 26
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3,
Azure Blob, Google Cloud Storage, other S3-compatible storage, and even local storages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion

Suggested change
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3,
Azure Blob, Google Cloud Storage, other S3-compatible storage, and even local storages.
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including:
- AWS S3
- Azure Blob
- Google Cloud Storage
- other S3-compatible storage (such as minIO)
- local storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Use the new `blockstores` structure, **replacing** the existing `blockstore` configuration. Note that `blockstore` and `blockstores`
configurations are mutually exclusive - lakeFS does not support both simultaneously.
* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends.
* The `signing.secret_key` remains a required global setting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by remains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed wording, thx

* Restart the server.

{: .warning}
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lakeFS will fail to start, it's not un-expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, changed it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to Guy's comment - it will also make the repositories associated with the storage ID unusable

Comment on lines 184 to 185
The [Get Config](https://docs.lakefs.io/reference/api.html#/config/getConfig) API endpoint now returns a list of storage
configurations. In multi-store setups, this is the recommended method to list connected storage backends and view their details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you want to be concrete here, but the API returns both a string and a list and the user should use the relevant

* Restart the server.

{: .warning}
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would like this kind of warning to replacing IDs, it's mentioned above but maybe we would like to emphesize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Unsupported storage backends

Multi-storage backend support has been validated on:
* Self-managed S3-compatible object storage (e.g., MinIO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think e.g fits here, it's only tested on MinIO

still in progress.

{: .note}
> **Note:** Other untested combinations may still work. You are encouraged to try them and share feedback.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace the encouraged to try them, prefer something like if you are interested please contact us.
No one can really try it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree

@@ -34,8 +34,11 @@ With lakeFS Enterprise you’ll receive access to the security package containin

## What additional functionality does lakeFS Enterprise provide?

1. [lakeFS Mount]({% link reference/mount.md %}) allows users to virtually mount a remote lakeFS repository onto a local directory. Once mounted, users can access the data as if it resides on their local filesystem, using any tool, library, or framework that reads from a local filesystem.
1. [lakeFS Mount]({% link reference/mount.md %}) - allows users to virtually mount a remote lakeFS repository onto a local directory. Once mounted, users can access the data as if it resides on their local filesystem, using any tool, library, or framework that reads from a local filesystem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on how we communicate mount - we mount a branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the scope of this prd, let's differ for later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a different issue - the docs we specify the additional field as "stoarge id" and in the ui is is labeled "storage"

for all organizational data assets, which is especially critical in AI/ML environments that rely on diverse datasets stored
in multiple locations.

With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-storage backend

For a complete list of available options, refer to the [server configuration reference](../reference/configuration.md#blockstores).

{: .note}
> **Note:** If you're upgrading from a single-store lakeFS setup, refer to the [upgrade guidelines](#upgrading-from-single-to-multi-store)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use the same term "multi-storage" instead of store in all the doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Listing Connected Storage Backends

The [Get Config](https://docs.lakefs.io/reference/api.html#/config/getConfig) API endpoint now returns a list of storage
configurations. In multi-store setups, this is the recommended method to list connected storage backends and view their details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

**Note**: The `--storage-id` flag is currently hidden in the CLI.
* UI: Select a storage backend from the dropdown menu.
![create repo with storage id](../assets/img/msb/msb_create_repo_ui.png)
* High-level Python SDK: TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to include todo here? is it not limitation or we can update the docs

### Viewing Repository Details

To check which storage backend is associated with a repository:
* API – The [List Repositories](https://docs.lakefs.io/reference/api.html#/repositories/listRepositories) response includes the storage ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link needs to be to a markdown related to the doc

* `blockstores.stores[].s3.pre_signed_endpoint` `(string : )` - Custom endpoint for pre-signed URLs.
* `blockstores.stores[].s3.disable_pre_signed` `(bool : false)` - Disable use of pre-signed URL.
* `blockstores.stores[].s3.disable_pre_signed_ui` `(bool : true)` - Disable use of pre-signed URL in the UI.
* `blockstores.stores[].s3.disable_pre_signed_multipart` `(bool : )` - Disable use of pre-signed multipart upload **experimental**, enabled on s3 block adapter with presign support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `blockstores.stores[].s3.disable_pre_signed_multipart` `(bool : )` - Disable use of pre-signed multipart upload **experimental**, enabled on s3 block adapter with presign support.
* `blockstores.stores[].s3.disable_pre_signed_multipart` `(bool : )` - Disable use of pre-signed multipart upload **experimental**, enabled on S3 block adapter with presign support.

* Access data across multiple storage backends using a single, consistent [URI format](../understand/model.md#lakefs-protocol-uris).

3. **Centralized Access Control & Governance**:
* Access permissions and policies can be centrally managed across all connected storage systems using lakeFS [RBAC](../security/rbac.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note sure we need to specify - but we currently do not provide a mechanism to control access based on storage id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think that we shouldn't specify this, at least not here where we are walking through the high level use cases.
for now, I'd like not to mention this

blockstores:
signing:
secret_key: "some-secret"
stores:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for stores and everything below it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thank you!

@nopcoder nopcoder dismissed their stale review March 5, 2025 17:22

not require to block on docs changes

**Note**: The `--storage-id` flag is currently hidden in the CLI.
* UI: Select a storage backend from the dropdown menu.
![create repo with storage id](../assets/img/msb/msb_create_repo_ui.png)
* High-level Python SDK: TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z can you please assist in completing this part?


<div markdown="1" id="on-prem">

This example setup configures lakeFS to manage data across two separate MinIO instances.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


This example setup configures lakeFS to manage data across two separate MinIO instances.

```yaml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


<div markdown="2" id="multi-cloud">

This example setup configures lakeFS to manage data across two public cloud providers: AWS and Azure.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 25 to 26
With a multi-store setup, lakeFS can connect to and manage any combination of supported storage systems, including AWS S3,
Azure Blob, Google Cloud Storage, other S3-compatible storage, and even local storages.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Use the new `blockstores` structure, **replacing** the existing `blockstore` configuration. Note that `blockstore` and `blockstores`
configurations are mutually exclusive - lakeFS does not support both simultaneously.
* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends.
* The `signing.secret_key` remains a required global setting.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed wording, thx

* Restart the server.

{: .warning}
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, changed it

still in progress.

{: .note}
> **Note:** Other untested combinations may still work. You are encouraged to try them and share feedback.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done

* Restart the server.

{: .warning}
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very detailed and well phrased.
Some comments/suggestions

1. **Distributed Data Management**:
* Eliminate data silos and enable seamless cross-cloud collaboration.
* Maintain version control across different storage providers for consistency and reproducibility.
* Ideal for AI/ML environments where datasets are distributed across multiple storage locations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for SEO purposes?
I think removing the AI/ML makes this bullet more inclusive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for SEO but to make it clear to what we think is our target customers.
I think that making this more inclusive is not necessarily a good thing here. I want people to easily understand what's in it for them and that's one way

Comment on lines 47 to 48
To configure your lakeFS server to connect to multiple storage backends, define them under the `blockstores`
section in your server configurations. The `blockstores.stores` field is an array of storage backends, each with its own configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To configure your lakeFS server to connect to multiple storage backends, define them under the `blockstores`
section in your server configurations. The `blockstores.stores` field is an array of storage backends, each with its own configuration.
To configure your lakeFS server to connect to multiple storage backends, define them under the `blockstores` section in your server configurations.
The `blockstores.stores` field is an array of storage backends, each with its own configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* S3 Authentication Handling:
* All standard S3 authentication methods are supported.
* If static credentials are provided, lakeFS will use them. Otherwise, it will fall back to the AWS credentials chain.
This means that for setups with multiple storages of type `s3`, static credentials are required for all but one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to caveat that with the usage of profiles
Not necessarily defined as static but still can be used to define multiple aws blockstores

configurations are mutually exclusive - lakeFS does not support both simultaneously.
* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends.
* The `signing.secret_key` remains a required global setting.
* Set `backward_compatible: true` for the existing storage backend to ensure:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we need to be much more explicit here to the fact that things will not work correctly without it and that it cannot be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, I added your points

* Restart the server.

{: .warning}
> Repositories linked to a removed storage backend will result in unexpected behavior. Ensure all necessary cleanup is done before removal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to Guy's comment - it will also make the repositories associated with the storage ID unusable

### Importing Data into a Repository

Importing data into a repository is supported when the following conditions are met:
* The credentials used for the repository's backing blockstore allow access to the storage location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The credentials used for the repository's backing blockstore allow access to the storage location.
* The credentials used for the repository's backing blockstore allow access (read, list) to the storage location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Importing data into a repository is supported when the following conditions are met:
* The credentials used for the repository's backing blockstore allow access to the storage location.
* The storage location is in the same region as the repository's backend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that is true. I don't think we enforce region

still in progress.

{: .note}
> **Note:** Other untested combinations may still work. You are encouraged to try them and share feedback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree

@@ -211,6 +211,70 @@ Configuration section when using `database.type="local"`
* `blockstore.gs.server_side_encryption_customer_supplied` `(string : )` - Server side encryption with AES key in hex format, exclusive with key ID below
* `blockstore.gs.server_side_encryption_kms_key_id` `(string : )` - Server side encryption KMS key ID, exclusive with above

### blockstores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the best thing we could do will be to create a separate documentation for enterprise configurations. However, if not possible perhaps separate the configuration documentation clearly and have an "Enterprise" section where the enterprise configurations are detailed.
I wouldn't want as an OSS reader to weave through enterprise configuration as I'm trying to understand what configurations I need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of separating Enterprise only configurations until we have separate docs - thanks!

Copy link
Contributor

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great doc @talSofer !

Added some comments.

2. [Transactional Mirroring]({% link howto/mirroring.md %}) - allows replicating lakeFS repositories into consistent read-only copies in remote locations.
3. [Multiple Storage Backends]({% link howto/multiple-storage-backends.md %}) - allows managing data stored across multiple storage locations: on-prem, hybrid, or multi-cloud.

<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: <br> is html, shouldn't belong in md.

parent: How-To
---

# Multi-Storage Backend Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Multi-Storage Backend Support
# Multi-Storage Backend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


{% include toc.html %}

## What is Multi-storage Backend Support?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also - every feature can have Support in it.
I think we should simply use Multi-storage Backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. I changed the title but the sentence "What is Multi-storage Backend?" has different answer than I expect for "What is Multi-storage Backend support?" the later is related to what it means in lakeFS


## What is Multi-storage Backend Support?

lakeFS multi-storage backend support enables seamless data management across multiple storage systems —
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lakeFS multi-storage backend support enables seamless data management across multiple storage systems —
Using lakeFS multi-storage backend enables seamless data management across multiple storage systems —

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree here too

* Ideal for AI/ML environments where datasets are distributed across multiple storage locations.

2. **Unified Data Access**:
* Access data across multiple storage backends using a single, consistent [URI format](../understand/model.md#lakefs-protocol-uris).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this has a different indentation than 1 and 3,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

After setting up lakeFS Enterprise to connect with multiple storage backends, this section explains how to use these
connected storages when working with lakeFS.

With multiple storage backends configured, lakeFS repositories are now linked to a specific storage backend. Together with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With multiple storage backends configured, lakeFS repositories are now linked to a specific storage backend. Together with
With multiple storage backends configured, lakeFS repositories are now linked to a specific storage. Together with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


In a multi-store setup, users must specify a storage ID when creating a repository. This can be done using the following methods:
* API: Use the `storage_id` parameter in the [Create Repository endpoint](https://docs.lakefs.io/reference/api.html#/repositories/createRepository).
* CLI: Use the `storage-id` flag with the [repo create](../reference/cli.md#lakectl-repo-create) command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--storage-id maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Creating a Repository

In a multi-store setup, users must specify a storage ID when creating a repository. This can be done using the following methods:
* API: Use the `storage_id` parameter in the [Create Repository endpoint](https://docs.lakefs.io/reference/api.html#/repositories/createRepository).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion - make these (API, CLI, WebUI, Py HL SDK) to headers:

#### API

Use the...

### CLI

Use the...

...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## Limitations

### Unsupported storage backends
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Unsupported storage backends
### Supported storages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, done

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Impressive path through all the docs...

{: .label .label-purple }

{: .note}
> Multi-storage backend support is only available for licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers.<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
> Multi-storage backend support is only available for licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers.<br>
> Multi-storage backend support is only available to licensed [lakeFS Enterprise]({% link enterprise/index.md %}) customers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## What is Multi-storage Backend Support?

lakeFS multi-storage backend support enables seamless data management across multiple storage systems —
on-premises, across public clouds, or hybrid environments. This capability makes lakeFS a unified data management platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-cloud support may be a trap. Obviously it will work, but most use cases will involve paying egress fees to at least one of the cloud providers. Do we want to go there with our customers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this feature sets the grounds for supporting cross cloud in a way that is egress fees aware - I can now manage cross cloud data for a single platform. making it more efficient in cost is another (very reasonable) feature to build


### Example Configurations

<div class="tabs">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"tabs"... this is some CSS thing that renders multiple selectable tabs.

access_key_id: "prod_access_key"
secret_access_key: "prod_secret_key"
- id: "minio-backup"
description: "Backup MinIO storage for disaster recovery"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document how disaster recovery works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate on that?

* Define all previously available [single-blockstore settings](../reference/configuration.md#blockstore) under their respective storage backends.
* The `signing.secret_key` is a required setting global to all connected stores.
* Set `backward_compatible: true` for the existing storage backend to ensure:
* Existing repositories continue using the original storage backend.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Existing repositories continue using the original storage backend.
* Existing repositories continue to use the original storage backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* The `signing.secret_key` is a required setting global to all connected stores.
* Set `backward_compatible: true` for the existing storage backend to ensure:
* Existing repositories continue using the original storage backend.
* Newly created repositories default to this backend unless explicitly assigned a different one, to ensure a non-breaking upgrade process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are re-opening the whole default vs. b/c debate here. I would much prefer not to say "upgrade" here, given that upgrades might not be possible when using external tools, or for customers whose users do not know their storage IDs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielshaqed i'm not sure I understand your suggestion. can you please help me understand?


### Creating a Repository

In a multi-store setup, users must specify a storage ID when creating a repository. This can be done using the following methods:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we put equivalent methods (CLI / API / UI / ...) in tabs.

Do we want do document how to do this in HTTP REST (which many people seem to do), or in the Java SDK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we put equivalent methods (CLI / API / UI / ...) in tabs.

great idea - I changed it!

Do we want do document how to do this in HTTP REST (which many people seem to do), or in the Java SDK?

I'm differing for later if ok with you

@talSofer talSofer requested a review from itaigilo March 6, 2025 09:43
Copy link
Contributor

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great detailed doc 👍

@talSofer talSofer merged commit c92f605 into master Mar 6, 2025
43 of 44 checks passed
@talSofer talSofer deleted the docs/multi-storage-backends branch March 6, 2025 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation exclude-changelog PR description should not be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs for multiple storage backend
7 participants