Thinking about disaster recovery #534

bingkunyangvungle · 2024-04-08T13:55:25Z

What is currently missing?

Since we have all the data needed in S3 already (including the log data and the metadata), it can be of great help if we can just recover from the shutdown of the cluster, and launch the new cluster to use the same topic folder(prefix_randomstring) in S3. For the new cluster, we can consume from the oldest offset from the old topic folder. In this way, we can fully achieve the disaster recovery.

How could this be improved?

Some suggestions:
1.Enable the topic can be created with designated folder (with random string) in S3 in the new cluster;
2.The new message for the topic can be stored in the older folder

Is this a feature you would work on yourself?

I haven't take a look for the actual code part, but if someone can point me to the proper code link, I'd be happy to start looking.

ivanyu · 2024-04-09T06:01:55Z

Hi @bingkunyangvungle.
Such restoration is certainly possible. Do you see particular changes need to be done to this RSM plugin?

ivanyu · 2024-04-09T06:04:03Z

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

bingkunyangvungle · 2024-04-09T06:14:05Z

This is the tricky part that I think it might be. One proposal that I can think of is to have an internal mapping between the kafka-created topic ID <------> the plugin-managed topic ID. Then the plugin will manage the remote storage folder with the plugin-managed topic ID.
Also user can provide/configure the plugin-managed topic ID for topic provisioning. Of course the plugin would also need to check if the remote storage folder exist or not.

Just toss some idea here to share.

funky-eyes · 2024-06-06T09:42:28Z

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

Perhaps we can ignore the topicid in ObjectKeyFactory#mainPath? Adding some configuration to achieve this purpose.

ivanyu · 2024-06-06T11:19:04Z

This probably also doesn't have to do to the plugin. The plugin is a pretty passive component and does only a limited number of operations by the broker request. I think this should be a separate tool and/or certain broker modifications to support restoration like this in the first place.

We're exploring this idea in Aiven, but for the reason mentioned above, I don't think the plugin will undergo much change in the course of this.

jrmcclurg · 2025-02-09T04:10:12Z

I recently started thinking about this type of disaster recovery after experimenting with deleting persistent disks in my Azure test cluster. I would like to at least come up with a script(s) that can re-create topic data locally from a specified topic(s) in blob storage.

Has functionality like this already been implemented? If not, do you see any issues with the following general approach?

Construct a RemoteLogSegmentMetadata from the .rsm-manifest file in remote storage.
Use RemoteStorageManager.fetchLogSegment(...) to download the .log file, and similarly use fetchIndex(...) to get the .index file.
Use the second parameter of fetchIndex(...) to also download the .timeindex and .snapshot files.

Edit: I tried it, and it seems to work. I disabled remote storage, and Kafka was able to read log messages using a log directory reconstructed using the above. However, when I then turn on remote storage, things go wrong. Kafka then deletes the topic from blob storage, which I don't completely understand. I'll do a bit more testing.

giuseppelillo · 2025-02-10T15:22:54Z

@jrmcclurg You also need to make sure that the correct offsets and leader epochs are restored when re-creating a topic from remote segments.
RemoteLogSegmentMetadata contains the information about all the leader epochs that a segment has seen, and the end offset of the segment.
Let's say you have a remote segment related to a partition topic-0 with endOffset = 10, and segmentLeaderEpochs = {0 -> 0; 1 -> 5}.
You need to make sure that partition 0 of the newly created topic has an high watermark of 11, and that the leader epoch for that partition is at least 5. These parameters are configured inside the checkpoint files on the disk (replication-offset-checkpoint, recovery-point-offset-checkpoint, leader-epoch-checkpoint).
If those files are not properly setup, Kafka will consider the segment as invalid (for example a newly created partition starts from leader epoch 0 and offset 0, Kafka will see that the remote segment has a leader epoch of 5) and issue a deletion of the segment.

jrmcclurg · 2025-02-12T06:53:04Z

@giuseppelillo Many thanks, this is extremely helpful! On a slightly-related note, would I also need to reconstruct the __remote_log_metadata (and also __cluster_metadata) topics based on the remote segments, or is Kafka able to recover without these?

giuseppelillo · 2025-02-12T09:03:06Z

@jrmcclurg yes you absolutely need to reconstruct the __remote_log_metadata topic.

I'm not sure about __cluster_metadata instead, I guess it depends on the order of operations. For example let's say you have some remote segments stored in object storage for partition topic-0. If you want this topic to be present (and to contain all the data stored on object storage) as soon as you start your disaster recovery Kafka cluster, then yes you will also have to reconstruct __cluster_metadata so that the brokers immediately know about topic-0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thinking about disaster recovery #534

Thinking about disaster recovery #534

bingkunyangvungle commented Apr 8, 2024

ivanyu commented Apr 9, 2024

ivanyu commented Apr 9, 2024

bingkunyangvungle commented Apr 9, 2024 •

edited

Loading

funky-eyes commented Jun 6, 2024

ivanyu commented Jun 6, 2024

jrmcclurg commented Feb 9, 2025 •

edited

Loading

giuseppelillo commented Feb 10, 2025

jrmcclurg commented Feb 12, 2025

giuseppelillo commented Feb 12, 2025 •

edited

Loading

Thinking about disaster recovery #534

Thinking about disaster recovery #534

Comments

bingkunyangvungle commented Apr 8, 2024

What is currently missing?

How could this be improved?

Is this a feature you would work on yourself?

ivanyu commented Apr 9, 2024

ivanyu commented Apr 9, 2024

bingkunyangvungle commented Apr 9, 2024 • edited Loading

funky-eyes commented Jun 6, 2024

ivanyu commented Jun 6, 2024

jrmcclurg commented Feb 9, 2025 • edited Loading

giuseppelillo commented Feb 10, 2025

jrmcclurg commented Feb 12, 2025

giuseppelillo commented Feb 12, 2025 • edited Loading

bingkunyangvungle commented Apr 9, 2024 •

edited

Loading

jrmcclurg commented Feb 9, 2025 •

edited

Loading

giuseppelillo commented Feb 12, 2025 •

edited

Loading