Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thinking about disaster recovery #534

Open
bingkunyangvungle opened this issue Apr 8, 2024 · 9 comments
Open

Thinking about disaster recovery #534

bingkunyangvungle opened this issue Apr 8, 2024 · 9 comments

Comments

@bingkunyangvungle
Copy link

What is currently missing?

Since we have all the data needed in S3 already (including the log data and the metadata), it can be of great help if we can just recover from the shutdown of the cluster, and launch the new cluster to use the same topic folder(prefix_randomstring) in S3. For the new cluster, we can consume from the oldest offset from the old topic folder. In this way, we can fully achieve the disaster recovery.

How could this be improved?

Some suggestions:
1.Enable the topic can be created with designated folder (with random string) in S3 in the new cluster;
2.The new message for the topic can be stored in the older folder

Is this a feature you would work on yourself?

I haven't take a look for the actual code part, but if someone can point me to the proper code link, I'd be happy to start looking.

@ivanyu
Copy link
Contributor

ivanyu commented Apr 9, 2024

Hi @bingkunyangvungle.
Such restoration is certainly possible. Do you see particular changes need to be done to this RSM plugin?

@ivanyu
Copy link
Contributor

ivanyu commented Apr 9, 2024

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

@bingkunyangvungle
Copy link
Author

bingkunyangvungle commented Apr 9, 2024

This is the tricky part that I think it might be. One proposal that I can think of is to have an internal mapping between the kafka-created topic ID <------> the plugin-managed topic ID. Then the plugin will manage the remote storage folder with the plugin-managed topic ID.
Also user can provide/configure the plugin-managed topic ID for topic provisioning. Of course the plugin would also need to check if the remote storage folder exist or not.

Just toss some idea here to share.

@funky-eyes
Copy link
Contributor

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

Perhaps we can ignore the topicid in ObjectKeyFactory#mainPath? Adding some configuration to achieve this purpose.

@ivanyu
Copy link
Contributor

ivanyu commented Jun 6, 2024

This probably also doesn't have to do to the plugin. The plugin is a pretty passive component and does only a limited number of operations by the broker request. I think this should be a separate tool and/or certain broker modifications to support restoration like this in the first place.

We're exploring this idea in Aiven, but for the reason mentioned above, I don't think the plugin will undergo much change in the course of this.

@jrmcclurg
Copy link

jrmcclurg commented Feb 9, 2025

I recently started thinking about this type of disaster recovery after experimenting with deleting persistent disks in my Azure test cluster. I would like to at least come up with a script(s) that can re-create topic data locally from a specified topic(s) in blob storage.

Has functionality like this already been implemented? If not, do you see any issues with the following general approach?

  1. Construct a RemoteLogSegmentMetadata from the .rsm-manifest file in remote storage.
  2. Use RemoteStorageManager.fetchLogSegment(...) to download the .log file, and similarly use fetchIndex(...) to get the .index file.
  3. Use the second parameter of fetchIndex(...) to also download the .timeindex and .snapshot files.

Edit: I tried it, and it seems to work. I disabled remote storage, and Kafka was able to read log messages using a log directory reconstructed using the above. However, when I then turn on remote storage, things go wrong. Kafka then deletes the topic from blob storage, which I don't completely understand. I'll do a bit more testing.

@giuseppelillo
Copy link
Contributor

@jrmcclurg You also need to make sure that the correct offsets and leader epochs are restored when re-creating a topic from remote segments.
RemoteLogSegmentMetadata contains the information about all the leader epochs that a segment has seen, and the end offset of the segment.
Let's say you have a remote segment related to a partition topic-0 with endOffset = 10, and segmentLeaderEpochs = {0 -> 0; 1 -> 5}.
You need to make sure that partition 0 of the newly created topic has an high watermark of 11, and that the leader epoch for that partition is at least 5. These parameters are configured inside the checkpoint files on the disk (replication-offset-checkpoint, recovery-point-offset-checkpoint, leader-epoch-checkpoint).
If those files are not properly setup, Kafka will consider the segment as invalid (for example a newly created partition starts from leader epoch 0 and offset 0, Kafka will see that the remote segment has a leader epoch of 5) and issue a deletion of the segment.

@jrmcclurg
Copy link

@giuseppelillo Many thanks, this is extremely helpful! On a slightly-related note, would I also need to reconstruct the __remote_log_metadata (and also __cluster_metadata) topics based on the remote segments, or is Kafka able to recover without these?

@giuseppelillo
Copy link
Contributor

giuseppelillo commented Feb 12, 2025

@jrmcclurg yes you absolutely need to reconstruct the __remote_log_metadata topic.

I'm not sure about __cluster_metadata instead, I guess it depends on the order of operations. For example let's say you have some remote segments stored in object storage for partition topic-0. If you want this topic to be present (and to contain all the data stored on object storage) as soon as you start your disaster recovery Kafka cluster, then yes you will also have to reconstruct __cluster_metadata so that the brokers immediately know about topic-0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants