Update backup_restore.md #321

Tejeev · 2025-01-31T19:35:46Z

Can't count how many times we've seen removing backup etcd dbs put a customer in a bad spot.

brandond · 2025-01-31T20:06:23Z

docs/datastore/backup_restore.md

 ```

-4. Remove the rke2 db directory on the other server nodes as follows:
+4. Move the rke2 db directory on the other server nodes as follows (you want to keep a copy to avoid ending up with only an old or corrupt backup to chose for):


Having the old DB dir around on the secondary servers doesn't really help with anything. If you run into problems, restoring a snapshot is a better resolution than moving an old db dir back into place.

The issue is that we currently run rm -rf /var/lib/rancher/rke2/server/db/, which deletes both the etcd data and the snapshots directory. This means we erase the live data along with its backups.

We've encountered cases where customers, not paying close attention, have accidentally executed this command on all three master/etcd nodes, leading to complete data loss.

This change ensures that snapshots are not deleted until the cluster has been fully restored, allowing customers to perform the cleanup on their own afterward.

Hmm, then how about we leave this as-is and just delete the etcd directory?

@brandond @Tejeev

I agree with the proposed change to use rm -rf /var/lib/rancher/rke2/server/db/etcd instead of the broader directory removal.

The more targeted approach addresses the core issue while providing several important benefits:

It removes only the etcd database files that need to be replaced during restoration

Preserves the snapshots directory, preventing potential complete data loss scenarios

Eliminates the risk we've seen with customers accidentally executing the broader command across all master/etcd nodes simultaneously

Requires no additional cleanup steps later in the process

brandond · 2025-02-19T22:39:07Z

docs/datastore/backup_restore.md

+mv /var/lib/rancher/rke2/server/db /var/lib/rancher/rke2/server/backups
+```
+Clean them out after this operation:
+```
+rm -rf /var/lib/rancher/rke2/server/backups


Suggested change

mv /var/lib/rancher/rke2/server/db /var/lib/rancher/rke2/server/backups

```

Clean them out after this operation:

```

rm -rf /var/lib/rancher/rke2/server/backups

rm -rf /var/lib/rancher/rke2/server/db/etcd

This should remove the etcd files but leave the snapshots, without requiring any additional cleanup later.

I think this is good, unless I'm missing anything; @mattmattox?

Update backup_restore.md

198edf0

Tejeev requested a review from a team as a code owner January 31, 2025 19:35

brandond requested changes Jan 31, 2025

View reviewed changes

Tejeev requested a review from brandond February 19, 2025 20:34

brandond requested changes Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update backup_restore.md #321

Update backup_restore.md #321

Uh oh!

Tejeev commented Jan 31, 2025

Uh oh!

brandond Jan 31, 2025

Uh oh!

mattmattox Feb 7, 2025

Uh oh!

brandond Feb 19, 2025 •

edited

Loading

Uh oh!

mattmattox May 8, 2025

Uh oh!

brandond Feb 19, 2025

Uh oh!

Tejeev Feb 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update backup_restore.md #321

Are you sure you want to change the base?

Update backup_restore.md #321

Uh oh!

Conversation

Tejeev commented Jan 31, 2025

Uh oh!

brandond Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

mattmattox Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

brandond Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattmattox May 8, 2025

Choose a reason for hiding this comment

Uh oh!

brandond Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

Tejeev Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brandond Feb 19, 2025 •

edited

Loading

Tejeev Feb 22, 2025 •

edited

Loading