-
Notifications
You must be signed in to change notification settings - Fork 91
Backup stage Internals
Akira Kurogane edited this page Oct 21, 2019
·
1 revision
Relevant code: cmd/pbm/main.go SendCmd(), pbm/backup.go Run(), NodeSuits(), pbm/cmd.go ListenCmd()
- When
pbm-agent
processes are started they begin to listen/watch on admin.pbmCmd in the replicaset with the PBM control collections (= configsvr replicaset if a cluster, otherwise the non-shared replicaset itself). - When
pbm backup
is executed thepbm
CLI inserts a document with {"cmd": "backup", "backup": { .... } } into admin.pbmCmd. - All
pbm-agent
process react to appearance of the new command document in admin.pbmCmd.- The first step is to see if they are valid (as of v1.0 that is = in PRIMARY or SECONDARY status && replication lag < 21s)
- Second step is to AcquireLock(), which is to write into admin.pbmOp first for the replicaset they're in.
-
pbm-agent
processes that didn't acquire the lock log "Backup has been scheduled on another replset node" and ??? goes back to listening/watching admin.pbmCmd for the next command.???
- The
pbm-agent
that took the log runs through the pbm/backup.go Run() command.- Upserts the backup metadata document in admin.pbmBackup.
- Runs the dump, updates admin.pbmBackup that the dump is complete for that replicaset
- If non-sharded replicaset no wait. If a shard replicaset it waits to see that the whole cluster dump is done. If a configsvr replicaset it watches the admin.pbmBackup document periodically (once per second) until it sees that all shards have
StatusDumpDone
(or the later stepStatusDone
). When all replicasets have finished the dump the parent level "status" field is set to "done" - After the dump(s) are all done Oplog slices are made by function ???