Skip to content

Commit 6d11f50

Browse files
authored
STAC-23599: Restoring VictoriaMetrics (#5)
* STAC-23599: Restoring VictoriaMetrics
1 parent 06be20e commit 6d11f50

File tree

26 files changed

+1641
-484
lines changed

26 files changed

+1641
-484
lines changed

ARCHITECTURE.md

Lines changed: 54 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ stackstate-backup-cli/
2020
│ ├── root.go # Root command and global flags
2121
│ ├── version/ # Version information command
2222
│ ├── elasticsearch/ # Elasticsearch backup/restore commands
23-
│ └── stackgraph/ # Stackgraph backup/restore commands
23+
│ ├── stackgraph/ # Stackgraph backup/restore commands
24+
│ └── victoriametrics/ # VictoriaMetrics backup/restore commands
2425
2526
├── internal/ # Internal packages (Layers 0-3)
2627
│ ├── foundation/ # Layer 0: Core utilities
@@ -35,7 +36,8 @@ stackstate-backup-cli/
3536
│ │
3637
│ ├── orchestration/ # Layer 2: Workflows
3738
│ │ ├── portforward/ # Port-forwarding orchestration
38-
│ │ └── scale/ # Deployment scaling workflows
39+
│ │ ├── scale/ # Deployment/StatefulSet scaling workflows
40+
│ │ └── restore/ # Restore job orchestration
3941
│ │
4042
│ ├── app/ # Layer 3: Dependency Container
4143
│ │ └── app.go # Application context and dependency injection
@@ -62,7 +64,8 @@ stackstate-backup-cli/
6264

6365
**Key Packages**:
6466
- `cmd/elasticsearch/`: Elasticsearch snapshot/restore commands (configure, list-snapshots, list-indices, restore-snapshot)
65-
- `cmd/stackgraph/`: Stackgraph backup/restore commands (list, restore)
67+
- `cmd/stackgraph/`: Stackgraph backup/restore commands (list, restore, check-and-finalize)
68+
- `cmd/victoriametrics/`: VictoriaMetrics backup/restore commands (list, restore, check-and-finalize)
6669
- `cmd/version/`: Version information
6770

6871
**Dependency Rules**:
@@ -117,7 +120,8 @@ appCtx.Formatter
117120

118121
**Key Packages**:
119122
- `portforward/`: Manages Kubernetes port-forwarding lifecycle
120-
- `scale/`: Deployment scaling workflows with detailed logging
123+
- `scale/`: Deployment and StatefulSet scaling workflows with detailed logging
124+
- `restore/`: Restore job orchestration (confirmation, job lifecycle, finalization, resource management)
121125

122126
**Dependency Rules**:
123127
- ✅ Can import: `internal/foundation/*`, `internal/clients/*`
@@ -167,7 +171,7 @@ appCtx.Formatter
167171

168172
```
169173
1. User invokes CLI command
170-
└─> cmd/elasticsearch/restore-snapshot.go
174+
└─> cmd/victoriametrics/restore.go (or stackgraph/restore.go)
171175
172176
2. Parse flags and validate input
173177
└─> Cobra command receives global flags
@@ -177,16 +181,17 @@ appCtx.Formatter
177181
├─> internal/clients/k8s/ (K8s client)
178182
├─> internal/foundation/config/ (Load from ConfigMap/Secret)
179183
├─> internal/clients/s3/ (S3/Minio client)
180-
├─> internal/clients/elasticsearch/ (ES client)
181184
├─> internal/foundation/logger/ (Logger)
182185
└─> internal/foundation/output/ (Formatter)
183186
184187
4. Execute business logic with injected dependencies
185188
└─> runRestore(appCtx)
186-
├─> internal/orchestration/scale/ (Scale down)
187-
├─> internal/orchestration/portforward/ (Port-forward)
188-
├─> internal/clients/elasticsearch/ (Restore snapshot)
189-
└─> internal/orchestration/scale/ (Scale up)
189+
├─> internal/orchestration/restore/ (User confirmation)
190+
├─> internal/orchestration/scale/ (Scale down StatefulSets)
191+
├─> internal/orchestration/restore/ (Ensure resources: ConfigMaps, Secrets)
192+
├─> internal/clients/k8s/ (Create restore Job)
193+
├─> internal/orchestration/restore/ (Wait for completion & cleanup)
194+
└─> internal/orchestration/scale/ (Scale up StatefulSets)
190195
191196
5. Format and display results
192197
└─> appCtx.Formatter.PrintTable() or PrintJSON()
@@ -262,15 +267,50 @@ defer close(pf.StopChan) // Automatic cleanup
262267

263268
### 5. Scale Down/Up Pattern
264269

265-
Deployments are scaled down before restore operations and scaled up afterward:
270+
Deployments and StatefulSets are scaled down before restore operations and scaled up afterward:
266271

267272
```go
268273
// Example usage
269-
scaledDeployments, _ := scale.ScaleDown(k8sClient, namespace, selector, log)
270-
defer scale.ScaleUp(k8sClient, namespace, scaledDeployments, log)
274+
scaledResources, _ := scale.ScaleDown(k8sClient, namespace, selector, log)
275+
defer scale.ScaleUpFromAnnotations(k8sClient, namespace, selector, log)
271276
```
272277

273-
### 6. Structured Logging
278+
**Note**: Scaling now supports both Deployments and StatefulSets through a unified interface.
279+
280+
### 6. Restore Orchestration Pattern
281+
282+
Common restore operations are centralized in the `restore` orchestration layer:
283+
284+
```go
285+
// User confirmation
286+
if !restore.PromptForConfirmation() {
287+
return fmt.Errorf("operation cancelled")
288+
}
289+
290+
// Wait for job completion and cleanup
291+
restore.PrintWaitingMessage(log, "service-name", jobName, namespace)
292+
err := restore.WaitAndCleanup(k8sClient, namespace, jobName, log, cleanupPVC)
293+
294+
// Check and finalize background jobs
295+
err := restore.CheckAndFinalize(restore.CheckAndFinalizeParams{
296+
K8sClient: k8sClient,
297+
Namespace: namespace,
298+
JobName: jobName,
299+
ServiceName: "service-name",
300+
ScaleSelector: config.ScaleDownLabelSelector,
301+
CleanupPVC: true,
302+
WaitForJob: false,
303+
Log: log,
304+
})
305+
```
306+
307+
**Benefits**:
308+
309+
- Eliminates duplicate code between Stackgraph and VictoriaMetrics restore commands
310+
- Consistent user experience across services
311+
- Centralized job lifecycle management and cleanup
312+
313+
### 7. Structured Logging
274314

275315
All operations use structured logging with consistent levels:
276316

README.md

Lines changed: 81 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,9 @@ This CLI tool replaces the legacy Bash-based backup/restore scripts with a singl
99
**Current Support:**
1010
- Elasticsearch snapshots and restores
1111
- Stackgraph backups and restores
12+
- VictoriaMetrics backups and restores
1213

13-
**Planned:** VictoriaMetrics, ClickHouse, Configuration backups
14+
**Planned:** ClickHouse, Configuration backups
1415

1516
## Installation
1617

@@ -112,11 +113,76 @@ sts-backup stackgraph restore --namespace <namespace> [--archive <name> | --late
112113
**Flags:**
113114
- `--archive` - Specific archive name to restore (e.g., sts-backup-20210216-0300.graph)
114115
- `--latest` - Restore from the most recent backup
115-
- `--force` - Force delete existing data during restore
116116
- `--background` - Run restore job in background without waiting for completion
117+
- `--yes, -y` - Skip confirmation prompt
117118

118119
**Note**: Either `--archive` or `--latest` must be specified (mutually exclusive).
119120

121+
#### check-and-finalize
122+
123+
Check the status of a background Stackgraph restore job and clean up resources.
124+
125+
```bash
126+
sts-backup stackgraph check-and-finalize --namespace <namespace> --job <job-name> [--wait]
127+
```
128+
129+
**Flags:**
130+
131+
- `--job, -j` - Stackgraph restore job name (required)
132+
- `--wait, -w` - Wait for job to complete before cleanup
133+
134+
**Use Case**: This command is useful when a restore job was started with `--background` flag or was interrupted (
135+
Ctrl+C).
136+
137+
### victoriametrics
138+
139+
Manage VictoriaMetrics backups and restores.
140+
141+
#### list
142+
143+
List available VictoriaMetrics backups from S3/Minio.
144+
145+
```bash
146+
sts-backup victoriametrics list --namespace <namespace>
147+
```
148+
149+
**Note**: In HA mode, backups from both instances (victoria-metrics-0 and victoria-metrics-1) are listed. The restore
150+
command accepts either backup to restore both instances.
151+
152+
#### restore
153+
154+
Restore VictoriaMetrics from a backup archive. Automatically scales down affected StatefulSets before restore and scales
155+
them back up afterward.
156+
157+
```bash
158+
sts-backup victoriametrics restore --namespace <namespace> [--archive <name> | --latest] [flags]
159+
```
160+
161+
**Flags:**
162+
163+
- `--archive` - Specific backup name to restore (e.g., sts-victoria-metrics-backup/victoria-metrics-0-20251030143500)
164+
- `--latest` - Restore from the most recent backup
165+
- `--background` - Run restore job in background without waiting for completion
166+
- `--yes, -y` - Skip confirmation prompt
167+
168+
**Note**: Either `--archive` or `--latest` must be specified (mutually exclusive).
169+
170+
#### check-and-finalize
171+
172+
Check the status of a background VictoriaMetrics restore job and clean up resources.
173+
174+
```bash
175+
sts-backup victoriametrics check-and-finalize --namespace <namespace> --job <job-name> [--wait]
176+
```
177+
178+
**Flags:**
179+
180+
- `--job, -j` - VictoriaMetrics restore job name (required)
181+
- `--wait, -w` - Wait for job to complete before cleanup
182+
183+
**Use Case**: This command is useful when a restore job was started with `--background` flag or was interrupted (
184+
Ctrl+C).
185+
120186
## Configuration
121187

122188
The CLI uses configuration from Kubernetes ConfigMaps and Secrets with the following precedence:
@@ -194,9 +260,14 @@ See [internal/foundation/config/testdata/validConfigMapConfig.yaml](internal/fou
194260
│ │ ├── list-indices.go # List indices
195261
│ │ ├── list-snapshots.go # List snapshots
196262
│ │ └── restore-snapshot.go # Restore snapshot
197-
│ └── stackgraph/ # Stackgraph subcommands
263+
│ ├── stackgraph/ # Stackgraph subcommands
264+
│ │ ├── list.go # List backups
265+
│ │ ├── restore.go # Restore backup
266+
│ │ └── check-and-finalize.go # Check and finalize restore job
267+
│ └── victoriametrics/ # VictoriaMetrics subcommands
198268
│ ├── list.go # List backups
199-
│ └── restore.go # Restore backup
269+
│ ├── restore.go # Restore backup
270+
│ └── check-and-finalize.go # Check and finalize restore job
200271
├── internal/ # Internal packages (Layers 0-3)
201272
│ ├── foundation/ # Layer 0: Core utilities
202273
│ │ ├── config/ # Configuration management
@@ -208,7 +279,12 @@ See [internal/foundation/config/testdata/validConfigMapConfig.yaml](internal/fou
208279
│ │ └── s3/ # S3/Minio client
209280
│ ├── orchestration/ # Layer 2: Workflows
210281
│ │ ├── portforward/ # Port-forwarding lifecycle
211-
│ │ └── scale/ # Deployment scaling
282+
│ │ ├── scale/ # Deployment/StatefulSet scaling
283+
│ │ └── restore/ # Restore job orchestration
284+
│ │ ├── confirmation.go # User confirmation prompts
285+
│ │ ├── finalize.go # Job status check and cleanup
286+
│ │ ├── job.go # Job lifecycle management
287+
│ │ └── resources.go # Restore resource management
212288
│ ├── app/ # Layer 3: Dependency container
213289
│ │ └── app.go # Application context and DI
214290
│ └── scripts/ # Embedded bash scripts

cmd/elasticsearch/list_snapshots_test.go

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,26 @@ stackgraph:
4949
memory: "2Gi"
5050
pvc:
5151
size: "10Gi"
52+
victoriaMetrics:
53+
S3Locations:
54+
- bucket: vm-backup
55+
prefix: victoria-metrics-0
56+
- bucket: vm-backup
57+
prefix: victoria-metrics-1
58+
restore:
59+
haMode: "mirror"
60+
persistentVolumeClaimPrefix: "database-victoria-metrics-"
61+
scaleDownLabelSelector: "app=victoria-metrics"
62+
job:
63+
image: vm-backup:latest
64+
waitImage: wait:latest
65+
resources:
66+
limits:
67+
cpu: "1"
68+
memory: "2Gi"
69+
requests:
70+
cpu: "500m"
71+
memory: "1Gi"
5272
`
5373

5474
// mockESClient is a simple mock for testing commands

cmd/root.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import (
77
"github.com/stackvista/stackstate-backup-cli/cmd/elasticsearch"
88
"github.com/stackvista/stackstate-backup-cli/cmd/stackgraph"
99
"github.com/stackvista/stackstate-backup-cli/cmd/version"
10+
"github.com/stackvista/stackstate-backup-cli/cmd/victoriametrics"
1011
"github.com/stackvista/stackstate-backup-cli/internal/foundation/config"
1112
)
1213

@@ -39,6 +40,10 @@ func init() {
3940
addBackupConfigFlags(stackgraphCmd)
4041
rootCmd.AddCommand(stackgraphCmd)
4142

43+
victoriaMetricsCmd := victoriametrics.Cmd(flags)
44+
addBackupConfigFlags(victoriaMetricsCmd)
45+
rootCmd.AddCommand(victoriaMetricsCmd)
46+
4247
// Add commands that don't need backup config flags
4348
rootCmd.AddCommand(version.Cmd())
4449
}

0 commit comments

Comments
 (0)