Skip to content

Commit dc03292

Browse files
authored
Merge pull request #195 from mijinummi/feat/188-automated-backup-recovery
Feat/188 automated backup recovery
2 parents 1595190 + c117014 commit dc03292

File tree

3 files changed

+149
-0
lines changed

3 files changed

+149
-0
lines changed

disaster-recovery.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Disaster Recovery Runbook
2+
3+
## Purpose
4+
This runbook provides step-by-step procedures for restoring database services in the event of data loss, corruption, or infrastructure failure. It ensures business continuity and compliance with recovery objectives.
5+
6+
---
7+
8+
## Recovery Objectives
9+
- **RPO (Recovery Point Objective):** ≤ 1 hour (hourly backups + WAL logs).
10+
- **RTO (Recovery Time Objective):** ≤ 2 hours for full restoration.
11+
- **Retention:** 30 days PITR, 6 months weekly backups.
12+
13+
---
14+
15+
## Recovery Scenarios
16+
1. **Accidental Data Deletion**
17+
- Restore latest backup.
18+
- Apply WAL logs to recover up to deletion time.
19+
2. **Database Corruption**
20+
- Provision new DB instance.
21+
- Restore last verified backup.
22+
- Apply WAL logs.
23+
3. **Regional Outage**
24+
- Switch to cross-region backup.
25+
- Provision DB in secondary region.
26+
- Restore backup + WAL logs.
27+
4. **Security Breach**
28+
- Isolate compromised DB.
29+
- Restore clean backup.
30+
- Rotate credentials and keys.
31+
32+
---
33+
34+
## Recovery Steps
35+
1. **Identify Incident**
36+
- Monitor alerts (backup failures, DB errors).
37+
- Confirm scope of outage.
38+
2. **Provision New Database**
39+
- Launch new DB instance in primary or secondary region.
40+
- Configure networking and security groups.
41+
3. **Restore Backup**
42+
- Retrieve latest encrypted backup from storage.
43+
- Decrypt using KMS key.
44+
- Import backup into new DB.
45+
4. **Apply WAL Logs (PITR)**
46+
- Replay logs up to desired timestamp.
47+
- Validate consistency.
48+
5. **Verify Restoration**
49+
- Run automated integrity tests.
50+
- Validate application connectivity.
51+
6. **Switch Traffic**
52+
- Update connection strings.
53+
- Point services to restored DB.
54+
7. **Post-Recovery Actions**
55+
- Document incident.
56+
- Notify stakeholders.
57+
- Schedule follow-up review.
58+
59+
---
60+
61+
## Monitoring & Alerts
62+
- **Backup Failures:** Alert via Slack/email.
63+
- **Restore Failures:** Escalate to DBA team.
64+
- **Retention Policy:** Auto-delete expired backups, log events.
65+
66+
---
67+
68+
## Testing Schedule
69+
- **Monthly Restore Drill:** Restore backup into staging DB.
70+
- **Quarterly Failover Drill:** Simulate regional outage, restore cross-region backup.
71+
- **Annual Full Audit:** Verify PITR functionality for 30 days.
72+
73+
---
74+
75+
## Roles & Responsibilities
76+
- **DBA Team:** Execute recovery steps.
77+
- **DevOps Team:** Provision infrastructure.
78+
- **Security Team:** Handle breach scenarios.
79+
- **Management:** Approve failover decisions.
80+
81+
---
82+
83+
## References
84+
- Backup Service (`backend/src/backup/backup.service.ts`)
85+
- Monitoring Dashboard
86+
- Cloud Storage Policies

src/backup/backup.service.ts

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import { Injectable, Logger } from '@nestjs/common';
2+
import { Cron } from '@nestjs/schedule';
3+
import { exec } from 'child_process';
4+
import * as fs from 'fs';
5+
import * as crypto from 'crypto';
6+
7+
@Injectable()
8+
export class BackupService {
9+
private readonly logger = new Logger(BackupService.name);
10+
11+
@Cron('0 * * * *') // hourly
12+
async hourlyBackup() {
13+
await this.runBackup('hourly');
14+
}
15+
16+
@Cron('0 0 * * *') // daily
17+
async dailyBackup() {
18+
await this.runBackup('daily');
19+
}
20+
21+
private async runBackup(type: 'hourly' | 'daily') {
22+
const timestamp = new Date().toISOString();
23+
const file = `backup-${type}-${timestamp}.sql`;
24+
25+
exec(`pg_dump mydb > ${file}`, async (err) => {
26+
if (err) {
27+
this.logger.error(`Backup failed: ${err.message}`);
28+
// trigger alert
29+
return;
30+
}
31+
32+
// Encrypt backup
33+
const data = fs.readFileSync(file);
34+
const cipher = crypto.createCipher('aes-256-cbc', process.env.BACKUP_KEY!);
35+
const encrypted = Buffer.concat([cipher.update(data), cipher.final()]);
36+
fs.writeFileSync(`${file}.enc`, encrypted);
37+
38+
this.logger.log(`Backup ${file} completed and encrypted`);
39+
// upload to cloud storage here
40+
});
41+
}
42+
}

test/backup/backup.service.spec.ts

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import { BackupService } from '../../src/backup/backup.service';
2+
3+
describe('BackupService', () => {
4+
let service: BackupService;
5+
6+
beforeEach(() => {
7+
service = new BackupService();
8+
});
9+
10+
it('should run hourly backup', async () => {
11+
const spy = jest.spyOn(service as any, 'runBackup').mockResolvedValue(true);
12+
await service.hourlyBackup();
13+
expect(spy).toHaveBeenCalledWith('hourly');
14+
});
15+
16+
it('should run daily backup', async () => {
17+
const spy = jest.spyOn(service as any, 'runBackup').mockResolvedValue(true);
18+
await service.dailyBackup();
19+
expect(spy).toHaveBeenCalledWith('daily');
20+
});
21+
});

0 commit comments

Comments
 (0)