Automated backup and restore system for the PIFP SQLite indexer database.
The backup system provides automated, secure backups of the PIFP indexer database with:
- Daily automated backups via cron job
- Compression using gzip for efficient storage
- Cloud storage integration (AWS S3 or Google Cloud Storage)
- 30-day retention policy with automatic cleanup
- Point-in-time recovery capability
- Comprehensive logging and error handling
┌─────────────────┐
│ Indexer DB │
│ (SQLite) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ backup.sh │
│ - Copy DB │
│ - Compress │
│ - Upload │
└────────┬────────┘
│
▼
┌─────────────────┐
│ S3 / GCS │
│ (Encrypted) │
└─────────────────┘
cd scripts/
cp .env.backup.example .env.backup
# Edit .env.backup with your credentials./backup.sh./setup_cron.shCreate a .env.backup file in the scripts/ directory:
# Database path
BACKUP_DB_PATH=/workspace/backend/indexer/pifp_events.db
# Storage provider: 's3' or 'gcs'
STORAGE_TYPE=s3
# AWS S3 configuration
BACKUP_BUCKET=pifp-database-backups
BACKUP_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
# Retention (days)
BACKUP_RETENTION_DAYS=30
# Logging
LOG_LEVEL=INFO
LOG_FILE=/var/log/pifp_backup.logSTORAGE_TYPE=s3
BACKUP_BUCKET=your-bucket-name
BACKUP_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYIAM Policy Requirements:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/backups/*"
]
}
]
}STORAGE_TYPE=gcs
BACKUP_BUCKET=your-bucket-name
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.jsonService Account Permissions:
storage.objects.createstorage.objects.deletestorage.objects.getstorage.objects.list
Run a one-time backup:
cd scripts/
source .env.backup # Load environment variables
./backup.shExpected Output:
[2025-01-28 02:00:00] [INFO] =========================================
[2025-01-28 02:00:00] [INFO] Starting PIFP Database Backup
[2025-01-28 02:00:00] [INFO] =========================================
[2025-01-28 02:00:01] [INFO] Validating environment configuration...
[2025-01-28 02:00:01] [INFO] Environment validation passed
[2025-01-28 02:00:02] [INFO] Creating backup directory: /tmp/pifp_backups
[2025-01-28 02:00:02] [INFO] Copying database from: /workspace/backend/indexer/pifp_events.db
[2025-01-28 02:00:03] [INFO] Database copied successfully. Size: 1048576 bytes
[2025-01-28 02:00:04] [INFO] Compressing backup file...
[2025-01-28 02:00:05] [INFO] Compression complete. Compressed size: 262144 bytes
[2025-01-28 02:00:06] [INFO] Uploading backup to S3 bucket: pifp-database-backups
[2025-01-28 02:00:15] [INFO] Successfully uploaded to S3: s3://pifp-database-backups/backups/pifp_backup_20250128_020000.db.gz
[2025-01-28 02:00:16] [INFO] Verifying backup upload...
[2025-01-28 02:00:17] [INFO] Upload verification successful
[2025-01-28 02:00:18] [INFO] Applying retention policy: keeping backups for 30 days
[2025-01-28 02:00:20] [INFO] Retention policy applied successfully
[2025-01-28 02:00:21] [INFO] Cleaning up local temporary files...
[2025-01-28 02:00:21] [INFO] Cleanup complete
[2025-01-28 02:00:21] [INFO] =========================================
[2025-01-28 02:00:21] [INFO] Backup completed successfully!
[2025-01-28 02:00:21] [INFO] Backup file: pifp_backup_20250128_020000.db.gz
[2025-01-28 02:00:21] [INFO] Location: s3://pifp-database-backups/backups/
[2025-01-28 02:00:21] [INFO] =========================================
cd scripts/
source .env.backup
./restore.sh./restore.sh pifp_backup_20250101_020000.db.gzRestore Process:
- Downloads backup from cloud storage
- Verifies backup integrity
- Stops indexer service (if running)
- Creates safety backup of current database
- Restores database from backup
- Verifies restored database integrity
- Restarts indexer service
Expected Output:
[2025-01-28 10:00:00] [INFO] =========================================
[2025-01-28 10:00:00] [INFO] Starting PIFP Database Restore
[2025-01-28 10:00:00] [INFO] =========================================
[2025-01-28 10:00:01] [INFO] Validating environment configuration...
[2025-01-28 10:00:01] [INFO] Environment validation passed
[2025-01-28 10:00:02] [INFO] Searching for latest backup in S3 bucket: pifp-database-backups
[2025-01-28 10:00:05] [INFO] Found latest backup: pifp_backup_20250128_020000.db.gz
[2025-01-28 10:00:06] [INFO] Downloading backup from S3: pifp_backup_20250128_020000.db.gz
[2025-01-28 10:00:15] [INFO] Download complete: /tmp/pifp_restore/pifp_backup_20250128_020000.db.gz
[2025-01-28 10:00:16] [INFO] Verifying backup integrity...
[2025-01-28 10:00:17] [INFO] Backup integrity verified. Size: 262144 bytes
[2025-01-28 10:00:18] [INFO] Decompressing backup file...
[2025-01-28 10:00:20] [INFO] Decompression complete. Database size: 1048576 bytes
[2025-01-28 10:00:21] [INFO] Stopping indexer service...
[2025-01-28 10:00:23] [INFO] Indexer process terminated (PID: 12345)
[2025-01-28 10:00:24] [INFO] Replacing database at: /workspace/backend/indexer/pifp_events.db
[2025-01-28 10:00:25] [INFO] Creating safety backup: /workspace/backend/indexer/pifp_events.db.pre_restore_20250128_100024
[2025-01-28 10:00:26] [INFO] Database replaced successfully
[2025-01-28 10:00:27] [INFO] Verifying restored database integrity...
[2025-01-28 10:00:28] [INFO] Database integrity verified. Tables: 4
[2025-01-28 10:00:29] [INFO] Starting indexer service...
[2025-01-28 10:00:30] [INFO] Please start the indexer manually if needed
[2025-01-28 10:00:31] [INFO] Cleaning up restore temporary files...
[2025-01-28 10:00:31] [INFO] Cleanup complete
[2025-01-28 10:00:31] [INFO] =========================================
[2025-01-28 10:00:31] [INFO] Restore completed successfully!
[2025-01-28 10:00:31] [INFO] =========================================
Setup daily automated backups:
./setup_cron.shThis will:
- Install a cron job to run backup daily at 2:00 AM UTC
- Log output to
/var/log/pifp_backup_cron.log - Load environment from
.env.backup
Verify Installation:
crontab -l | grep pifpExpected Output:
# PIFP Database Backup - Daily automated backup
0 2 * * * /workspace/scripts/backup.sh >> /var/log/pifp_backup_cron.log 2>&1
Edit crontab:
crontab -eCommon Schedules:
- Every 6 hours:
0 */6 * * * - Every hour:
0 * * * * - Twice daily (midnight & noon):
0 0,12 * * * - Weekly (Sunday 3 AM):
0 3 * * 0
Backups are automatically managed with a 30-day retention policy:
- Daily backups: One backup per day
- Retention period: 30 days
- Automatic deletion: Backups older than 30 days are deleted
- Cleanup timing: Applied during each backup run
Example Timeline:
Jan 1 Jan 5 Jan 10 Jan 15 Jan 20 Jan 25 Jan 28 Feb 1
|-------|-------|-------|-------|-------|-------|-------|
✓ ✓ ✓ ✓ ✓ ✓ ✓
↓
Delete Jan 1 backup
To change retention period, set in .env.backup:
BACKUP_RETENTION_DAYS=60 # Keep 2 months✅ DO:
- Store credentials in
.env.backupfile - Add
.env.backupto.gitignore(already configured) - Use IAM roles when possible (EC2, GCE)
- Rotate access keys regularly
- Use least-privilege permissions
❌ DON'T:
- Commit
.env.backupto version control - Hardcode credentials in scripts
- Share credentials via email/chat
- Use root/admin credentials
At Rest:
- S3: Server-side encryption (AES-256) enabled by default
- GCS: Automatic encryption at rest
In Transit:
- All uploads use HTTPS/TLS
- No unencrypted data transmission
Bucket Policy Recommendations:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyPublicAccess",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::your-bucket-name/backups/*",
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
]
}Backup Logs:
- Default location:
/var/log/pifp_backup.log - Cron logs:
/var/log/pifp_backup_cron.log
View Recent Activity:
tail -f /var/log/pifp_backup.logSearch Logs:
grep "ERROR" /var/log/pifp_backup.log
grep "Successfully uploaded" /var/log/pifp_backup.logConfigure verbosity in .env.backup:
LOG_LEVEL=DEBUG # Maximum detail
LOG_LEVEL=INFO # Normal operation (default)
LOG_LEVEL=WARN # Warnings only
LOG_LEVEL=ERROR # Errors onlyFor production environments, consider adding alerting:
Example: Email on Failure
# In setup_cron.sh or crontab
0 2 * * * /workspace/scripts/backup.sh || mail -s "Backup Failed" [email protected]Example: Slack Notification
Add to backup.sh after error detection:
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"PIFP Backup Failed!"}' \
https://hooks.slack.com/services/YOUR/WEBHOOK/URLCause: Indexer hasn't created the database yet
Solution:
# Run the indexer first
cd backend/indexer
cargo run
# Let it create the database, then stop it
# Now run backupCause: Indexer is actively writing to database
Solution:
- Stop indexer before backup
- Or wait for write operations to complete
- Consider using WAL mode for SQLite
Solution:
# Install AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
# Verify installation
aws --versionCauses:
- Invalid credentials
- Bucket doesn't exist
- Network issues
- Permissions issue
Solution:
# Test credentials
aws sts get-caller-identity
# Test bucket access
aws s3 ls s3://your-bucket-name
# Check IAM permissions
aws iam list-attached-user-policies --user-name your-userCauses:
- First backup hasn't run yet
- Wrong bucket name
- Wrong region
Solution:
# List all objects in bucket
aws s3 ls s3://your-bucket-name/backups/ --region your-region
# Verify bucket exists
aws s3api head-bucket --bucket your-bucket-nameEnable detailed debugging:
LOG_LEVEL=DEBUG ./backup.shThis will show:
- Environment variable values
- Each step's detailed output
- API calls and responses
- File operation details
# 1. Create test database
mkdir -p /tmp/test_indexer
sqlite3 /tmp/test_indexer/pifp_events.db "CREATE TABLE events (id INTEGER); INSERT INTO events VALUES (1);"
# 2. Set environment
export BACKUP_DB_PATH=/tmp/test_indexer/pifp_events.db
export BACKUP_BUCKET=test-backup-bucket
export STORAGE_TYPE=s3
# 3. Run backup
./backup.sh
# 4. Verify
ls -lh /tmp/pifp_backups/# 1. Backup current database
./backup.sh
# 2. Delete original database
rm /workspace/backend/indexer/pifp_events.db
# 3. Restore
./restore.sh
# 4. Verify restoration
sqlite3 /workspace/backend/indexer/pifp_events.db "SELECT COUNT(*) FROM events;"Practice full recovery quarterly:
- Simulate complete database loss
- Restore from latest backup
- Verify data integrity
- Document recovery time
- Update procedures based on learnings
- Duration: Typically 1-5 minutes depending on database size
- CPU: Minimal (compression uses ~10-20% CPU)
- Memory: < 100MB
- Network: Upload bandwidth dependent
- Database: Brief read lock during copy (< 1 second)
- Schedule during low-traffic periods: 2:00 AM UTC recommended
- Monitor backup duration: Alert if > 10 minutes
- Test restore regularly: Ensure backups are valid
- Rotate credentials: Every 90 days
- Review logs weekly: Catch issues early
Assumptions:
- Database size: 1 GB
- Compressed size: 250 MB (75% compression)
- Daily backups
- 30-day retention
Monthly Storage:
250 MB × 30 backups = 7.5 GB
Cost: 7.5 GB × $0.023/GB = $0.17/month
API Calls:
- 3 PUT requests per backup (upload, verify, lifecycle)
- 90 PUT requests/month = negligible cost
Total Estimated Cost: < $0.25/month
Similar pricing structure:
- Standard storage: $0.020/GB/month
- Operations: Similar minimal cost
- Review backup logs for errors
- Verify backup file count matches expected (30)
- Test restore procedure
- Check storage costs
- Rotate AWS/GCS credentials
- Perform disaster recovery drill
- Review and update retention policy
- Document any infrastructure changes
- Review backup strategy effectiveness
- Evaluate new storage options
- Update documentation
- Train team members on procedures
For issues or questions:
- Check logs:
/var/log/pifp_backup.log - Review troubleshooting section above
- Enable DEBUG logging
- Contact project maintainers
Part of the PIFP project. See main repository LICENSE file.