Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 20 additions & 12 deletions bin/backup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ set -e

setup.sh

max_pg_wait_count=120
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this configurable via env var and default to 120?

work_area=${PGDUMP_BACKUP_AREA:-/pg_dump}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the work area need to be configurable? Do you want to bind mount it from a different host volume, and are then unable to rm -rf the bind mounted directory?

Maybe we just rm /pg_dump/*.sql, instead? And then you can bind mount /pg_dump from anywhere, and we can also mkdir /pg_dump in Dockerfile and no longer need mkdir -p.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're wanting this in a different volume, and force making this in / is a thing that only makes sense inside a container, and a pretty specific container at that. While this code is for a container, making it dependent on that is a pain point waiting to happen (as has just happened to us).

Making the path configurable lets you tune things without hand configuring the container build with a bind mount. Far better to have the script be somewhat generic by being able to tell it where to work.

One stong advantage of the mkdir is that we know we make the work and nobody else might be using it. Otherwise the rm -rf below, or your suggested rm *.sql are just asking to damage something else's work. With mkdir, we made it and we're free to unmake it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow. Are you using this script outside of a container? From what I have seen, it is fairly common for Docker images to store data in a subdirectory of /.

As far as I can tell, there is no point in making the directory configurable unless you are wanting to use a bind mount, which must be hand configured by you in your compose file or docker run command, and needing to completely remove and recreate that directory which would require you nominate a subdirectory of the mount point.

I think we could just use /data/pg_dump as the directory and you could then use -v /some/host/dir:/data and we could continue to destroy and recreate the pg_dump subdirectory, without this extra configuration or documentation.

Fair point about the safety of retaining the mkdir and exit if it already exists. Though, if something ever goes wrong and a container exits (e.g. OOM killed) before having removed the directory, the container will not be able to restart resume backups without intervention?


for i in {1..5}; do
export HOSTNAME_VAR="HOSTNAME_$i"
export PGHOST_VAR="PGHOST_$i"
Expand All @@ -28,37 +31,42 @@ for i in {1..5}; do
echo "Dumping database cluster $i: $PGUSER@$PGHOST:$PGPORT"

# Wait for PostgreSQL to become available.
COUNT=0
count=0
until psql -l > /dev/null 2>&1; do
if [[ "$COUNT" == 0 ]]; then
if [[ "$count" == 0 ]]; then
echo "Waiting for PostgreSQL to become available..."
fi
(( COUNT += 1 ))
(( count += 1 ))
[ $count -lt $max_pg_wait_count ] || break
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why [] here instead of [[]]? Can we stick to [[]], or even if [[ ... ]]; then ... fi (as above) for consistency.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because I pretty much never use [[ in my own scripts - its utility is generally tiny and it is less portable. I'll tweak this for consistency though.

sleep 1
done
if (( COUNT > 0 )); then
echo "Waited $COUNT seconds."
if (( count > 0 )); then
echo "Waited $count seconds."
psql -l > /dev/null 2>&1 || {
echo "PostgreSQL still not available, trying next backup."
continue
}
fi

mkdir -p "/pg_dump"
mkdir "$work_area" || exit 1
Comment on lines -43 to +54
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the problem with -p? If we do drop it, we don't need || exit 1 because we have set -e already.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with -p is twofold:

  • it makes intermediate directories (that's its entire feature) - not an issue with a hardwired /pg_dump but definitely an issue with longer paths - with a longer path accidents in the longer path eg /tmpp/work_area do not get notice - badness just gets created in the filesystem.
  • if the directory already exists, it is not an error - this means that 2 instances of the script can both "make" the work area, both try to use in concurrently, and both try to blow it away. All those things can variously lead to corrupt/truncated backups or removed-during-processing backups. All bad. mkdir -p in a script is usually a bug magnet. By not having -p we are just asking to make a specific directory we want to be (a) new and (b) in a place where it is expected. -p discards these benefits.

Why might 2 scripts be running? If the backups take a very long time, something which creeps up on one for a fixed cronlike schedule. Forcing a mkdir is robust.

I like || exit to make the flow control explicit, -e regardless. I use both -e and -u in my own scripts, but still make an exit like the above explicit. It aids readability. Also, that's a fun side effect of subshells, where -e effectively gets turned off and needs reenabling. Explicit flow control reduces the opportunity for this kind of misadventure.


# Dump individual databases directly to restic repository.
DBLIST=$(psql -d postgres -q -t -c "SELECT datname FROM pg_database WHERE datname NOT IN ('postgres', 'rdsadmin', 'template0', 'template1')")
for dbname in $DBLIST; do
dblist=$(psql -d postgres -q -t -c "SELECT datname FROM pg_database WHERE datname NOT IN ('postgres', 'rdsadmin', 'template0', 'template1')")
for dbname in $dblist; do
echo "Dumping database '$dbname'"
pg_dump --file="/pg_dump/$dbname.sql" --no-owner --no-privileges --dbname="$dbname" || true # Ignore failures
pg_dump --file="$work_area/$dbname.sql" --no-owner --no-privileges --dbname="$dbname" || true # Ignore failures
done

# echo "Dumping global objects for '$PGHOST'"
# pg_dumpall --file="/pg_dump/!globals.sql" --globals-only
# pg_dumpall --file="$work_area/!globals.sql" --globals-only

echo "Sending database dumps to S3"
while ! restic backup --host "$HOSTNAME" "/pg_dump"; do
while ! restic backup --host "$HOSTNAME" "$work_area"; do
echo "Sleeping for 10 seconds before retry..."
sleep 10
done

echo 'Finished sending database dumps to S3'

rm -rf "/pg_dump"
rm -rf "$work_area"
done
4 changes: 3 additions & 1 deletion bin/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@

set -e

ok=1
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY RESTIC_PASSWORD RESTIC_REPOSITORY; do
eval [[ -z \${$var+1} ]] && {
>&2 echo "ERROR: Missing required environment variable: $var"
exit 1
ok=
}
done
[ $ok ] || exit 1

if ! restic unlock; then
restic init
Expand Down