Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github action for mirroring repository content to an S3 bucket #47

Merged
merged 1 commit into from
Jul 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/s3-mirror.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Mirror repository content to an S3 bucket

on:
push:
branches:
- gh-pages

schedule:
# Pushes to `gh-pages` done from other actions cannot trigger this action so we also want it to run
# on a schedule. Let's give the nightly job a 1h head-start and run every day at 1:00.
- cron: '0 1 * * *'

env:
S3_BUCKET: solc-bin
S3_REGION: eu-central-1

jobs:
push-to-s3:
runs-on: ubuntu-latest
timeout-minutes: 30

steps:
- name: Wait for other instances of this workflow to finish
# It's not safe to run two S3 sync operations concurrently with different files
uses: softprops/turnstyle@v1
with:
same-branch-only: no
Comment on lines +23 to +27
Copy link
Member Author

@cameel cameel Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's no way to tell github to run only one instance of the action at a time. As a workaround I used the turnstyle action.

Is it OK to use it? I looked at its code briefly and I didn't see anything nefarious. The docs say that giving it GITHUB_TOKEN is required but seems to work without it just fine. So this PR should be completely safe (turnstyle can't access any secrets) but it's a third-party action so if we ever update it without reviewing the new code and we start putting any secrets (GITHUB_TOKEN or S3 keys) in env variables it could theoretically steal them and use them to modify files on github or in S3.


- name: Configure the S3 client
run: |
aws configure set default.region "$S3_REGION"
aws configure set aws_access_key_id '${{ secrets.AWS_ACCESS_KEY_ID }}'
aws configure set aws_secret_access_key '${{ secrets.AWS_SECRET_ACCESS_KEY }}'

- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Sync the S3 bucket
run: |
./sync-s3.sh "$S3_BUCKET"
61 changes: 61 additions & 0 deletions sync-s3.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env bash

#------------------------------------------------------------------------------
# Uploads the content of the local working copy to a storage bucket on
# Amazon S3. Removes any files that do not exist locally. Files in the root
# directory with names starting with a dot or an underscore are not uploaded.
#
# The script assumes that the AWS CLI tool is installed and already configured
# with credentials allowing it to modify the bucket.
#
# NOTE: There's no built-in mechanism for updating an S3 bucket in an atomic
# way. Only individual file updates are atomic. This means that during the
# sync clients will see the intermediate state with some files missing or not
# yet updated. Since the binaries are never modified or removed from the repository
# under normal circumstances, updating file lists last is enough to alleviate this.
#
# When running multiple instances of this script concurrently on different
# revisions it's theoretically possible to end up with any combination of
# their files in the bucket so it should be avoided.
#
# WARNING: The script destructively modifies the working copy. Always run it
# on a fresh clone!
#------------------------------------------------------------------------------

set -eo pipefail

die() { >&2 echo "ERROR: $@" && false; }

s3_bucket="$1"
(( $# == 1 )) || die "Expected exactly 1 parameter."

[[ $(git rev-parse --is-shallow-repository) == false ]] || die "This script requires access to full git history to be able to set file timestamps correctly."

echo "===> Updating file modification timestamps to match commits"
# NOTE: `aws s3 sync` compares file timestamp and size to decide whether to upload it or not.
readarray -t files < <(git ls-files)
for file in "${files[@]}"; do
full_time="$(git log --max-count 1 --pretty=format:%cd --date=iso -- "$file")"
unix_timestamp="$(date --date="$full_time" +%Y%m%d%H%M.%S)"
touch -m -t "$unix_timestamp" "$file"
done

echo "===> Removing files that should not be uploaded to S3"
# NOTE: This ensures that they will be deleted from the bucket if they're already there.
# If we used `aws s3 sync --delete --exclude` instead, they would not get deleted.
find -path './.*' -delete
find -path './_*' -delete

echo "===> Adding compatibility symlinks for files containing plus signs in the name"
# NOTE: This is a quick'n'dirty workaround for Amazon S3 decoding plus sign in paths
# as a space even though this substitution is only supposed to happen in a query string.
# See https://forums.aws.amazon.com/thread.jspa?threadID=55746
find . \
-regex "^\(.*/\)*[^/]*\+[^/]*$" \
-exec bash -c 'ln --symbolic --no-target-directory "$(basename "{}")" "$(dirname "{}")/$(basename "{}" | tr "+" " ")"' \;

echo "===> Syncing binaries with the S3 bucket"
aws s3 sync . "s3://${s3_bucket}" --delete --follow-symlinks --no-progress --exclude "*/list.*"

echo "===> Syncing file lists with the S3 bucket"
aws s3 sync . "s3://${s3_bucket}" --delete --follow-symlinks --no-progress --exclude "*" --include "*/list.*"