Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ jobs:
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r certificate_automation/requirements.txt
pip install -r blog_automation/requirements.txt
- name: Run pytest
run: |
Expand Down
72 changes: 72 additions & 0 deletions .github/workflows/run_blog_exporter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Publish reviewed blogs

on:
workflow_dispatch:
schedule:
- cron: '0 7 * * *' # daily at 07:00 UTC: publish any newly reviewed blogs

jobs:
publish-blogs:
if: github.repository == 'Women-Coding-Community/WomenCodingCommunity.github.io'
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v5

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'

- name: Cache pip
uses: actions/cache@v4
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-blog-${{ hashFiles('tools/blog_automation/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-blog-
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r tools/blog_automation/requirements.txt
- name: Write service account key
run: echo "$SERVICE_ACCOUNT_KEY" > tools/blog_automation/service_account_key.json
env:
SERVICE_ACCOUNT_KEY: ${{ secrets.BLOG_AUTOMATION_SERVICE_ACCOUNT }}

- name: Export reviewed blogs
run: |
cd tools/blog_automation
python publish_reviewed_blogs.py
- name: Remove service account key
if: always()
run: rm -f tools/blog_automation/service_account_key.json

- name: Create or Update Pull Request
id: create-pr
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.GHA_ACTIONS_ALLOW_TOKEN }}
commit-message: "Automated import of reviewed blog posts"
branch: "automation/import-blog"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch name is fixed as automation/import-blog, which means all batch imports accumulate into the same PR. If a previous batch PR is still open when a new daily run triggers, reviewers might approve a larger bundle than expected.\n\nWould it be worth using a date-stamped branch name (e.g. automation/import-blog-2026-06-28) so each run creates its own isolated PR, giving reviewers clearer control over what they're approving?

team-reviewers: "Women-Coding-Community/leaders"
title: "Automated import of reviewed blog posts"
body: |
This PR was created automatically by a GitHub Action.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR checklist includes "I have added a screenshot from the website after I tested it locally" but no screenshot appears in the PR body. Since this automation writes files and opens PRs, a sample of the generated post output (e.g. the front matter + first few lines of an exported .html file) would help reviewers validate the format before merging.

It contains every blog marked `isReviewedandApproved` (and not yet
`isPublished`) in the submissions spreadsheet:
- new posts under `_posts/`
- cover images under `assets/images/blog/`
The spreadsheet's `isPublished` column has already been set to TRUE for

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR body already mentions that isPublished is set to TRUE before merging, which is good. One concern: if this PR is rejected or closed without merging, those rows stay marked as published in the sheet and won't be re-exported on the next run — they'd be silently lost.\n\nCould we add a note here reminding reviewers that closing this PR without merging requires manually resetting isPublished to FALSE in the spreadsheet for the affected rows?

these rows. Please review the rendered posts before merging.
labels: |
automation
add-paths: |
_posts/**
assets/images/blog/**
56 changes: 51 additions & 5 deletions tools/blog_automation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ To allow our scripts to access Google Drive and export documents, you need to cr
👉 **Note:** You need the **Project Editor** or **Owner** role on this project to create service accounts and keys.
If you’re the one who created the project, you already have these permissions.

### 1. Enable the Drive API
### 1. Enable the Drive and Sheets APIs
1. In the left menu, go to **APIs & Services → Library**.
2. Search for **Google Drive API**.
3. Click **Enable**.
2. Search for **Google Drive API** and click **Enable**.
3. Search for **Google Sheets API** and click **Enable** (needed to read the submissions spreadsheet).

### 2. Create a Service Account
1. In the left menu, go to **IAM & Admin → Service Accounts**.
Expand All @@ -47,6 +47,7 @@ If you’re the one who created the project, you already have these permissions.
4. Give it at least **Viewer** access.
5. Save changes.
- Now the service account can read/export files in that folder or doc.
6. Repeat the **Share** step for the **blog submissions spreadsheet** (the Google Form responses sheet), giving the service account **Editor** access. Editor (not just Viewer) is required because the pipeline writes `isPublished = TRUE` back to a row after exporting it.

---

Expand Down Expand Up @@ -75,8 +76,53 @@ Then the **Document ID** is:

Use this ID in your scripts when exporting the document.

## Run Automation
## Export a single blog manually (for testing)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README now correctly references blog_exporter.py as the entry point for manual testing, but doc_to_html_conversion.py is still present in the repo (visible in the file tree). Is this file still needed, or should it be removed/deprecated to avoid confusion about which script to use?

1. Activate virtual environment: `source venv/bin/activate`
2. Run the script: `python doc_to_html_conversion.py <DOC_ID>`
2. Export one Google Doc into a post:
`python blog_exporter.py --doc_id <DOC_ID> --author_name "Jane Doe" --image_link "<DRIVE_IMAGE_LINK>"`

This is handy to check a Doc renders correctly. The full pipeline below reads all
of this metadata from the spreadsheet automatically.

## Tests

Run `pytest test_blog_exporter.py`

## CI/CD pipeline: publish a blog when you mark it reviewed

The Google Sheet is the **single source of truth** — there is no local CSV. The
GitHub Action [`.github/workflows/run_blog_exporter.yml`](../../.github/workflows/run_blog_exporter.yml)
turns a reviewed blog into a draft pull request automatically.

### How to publish a blog (the editor's workflow)
1. In the submissions spreadsheet (the **Form Responses 1** sheet), set the row's
**`isReviewedandApproved`** cell to **`TRUE`** once the draft is reviewed.
Leave **`isPublished`** blank/`FALSE`.
2. Within a day (or immediately via **Actions → Publish reviewed blogs → Run
workflow**) the action exports the blog, sets that row's **`isPublished`** to
`TRUE` in the sheet, and opens/updates a PR
(`Automated import of reviewed blog posts`) with the new post and cover image.
3. **Review the rendered post and merge.**

### What runs
`publish_reviewed_blogs.py` reads the sheet and exports every row where
`isReviewedandApproved` is `TRUE` and `isPublished` is not `TRUE`. Because the
`isPublished` flag is written straight back to the sheet, a blog is never exported
twice — and the existing backlog (already `isPublished = TRUE`) is left alone.

> The draft must be a **native Google Doc** (Drive can only export those to
> Markdown). If a submitter uploaded a `.docx`/`.pdf`, open it and do
> **File → Save as Google Docs** first, otherwise that row is skipped with an error.
### One-time repo setup
- **Service account needs Editor access to the spreadsheet** (see setup step 4) so
the pipeline can write back `isPublished`.
- **Secret `BLOG_AUTOMATION_SERVICE_ACCOUNT`** — paste the full contents of
`service_account_key.json` into a repository secret with this name
(Settings → Secrets and variables → Actions). The workflow writes it to disk at
runtime and deletes it afterwards; the key is never committed.
- **Secret `GHA_ACTIONS_ALLOW_TOKEN`** — already used by the other automations; it
lets the action open the pull request.



Loading
Loading