Skip to content

Commit c03c54c

Browse files
Make changes to check for broken links (#4)
1 parent 83767e7 commit c03c54c

File tree

6 files changed

+252
-145
lines changed

6 files changed

+252
-145
lines changed

.github/workflows/check.yaml

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Check links
2+
3+
on:
4+
# when someone makes a change directly to main branch
5+
push:
6+
branches:
7+
- main
8+
# when someone requests a change to main branch
9+
pull_request:
10+
branches:
11+
- main
12+
13+
# run periodically
14+
schedule:
15+
- cron: "0 0 * * *"
16+
# run manually
17+
workflow_dispatch:
18+
19+
jobs:
20+
check:
21+
runs-on: ubuntu-latest
22+
steps:
23+
- if: runner.debug == '1'
24+
uses: mxschmitt/action-tmate@v3
25+
26+
- name: Get this repo's code
27+
uses: actions/checkout@v4
28+
29+
- name: Set up Bun
30+
uses: oven-sh/setup-bun@v1
31+
32+
- name: Install packages
33+
run: bun install glob@v9 yaml@v2
34+
35+
- name: Run check script
36+
run: bun ./check.js

.github/workflows/deploy.yaml

+10-5
Original file line numberDiff line numberDiff line change
@@ -15,25 +15,30 @@ env:
1515

1616
jobs:
1717
encode:
18-
name: Encode and deploy
1918
runs-on: ubuntu-latest
2019
steps:
20+
- if: runner.debug == '1'
21+
uses: mxschmitt/action-tmate@v3
22+
2123
- name: Get this repo's code
22-
uses: actions/checkout@v3
24+
uses: actions/checkout@v4
2325
with:
2426
path: redirects-repo # save in separate sub-folder
2527

2628
- name: Get website repo's code
27-
uses: actions/checkout@v3
29+
uses: actions/checkout@v4
2830
with:
2931
repository: ${{ github.repository_owner }}/${{ env.website_repo }} # assume same user/org
3032
path: website-repo # save in separate sub-folder
3133

34+
- name: Set up Bun
35+
uses: oven-sh/setup-bun@v1
36+
3237
- name: Install packages
33-
run: npm install glob@v9 yaml@v2
38+
run: bun install glob@v9 yaml@v2
3439

3540
- name: Run encode script
36-
run: node ./redirects-repo/encode.js
41+
run: bun ./redirects-repo/encode.js
3742

3843
- name: Commit result to website repo
3944
if: ${{ github.event_name == 'push' }}

README.md

+24-21
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,12 @@ _Counterpart to the [redirects-website repo](../../../redirects-website)._
1212

1313
1. Add/change/remove redirect entries in one or more [`.yaml` files in the top folder](../../blob/main/redirects.yaml).
1414
Note: the `from` field is **case-insensitive**.
15-
2. Commit the changes to the `main` branch, either directly or with a pull request (recommended so the automatic process can catch errors before the changes go live).
16-
3. Changes should take effect automatically within a minute or so.
15+
1. Commit the changes to the `main` branch, either directly or with a pull request (recommended so the automatic process can catch errors before the changes go live).
16+
1. Changes should take effect automatically within a minute or so.
1717
Verify that no errors occurred in the automatic process here: [![Encode and deploy](../../actions/workflows/deploy.yaml/badge.svg)](../../actions/workflows/deploy.yaml)
18+
1. Verify that none of your redirect links are reported broken in the automatic process here: [![Check links](../../actions/workflows/check.yaml/badge.svg)](../../actions/workflows/check.yaml).
19+
Note that this is only a **rough check**.
20+
There _may be false positives or true negatives_, as it simply checks the [status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) of the link, which the third-party may choose inappropriately.
1821

1922
You can do this [directly on github.com](../../edit/main/redirects.yaml) (tip: press <kbd>.</kbd> right now), or locally with git.
2023

@@ -96,19 +99,19 @@ After the one-time setup, **all you have to do is edit the `.yaml` files, and ev
9699
Adding/removing/changing a link goes like this:
97100

98101
1. You change one or more of the `.yaml` files in the _redirects repo_.
99-
2. `deploy.yaml` tells [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions) that any time someone commits a change to the repo, it should automatically run the `encode.js` script.
100-
3. The `encode.js` script combines all of your `.yaml` files into one, and encodes it[^1].
101-
4. `deploy.yaml` then tells GitHub to take the result of the `encode.js` script and commit it to the `redirect.js` script in the _website repo_.
102-
5. In the _website repo_, GitHub Pages detects a change in the `redirect.js` script, and updates the website.
102+
1. `deploy.yaml` tells [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions) that any time someone commits a change to the repo, it should automatically run the `encode.js` script.
103+
1. The `encode.js` script combines all of your `.yaml` files into one, and encodes it[^1].
104+
1. `deploy.yaml` then tells GitHub to take the result of the `encode.js` script and commit it to the `redirect.js` script in the _website repo_.
105+
1. In the _website repo_, GitHub Pages detects a change in the `redirect.js` script, and updates the website.
103106

104107
Then, a user visiting a link goes like this:
105108

106109
1. They navigate to a link on the website, e.g. `/chatroom`.
107-
2. `chatroom.html` isn't a file in the _website repo_, and thus isn't a page on the website, so GitHub loads [`404.html`](https://en.wikipedia.org/wiki/HTTP_404) for the user instead (but preserves the `/chatroom` url).
110+
1. `chatroom.html` isn't a file in the _website repo_, and thus isn't a page on the website, so GitHub loads [`404.html`](https://en.wikipedia.org/wiki/HTTP_404) for the user instead (but preserves the `/chatroom` url).
108111
This file immediately runs some scripts:
109-
3. The analytics code snippet sends[^2] stats like url, IP, date, time, location, etc. off to Google Analytics or whoever.
110-
4. The `redirect.js` script decodes the redirect lists previously encoded from the _redirects repo_, finds the long url corresponding to "chatroom" (**case-insensitive**), and navigates there instead.
111-
5. They arrive at the intended destination, e.g. `zoom.us/j/12345abcdef`, with virtually no perceptible delay.
112+
1. The analytics code snippet sends[^2] stats like url, IP, date, time, location, etc. off to Google Analytics or whoever.
113+
1. The `redirect.js` script decodes the redirect lists previously encoded from the _redirects repo_, finds the long url corresponding to "chatroom" (**case-insensitive**), and navigates there instead.
114+
1. They arrive at the intended destination, e.g. `zoom.us/j/12345abcdef`, with virtually no perceptible delay.
112115

113116
## Setup
114117

@@ -117,10 +120,10 @@ Then, a user visiting a link goes like this:
117120
1. [Use the _redirects repo_ (this repo) as a template](https://github.com/CU-DBMI/redirects/generate).
118121
**Do not fork**, because you cannot make forks private.
119122
_Name it `redirects` and make it private_.
120-
2. [Use the _website repo_ as a template](https://github.com/CU-DBMI/redirects-website/generate).
123+
1. [Use the _website repo_ as a template](https://github.com/CU-DBMI/redirects-website/generate).
121124
_Name it `redirects-website` and make it public_.
122-
3. [Enable GitHub Pages](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site) on your copied _website repo_ with the default settings.
123-
4. After a minute or so, GitHub should tell you that your site is now being hosted at `your-org.github.io/redirects-website`.
125+
1. [Enable GitHub Pages](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site) on your copied _website repo_ with the default settings.
126+
1. After a minute or so, GitHub should tell you that your site is now being hosted at `your-org.github.io/redirects-website`.
124127

125128
If you ever need to pull in updates from these templates, [see the instructions here](https://stackoverflow.com/questions/56577184/github-pull-changes-from-a-template-repository).
126129

@@ -129,8 +132,8 @@ If you ever need to pull in updates from these templates, [see the instructions
129132
To allow your _redirects repo_ to automatically write to your _website repo_, you need to "connect" them with a deploy key:
130133

131134
1. [Generate an SSH key pair](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#generating-a-new-ssh-key).
132-
2. In your _redirects repo_, [create a new repository actions secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) named `DEPLOY_KEY`, and paste the private SSH key.
133-
3. In your _website repo_, [create a new deploy key](https://docs.github.com/en/developers/overview/managing-deploy-keys#setup-2) with write/push access named `DEPLOY_KEY`, and paste the public SSH key.
135+
1. In your _redirects repo_, [create a new repository actions secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) named `DEPLOY_KEY`, and paste the private SSH key.
136+
1. In your _website repo_, [create a new deploy key](https://docs.github.com/en/developers/overview/managing-deploy-keys#setup-2) with write/push access named `DEPLOY_KEY`, and paste the public SSH key.
134137

135138
### Set up analytics
136139

@@ -155,17 +158,17 @@ e.g. `your-domain.com/some-link`
155158
In summary:
156159

157160
1. Purchase a domain name from a reputable service.
158-
2. Point your domain name provider to GitHub Pages using an `A` record.
161+
1. Point your domain name provider to GitHub Pages using an `A` record.
159162
This is slightly different for each company; they should have their own instructions on how to do it.
160-
3. Set the custom domain field in the "Pages" settings of your _website repo_ (automatically creates a `CNAME` file in the repo).
161-
4. After a minute or so, GitHub should tell you that your site is now being hosted at `your-domain.com`.
163+
1. Set the custom domain field in the "Pages" settings of your _website repo_ (automatically creates a `CNAME` file in the repo).
164+
1. After a minute or so, GitHub should tell you that your site is now being hosted at `your-domain.com`.
162165

163166
#### GitHub user/org site
164167

165168
e.g. `your-org.github.io/some-link`
166169

167170
1. Name your _website repo_ `your-org.github.io` to match your GitHub user/organization name.
168-
2. In your _redirects repo_, change `redirects-website` in `deploy.yaml` to the same name.
171+
1. In your _redirects repo_, change `redirects-website` in `deploy.yaml` to the same name.
169172

170173
[About GitHub user/org sites](https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages#types-of-github-pages-sites).
171174

@@ -188,8 +191,8 @@ In your _website repo_:
188191
If you already have a website being hosted with GitHub Pages that you want to incorporate this approach into:
189192

190193
1. Skip templating the _website repo_.
191-
2. Instead, copy its [`redirect.js` script](https://github.com/CU-DBMI/redirects-website/blob/main/redirect.js) into the **top folder** of your existing website repo, and modify `baseurl` in it as appropriate.
192-
3. Include the script in your 404 page in the [same way it is done here](https://github.com/CU-DBMI/redirects-website/blob/main/404.html).
194+
1. Instead, copy its [`redirect.js` script](https://github.com/CU-DBMI/redirects-website/blob/main/redirect.js) into the **top folder** of your existing website repo, and modify `baseurl` in it as appropriate.
195+
1. Include the script in your 404 page in the [same way it is done here](https://github.com/CU-DBMI/redirects-website/blob/main/404.html).
193196
If an existing page and a redirect have same name/path, the redirect won't happen since the user won't get a [`404`](https://en.wikipedia.org/wiki/HTTP_404).
194197

195198
If your existing website is built and hosted in a different way, this approach would require modification[^3] and might not be appropriate for you.

check.js

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import { addError, getList, onExit } from "./core";
2+
3+
onExit();
4+
5+
// check list of redirects for broken links
6+
async function checkList(list) {
7+
return await Promise.all(
8+
// for each redirect
9+
list.map(async ({ to }) => {
10+
try {
11+
// do simple request to target url
12+
const response = await fetch(to);
13+
if (
14+
// only fail on certain status codes that might indicate link is "broken"
15+
// select as desired from https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
16+
[
17+
400, 404, 405, 406, 408, 409, 410, 421, 500, 501, 502, 503, 504,
18+
].includes(response.status)
19+
)
20+
throw Error(response.status);
21+
} catch (error) {
22+
addError(`"to: ${to}" may be a broken link\n(${error})`);
23+
}
24+
})
25+
);
26+
}
27+
28+
await checkList(getList());

core.js

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
import { readFileSync } from "fs";
2+
import { resolve } from "path";
3+
import { globSync } from "glob";
4+
import { parse } from "yaml";
5+
6+
// if running in github actions debug mode, do extra logging
7+
export const verbose = !!process.env.RUNNER_DEBUG;
8+
9+
// get full list of redirects
10+
export function getList() {
11+
// get yaml files that match glob pattern
12+
const files = globSync("*.y?(a)ml", { cwd: __dirname });
13+
14+
log("Files", files.join(" "));
15+
16+
// start combined list of redirects
17+
const list = [];
18+
19+
// keep track of duplicate entries
20+
const duplicates = {};
21+
22+
// go through each yaml file
23+
for (const file of files) {
24+
// load file contents
25+
const contents = readFileSync(resolve(__dirname, file), "utf8");
26+
27+
// try to parse as yaml
28+
let data;
29+
try {
30+
data = parse(contents);
31+
} catch (error) {
32+
addError(`Couldn't parse ${file}. Make sure it is valid YAML.`);
33+
continue;
34+
}
35+
36+
// check if top level is list
37+
if (!Array.isArray(data)) {
38+
addError(`${file} is not a list`);
39+
continue;
40+
}
41+
42+
// go through each entry
43+
for (let [index, entry] of Object.entries(data)) {
44+
index = Number(index) + 1;
45+
const trace = `${file} entry ${index}`;
46+
47+
// check if dict
48+
if (typeof entry !== "object") {
49+
addError(`${trace} is not a dict`);
50+
continue;
51+
}
52+
53+
// check "from" field
54+
if (!(typeof entry.from === "string" && entry.from.trim())) {
55+
addError(`${trace} "from" field invalid`);
56+
continue;
57+
}
58+
59+
// check "to" field
60+
if (!(typeof entry.to === "string" && entry.to.trim()))
61+
addError(`${trace} "to" field invalid`);
62+
63+
// normalize "from" field. lower case, remove leading slashes.
64+
entry.from = entry.from.toLowerCase().replace(/^(\/+)/, "");
65+
66+
// add to combined list
67+
list.push(entry);
68+
69+
// add to duplicate list. record source file and entry number for logging.
70+
duplicates[entry.from] ??= [];
71+
duplicates[entry.from].push({ ...entry, file, index });
72+
}
73+
}
74+
75+
// check that any redirects exist
76+
if (!list.length) addError("No redirects");
77+
78+
if (verbose) log("Combined redirects list", list);
79+
80+
// trigger errors for duplicates
81+
for (const [from, entries] of Object.entries(duplicates)) {
82+
const count = entries.length;
83+
if (count <= 1) continue;
84+
const duplicates = entries
85+
.map(({ file, index }) => `\n ${file} entry ${index}`)
86+
.join("");
87+
addError(`"from: ${from}" appears ${count} time(s): ${duplicates}`);
88+
}
89+
90+
return list;
91+
}
92+
93+
// collect (caught) errors to report at end
94+
const errors = [];
95+
96+
// add error
97+
export function addError(error) {
98+
errors.push(error);
99+
}
100+
101+
// when script finished, report all errors together
102+
export function onExit() {
103+
process.on("exit", () => {
104+
if (errors.length) {
105+
errors.forEach(logError);
106+
logError(`${errors.length} error(s)`);
107+
process.exit(1);
108+
} else {
109+
process.exitCode = 0;
110+
log("No errors!");
111+
}
112+
});
113+
}
114+
115+
// formatted normal log
116+
export function log(message, data) {
117+
console.info("\x1b[1m\x1b[96m" + message + "\x1b[0m");
118+
if (data) console.log(data);
119+
}
120+
121+
// formatted error log
122+
export function logError(message) {
123+
console.error("\x1b[1m\x1b[91m" + message + "\x1b[0m");
124+
}

0 commit comments

Comments
 (0)