Skip to content

Commit

Permalink
Merge pull request #6 from NieTiger/master
Browse files Browse the repository at this point in the history
added option to exclude a list of prefixes.
  • Loading branch information
ScholliYT authored Nov 19, 2020
2 parents 05e6641 + c636a33 commit b845620
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 5 deletions.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ Based on this work: https://github.com/healeycodes/Broken-Link-Crawler

**Required** The url of the website to check.

### `exclude_url_prefix`

**Optional** Comma separated list of url prefixes to exclude. Some sites do not respond properly to bots and you might want to exclude those known sites to prevent a failed build.

### `verbose`

**Optional** Turn verbose mode on/off (default false).
Expand All @@ -31,6 +35,7 @@ Based on this work: https://github.com/healeycodes/Broken-Link-Crawler
uses: ScholliYT/[email protected]
with:
website_url: 'https://github.com/ScholliYT/Broken-Links-Crawler-Action'
exclude_url_prefix: 'mailto:,https://www.linkedin.com,https://linkedin.com'
verbose: 'true'
max_retry_time: 30
max_retries: 5
Expand All @@ -41,5 +46,5 @@ with:
The easiest way to run this action locally is to use Docker. Just build a new image and pass the correct env. variables to it.
```
docker build --tag broken-links-crawler-action:latest .
docker run -e INPUT_WEBSITE_URL="https://github.com/ScholliYT/Broken-Links-Crawler-Action" -e INPUT_VERBOSE="true" -e INPUT_MAX_RETRY_TIME=30 -e INPUT_MAX_RETRIES=5 broken-links-crawler-action:latest
docker run -e INPUT_WEBSITE_URL="https://github.com/ScholliYT/Broken-Links-Crawler-Action" -e INPUT_VERBOSE="true" -e INPUT_MAX_RETRY_TIME=30 -e INPUT_MAX_RETRIES=5 -e INPUT_EXCLUDE_URL_PREFIX="mailto:,https://www.linkedin.com,https://linkedin.com" broken-links-crawler-action:latest
```
10 changes: 7 additions & 3 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,21 @@ inputs:
website_url: # id of input
description: 'Which website to check'
required: true
exclude_url_prefix:
description: 'Comma separated list of URL prefixes to ignore'
required: false
default: 'mailto:'
verbose:
description: 'Turn verbose mode on/off'
require: false
required: false
default: 'false'
max_retry_time:
description: 'Maximum time for request retries'
require: false
required: false
default: 30
max_retries:
description: 'Maximum request retry count'
require: false
required: false
default: 4
runs:
using: 'docker'
Expand Down
7 changes: 6 additions & 1 deletion deadseeker.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
import logging

search_attrs = set(['href', 'src'])
excluded_link_prefixes = set(['mailto:'])
excluded_link_prefixes = set()
agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'

class LinkParser(HTMLParser):
Expand Down Expand Up @@ -104,8 +104,13 @@ def make_statuscode_request(self, req):
# read env variables
website_url = os.environ['INPUT_WEBSITE_URL']
verbose = os.environ['INPUT_VERBOSE']
exclude_prefix = os.environ['INPUT_EXCLUDE_URL_PREFIX']
print("Checking website: " + str(website_url))
print("Verbose mode on: " + str(verbose))
if exclude_prefix and exclude_prefix != '':
for el in exclude_prefix.split(","):
excluded_link_prefixes.add(el.strip())
print(f"Excluding prefixes: {excluded_link_prefixes}")

if verbose:
logging.getLogger('backoff').addHandler(logging.StreamHandler())
Expand Down

0 comments on commit b845620

Please sign in to comment.