A super simple Python utility to check for dead links in a website.
PyLich is available on PyPI and can be installed using pip:
pip install pylichSimply provide the URL of the sitemap and pylich will crawl through links in the pages and check their status. pylich can be used as a command line tool or as a Python package.
pylich https://www.example.com/sitemap.xmlThe command will exit with a status code of 1 if any dead links are found and 0 otherwise.
| Flag | Arguments | Description |
|---|---|---|
-v |
N/A | Verbose mode. Print progress to the console as well as a summary of the dead links at the end. |
-i |
List of integer HTTP response codes | Ignore links with the specified HTTP response codes. |
--failed-is-dead |
N/A | Treat any failed HTTP request as a dead link, regardless of the HTTP status code. |
pylich https://www.example.com/sitemap.xml -v -i 404 500PyLich can also be used as a Python package.
from pylich import LinkChecker
checker = LinkChecker(
"https://www.example.com/sitemap.xml",
verbose=True,
ignored_status_codes=[404, 500]
)
urls = checker.get_sitemap_urls()
broken_links = checker.check_links(urls)
checker.print_dead_links()Pull requests are welcome.
Package and dependency management is done using Poetry. To install the dependencies and the package in development mode, run:
poetry installTo run the tests, run:
pytestPre-commit hooks are available to run code formatting and linting. To install the hooks, run:
pre-commit install