This scraper downloads devdocs.io documentation databases and puts them in ZIM files, a clean and user friendly format for storing content for offline usage.
There are three main ways to install and use devdocs2zim from most recommended to least:
Install using a pre-built container
-
Download the image using
docker:docker pull ghcr.io/openzim/devdocs
Build your own container
-
Clone the repository locally:
git clone https://github.com/openzim/devdocs.git && cd devdocs
-
Build the image:
docker build -t ghcr.io/openzim/devdocs .
Run the software locally using Hatch
-
Clone the repository locally:
git clone https://github.com/openzim/devdocs.git && cd devdocs
-
Install Hatch:
pip3 install hatch
-
Start a hatch shell to install software and dependencies in an isolated virtual environment.
hatch shell
-
Run the
devdocs2zimcommand:devdocs2zim --help
Warning
This project is still a work in progress and isn't ready for use yet, the commands below are examples only.
# Usage
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim [--all|--slug=SLUG|--first=N]
# Fetch all documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all
# Fetch all documents except Ansible
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all --skip-slug-regex "^ansible.*"
# Fetch Vue related documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --slug vue~3 --slug vue_router~4
# Fetch the docs for the two most recent versions of each software
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --first=2One of the following flags is required:
--all: Fetch all Devdocs resources, and produce one ZIM per resource.--slug SLUG: Fetch the provided Devdocs resource. Slugs are the first path entry in the Devdocs URL. For example, the slug for:https://devdocs.io/gcc~12/isgcc~12. Use --slug several times to add multiple.--first N: Fetch the first number of items per slug as shown in the DevDocs UI.
Optional Flags:
--skip-slug-regex REGEX: Skips slugs matching the given regular expression.--output OUTPUT_FOLDER: Output folder for ZIMs. Default: /output--creator CREATOR: Name of content creator. Default: 'DevDocs'--publisher PUBLISHER: Custom publisher name. Default: 'openZIM'--name-format FORMAT: Custom name format for individual ZIMs. Default: 'devdocs_{slug_without_version}_{version}'--title-format FORMAT: Custom title format for individual ZIMs. Value will be truncated to 30 chars. Default: '{full_name} Documentation'--description-format FORMAT: Custom description format for individual ZIMs. Value will be truncated to 80 chars. Default: '{full_name} Documentation'--long-description-format FORMAT: Custom long description format for your ZIM. Value will be truncated to 4000 chars.Default: '{full_name} documentation by DevDocs'--tag TAG: Add tag to the ZIM. Use --tag several times to add multiple. Formatting is supported. Default: ['devdocs', '{slug_without_version}']--logo-format FORMAT: URL/path for the ZIM logo in PNG, JPG, or SVG format. Formatting placeholders are supported. If unset, a DevDocs logo will be used.
Formatting Placeholders
The following formatting placeholders are supported:
{name}: Human readable name of the resource e.g.Python.{full_name}: Name with optional version for the resource e.g.Python 3.12.{slug}: Devdocs slug for the resource e.g.python~3.12.{clean_slug}: Slug with non alphanumeric/period characters replaced with-e.g.python-3.12.{slug_without_version}: Devdocs slug for the resource without the version e.g.python.{version}: Shortened version displayed in devdocs, if any e.g.3.12.{release}: Specific release of the software the documentation is for, if any e.g.3.12.1.{attribution}: License and attribution information about the resource.{home_link}: Link to the project's home page, if any: e.g.https://python.org.{code_link}: Link to the project's source, if any: e.g.https://github.com/python/cpython.{period}: The current date inYYYY-MMformat e.g.2024-02.
Use the commands below to set up the project once:
# Install hatch if it isn't installed already.
❯ pip install hatch
# Local install (in default env) / re-sync packages
❯ hatch run pip list
# Set-up pre-commit
❯ pre-commit installThe following commands can be used to build and test the scraper:
# Show scripts
❯ hatch env show
# linting, testing, coverage, checking
❯ hatch run lint:all
❯ hatch run lint:fixall
# run tests on all matrixed' envs
❯ hatch run test:run
# run tests in a single matrixed' env
❯ hatch env run -e test -i py=3.12 coverage
# run static type checks
❯ hatch env run check:all
# building packages
❯ hatch buildThis project adheres to openZIM's Contribution Guidelines.
This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.3.