Skip to content

Latest commit

 

History

History
189 lines (138 loc) · 6.03 KB

RUNNING.md

File metadata and controls

189 lines (138 loc) · 6.03 KB

Running repology

Dependencies

Needed for core:

Needed for fetching/parsing specific repository data:

Needed for web-application:

Optional, for doing HTML validation when running tests:

Optional, for checking schemas of configuration files:

Optional, for python code linding:

Building

Though repology is mostly a Python project, it contains C utility to read binary rpm format, which is used for parsing ALT Sisyphus repository. To build the utility, run make in project root. You need librpm and pkg-config.

Usage

Preparing

Since repology rules live in separate repository you'll need to clone it first. The location may be arbitrary, but rules.d subdirectory is what default config points to, so using it is the most simple way.

git clone https://github.com/repology/repology-rules.git rules.d

Configuration

First, you may need to tune settings which are shared by all repology utilities, such as directory for storing downloaded repository state or DSN (string which specifies how to connect to PostgreSQL database). See repology.conf.default for default values, create repology.conf in the same directory to override them or specify path to alternative config in REPOLOGY_SETTINGS environment variable, or override settings via command line.

By default, repology uses ./_state state directory and repology/repology/repology database/user/password on localhost.

Fetching/updating repository data

First, let's try to fetch some repository data and see if it works. No database is needed at this point.

./repology-update.py --fetch --parse
  • --fetch tells the utility to fetch raw repository data (download files, scrape websites, clone git repos) into state directory. Note that it won't refetch (update) data unless --update is also specified.
  • --parse parses downloaded raw data into internal format which is also saved into state directory.

After data is downloaded you may inspect it with

./repology-dump.py | less

The utility allows filtering and several modes of operation, see --help for full list of options.

Creating the database

To run repology webapp you need a PostgreSQL database.

First, ensure PostgreSQL server is installed and running, then execute the following commands to create a database for repology:

psql --username postgres -c "CREATE DATABASE repology"
psql --username postgres -c "CREATE USER repology WITH PASSWORD 'repology'"
psql --username postgres -c "GRANT ALL ON DATABASE repology TO repology"
psql --username postgres --dbname repology -c "CREATE EXTENSION pg_trgm"
psql --username postgres --dbname repology -c "CREATE EXTENSION libversion"

now you can create database schema (tables, indexes etc.) with:

./repology-update.py --initdb

and finally push parsed data into the database with:

./repology-update.py --database

Running the webapp

Repology is a flask application, so as long as you've set up database and configuration, you may just run the application locally:

./repology-app.py

and point your browser to http://127.0.0.1:5000/ to view the site. This should be enough for personal usage, experiments and testing.

Alternatively, you may deploy the application in numerous ways, including mod_wsgi, uwsgi, fastcgi and plain CGI application. See flask documentation on deployment for more info.

Keeping up to date

To keep repository data up to date, you'll need to periodically refetch it and update the database. Note there's no need to recreate database schema every time (unless it needs changing). Everything can be done with single command:

./repology-update.py --fetch --update --parse --database

Link checker

A separate utility exists to gather and refresh availability information for links (homepages and downloads) extracted by repology. After updating the database once, you may run

./repology-linkchecker.py

This will issue a HEAD (and if that fails, a GET) request for each link, and save the result (such as HTTP code and redirect information) in the database.

Note that typical repology installation would know of hundreds of thousands of links so this may take time. Consult --help for a list of additional options. Typical Repology setup with regular update would run something like

./repology-linkchecker.py --unchecked --jobs 10

after each update to handle newly discovered links, and

./repology-linkchecker.py --checked --age 7 --jobs 10

weekly to refresh information of already known links in background.