Extracts login events from a DHIS2 server, geolocates the source IPs, checks them against threat-intel blocklists, and produces a report that flags logins from outside the expected home country and accounts that look compromised.
It's aimed at national DHIS2 systems, where almost every login should come from inside one country. A flagged account is a lead, not a verdict — there are often innocent explanations, so treat the report as a starting point for investigation.
- Extract — SSH into the server, pull
AuthenticationSuccessEvententries from the Tomcat journal, and parse out the timestamp, username, and IP address. - Geolocate — Look up each IP against the DB-IP free country database and the Starlink GeoIP database.
- Reputation check — Match each IP against free, offline threat-intel blocklists (AbuseIPDB high-confidence aggregate, FireHOL, blocklist.de, Emerging Threats) to flag logins from known-malicious IPs.
- Analyse — Flag accounts showing suspicious patterns and write a dated report.
Starlink IPs are identified separately and treated as home-country logins, since Starlink users physically in the home country can appear as foreign due to how satellite traffic is routed.
extract_logins.sh is written for servers deployed with dhis2-server-tools or any similar LXD-based setup, where DHIS2 runs inside a named LXC container and Tomcat logs to the systemd journal.
Other deployment types will need a different extraction approach. For example:
- Docker — Tomcat logs go to the container log driver rather than journald. You would use
docker logsor read the log file directly and adapt the parsing accordingly. - Bare metal / standalone Tomcat — No LXC wrapper;
journalctl -u tomcat9can be run directly on the host without thelxc execstep.
Regardless of how DHIS2 is deployed, the goal is the same: produce a logins.txt file in the format that analyse.py expects (see logins.txt format below), then run analyse.py against it.
sshaccess to the target server (key-based auth recommended)- python3 virtual environment:
python3 -m venv env && source env/bin/activate && pip3 install -r requirements.txt perl(standard on most Linux systems)
export SERVER=user@hostname # SSH target
export INSTANCE=prod # LXC container name
export UNIT=tomcat9 # tomcat9 or tomcat10 (default: tomcat9)
export COUNTRY=RW # ISO country code for the home countryAdd these to your ~/.bashrc or ~/.profile to persist them.
./download_geoip.shThis downloads the current month's DB-IP country database (~8 MB) and the Starlink CIDR list. It is safe to re-run — it skips the DB-IP download if the file is already current month. Run it again at the start of each month to refresh.
./download_blocklists.shThis fetches several free, no-API-key IP reputation feeds into blocklists/ and is run automatically by run_all.sh. It is safe to re-run — it skips the download if the lists are less than 12 hours old. The feeds are:
| Feed | Source | Notes |
|---|---|---|
abuseipdb_100 |
borestad mirror | AbuseIPDB IPs at ~100% abuse confidence (no API key needed) |
firehol_level1 / firehol_level3 |
FireHOL | Curated low-false-positive firewall blocklists (includes Spamhaus DROP, DShield, Feodo) |
blocklist_de |
blocklist.de | Hosts reported for SSH/brute-force/login attacks |
et_compromised |
Emerging Threats | Known compromised / hostile hosts |
If the directory is missing or empty, analyse.py simply skips the malicious-IP check. Respect the source licenses (Spamhaus data requires attribution and prohibits commercial use).
A blocklist hit is enrichment, not proof. Shared NAT/CGNAT, Tor exit nodes, and cloud provider egress IPs can produce false positives — corroborate with the other flags before acting.
- Run the pipeline for a time window:
./run_all.sh "7 days ago". - Open the generated
report_*.txtand read the high-suspicion accounts and known-malicious IP matches. - Dig into anything that stands out:
./investigate.sh <username>. - Need more history? Widen the window and re-run — but the server's journal only keeps a limited backlog (often around two weeks), so that's the most you can recover after the fact. For ongoing coverage, run this on a schedule and keep the
logins_*.txtextracts.
./run_all.sh "1 day ago"This runs the whole thing end to end: refresh the databases and blocklists, extract logins, analyse. The time window is passed straight to journalctl -S, so anything journalctl accepts works:
./run_all.sh "6 hours ago"
./run_all.sh "2026-05-01"Extract only (writes to logins.txt):
./extract_logins.sh "1 day ago"Analyse an existing logins file:
python3 analyse.py logins.txtInvestigate a single account:
./investigate.sh <username>For one account, this lists every IP it logged in from (with counts, owner/ASN and reverse DNS via ipinfo.io, and any blocklist match), then prints a chronological timeline that marks country changes. It's the quickest way to tell a compromised account from a false positive: a real user has a consistent local footprint, while a takeover shows a sudden switch to cloud or foreign IPs.
Needs internet for the ownership lookups. Blocklist matching here is exact-IP only — use analyse.py for CIDR/range matches.
Each run of analyse.py prints a report to stdout and saves a copy to a dated file (report_YYYYMMDD_HHMMSS.txt).
The report contains:
- Overview — total login events, home/foreign split, unresolved IPs, unique accounts, malicious-IP logins
- Starlink logins — IPs matched against the Starlink database (excluded from foreign counts)
- Foreign login countries — breakdown of logins by non-home country
- Known-malicious IP matches — login IPs found in the threat-intel feeds, with login count, affected user count, and the feed that flagged each
- Suspicious account tiers:
- Any foreign login — at least one login from outside the home country
- Impossible travel — logins from two different countries within 60 minutes
- No home-country logins — account has never logged in from the home country
- Majority foreign — more logins from outside the home country than inside
- Known-malicious IP — at least one login from an IP on a threat-intel blocklist
- High suspicion — accounts triggering two or more of the above
- High suspicion account detail — per-account breakdown of countries, top IPs, and impossible travel events
| File | Purpose |
|---|---|
run_all.sh |
Full pipeline entry point |
extract_logins.sh |
Extract logins from the remote server via SSH |
download_geoip.sh |
Download/refresh GeoIP databases |
download_blocklists.sh |
Download/refresh threat-intel IP blocklists |
analyse.py |
Geolocate IPs, reputation-check, and produce the report |
investigate.sh |
Deep-dive a single account's footprint + IP ownership |
dbip-country-lite.mmdb |
DB-IP country database (auto-downloaded) |
starlink-geoip.csv |
Starlink CIDR → country map (auto-downloaded) |
blocklists/ |
Threat-intel feed files (auto-downloaded) |
logins*.txt |
Login extracts (gitignored — they contain usernames and IPs) |
report_*.txt |
Saved reports (one per run) |
analyse.py expects one login event per line with three space-separated fields:
2026-05-22T14:41:11,667 jsmith 102.93.8.147
| Field | Format | Example |
|---|---|---|
| Timestamp | ISO 8601, comma or dot as subsecond separator | 2026-05-22T14:41:11,667 |
| Username | No spaces | jsmith |
| IP address | IPv4 | 102.93.8.147 |
If you are adapting the extraction for a different deployment type, this is the format to target. Trailing punctuation on the IP field (e.g. a stray ;) is stripped automatically by analyse.py.
The DB-IP lite database is reliable at country level (~95–99% globally). For the home country, well-known ISP ranges (e.g. MTN, RwandaTel) are correctly attributed. False negatives — home-country users appearing as foreign — are more likely than false positives, particularly for users on Starlink (handled separately) or traffic routed through regional hubs in neighbouring countries.
Cloud provider IPs (Google Cloud, AWS, Azure, etc.) appearing repeatedly in foreign logins, especially switching rapidly with home-country IPs, are a stronger signal than a one-off foreign login. An account that only ever appears from a cloud IP, or a real user whose normal local logins are suddenly interleaved with cloud ones, is worth a close look.
Know your own service accounts. A BI or integration tool (e.g. a Superset/DHIS2 connector) authenticates constantly from a fixed server IP, which is usually foreign — so it will show up as high-suspicion every run. That's an expected false positive, but it's worth confirming the IP belongs to your integration and hasn't changed.