Skip to content

bobjolliffe/dhis2-geoip-analysis

Repository files navigation

DHIS2 Login GeoIP Analysis

Extracts login events from a DHIS2 server, geolocates the source IPs, checks them against threat-intel blocklists, and produces a report that flags logins from outside the expected home country and accounts that look compromised.

It's aimed at national DHIS2 systems, where almost every login should come from inside one country. A flagged account is a lead, not a verdict — there are often innocent explanations, so treat the report as a starting point for investigation.

How it works

  1. Extract — SSH into the server, pull AuthenticationSuccessEvent entries from the Tomcat journal, and parse out the timestamp, username, and IP address.
  2. Geolocate — Look up each IP against the DB-IP free country database and the Starlink GeoIP database.
  3. Reputation check — Match each IP against free, offline threat-intel blocklists (AbuseIPDB high-confidence aggregate, FireHOL, blocklist.de, Emerging Threats) to flag logins from known-malicious IPs.
  4. Analyse — Flag accounts showing suspicious patterns and write a dated report.

Starlink IPs are identified separately and treated as home-country logins, since Starlink users physically in the home country can appear as foreign due to how satellite traffic is routed.

Deployment compatibility

extract_logins.sh is written for servers deployed with dhis2-server-tools or any similar LXD-based setup, where DHIS2 runs inside a named LXC container and Tomcat logs to the systemd journal.

Other deployment types will need a different extraction approach. For example:

  • Docker — Tomcat logs go to the container log driver rather than journald. You would use docker logs or read the log file directly and adapt the parsing accordingly.
  • Bare metal / standalone Tomcat — No LXC wrapper; journalctl -u tomcat9 can be run directly on the host without the lxc exec step.

Regardless of how DHIS2 is deployed, the goal is the same: produce a logins.txt file in the format that analyse.py expects (see logins.txt format below), then run analyse.py against it.

Prerequisites

  • ssh access to the target server (key-based auth recommended)
  • python3 virtual environment: python3 -m venv env && source env/bin/activate && pip3 install -r requirements.txt
  • perl (standard on most Linux systems)

Setup

1. Set environment variables

export SERVER=user@hostname   # SSH target
export INSTANCE=prod          # LXC container name
export UNIT=tomcat9           # tomcat9 or tomcat10 (default: tomcat9)
export COUNTRY=RW             # ISO country code for the home country

Add these to your ~/.bashrc or ~/.profile to persist them.

2. Download the GeoIP databases

./download_geoip.sh

This downloads the current month's DB-IP country database (~8 MB) and the Starlink CIDR list. It is safe to re-run — it skips the DB-IP download if the file is already current month. Run it again at the start of each month to refresh.

3. Download the threat-intel blocklists

./download_blocklists.sh

This fetches several free, no-API-key IP reputation feeds into blocklists/ and is run automatically by run_all.sh. It is safe to re-run — it skips the download if the lists are less than 12 hours old. The feeds are:

Feed Source Notes
abuseipdb_100 borestad mirror AbuseIPDB IPs at ~100% abuse confidence (no API key needed)
firehol_level1 / firehol_level3 FireHOL Curated low-false-positive firewall blocklists (includes Spamhaus DROP, DShield, Feodo)
blocklist_de blocklist.de Hosts reported for SSH/brute-force/login attacks
et_compromised Emerging Threats Known compromised / hostile hosts

If the directory is missing or empty, analyse.py simply skips the malicious-IP check. Respect the source licenses (Spamhaus data requires attribution and prohibits commercial use).

A blocklist hit is enrichment, not proof. Shared NAT/CGNAT, Tor exit nodes, and cloud provider egress IPs can produce false positives — corroborate with the other flags before acting.

Usage

Typical workflow

  1. Run the pipeline for a time window: ./run_all.sh "7 days ago".
  2. Open the generated report_*.txt and read the high-suspicion accounts and known-malicious IP matches.
  3. Dig into anything that stands out: ./investigate.sh <username>.
  4. Need more history? Widen the window and re-run — but the server's journal only keeps a limited backlog (often around two weeks), so that's the most you can recover after the fact. For ongoing coverage, run this on a schedule and keep the logins_*.txt extracts.

Full pipeline (recommended)

./run_all.sh "1 day ago"

This runs the whole thing end to end: refresh the databases and blocklists, extract logins, analyse. The time window is passed straight to journalctl -S, so anything journalctl accepts works:

./run_all.sh "6 hours ago"
./run_all.sh "2026-05-01"

Steps individually

Extract only (writes to logins.txt):

./extract_logins.sh "1 day ago"

Analyse an existing logins file:

python3 analyse.py logins.txt

Investigate a single account:

./investigate.sh <username>

For one account, this lists every IP it logged in from (with counts, owner/ASN and reverse DNS via ipinfo.io, and any blocklist match), then prints a chronological timeline that marks country changes. It's the quickest way to tell a compromised account from a false positive: a real user has a consistent local footprint, while a takeover shows a sudden switch to cloud or foreign IPs.

Needs internet for the ownership lookups. Blocklist matching here is exact-IP only — use analyse.py for CIDR/range matches.

Output

Each run of analyse.py prints a report to stdout and saves a copy to a dated file (report_YYYYMMDD_HHMMSS.txt).

The report contains:

  • Overview — total login events, home/foreign split, unresolved IPs, unique accounts, malicious-IP logins
  • Starlink logins — IPs matched against the Starlink database (excluded from foreign counts)
  • Foreign login countries — breakdown of logins by non-home country
  • Known-malicious IP matches — login IPs found in the threat-intel feeds, with login count, affected user count, and the feed that flagged each
  • Suspicious account tiers:
    • Any foreign login — at least one login from outside the home country
    • Impossible travel — logins from two different countries within 60 minutes
    • No home-country logins — account has never logged in from the home country
    • Majority foreign — more logins from outside the home country than inside
    • Known-malicious IP — at least one login from an IP on a threat-intel blocklist
    • High suspicion — accounts triggering two or more of the above
  • High suspicion account detail — per-account breakdown of countries, top IPs, and impossible travel events

Files

File Purpose
run_all.sh Full pipeline entry point
extract_logins.sh Extract logins from the remote server via SSH
download_geoip.sh Download/refresh GeoIP databases
download_blocklists.sh Download/refresh threat-intel IP blocklists
analyse.py Geolocate IPs, reputation-check, and produce the report
investigate.sh Deep-dive a single account's footprint + IP ownership
dbip-country-lite.mmdb DB-IP country database (auto-downloaded)
starlink-geoip.csv Starlink CIDR → country map (auto-downloaded)
blocklists/ Threat-intel feed files (auto-downloaded)
logins*.txt Login extracts (gitignored — they contain usernames and IPs)
report_*.txt Saved reports (one per run)

logins.txt format

analyse.py expects one login event per line with three space-separated fields:

2026-05-22T14:41:11,667 jsmith 102.93.8.147
Field Format Example
Timestamp ISO 8601, comma or dot as subsecond separator 2026-05-22T14:41:11,667
Username No spaces jsmith
IP address IPv4 102.93.8.147

If you are adapting the extraction for a different deployment type, this is the format to target. Trailing punctuation on the IP field (e.g. a stray ;) is stripped automatically by analyse.py.

Notes on accuracy

The DB-IP lite database is reliable at country level (~95–99% globally). For the home country, well-known ISP ranges (e.g. MTN, RwandaTel) are correctly attributed. False negatives — home-country users appearing as foreign — are more likely than false positives, particularly for users on Starlink (handled separately) or traffic routed through regional hubs in neighbouring countries.

Cloud provider IPs (Google Cloud, AWS, Azure, etc.) appearing repeatedly in foreign logins, especially switching rapidly with home-country IPs, are a stronger signal than a one-off foreign login. An account that only ever appears from a cloud IP, or a real user whose normal local logins are suddenly interleaved with cloud ones, is worth a close look.

Know your own service accounts. A BI or integration tool (e.g. a Superset/DHIS2 connector) authenticates constantly from a fixed server IP, which is usually foreign — so it will show up as high-suspicion every run. That's an expected false positive, but it's worth confirming the IP belongs to your integration and hasn't changed.

About

Tool to check for suspicious DHIS2 logins

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors