security: hardening pass (30 fixes from Bandit/Semgrep + CodeQL triage)#17
Open
HaraldeRoessler wants to merge 8 commits into
Open
security: hardening pass (30 fixes from Bandit/Semgrep + CodeQL triage)#17HaraldeRoessler wants to merge 8 commits into
HaraldeRoessler wants to merge 8 commits into
Conversation
Mechanical fixes from a Bandit/Semgrep + manual-review audit pass.
Scope is intentionally narrow: only changes that are clearly correct,
behaviour-preserving, and unlikely to break running deploys.
1. [CRITICAL] Constant-time admin-key comparison (15 sites)
- app/main.py: 14 endpoints
- app/billing.py: 1 endpoint
`admin_key != expected` is vulnerable to timing attacks — an
attacker can brute-force the key character-by-character by
measuring response latency. Replaced with secrets.compare_digest,
guarded by truthy checks so the function never sees None.
2. [CRITICAL] Admin bcrypt hashes moved out of source
- app/admin_auth.py
Three hardcoded bcrypt hashes for lars/harald/bernd are now
loaded from MOLTRUST_ADMIN_USERS env var (format:
"name:role:$2b$12$...,name:role:$2b$12$...,..."). Missing env
var → empty admin set → all login attempts refused (fail-closed).
*** BREAKING for any deploy that hasn't set MOLTRUST_ADMIN_USERS. ***
Migration: extract the three hashes from git history into your
secrets manager, set the env var on the server, redeploy.
3. [CRITICAL] Container no longer runs as root
- Dockerfile
Added `useradd appuser` + `USER appuser` before CMD. Limits
blast radius if uvicorn or a dependency is compromised.
4. [HIGH] MD5 → SHA-256 for dedup keys
- agents/moltbook_poster.py:262 (post_hash)
- agents/news_scout.py:108 (url_key)
MD5 is cryptographically broken. Both uses are non-security
(dedup), but switching avoids static-analysis noise.
5. [HIGH] defusedxml for untrusted RSS feeds
- agents/news_scout.py + requirements.txt
xml.etree.ElementTree.fromstring on attacker-controlled RSS
feeds is exposed to XXE / Billion Laughs / DTD bombs.
defusedxml is imported with stdlib fallback so the file still
parses on hosts without the package installed.
6. [HIGH] CORS wildcard tightened in dev server
- app/main_dev.py
allow_origins=["*"] → explicit allowlist (localhost + 127.0.0.1
by default; override via MOLTRUST_DEV_CORS_ORIGINS).
7. [HIGH] HTTP timeouts added (7 sites)
- agents/x_thread_followup.py, agents/x_wallet_binding.py,
scripts/telegram_hn_remind.py, seed_ecosystem.py (×4)
requests.{get,post} without timeout can hang indefinitely.
Added timeout=15s on every call.
8. [HIGH] URL scheme validation before urllib.request.urlopen
- app/main.py (ip enrichment), app/ipfs_publisher.py,
agents/poll_payments.py, agents/retention_cleanup.py,
monitor/poll_payments.py, scripts/erc8004_scanner.py
Without scheme validation, urlopen happily handles file://, ftp://,
etc., which under variable-URL conditions can leak local files or
SSRF internal services. Added explicit http(s):// guards.
9. [MEDIUM] Swagger UI CDN gets SRI integrity hash
- app/main.py
Pinned @5 → @5.17.14, added sha384 SRI hash + crossorigin.
Hash was computed locally:
curl -sL .../swagger-ui-bundle.js | openssl dgst -sha384 -binary | openssl base64 -A
10. [MEDIUM] ip-api.com upgraded to HTTPS
- app/main.py (_enrich_ip)
Default base URL is now https://ip-api.com (configurable via
MOLTRUST_IP_ENRICH_BASE). Plain HTTP would let a MITM inject
false geolocation data.
EXPLICITLY OUT OF SCOPE (separate PRs / your call)
- .env.dilithium (gitignored; local-disk hygiene only)
- IP enrichment removal entirely (GDPR/privacy decision)
- X-Forwarded-For trust restriction (deployment-topology dependent)
- Making NONCE_SECRET, MOLTSTACK_DB_PW, etc. required at startup
(migration risk for running deploys)
- In-memory session-store cleanup (admin_auth.py SESSIONS dict)
- print() → logger across 15+ sites (large mechanical pass)
- subprocess → web3 / ssl module refactors (touches core paths)
- Plaintext key fallback in app/crypto/kms_signer.py (design call)
VERIFICATION
Python AST parse — all 15 touched .py files compile cleanly.
No tests modified; existing CI should continue to pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… CSP Follow-up to the previous commit on this branch (89bec1d). Adds the 13 items reviewers flagged as still-open. Bundled into one branch since they're all defence-in-depth / mechanical, and splitting would just multiply review cycles. FAIL-FAST ON REQUIRED CREDENTIALS (these are BREAKING for deploys that haven't set the env vars): M2 NONCE_SECRET — empty default removed; raises at startup if unset (the runtime checks at the call sites caught it before, but better to refuse to start than to serve traffic with trivially-forgeable nonces). M10 MOLTSTACK_DB_PW — empty default removed; raises at startup. HIGH#3 MOLTBOOK_APP_KEY — "moltdev_PENDING" sentinel default removed; endpoint returns 503 when unset. L15 GITHUB_CLIENT_ID — "PENDING" sentinel removed; endpoint returns 503 when unset. OPERATOR MIGRATION (one-time before merging into prod): export MOLTRUST_ADMIN_USERS="..." # from previous commit export NONCE_SECRET="..." # any high-entropy string export MOLTSTACK_DB_PW="..." # the existing DB password # MOLTBOOK_APP_KEY / GITHUB_CLIENT_ID: optional, leave unset to disable LOCALIZED HARDENING (no migration impact): M4 ipaddress.ip_address() validation on /admin/traffic/caller/{ip}. Rejects malformed input before the DB LIKE-prefix lookup. L3 random.choice annotations — noqa: S311 with "non-security content selection" justification at 4 sites (cosmetic uses; no security-adjacent sampling). M9 DATABASE_URL default cleaned of the $(cat /dev/null) shell antipattern. Was never shelled out, but confusing. HIGH#5 _reg_tracker registration-rate hash: 16-char (64-bit) truncation → full SHA-256. Per-API-key rate limits are no longer collision-bypassable. HIGH#6 scrub_secrets pattern expansion: GitHub tokens, Stripe live keys (sk_live_, rk_live_, whsec_), OpenAI/Anthropic sk-proj-*, generic JWTs, AWS secret access keys, plus more PRIVATE KEY header variants. Deliberately NOT adding broad hex / base58 patterns — they would scrub legitimate response payloads (Ethereum TX hashes, IPR commitment hashes, wallet addresses). M12 KMS signer plaintext-fallback gating: DID_PRIVATE_KEY_HEX and ~/.moltrust_did_private_key are accepted in dev, but a hard error is raised when MOLTRUST_ENV=production is set and no KMS-encrypted blob is provided. M13 update_last_seen / update_last_active: bare `except: pass` replaced with `logger.warning(...)`. Still fire-and-forget (doesn't block the request) but failures are observable now. M1 _get_client_ip: X-Real-IP / X-Forwarded-For are honoured ONLY when request.client.host falls in MOLTRUST_TRUSTED_PROXIES (default: RFC1918 + loopback + ULA, which covers a load balancer with a private IP). Operators with public-IP LBs should override the env var; set "0.0.0.0/0,::/0" to fall back to the previous "trust everyone" behaviour. L16 Swagger UI: CSP header added; CSS link pinned to @5.17.14 with SRI integrity hash (matches the JS pin from the previous commit). EXPLICITLY STILL OUT OF SCOPE (separate PRs / your call): HIGH#4 SESSIONS dict in app/admin_auth.py — needs a design decision (TTLCache vs background sweep vs Redis). M11 print() → logger across 15+ sites in app/main.py — too large to mix into this PR. L2 Foundry `subprocess` for on-chain anchoring → Python web3 lib — touches the core anchoring path; similar refactors closed before (PR MoltyCel#2). L1 subprocess for openssl SSL check in /health — low value. C1 .env.dilithium — VERIFIED never in git history, only on local disk. No repo change applicable; operator should delete the file and rotate the key locally. VERIFICATION Python 3.12 AST parse — all 8 touched files compile cleanly. (System Python 3.9 can't parse the project — pre-existing f-string syntax requires 3.10+.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeQL `py/full-ssrf` finding in `_resolve_did_web_external`.
The host portion of any external `did:web:*` identifier is fully
attacker-controlled — it's the path segment of the request. The
function previously fed it straight into `httpx.AsyncClient.get(...)`
without validating that the resolved address is publicly routable.
WHAT AN ATTACKER COULD DO
did:web:169.254.169.254 → read AWS/GCP/Azure metadata
(incl. IAM creds)
did:web:metadata.google.internal → same, by hostname
did:web:127.0.0.1%3A5984 → hit local CouchDB / Redis /
debug ports on the API host
did:web:10.0.0.5%3A8080 → probe internal services in
private network
did:web:[::1] → IPv6 loopback variant
The response body is returned to the caller via `resp.json()` (or
leaks via the "didDocument is not valid JSON" error message), so
the exfiltration channel is direct, not just a side-channel.
FIX
New `_assert_public_host(host)` runs before the httpx GET. It:
- Rejects banned hostnames (localhost, metadata.google.internal,
instance-data.ec2.internal, etc.)
- Strips port and IPv6 brackets, parses the host
- If the host is a literal IP, rejects when it falls in any of:
0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10 (CGNAT), 127.0.0.0/8,
169.254.0.0/16 (link-local incl. cloud metadata),
172.16.0.0/12, 192.0.0.0/24, 192.168.0.0/16,
198.18.0.0/15 (benchmark), 224.0.0.0/4 (multicast),
240.0.0.0/4 (reserved), ::1/128, fc00::/7, fe80::/10,
ff00::/8, ::ffff:0:0/96 (IPv4-mapped IPv6)
- If the host is a name, resolves it via getaddrinfo (wrapped in
asyncio.to_thread) and refuses if ANY returned address falls
in a banned range.
LIMITS
- DNS rebinding (host resolves public at check time, private at
fetch time) is not closed by this patch. Closing it requires
pinning the IP and passing it via httpx's transport; deferred
as a defense-in-depth follow-up.
- did:web spec was designed for public hostnames anyway. Any
legitimate consumer of the registry uses public hosts; this
change should not break any real workflow. `did:web:api.moltrust.ch`
still resolves normally.
VERIFICATION
Python 3.12 AST parse — clean.
No tests modified; existing CI continues to pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to commit d25e70c (SSRF). After running CodeQL default-setup on the fork, 17 additional findings surfaced. Triage outcome: Already closed by earlier commits this PR: 1 (SSRF) False positives (dismissed via CodeQL UI): 4 Real findings fixed in this commit: 5 Stack-trace-exposure (deferred to design): 7 FIXES IN THIS COMMIT #1 [LOG SANITISATION] credit_middleware exception swallows DB password - app/main.py (logger.error in credit_middleware) `logger.error("…: %s", caller_did, e)` — the raw exception `e` can be an asyncpg ConnectionError whose repr() includes the Postgres connection string (with the password). Log only `type(e).__name__` instead. MoltyCel#2 [DEFENSIVE URL ENCODING] /join?ref= referrer parameter - app/main.py /join handler The redirect target is HARDCODED to https://moltrust.ch — the host is not user-controlled. But `f"https://moltrust.ch?ref={ref}"` interpolates `ref` raw, and a payload like `ref="x&malparam=…"` could corrupt the query string. Use `urllib.parse.quote(ref)` to percent-encode the value before interpolation. MoltyCel#3 [STDOUT TOKEN LEAK] telegram_hn_remind print(r.text) - scripts/telegram_hn_remind.py `print(f'Status: {r.status_code}, Response: {r.text}')` — if Telegram error responses ever echo the request URL (which contains the bot token in the path), the body lands in stdout / CI scrollback. Print only the status code. MoltyCel#4 [ReDoS] mpp authorization header regex - packages/mpp/index.js `auth.match(/^(?:Payment|MPP)\s+(.+)$/i)` on an unbounded header is polynomial-quadratic. This package is published to npm, so consumer servers carry the risk. Cap header at 8 KiB and use bounded `\s{1,8}` with a non-greedy first char. MoltyCel#5 [ReDoS] moltrust-openclaw-v2 base URL trim - moltrust-openclaw-v2/src/client.ts `.replace(/\/+$/, "")` is polynomial on pathological inputs. Replace with a `while (str.endsWith("/")) str = str.slice(0, -1)` loop, which is linear. DISMISSED AS FALSE POSITIVES (no code change) MoltyCel#14 py/clear-text-logging-sensitive-data at SPIFFE bind log Logs spiffe_uri, did, caller_did — none are passwords. CodeQL misfires on the "did" → "id" → "password" name-similarity heuristic. MoltyCel#13, MoltyCel#12 py/clear-text-logging-sensitive-data in scripts/threadwatch.py Telegram bot token flows into the request URL but never into a logger or print() call — only to requests.post (which doesn't log URLs by default). MoltyCel#16 py/weak-sensitive-data-hashing in _reg_tracker This is in-memory rate-limit bucket-key derivation, not password storage. bcrypt/argon2 would be wrong here (slow + salted breaks the lookup). SHA-256 of the full API key is the correct primitive for an O(1) tracker. EXPLICITLY DEFERRED (7 stack-trace-exposure findings) Multiple endpoints currently return `{"error": str(e)[:100]}` to callers. CodeQL flags these as info disclosure. Fixing them means changing the API contract — clients that parse the `error` field would break. This is a design call for the maintainer; deferring to a separate PR + discussion rather than including in this hardening pass. VERIFICATION Python 3.12 AST parse — app/main.py + scripts/telegram_hn_remind.py compile cleanly. `node -c packages/mpp/index.js` clean. The TS file change is a syntactically-trivial loop, not type-impacting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lightweight workflow that runs on every push + PR to the fork.
Catches the common breakage modes from the security-hardening pass
without needing a running API or database:
syntax — `python -m compileall` on every source dir
import-smoke — actually `from app.main import app` with all
required env vars set to placeholder values. This
catches startup-time RuntimeError raises that were
added by this PR (NONCE_SECRET, MOLTSTACK_DB_PW,
MOLTRUST_ADMIN_USERS) — they fail FAST here
instead of in production.
pytest-coll — `pytest --collect-only` over the in-repo tests.
Catches ImportError / SyntaxError in test modules.
ruff — informational lint, never blocks (continue-on-error)
bandit — informational SAST, never blocks (continue-on-error)
Separate filename (`fork-ci.yml`) from PR MoltyCel#14's proposed `ci.yml`
so the two can co-exist if PR MoltyCel#14 lands upstream later.
If reviewing this PR upstream and you prefer to land PR MoltyCel#14's
workflow first, this commit can be reverted independently —
none of the other security fixes in this PR depend on it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit's CI smoke test surfaced a pre-existing issue:
three packages are `import`-ed at module top level but were never
declared in `requirements.txt`:
bcrypt — app/admin_auth.py:6 (always required at startup)
stripe — app/billing.py:21 (always required at startup,
since billing.py is imported
unconditionally from main.py:35)
cryptography — used transitively via PyNaCl in some paths
The production deploy clearly has them installed already (otherwise
api.moltrust.ch wouldn't start), but `pip install -r requirements.txt`
on a fresh checkout — including the CI smoke test added in the
previous commit — would crash at `from app.main import app`.
Loose pins so existing prod versions stay valid.
This commit does not change any application behaviour. It only
makes the dependency manifest reflect the actual runtime needs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HaraldeRoessler
added a commit
to HaraldeRoessler/moltrust-api
that referenced
this pull request
May 12, 2026
Fork-side merge of the 29-fix security hardening pass + CI workflow. Lands the fixes on the fork's main so CodeQL re-scans and the 'open but actually fixed' alerts close. Does not affect upstream; this is a fork-local merge. Upstream PR for the same content: MoltyCel#17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up after CodeQL re-scan caught a sibling instance of the
same pattern already fixed in credit_middleware:
app/main.py update_last_seen / update_last_active both did:
logger.warning("update_last_*(%s) failed: %s", did, e)
The raw `e` exposes the asyncpg connection string (with password)
if the failure is a connection-pool error. Replaced with
type(e).__name__ to preserve diagnostic value without leaking creds.
Same pattern, same threat model, same fix as commit 3d1b5aa's
credit_middleware change. Just missed these two sister sites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Security hardening pass from three layered audits — Bandit + Semgrep (two reviewer sessions) plus GitHub CodeQL default-setup on the fork which surfaced 18 additional findings + caught one I'd missed on re-scan. 30 security fixes plus infrastructure to verify them.
The branch is divided into commits by concern so they're individually revertable:
89bec1d8626e14d25e70cpy/full-ssrf)3d1b5aa240e69087b69b4f1b9d82email-validatorfor PydanticEmailStr04f101cupdate_last_seen/update_last_activelogs (sibling of fix #25)Total: 8 commits, +569/−90, 22 files.
✅ CI status — green (5/5 jobs)
The branch carries its own CI workflow (commit
240e690). Running on every push to the fork:syntax (compileall).pyfile that fails to byte-compileimport smoke testfrom app.main import appwith all required env vars set — proves the fail-fast checks (NONCE_SECRET, MOLTRUST_ADMIN_USERS, etc.) work as intended without crashing the app for legitimate operatorspytest --collect-onlyruff (informational)bandit SAST (informational)Latest run on
04f101c: https://github.com/HaraldeRoessler/moltrust-api/actionsWhat the CI surfaced that 4 prior audit passes missed
Running CI uncovered three pre-existing upstream dependency bugs — modules
import-ed at top level but not declared inrequirements.txt:stripe(app/billing.py:21)bcrypt(app/admin_auth.py:6)email-validator(required for PydanticEmailStrinbilling.CheckoutRequest)Production clearly has these installed (otherwise the API wouldn't start), but
pip install -r requirements.txton a fresh checkout would crash at module load. Fixed in commits87b69b4+f1b9d82. These are not security fixes — separate commits so they can be reverted independently if you prefer to land them as their own PR.All 30 security fixes (commits 1–4 + 8)
Commit 1 — mechanical, no migration (10)
secrets.compare_digestfor admin keys (15 sites)MOLTRUST_ADMIN_USERSenv (breaking — see migration)appuserdefusedxmlfor untrusted RSS feeds["*"]→ explicit allowlistrequeststimeouts (7 sites)urlopen(6 sites)@5.17.14+ SRI sha384http://ip-api.com→https://Commit 2 — fail-fast + deeper hardening (13)
ipaddress.ip_address()validation on/admin/traffic/caller/{ip}random.choiceannotations with security justification (4 sites)DB_URLcleaned of$(cat /dev/null)antipatternMOLTBOOK_APP_KEY"PENDING" → empty + 503-when-unset (low migration risk)GITHUB_CLIENT_ID"PENDING" sentinel removed_reg_tracker64-bit hash → full SHA-256scrub_secrets: GitHub / Stripe / OpenAI / JWT / AWS patternsNONCE_SECRETempty default → raise at startup (breaking)MOLTSTACK_DB_PWempty default → raise at startup (breaking)MOLTRUST_ENV=productionupdate_last_seen/update_last_active: bareexcept→logger.warning(sanitisation added later in #30)_get_client_iptrusts X-F-F only fromMOLTRUST_TRUSTED_PROXIESCIDRs (breaking if LB is public-IP)/docs+ SRI on Swagger CSSCommit 3 — critical SSRF (1)
_resolve_did_web_external: refusesdid:webhosts that resolve to private / loopback / link-local / cloud-metadata IPs. CodeQLpy/full-ssrf.Commit 4 — CodeQL triage (5)
credit_middlewarelogger.errorsanitised —type(e).__name__instead of raw exception (DB conn string with password could appear inrepr(e))/join?ref=…percent-encodes the ref value before interpolating into the redirect URLtelegram_hn_remind.pyprints status only, never response body (bot token in URL could leak)packages/mpp/index.jsAuthorization-header regex: 8 KiB cap + bounded whitespace (ReDoS)moltrust-openclaw-v2/src/client.tstrailing-slash trim: loop instead of/+$/regex (ReDoS)Commit 8 — CodeQL re-scan follow-up (1)
update_last_seen/update_last_active: also sanitiselogger.warning(... e)totype(e).__name__(sibling of #25; missed on round 4). Caught by CodeQL re-scan after merging the security branch into fork main.Five env vars are now refused-when-empty (fail-closed-safer). Setting them:
The existing bcrypt hashes for
MOLTRUST_ADMIN_USERSare recoverable from any commit before89bec1don this branch.CodeQL false positives dismissed on the fork (no code change)
10 alerts dismissed with justification comments visible in the GitHub Security UI:
py/clear-text-logging-sensitive-data(×5)did/spiffe_uri,_reg_trackerflow uses public DIDs, sites already sanitised viatype(e).__name__. CodeQL's name-similarity heuristic ("did" → "id" → "password") misfires.py/weak-sensitive-data-hashing(×2)_reg_trackeris in-memory rate-limit bucket-key derivation, not password storage. bcrypt would be wrong.py/full-ssrf(×1)_assert_public_hostruns before httpx.get and rejects private/loopback/cloud-metadata. CodeQL's taint analysis cannot model that the guard raises.py/url-redirection(×1)urllib.parse.quote(ref, safe='')percent-encodes user input; host is hardcoded tomoltrust.ch. CodeQL doesn't recognise quote() as a sanitiser.requests.post).Explicitly deferred (separate PRs / your call)
py/stack-trace-exposureinapp/main.py+test_harness/routes.py{"error": str(e)[:100]}and clients may parse the field. Operator-debuggability trade-off; needs your design decision.SESSIONSdict cleanup inapp/admin_auth.pyprint()→loggeracross 15+ sitessubprocess→ Python web3subprocessfor openssl in/health.env.dilithiumHow to merge cleanly
Three options, in order of preference:
A. Merge as-is — squash or rebase-merge, all 8 commits land. Migration env vars set on the server before/at merge time.
B. Merge without the CI workflow — revert commit
240e690if you'd rather land PR #14's workflow first. The 7 remaining commits (security + build) are self-sufficient.C. Pick-and-choose — every commit is independently revertable. Drop any specific fix you don't want by reverting just that commit.
Verification — what's actually been tested
04f101c, includingfrom app.main import appwith placeholder env varscdn.jsdelivr.netfor both Swagger files_assert_public_host(SSRF guard) — 14/14 edge cases pass: public DNS allowed, all 16 banned ranges blocked, IPv6 with/without brackets, mixed-case banned hostnames, port-strippingscrub_secretspatterns — 12/12 legitimate response shapes pass through clean, 9/9 real secret formats get scrubbedpy/stack-trace-exposure, deliberately deferred) — https://github.com/HaraldeRoessler/moltrust-api/security/code-scanning🤖 Generated with Claude Code