Skip to content

feat: capture and log attacker payload metadata from MQTT sessions#18

Open
Uday9909 wants to merge 1 commit intohoneynet:mainfrom
Uday9909:feat/mqtt-payload-capture
Open

feat: capture and log attacker payload metadata from MQTT sessions#18
Uday9909 wants to merge 1 commit intohoneynet:mainfrom
Uday9909:feat/mqtt-payload-capture

Conversation

@Uday9909
Copy link
Copy Markdown

Summary

Adds real-time attacker payload capture to the MQTT tarpit, recording
what attackers send during sessions as a new Prometheus metric.
Follows the same pattern as PR #15 (Telnet input capture).

Changes

servers/mqtt_pit.c

  • Add sanitizeMetricToken() to replace non-printable bytes with '.'
    and truncate to 256 bytes, matching the Telnet sanitization approach
  • Add emitMqttPayloadMetric() to emit captures via sendMetric() IPC
  • Capture client_id, username, and protocol version from CONNECT packets
  • Capture topic and payload from PUBLISH packets
  • Capture topic filter from SUBSCRIBE packets
  • Add struct mqttClient* client parameter to readSubscribe() and
    readPublish() so they can emit captures with source IP and port
  • Populate client->port with ntohs(clientAddr.sin_port) in accept() path
  • Remove password from existing credentials metric emission — old code
    sent username and password, new code sends username only

shared/structs.h

  • Add uint16_t port to struct mqttClient for source port labeling

prometheus/main.go

  • Add mqttPayloadCaptured CounterVec with labels ip, port, protocol,
    packet_type, payload
  • Handle payloadCaptured command in handleMetric switch
  • Guard against malformed lines with len(fields) < 7 check

New Metric

mqtt_pit_payload_captured with labels:

  • ip: attacker source IP
  • port: attacker source port
  • protocol: always MQTT
  • packet_type: connect, publish, or subscribe
  • payload: sanitized payload content truncated at 256 bytes

Why

The MQTT tarpit previously discarded all semantic data sent by
attackers beyond version and topic metadata. This change captures
client identifiers, publish payloads, and subscription patterns
in real time, giving researchers visibility into attacker tooling,
credential stuffing scripts, and topic reconnaissance behavior
without requiring log parsing.

Safety

  • Passwords are never captured. The password buffer is skipped
    entirely with offset += passwordLength and never passed to any
    metric or log function
  • Binary payloads are sanitized to printable ASCII before emission
  • All captured fields truncated at 256 bytes
  • NULL guards on client, packetType, and payload in
    emitMqttPayloadMetric before any processing
  • Malformed or truncated packets return early without crash
  • Metric emission is best-effort via existing sendMetric() — if
    the socket is unavailable the tarpit continues normally

Testing

Tested by connecting an MQTT client via mosquitto_pub and
mosquitto_sub and confirming mqtt_pit_payload_captured appears
in the /metrics endpoint output with correct labels.

Go build passes: cd prometheus && go build ./...

Add mqtt_pit_payload_captured Prometheus metric that records
attacker-supplied data from three MQTT packet types:

- CONNECT: client_id, username, protocol version (password never captured)
- PUBLISH: topic name and payload
- SUBSCRIBE: topic filter

Follows the same sendMetric() unixgram IPC pattern as PR honeynet#15.

Safety:
- Passwords skipped entirely, never passed to any metric
- Binary payloads sanitized to printable ASCII
- All fields truncated at 256 bytes
- Malformed packets handled without crash
- NULL guards on all capture functions

Also removes password from existing credentials metric emission.
Old code sent username and password. New code sends username only.

Adds uint16_t port to struct mqttClient for source port labeling.
@vinayaktyagi10
Copy link
Copy Markdown

vinayaktyagi10 commented Mar 23, 2026

Nice work on the payload capture. One thing worth considering would
be using arbitrary attacker input (payload, client_id etc.) as
Prometheus labels creates unbounded cardinality, since each unique
value generates a new time series. I ran into the same issue in
PR #15 and just pushed a fix removing the data label there.

Might be worth doing the same here, keep packet_type and ip as
labels, log the raw payload content separately rather than as a
label dimension.

Also the credentials metric change dropping password is a breaking
change to existing label format, worth noting in the description
so anyone querying it knows to update their dashboards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants