Hi! I've been reading through the telemetry pipeline as part of getting familiar with the codebase, and I noticed a parsing edge case that can crash the exporter.
Since the issue affects multiple parts of the repo, I’m opening an issue first. It seems to me that there’s room to improve the overall effectiveness of the parsing logic.
Description
mqtt_pit.c doesn't validate topicLength before copying the topic string. When a packet arrives with topicLength = 0, the C code sets topic[0] = '\0' and snprintf produces a message with a double space:
On the Go side, strings.Fields() collapses the double space, so the resulting slice has only 3 elements instead of the expected 4. The handler then tries to access fields[3] and this index is out of range.
runtime error: index out of range [3] with length 3
The panic kills listenForMetrics, and since there's no recover() wrapper, metric ingestion stops for all tarpits. The /metrics endpoint keeps serving though, so Prometheus continues scraping data with no alerts.
Steps to reproduce
Prerequisites: build the project with make all.
1. Start the exporter and MQTT tarpit
rm -f /tmp/tarpit_exporter.sock
GEO_DB=./prometheus/GeoLite2-Country.mmdb ./bin/prometheus_exporter &
./bin/mqtt_pit 11883 4096 5000 10000 50 4096 &
2. Verify metrics are alive
curl -s http://localhost:9101/metrics | grep "^mqtt_pit"
Expected: counters at 0.
3. Send a normal CONNECT followed by a SUBSCRIBE with topicLength=0
import socket, time
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('127.0.0.1', 11883))
client_id = b'attacker'
vh = b'\x00\x04MQTT\x04\x00\x00\x3c'
payload = len(client_id).to_bytes(2, 'big') + client_id
remaining = vh + payload
sock.send(bytes([0x10, len(remaining)]) + remaining)
time.sleep(0.5)
sock.recv(1024)
sock.send(bytes([0x82, 0x05, 0x00, 0x01, 0x00, 0x00, 0x01]))
time.sleep(1)
sock.close()
4. Observe the crash
curl -s http://localhost:9101/metrics | grep "^mqtt_pit"
The exporter log shows:
panic: runtime error: index out of range [3] with length 3
goroutine 25 [running]:
main.handleMetric(...)
prometheus/main.go:220
main.listenForMetrics(...)
prometheus/main.go:164
5. Verify the exporter process state
ps aux | grep prometheus_exporter | grep -v grep
curl -s http://localhost:9101/metrics | head -5
The prometheus_exporter process is dead. ps returns nothing, curl returns empty response (connection refused).
Impact
A single malformed packet kills all telemetry collection with no error indication.
Suggested fix
I'd be happy to submit a PR with following steps:
- Add a
len(fields) guard to the SUBSCRIBE/PUBLISH branches in handleMetric
- Wrap
handleMetric calls in recover() so one bad message can't take down the whole pipeline
- Validate
topicLength > 0 on the C side before building the message (since it's aligned with the protocol specification)
Is this approach okay?
Hi! I've been reading through the telemetry pipeline as part of getting familiar with the codebase, and I noticed a parsing edge case that can crash the exporter.
Since the issue affects multiple parts of the repo, I’m opening an issue first. It seems to me that there’s room to improve the overall effectiveness of the parsing logic.
Description
mqtt_pit.cdoesn't validatetopicLengthbefore copying the topic string. When a packet arrives withtopicLength = 0, the C code setstopic[0] = '\0'andsnprintfproduces a message with a double space:On the Go side,
strings.Fields()collapses the double space, so the resulting slice has only 3 elements instead of the expected 4. The handler then tries to accessfields[3]and this index is out of range.The panic kills
listenForMetrics, and since there's norecover()wrapper, metric ingestion stops for all tarpits. The/metricsendpoint keeps serving though, so Prometheus continues scraping data with no alerts.Steps to reproduce
Prerequisites: build the project with
make all.1. Start the exporter and MQTT tarpit
2. Verify metrics are alive
Expected: counters at 0.
3. Send a normal CONNECT followed by a SUBSCRIBE with
topicLength=04. Observe the crash
The exporter log shows:
5. Verify the exporter process state
The prometheus_exporter process is dead. ps returns nothing, curl returns empty response (connection refused).
Impact
A single malformed packet kills all telemetry collection with no error indication.
Suggested fix
I'd be happy to submit a PR with following steps:
len(fields)guard to the SUBSCRIBE/PUBLISH branches inhandleMetrichandleMetriccalls inrecover()so one bad message can't take down the whole pipelinetopicLength > 0on the C side before building the message (since it's aligned with the protocol specification)Is this approach okay?