feat(ai): Redact base64 data URLs in image_url content blocks #5953
2 issues
find-bugs: Found 2 issues (2 medium)
Medium
AttributeError when image_url is a string instead of a dict - `sentry_sdk/ai/utils.py:600`
The _is_image_type_with_blob_content function assumes image_url is always a dict, but OpenAI's format also supports a string shorthand (e.g., {"type": "image_url", "image_url": "data:image/jpeg;base64,..."}). When image_url is a string, calling .get("url", "") on it will raise AttributeError: 'str' object has no attribute 'get'. This causes redact_blob_message_parts to crash when processing messages with the string format, potentially leaking base64 image content to Sentry span data.
Also found at:
sentry_sdk/ai/utils.py:659-660
TypeError when redacting image_url that is a string instead of a dict - `sentry_sdk/ai/utils.py:684-685`
Line 685 assumes item["image_url"] is a dict when performing item["image_url"]["url"] = BLOB_DATA_SUBSTITUTE. If image_url is a string (which is valid per OpenAI's format), this will raise TypeError: 'str' object does not support item assignment. This is a separate issue from the detection bug since even if detection were fixed, the redaction would still fail.
Also found at:
sentry_sdk/ai/consts.py:4-6
Duration: 2m 57s · Tokens: 1.8M in / 13.5k out · Cost: $3.01 (+extraction: $0.01, +merge: $0.00, +fix_gate: $0.01)
Annotations
Check warning on line 600 in sentry_sdk/ai/utils.py
sentry-warden / warden: find-bugs
AttributeError when image_url is a string instead of a dict
The `_is_image_type_with_blob_content` function assumes `image_url` is always a dict, but OpenAI's format also supports a string shorthand (e.g., `{"type": "image_url", "image_url": "data:image/jpeg;base64,..."}`). When `image_url` is a string, calling `.get("url", "")` on it will raise `AttributeError: 'str' object has no attribute 'get'`. This causes `redact_blob_message_parts` to crash when processing messages with the string format, potentially leaking base64 image content to Sentry span data.
Check warning on line 660 in sentry_sdk/ai/utils.py
sentry-warden / warden: find-bugs
[R44-S4Y] AttributeError when image_url is a string instead of a dict (additional location)
The `_is_image_type_with_blob_content` function assumes `image_url` is always a dict, but OpenAI's format also supports a string shorthand (e.g., `{"type": "image_url", "image_url": "data:image/jpeg;base64,..."}`). When `image_url` is a string, calling `.get("url", "")` on it will raise `AttributeError: 'str' object has no attribute 'get'`. This causes `redact_blob_message_parts` to crash when processing messages with the string format, potentially leaking base64 image content to Sentry span data.
Check warning on line 685 in sentry_sdk/ai/utils.py
sentry-warden / warden: find-bugs
TypeError when redacting image_url that is a string instead of a dict
Line 685 assumes `item["image_url"]` is a dict when performing `item["image_url"]["url"] = BLOB_DATA_SUBSTITUTE`. If `image_url` is a string (which is valid per OpenAI's format), this will raise `TypeError: 'str' object does not support item assignment`. This is a separate issue from the detection bug since even if detection were fixed, the redaction would still fail.
Check warning on line 6 in sentry_sdk/ai/consts.py
sentry-warden / warden: find-bugs
[7XB-5A9] TypeError when redacting image_url that is a string instead of a dict (additional location)
Line 685 assumes `item["image_url"]` is a dict when performing `item["image_url"]["url"] = BLOB_DATA_SUBSTITUTE`. If `image_url` is a string (which is valid per OpenAI's format), this will raise `TypeError: 'str' object does not support item assignment`. This is a separate issue from the detection bug since even if detection were fixed, the redaction would still fail.