Skip to content

Conversation

@mosa-riel
Copy link
Contributor

@mosa-riel mosa-riel commented Oct 29, 2025

Summary

This PR adds an image proxy feature that enables strict Content Security Policy (CSP) enforcement while maintaining email functionality, and provides privacy protection by proxying external images through the application server.

Closes #187

Motivation

Primary goal: CSP Compliance

Modern web applications need strict Content Security Policy headers to prevent XSS attacks. However, emails often contain external images that violate a strict img-src 'self' CSP policy.

By proxying all external images through the application server, all image requests originate from the same domain, enabling:

Content-Security-Policy: img-src 'self'

Secondary goal: Privacy Protection

External images in emails are commonly used as tracking pixels. The image proxy prevents exposure of user IP addresses and reading behavior to third-party servers.

Security

The image proxy implements defense-in-depth security measures to prevent Server-Side Request Forgery (SSRF) attacks:

SSRF Protection:

  • URL validation before making any HTTP request
  • Blocks all IP addresses (only domain names allowed)
  • DNS resolution validation to prevent DNS rebinding attacks
  • Blocks private IP ranges, loopback, link-local, multicast, and cloud metadata endpoints
  • Returns informative SVG placeholder for blocked images

Memory Protection:

  • Streams content in 8KB chunks to prevent memory exhaustion
  • Enforces configurable size limits (default 5MB)

Content Validation:

  • Validates content is actually an image using libmagic
  • Verifies Content-Type headers
  • Returns SVG placeholder for invalid content

Implementation

Backend:

  • New API endpoint: GET /api/v1.0/mailboxes/{mailbox_id}/image-proxy/?url={encoded_url}
  • SSRF protection utility in core/utils.py
  • HTML rewriting in MessageSerializer to proxy external images and handle CID attachments
  • Configurable via PROXY_EXTERNAL_IMAGES and PROXY_MAX_IMAGE_SIZE_MB environment variables
  • 30-day cache headers on blob downloads

Frontend:

  • Lazy loading on all images for improved page load performance
  • HTTP/2 enabled in nginx for parallel image requests

Dependencies:

  • Added beautifulsoup4 for HTML parsing
  • Uses python-magic for MIME type validation

URL rewriting

External images:

<!-- Before (violates CSP) -->
<img src="https://example.com/image.jpg">

<!-- After (CSP compliant) -->
<img src="/api/v1.0/mailboxes/123/image-proxy/?url=..." loading="lazy">

CID attachments:

<!-- Before -->
<img src="cid:[email protected]">

<!-- After -->
<img src="/api/v1.0/blob/msg_456_0/download/" loading="lazy">

Benefits

  • CSP Compliance: Enables strict img-src 'self' security policy
  • Privacy Protection: Prevents IP address leakage and tracking
  • SSRF Protection: Comprehensive validation prevents internal network access
  • Performance: Lazy loading + HTTP/2 + caching improve load times
  • Configurable: Enable per deployment with adjustable size limits

Configuration

# Enable image proxying (default: false)
PROXY_EXTERNAL_IMAGES=true

# Maximum image size in MB (default: 5)
PROXY_MAX_IMAGE_SIZE_MB=10

Summary by CodeRabbit

  • New Features

    • External images in messages can now be proxied through a secure endpoint with configurable size limits.
    • Images in message threads now load with lazy loading.
  • Improvements

    • Blob downloads now cache for 30 days in the browser.

mosa-riel and others added 6 commits October 27, 2025 15:53
Add image proxification feature to protect user privacy by proxying external images through the application server. This prevents tracking pixels from leaking user IP addresses and browsing behavior.

Changes:
- Add _proxy_images_in_html method to MessageSerializer to rewrite external image URLs
- Handle both external HTTP(S) URLs and CID (Content-ID) inline attachments
- Create ImageProxyViewSet endpoint at /api/mailboxes/{id}/image-proxy/
- Add per-domain configuration via _proxy_external_images custom attribute
- Add size limits via _proxy_max_image_size_mb custom attribute (default 5MB)
- Add beautifulsoup4 dependency for HTML parsing
- Images are proxied on-demand without caching (simple implementation)

Benefits:
- Protects user privacy (IP address not exposed to external servers)
- Breaks tracking pixels in emails
- Enables strict CSP policy (img-src 'self')
- Per-domain control via MailDomain.custom_attributes
Fix issues preventing the image proxy from working correctly:
- Change OpenApiParameter to OpenApiResponse in response schema definitions
- Use ViewSet.list() method instead of custom action for proper routing
- Add API version prefix to generated proxy URLs in serializer

The image proxy now correctly responds at:
/api/v1.0/mailboxes/{mailbox_id}/image-proxy/?url={encoded_url}
…-outside-toplevel

Move imports to top of file to fix pylint warnings:
- Move BeautifulSoup, quote, and settings imports to module level
- Remove import-outside-toplevel violations

Replace custom_attributes with environment variables:
- Add PROXY_EXTERNAL_IMAGES setting (default: False)
- Add PROXY_MAX_IMAGE_SIZE_MB setting (default: 5)
- Use settings instead of mailbox.domain.custom_attributes
- Simpler configuration via environment variables
Fixed the image proxy URL rewriting for inline images (CID references) to use the correct blob download endpoint format. The previous implementation was generating non-existent endpoint URLs.

Changes:
- Updated CID image URL from `/api/messages/{id}/attachments/{index}` to `/api/v1.0/blob/msg_{id}_{index}/download/`
- This matches the existing blob download endpoint that handles the `msg_*` format
- Ensures inline images in HTML emails display correctly when image proxying is enabled
…d caching

- Add lazy loading to all images (CID and external) for better initial page load
- Enable HTTP/2 in nginx for parallel image request handling
- Add Cache-Control headers (30 days) to blob downloads for browser caching
- Fix TypeScript typing for DomPurify hook parameter

These changes significantly improve performance for long emails with many images
by reducing initial load time and enabling efficient browser caching.
@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

Walkthrough

Adds server-side external-image proxying: HTML rewriting for image src, a new ImageProxyViewSet to fetch and serve images with SSRF and size checks, new settings and URL routing, caching headers for blobs, a URL-safety utility, a BeautifulSoup dependency, and small frontend IMG lazy-loading/type annotations.

Changes

Cohort / File(s) Summary
HTML rewriting & serializer
src/backend/core/api/serializers.py
Added _proxy_images_in_html() and changed HTML body rendering to rewrite cid: sources to blob URLs and external http(s) sources to the mailbox image-proxy endpoint when enabled and the mailbox is authenticated.
Image proxy endpoint
src/backend/core/api/viewsets/image_proxy.py
New ImageProxyViewSet with list() to proxy external images: mailbox/auth checks, PROXY_EXTERNAL_IMAGES guard, SSRF-safe URL validation, 10s fetch timeout, non-redirects, image content-type/size checks, streaming with enforced max size, MIME re-check, caching headers, and explicit error responses/logging.
URL routing
src/backend/core/urls.py
Registered nested router and route under mailboxes for the image-proxy endpoint (mailboxes/<uuid:mailbox_id>/...).
URL safety util
src/backend/core/utils.py
Added validate_url_safety(url: str) -> tuple[bool, str] to enforce scheme/hostname rules and reject private/loopback/link-local/multicast/reserved/cloud metadata IPs.
Configuration
src/backend/messages/settings.py
Added PROXY_EXTERNAL_IMAGES (bool, default False) and PROXY_MAX_IMAGE_SIZE_MB (positive int, default 5).
Caching headers
src/backend/core/api/viewsets/blob.py
Added Cache-Control: private, max-age=2592000 (30 days) to blob and inline-attachment download responses.
Frontend image handling
src/frontend/src/features/layouts/components/thread-view/components/thread-message/message-body.tsx
Added type annotation for DomPurify hook parameter and set loading="lazy" unconditionally on IMG elements; preserved conditional CID→blob URL transformation.
Dependency
src/backend/pyproject.toml
Added beautifulsoup4==4.12.3 for HTML parsing.
Infra (non-functional)
src/nginx/servers.conf.erb
Whitespace-only change (no functional routing or directive changes).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Frontend
    participant Serializer
    participant ImageProxyViewSet
    participant ExternalServer

    Client->>Frontend: Request email view
    Frontend->>Serializer: Fetch HTML body
    Serializer->>Serializer: Parse HTML (BeautifulSoup)
    Serializer->>Serializer: Rewrite IMG src -> proxy URL or blob URL
    Serializer->>Frontend: Return modified HTML
    Frontend->>Client: Render HTML (IMG loading="lazy")

    Note over Client,ImageProxyViewSet: When browser requests proxied image
    Client->>ImageProxyViewSet: GET /mailboxes/{id}/image-proxy?url=...
    ImageProxyViewSet->>ImageProxyViewSet: validate mailbox, feature flag, validate_url_safety
    ImageProxyViewSet->>ExternalServer: Fetch image (10s, no redirects)
    ExternalServer-->>ImageProxyViewSet: Stream image
    ImageProxyViewSet->>ImageProxyViewSet: enforce size, check content-type, magic bytes
    ImageProxyViewSet->>Client: 200 + image bytes + Cache-Control + X-Proxied-From
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas to focus review:
    • _proxy_images_in_html HTML parsing/rewrite correctness (cid mapping, URL quoting/encoding)
    • validate_url_safety DNS resolution and IP classification edge cases
    • ImageProxyViewSet streaming loop: size enforcement, timeout, and proper exception handling
    • Consistency of generated proxy URLs and frontend decoding/use

Possibly related PRs

Poem

🐰 I hopped through HTML, nibbling src and cid,
I stitched a proxy tunnel so faraway pics can hide,
I lazy-load my whiskers and guard each fetch with care,
Caches hum like burrows, images safely share. 📸✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues Check ✅ Passed The pull request successfully implements all requirements from linked issue #187. External image proxying is addressed through the new ImageProxyViewSet endpoint with SSRF validation and size enforcement, allowing external images to comply with img-src 'self' CSP rules. Internal CID image handling is implemented through the _proxy_images_in_html helper method which rewrites CID references to blob download URLs while rewriting external http(s) URLs to the proxy endpoint. Supporting infrastructure includes the SSRF validation utility (validate_url_safety), configuration options (PROXY_EXTERNAL_IMAGES and PROXY_MAX_IMAGE_SIZE_MB), frontend lazy loading enhancements, and cache headers for performance. All coding-related requirements from the linked issue are met by the provided changes.
Out of Scope Changes Check ✅ Passed The material changes in this pull request are aligned with the requirements in issue #187. The backend image proxy endpoint, HTML rewriting logic, SSRF validation, configuration options, frontend lazy loading, and cache headers on blob downloads are all directly related to implementing the image proxy feature. However, the nginx change in src/nginx/servers.conf.erb contains only a whitespace adjustment with no functional modifications, making it technically out of scope relative to the stated objectives. While this whitespace-only change is trivial and does not introduce problematic functionality, it appears unrelated to the image proxy feature implementation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The pull request title "Image proxy for CSP compliance and privacy protection" directly and accurately summarizes the main feature being added across the changeset. The core changes throughout multiple files (serializers, viewsets, settings, frontend, utils) all contribute to implementing an image proxy endpoint with external URL rewriting, SSRF protection, and caching to support CSP-compliant image loading and user privacy. The title is concise, specific, and clearly communicates the primary objective without vague language or misleading information.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7ee9dc and ea63279.

⛔ Files ignored due to path filters (1)
  • src/backend/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • src/backend/core/api/serializers.py (2 hunks)
  • src/backend/core/api/viewsets/blob.py (2 hunks)
  • src/backend/core/api/viewsets/image_proxy.py (1 hunks)
  • src/backend/core/urls.py (3 hunks)
  • src/backend/messages/settings.py (1 hunks)
  • src/backend/pyproject.toml (1 hunks)
  • src/frontend/src/features/layouts/components/thread-view/components/thread-message/message-body.tsx (2 hunks)
  • src/nginx/servers.conf.erb (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/backend/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

src/backend/**/*.py: Follow Django/PEP 8 style with a 100-character line limit
Use descriptive, snake_case names for variables and functions
Use Django ORM for database access; avoid raw SQL unless necessary for performance
Use Django’s built-in user model and authentication framework
Prefer try-except blocks to handle exceptions in business logic and views
Log expected and unexpected actions with appropriate log levels
Capture and report exceptions to Sentry; use capture_exception() for custom errors
Do not log sensitive information (tokens, passwords, financial/health data, PII)

Files:

  • src/backend/core/api/serializers.py
  • src/backend/messages/settings.py
  • src/backend/core/urls.py
  • src/backend/core/api/viewsets/blob.py
  • src/backend/core/api/viewsets/image_proxy.py
src/backend/{templates/**/*.html,**/serializers.py}

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

Use Django templates for HTML and DRF serializers for JSON responses

Files:

  • src/backend/core/api/serializers.py
src/backend/**/{settings.py,middleware.py}

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

Use middleware judiciously for cross-cutting concerns (authentication, logging, caching)

Files:

  • src/backend/messages/settings.py
src/backend/**/settings.py

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

src/backend/**/settings.py: Leverage Django’s caching framework (e.g., Redis/Memcached) where appropriate
Use Django’s cache framework with a backend like Redis or Memcached to reduce DB load
Optimize static file handling using Django’s staticfiles pipeline (e.g., WhiteNoise)

Files:

  • src/backend/messages/settings.py
src/backend/**/urls.py

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

Define clear, RESTful URL patterns using Django’s URL dispatcher

Files:

  • src/backend/core/urls.py
🧠 Learnings (2)
📚 Learning: 2025-09-02T10:12:12.835Z
Learnt from: CR
PR: suitenumerique/messages#0
File: .cursor/rules/django-python.mdc:0-0
Timestamp: 2025-09-02T10:12:12.835Z
Learning: Applies to src/backend/**/urls.py : Define clear, RESTful URL patterns using Django’s URL dispatcher

Applied to files:

  • src/backend/core/urls.py
📚 Learning: 2025-09-02T10:12:12.835Z
Learnt from: CR
PR: suitenumerique/messages#0
File: .cursor/rules/django-python.mdc:0-0
Timestamp: 2025-09-02T10:12:12.835Z
Learning: Applies to src/backend/**/views.py : Use Django REST Framework viewsets for API endpoints

Applied to files:

  • src/backend/core/urls.py
  • src/backend/core/api/viewsets/image_proxy.py
🧬 Code graph analysis (3)
src/backend/core/api/serializers.py (2)
src/backend/core/models.py (1)
  • get_parsed_field (1273-1275)
src/backend/core/tests/api/test_messages_import_labels_en.py (2)
  • user (26-28)
  • mailbox (32-36)
src/backend/core/urls.py (1)
src/backend/core/api/viewsets/image_proxy.py (1)
  • ImageProxyViewSet (21-135)
src/backend/core/api/viewsets/image_proxy.py (1)
src/backend/core/api/permissions.py (1)
  • IsAuthenticated (18-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test-back
  • GitHub Check: build-front
  • GitHub Check: lint-back
🔇 Additional comments (2)
src/frontend/src/features/layouts/components/thread-view/components/thread-message/message-body.tsx (2)

62-62: LGTM - Type safety improvement.

Adding the Element type annotation improves type safety and makes the hook parameter's type explicit.


73-89: LGTM - Well-structured lazy loading and CID transformation.

The restructured logic correctly applies lazy loading to all images while keeping CID transformation conditional. This approach:

  • Improves performance by enabling native browser lazy loading for all images (external proxied images and CID attachments)
  • Optimizes by skipping CID transformation when no CID attachments exist
  • Provides a fallback for CID handling when backend proxying is disabled
  • Maintains security by running transformations in the afterSanitizeAttributes hook (post-sanitization)

The CSP policy (line 18) correctly permits images from 'self', data:, and the API origin, which aligns with both proxied external images and blob download URLs for CID attachments.

@mosa-riel mosa-riel changed the title Image Proxy Feature for CSP Compliance and Privacy Protection Image proxy feature for CSP compliance and privacy protection Oct 29, 2025
@mosa-riel mosa-riel marked this pull request as draft October 29, 2025 20:59
Fixes critical vulnerabilities identified by CodeRabbit:
- Block IP addresses and validate DNS to prevent SSRF attacks
- Stream content in chunks to prevent memory exhaustion
- Validate fetched content is actually an image (defense-in-depth)
- Disable redirects to prevent SSRF bypass
- Remove nginx http2 directive (requires SSL)
- Return SVG placeholder for blocked images instead of errors

Fixes suitenumerique#187
@mosa-riel mosa-riel marked this pull request as ready for review October 30, 2025 00:18
@mosa-riel
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Oct 30, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
src/backend/core/utils.py (4)

80-88: DNS resolution may block request threads.

The getaddrinfo call performs synchronous DNS resolution, which can take several seconds if the nameserver is slow or unresponsive. In a web application context, this blocks the worker thread handling the HTTP request.

Consider adding a timeout mechanism or performing DNS resolution in a separate thread pool, though the security trade-off may be acceptable for this use case.


91-91: Rename unused loop variable to indicate intent.

The family variable from the getaddrinfo tuple is intentionally unused. Following Python conventions, prefix it with an underscore to indicate this is intentional.

As per static analysis hints.

Apply this diff:

-        for family, _, _, _, sockaddr in addr_info:
+        for _family, _, _, _, sockaddr in addr_info:

125-126: Bare except is acceptable here but consider logging unexpected exceptions.

The bare except clause catches all exceptions including SystemExit and KeyboardInterrupt, which is generally discouraged. However, in this security-critical validation function, failing closed (rejecting the URL) is the correct behavior.

Consider logging unexpected exceptions for debugging while maintaining the fail-closed behavior:

-    except Exception:
-        return False, "Invalid URL"
+    except Exception as e:
+        logger.warning("Unexpected error validating URL safety: %s", e, exc_info=True)
+        return False, "Invalid URL"

24-78: Document DNS rebinding limitation in docstring; TOCTOU is an accepted trade-off in DNS-based validation.

The TOCTOU concern is valid and inherent to all DNS-based SSRF protections. Between validate_url_safety() and the actual HTTP request, DNS could change. This is a known limitation rather than a bug—connection-level protections are optional enhancements but not mandatory.

Update the docstring as suggested to document this limitation:

    Blocked addresses include:
    - Any direct IP address (e.g., http://192.168.1.1/)
    - Private IP ranges (RFC1918: 10.x.x.x, 172.16-31.x.x, 192.168.x.x)
    - Loopback addresses (127.x.x.x, ::1)
    - Link-local addresses (169.254.x.x, fe80::/10)
    - Multicast and reserved addresses
    - Cloud provider metadata endpoints (169.254.169.254, fd00:ec2::254)
+
+    Note: This function is vulnerable to DNS rebinding attacks (time-of-check
+    to time-of-use) where DNS resolution can change between validation and
+    actual request. This is an inherent limitation of DNS-based validation.

The existing implementation and usage in image_proxy.py are correct and properly protected.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea63279 and 894d12a.

📒 Files selected for processing (3)
  • src/backend/core/api/viewsets/image_proxy.py (1 hunks)
  • src/backend/core/utils.py (2 hunks)
  • src/nginx/servers.conf.erb (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/nginx/servers.conf.erb
🧰 Additional context used
📓 Path-based instructions (1)
src/backend/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/django-python.mdc)

src/backend/**/*.py: Follow Django/PEP 8 style with a 100-character line limit
Use descriptive, snake_case names for variables and functions
Use Django ORM for database access; avoid raw SQL unless necessary for performance
Use Django’s built-in user model and authentication framework
Prefer try-except blocks to handle exceptions in business logic and views
Log expected and unexpected actions with appropriate log levels
Capture and report exceptions to Sentry; use capture_exception() for custom errors
Do not log sensitive information (tokens, passwords, financial/health data, PII)

Files:

  • src/backend/core/utils.py
  • src/backend/core/api/viewsets/image_proxy.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
PR: suitenumerique/messages#0
File: .cursor/rules/django-python.mdc:0-0
Timestamp: 2025-09-02T10:12:12.835Z
Learning: Applies to src/backend/**/settings.py : Optimize static file handling using Django’s staticfiles pipeline (e.g., WhiteNoise)
📚 Learning: 2025-09-02T10:12:12.835Z
Learnt from: CR
PR: suitenumerique/messages#0
File: .cursor/rules/django-python.mdc:0-0
Timestamp: 2025-09-02T10:12:12.835Z
Learning: Applies to src/backend/**/views.py : Use Django REST Framework viewsets for API endpoints

Applied to files:

  • src/backend/core/api/viewsets/image_proxy.py
🧬 Code graph analysis (1)
src/backend/core/api/viewsets/image_proxy.py (2)
src/backend/core/utils.py (1)
  • validate_url_safety (24-126)
src/backend/core/api/permissions.py (1)
  • IsAuthenticated (18-25)
🪛 Ruff (0.14.2)
src/backend/core/utils.py

91-91: Loop control variable family not used within loop body

Rename unused family to _family

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test-back
🔇 Additional comments (13)
src/backend/core/utils.py (2)

3-6: LGTM!

The imports are appropriate for implementing SSRF protection with URL parsing, DNS resolution, and IP address validation.


117-118: Good defense-in-depth for cloud metadata endpoints.

While 169.254.169.254 should already be caught by is_link_local and fd00:ec2::254 by is_private, the explicit check adds clarity and ensures these critical endpoints are always blocked even if the underlying library behavior changes.

src/backend/core/api/viewsets/image_proxy.py (11)

1-21: LGTM!

Imports are appropriate for an image proxy viewset with SSRF protection, content validation, and proper API documentation.


23-33: LGTM!

Using ViewSet (not ModelViewSet) is appropriate for this non-CRUD operation, and the custom IsAuthenticated permission correctly gates access.


65-77: LGTM!

The mailbox validation and access control are correctly implemented. The method name "list" follows DRF conventions for GET requests on collection endpoints in nested routes.


79-83: LGTM!

The feature flag check ensures the proxy endpoint is only available when explicitly enabled via settings.


85-91: LGTM!

Correctly extracts and URL-decodes the image URL parameter. Using unquote (not unquote_plus) is appropriate for URL decoding.


111-121: LGTM!

The HTTP request configuration is secure and well-designed:

  • Timeout prevents hanging requests
  • Streaming prevents memory exhaustion
  • allow_redirects=False blocks redirect-based SSRF bypasses
  • Custom User-Agent aids logging/debugging

Note: The TOCTOU issue flagged in validate_url_safety remains (DNS can change between validation and fetch), but this is an inherent limitation of DNS-based SSRF protection.


123-140: LGTM!

The content-type and Content-Length validation correctly addresses previous review concerns:

  • Safely handles malformed Content-Length headers
  • Uses Content-Length only as a hint (actual enforcement during streaming)
  • Content-type check provides initial filtering before download

The later python-magic validation (line 168) provides defense-in-depth against spoofed content-type headers.


142-165: LGTM! Excellent defense against memory exhaustion.

The chunked streaming with size enforcement correctly addresses previous security concerns:

  • Reads in small 8KB chunks with frequent size checks
  • Enforces limit independently of Content-Length header
  • Logs security events appropriately
  • Prevents memory exhaustion from malicious or broken servers

This is a textbook implementation of safe streaming.


167-183: LGTM! Strong defense-in-depth with magic number validation.

Using python-magic to validate actual file content (not just HTTP headers) is excellent defense-in-depth. This prevents serving malicious content disguised with a fake image/* Content-Type header.

Note: The SVG duplication was flagged separately.


185-192: LGTM!

The success response is well-configured:

  • 30-day caching reduces server load for static email images
  • X-Proxied-From header aids debugging and provides transparency
  • Uses validated content-type and content

194-198: LGTM!

Error handling correctly catches all requests exceptions and returns appropriate 502 status for upstream failures. Logging provides operational visibility.

Comment on lines +57 to +63
responses={
200: OpenApiResponse(description="Image content"),
400: OpenApiResponse(description="Invalid request"),
403: OpenApiResponse(description="Forbidden"),
413: OpenApiResponse(description="Image too large"),
502: OpenApiResponse(description="Failed to fetch external image"),
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add 404 response to OpenAPI schema.

The implementation returns 404 when the mailbox is not found (lines 70-72), but this response code is not documented in the OpenAPI schema.

Apply this diff:

         responses={
+            404: OpenApiResponse(description="Mailbox not found"),
         },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
responses={
200: OpenApiResponse(description="Image content"),
400: OpenApiResponse(description="Invalid request"),
403: OpenApiResponse(description="Forbidden"),
413: OpenApiResponse(description="Image too large"),
502: OpenApiResponse(description="Failed to fetch external image"),
},
responses={
200: OpenApiResponse(description="Image content"),
400: OpenApiResponse(description="Invalid request"),
403: OpenApiResponse(description="Forbidden"),
404: OpenApiResponse(description="Mailbox not found"),
413: OpenApiResponse(description="Image too large"),
502: OpenApiResponse(description="Failed to fetch external image"),
},
🤖 Prompt for AI Agents
In src/backend/core/api/viewsets/image_proxy.py around lines 57 to 63, the
OpenAPI responses mapping is missing the 404 entry even though the handler
returns 404 when a mailbox is not found (lines 70-72); add a 404:
OpenApiResponse(description="Not found" or "Mailbox not found") entry to the
responses dict so the OpenAPI schema documents the 404 case.

Comment on lines +93 to +109
# SSRF protection: validate URL before making any request
is_safe, error_message = validate_url_safety(url)
if not is_safe:
logger.warning("Blocked unsafe URL: %s - %s", url, error_message)
# Return placeholder image instead of JSON error for better UX
svg_placeholder = """<svg xmlns="http://www.w3.org/2000/svg" width="400" height="100" viewBox="0 0 400 100">
<rect width="100%" height="100%" fill="#f8f9fa"/>
<text x="50%" y="50%" text-anchor="middle" dominant-baseline="middle"
font-family="system-ui, -apple-system, sans-serif" font-size="14" fill="#6c757d">
🚫 Image blocked for security reasons
</text>
</svg>"""
return HttpResponse(
svg_placeholder,
content_type="image/svg+xml",
status=403,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Extract duplicate SVG placeholder to a module-level constant.

The SVG placeholder for blocked images appears here and again at lines 172-178. This violates DRY principles and makes updates error-prone.

Apply this diff:

 logger = logging.getLogger(__name__)
+
+# Placeholder SVG shown when images are blocked for security or validation reasons
+BLOCKED_IMAGE_SVG = """<svg xmlns="http://www.w3.org/2000/svg" width="400" height="100" viewBox="0 0 400 100">
+  <rect width="100%" height="100%" fill="#f8f9fa"/>
+  <text x="50%" y="50%" text-anchor="middle" dominant-baseline="middle"
+        font-family="system-ui, -apple-system, sans-serif" font-size="14" fill="#6c757d">
+    🚫 Image blocked for security reasons
+  </text>
+</svg>"""


 class ImageProxyViewSet(ViewSet):

Then use it at line 98 and line 172:

-            svg_placeholder = """<svg xmlns="http://www.w3.org/2000/svg" width="400" height="100" viewBox="0 0 400 100">
-  <rect width="100%" height="100%" fill="#f8f9fa"/>
-  <text x="50%" y="50%" text-anchor="middle" dominant-baseline="middle"
-        font-family="system-ui, -apple-system, sans-serif" font-size="14" fill="#6c757d">
-    🚫 Image blocked for security reasons
-  </text>
-</svg>"""
             return HttpResponse(
-                svg_placeholder,
+                BLOCKED_IMAGE_SVG,
                 content_type="image/svg+xml",
                 status=403,
             )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/backend/core/api/viewsets/image_proxy.py around lines 93-109 and again at
lines ~172-178, the SVG placeholder is duplicated; extract that SVG string into
a single module-level constant (e.g., BLOCKED_IMAGE_SVG) defined near the top of
the file, then replace the inline SVG literals at line 98 and line 172 with
references to that constant when constructing the HttpResponse (preserve
content_type="image/svg+xml" and status=403). Ensure the constant is a plain
triple-quoted string and update any imports or linter notes if necessary.

imghdr was removed in Python 3.13. Use python-magic (already a
dependency) instead for more robust MIME type detection.
@sylvinus
Copy link
Member

@mosa-riel awesome contribution, thanks!! A couple high-level notes:

  • I think we need to add some level of user consent to load external images. Even though the user IP can't leak because we proxy the request, many external images contain individualized tracking links that can report that an email was read.
  • I'm not sure we should parse the email content to rewrite the image srcs on the backend. We already have a parsing hook on the frontend (modified in this patch), it would be a good place to implement the rewrite instead? This could then be done dynamically based on user consent.
  • we could also parse attributes like image dimensions (to block trackers that are 1px for instance).
  • isn't image/svg+xml a vector for injecting scripts or other nasty stuff? It would probably be better to have a stricter mime type whitelist

@jbpenrath any other remarks?

Copy link
Contributor

@jbpenrath jbpenrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job ❤️

Can you add unit tests for backend utils and new endpoint ?

@mosa-riel
Copy link
Contributor Author

Good comments, thank you for the insights. I created this so I have a working version for here, but I would be happy to integrate more of your comments.

I think we need to add some level of user consent to load external images. Even though the user IP can't leak because we proxy the request, many external images contain individualized tracking links that can report that an email was read.

That is true. I don't think that is hard to implement. Let me update the PR later.

I'm not sure we should parse the email content to rewrite the image srcs on the backend. We already have a parsing hook on the frontend (modified in this patch), it would be a good place to implement the rewrite instead? This could then be done dynamically based on user consent.

What parsing hook are you referring too? I don't really have a preference here, I thought it was nice to not give any external URL to the frontend at all. The consent could just trigger a new fetch from the backend with a tag to add the images.

we could also parse attributes like image dimensions (to block trackers that are 1px for instance).
This would also be nice to add same dimension placeholders to prevent mailformatting to be skewed ?

isn't image/svg+xml a vector for injecting scripts or other nasty stuff? It would probably be better to have a stricter mime type whitelist
Yes, I am not 100% aware of this but the reason it is in a separate file is so it is somewhat expandable.

I will take these remarks and update the PR.

@jbpenrath any other remarks?

@mosa-riel mosa-riel changed the title Image proxy feature for CSP compliance and privacy protection ✨(feature): Image proxy feature for CSP compliance and privacy protection Oct 30, 2025
@mosa-riel mosa-riel changed the title ✨(feature): Image proxy feature for CSP compliance and privacy protection ✨(feature): Image proxy for CSP compliance and privacy protection Oct 30, 2025
@sylvinus
Copy link
Member

sylvinus commented Nov 1, 2025

I'm not sure we should parse the email content to rewrite the image srcs on the backend. We already have a parsing hook on the frontend (modified in this patch), it would be a good place to implement the rewrite instead? This could then be done dynamically based on user consent.

What parsing hook are you referring too? I don't really have a preference here, I thought it was nice to not give any external URL to the frontend at all. The consent could just trigger a new fetch from the backend with a tag to add the images.

I meant this one: https://github.com/suitenumerique/messages/pull/398/files#diff-e00c6a9c9e319c3b53ce318c4e9ab5366c2bb3fee498726013fb5a07ee3af76fR74

The only advantage to parsing on the frontend I see (but I could be missing something!) would be to create opaque proxy URLs so that the endpoint doesn't accept any URL. But even that seems fragile as you could just send yourself an email with any image URL you want. So I think it's best to leave the email untouched from the backend and do the rewrite from the frontend in that hook.

we could also parse attributes like image dimensions (to block trackers that are 1px for instance). This would also be nice to add same dimension placeholders to prevent mailformatting to be skewed ?

Indeed. To be tested

isn't image/svg+xml a vector for injecting scripts or other nasty stuff? It would probably be better to have a stricter mime type whitelist Yes, I am not 100% aware of this but the reason it is in a separate file is so it is somewhat expandable.

I didn't mean for the placeholder image, I meant for embedded external images in the email with this check:
https://github.com/suitenumerique/messages/pull/398/files#diff-67d7e7c7316fdd8bc6edf4971ae9c3705616fa0c7a59eead181f5ae69aad11dcR124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Images in emails

3 participants