Skip to content

Conversation

@joshed-io
Copy link
Contributor

Add HTTP URL-based Attachment Serving for Gmail Attachments

Summary

This PR implements a new feature that serves Gmail attachments via HTTP URLs instead of returning base64-encoded data in tool responses. This significantly reduces LLM context window consumption and token costs for large attachments while maintaining full backward compatibility.

Problem

Previously, get_gmail_attachment_content() only returned a 100-character preview of the base64-encoded attachment data, with a message saying "The full base64-encoded attachment data is available" - but the full data wasn't actually included in the response. Even if we returned the full base64 data, it would:

  • Consume massive LLM context window space - A 5MB PDF becomes ~6.7MB of base64 text
  • Waste tokens - Every character in the response counts toward token limits
  • Slow down responses - Large strings take time to transmit and process
  • Limit scalability - Context windows fill up quickly with large attachments

Solution

The implementation saves attachments to temporary storage and returns an HTTP URL that clients can use to download the file directly. This approach:

  • Avoids context window bloat - Only a small URL string is returned
  • Better performance - Clients can stream/download files directly via HTTP
  • More efficient - No need to decode base64 in client applications
  • Works across network boundaries - URLs accessible from any client
  • Backward compatible - Falls back to preview if storage fails

Implementation Details

1. Temporary File Storage (core/attachment_storage.py)

New AttachmentStorage class that:

  • Stores attachments in ./tmp/attachments/ directory
  • Uses UUID-based file IDs to prevent unauthorized access
  • Tracks metadata: filename, mime type, size, creation/expiration times
  • Files expire after 1 hour (configurable via DEFAULT_EXPIRATION_SECONDS)
  • Handles base64 decoding and file writing

2. HTTP Route Handlers

Added /attachments/{file_id} route to both servers:

  • Main FastMCP server (core/server.py) - For streamable-http mode
  • MinimalOAuthServer (auth/oauth_callback_server.py) - For stdio mode

Both routes:

  • Serve files with proper Content-Type headers via FastAPI's FileResponse
  • Return 404 for expired or missing attachments
  • Use the existing HTTP infrastructure (no additional servers needed)

3. Enhanced get_gmail_attachment_content()

Modified to:

  • Save attachments to temp storage and return HTTP URL
  • Attempt to fetch filename/mimeType from message metadata (best effort)
  • Handle stateless mode gracefully (skips file saving, shows preview)
  • Fall back to base64 preview if file saving fails
  • Generate URLs that respect WORKSPACE_EXTERNAL_URL for reverse proxy setups

Key Features

  • Works in both transport modes: Uses existing HTTP servers in both stdio and streamable-http modes
  • Stateless mode support: Automatically skips file writes when WORKSPACE_MCP_STATELESS_MODE=true
  • Secure: UUID-based file IDs prevent guessing/unauthorized access
  • Automatic cleanup: Files expire after 1 hour to prevent disk space issues
  • Reverse proxy ready: Uses WORKSPACE_EXTERNAL_URL if configured
  • Graceful degradation: Falls back to preview if storage fails

Example Usage

Before:

Attachment downloaded successfully!
Message ID: 12345
Size: 1024.0 KB (1048576 bytes)

Base64-encoded content (first 100 characters shown):
UEsDBBQAAAAIAH...

After:

Attachment downloaded successfully!
Message ID: 12345
Size: 1024.0 KB (1048576 bytes)

📎 Download URL: http://localhost:8000/attachments/550e8400-e29b-41d4-a716-446655440000

The attachment has been saved and is available at the URL above.
The file will expire after 1 hour.

Files Changed

  • core/attachment_storage.py (new) - 218 lines - Attachment storage management
  • core/server.py - Added /attachments/{file_id} route handler
  • auth/oauth_callback_server.py - Added /attachments/{file_id} route to minimal server
  • gmail/gmail_tools.py - Modified get_gmail_attachment_content() to use storage

Testing Considerations

  • ✅ Tested in both stdio and streamable-http modes
  • ✅ Verified stateless mode handling (skips file writes)
  • ✅ Tested with various file types and sizes
  • ✅ Confirmed URL generation works with WORKSPACE_EXTERNAL_URL
  • ✅ Verified graceful fallback when storage fails

Breaking Changes

None - This is fully backward compatible. If file saving fails or stateless mode is enabled, the function falls back to the previous behavior (showing a base64 preview).

@joshed-io
Copy link
Contributor Author

@taylorwilsdon I've opened the PR we discussed here

This PR covers the gmail attachment support. Once this gets merged, I can do the second PR for google drive attachments.

As I mentioned, this code was generated and should be reviewed by you someone familiar with the project.

@joshed-io joshed-io force-pushed the feat/gmail-attachment-http-serving branch from 895a914 to f63a643 Compare December 4, 2025 15:34
This commit implements a new feature that allows Gmail attachments to be
served via HTTP URLs instead of returning base64-encoded data in the tool
response. This avoids consuming LLM context window space and token budgets
for large attachments.

Architecture:
-------------
The implementation works in both stdio and streamable-http transport modes:

1. Temp File Storage (core/attachment_storage.py):
   - New AttachmentStorage class manages temporary file storage in ./tmp/attachments/
   - Uses UUID-based file IDs to prevent guessing/unauthorized access
   - Tracks metadata: filename, mime type, size, creation/expiration times
   - Files expire after 1 hour (configurable) with automatic cleanup support
   - Handles base64 decoding and file writing

2. HTTP Route Handlers:
   - Added /attachments/{file_id} route to main FastMCP server (streamable-http mode)
   - Added same route to MinimalOAuthServer (stdio mode)
   - Both routes serve files with proper Content-Type headers via FileResponse
   - Returns 404 for expired or missing attachments

3. Modified get_gmail_attachment_content():
   - Now saves attachments to temp storage and returns HTTP URL
   - Attempts to fetch filename/mimeType from message metadata (best effort)
   - Handles stateless mode gracefully (skips file saving, shows preview)
   - Falls back to base64 preview if file saving fails
   - URL generation respects WORKSPACE_EXTERNAL_URL for reverse proxy setups

Key Features:
-------------
- Works in both stdio and streamable-http modes (uses existing HTTP servers)
- Respects stateless mode (no file writes when WORKSPACE_MCP_STATELESS_MODE=true)
- Secure: UUID-based file IDs prevent unauthorized access
- Automatic expiration: Files cleaned up after 1 hour
- Reverse proxy support: Uses WORKSPACE_EXTERNAL_URL if configured
- Graceful degradation: Falls back to preview if storage fails

Benefits:
---------
- Avoids context window bloat: Large attachments don't consume LLM tokens
- Better performance: Clients can stream/download files directly
- More efficient: No need to decode base64 in client applications
- Works across network boundaries: URLs accessible from any client

The feature maintains backward compatibility - if file saving fails or stateless
mode is enabled, the function falls back to showing a base64 preview.
@joshed-io joshed-io force-pushed the feat/gmail-attachment-http-serving branch from f63a643 to ee1db22 Compare December 4, 2025 15:37
@taylorwilsdon
Copy link
Owner

I think we have a few going that all do similar things so clear demand - there's #252 which has conflicts that need to be resolved, let me take a peek at this if all is good let's get it in! It looks like it needs a uv run ruff check

@taylorwilsdon taylorwilsdon requested review from Copilot and taylorwilsdon and removed request for taylorwilsdon December 4, 2025 17:33
@taylorwilsdon taylorwilsdon self-assigned this Dec 4, 2025
@taylorwilsdon taylorwilsdon added the enhancement New feature or request label Dec 4, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements HTTP URL-based serving for Gmail attachments, replacing the previous base64 preview approach. This significantly reduces LLM context window consumption by serving attachments via temporary URLs instead of including large base64-encoded data in tool responses.

Key Changes

  • New AttachmentStorage class manages temporary file storage with UUID-based IDs and automatic expiration after 1 hour
  • Added /attachments/{file_id} HTTP routes to both main FastMCP server and MinimalOAuthServer for serving stored attachments
  • Modified get_gmail_attachment_content() to save attachments and return download URLs instead of base64 previews

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
core/attachment_storage.py New module implementing temporary attachment storage with UUID-based file IDs, metadata tracking, and automatic cleanup
core/server.py Added /attachments/{file_id} route handler to serve stored attachments via FileResponse
auth/oauth_callback_server.py Added /attachments/{file_id} route to minimal OAuth server for stdio mode support
gmail/gmail_tools.py Modified to save attachments to storage and return HTTP URLs; includes stateless mode handling and graceful fallback

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

except Exception:
# If we can't get metadata, use defaults
logger.debug(f"Could not fetch attachment metadata for {attachment_id}, using defaults")
pass
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except Exception followed by a pass statement makes the debug log message redundant since it will always execute before the pass. Consider removing the pass statement or restructuring the exception handling.

Suggested change
pass

Copilot uses AI. Check for mistakes.

# Decode base64 data
try:
file_bytes = base64.urlsafe_b64decode(base64_data)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gmail API returns standard base64 encoded data, not URL-safe base64. This should use base64.b64decode() instead of base64.urlsafe_b64decode() to correctly decode Gmail attachment data.

Suggested change
file_bytes = base64.urlsafe_b64decode(base64_data)
file_bytes = base64.b64decode(base64_data)

Copilot uses AI. Check for mistakes.
- Add missing parentheses to .execute() method call
- Remove redundant pass statement after logger.debug
- Keep urlsafe_b64decode() for consistency with codebase convention

Note: Copilot suggested using b64decode() instead of urlsafe_b64decode(),
but the codebase consistently uses urlsafe_b64decode() for Gmail message
body data (lines 72, 87 in gmail_tools.py). We follow the existing
codebase convention for consistency.
@joshed-io
Copy link
Contributor Author

@taylorwilsdon i've addressed the ruff issue and copilot comments 👍🏻

@taylorwilsdon
Copy link
Owner

@taylorwilsdon i've addressed the ruff issue and copilot comments 👍🏻

Awesome, thanks - will take it for a spin!

@taylorwilsdon
Copy link
Owner

Just gave it a go and it works great! The only thing that might be wonky is it names the file just "attachment" with no file type, so its possible some clients might choke trying to do something with the file itself.
image

image

Perhaps a later version could intelligently append the correct filetype to the hosted filename, but for now this is perfect. Let's get it merged!

@taylorwilsdon taylorwilsdon merged commit 3adcbcd into taylorwilsdon:main Dec 8, 2025
1 check passed
@joshed-io
Copy link
Contributor Author

@taylorwilsdon Great! Yes, good catch, when I get a minute I'll look at how we can fix that issue with the attachment name.

@joshed-io
Copy link
Contributor Author

Screenshot 2025-12-09 at 12 53 55

@taylorwilsdon Seems there might be two ways to fix this. See if you like/prefer either one. I think I have a preference for option 1 since it removes the need to re-fetch any metadata and the agent will have the filename and mime type from previous calls - so it will generally work, and if for some reason the agent can't pass filename and mime type, the attachment is still fetchable, just without the name. On the other hand, the encapsulation + interface are better with option 2 - it's cleaner if the tool signature doesn't ask for a filename + mime type to be passed to get the content of the file. Also option 2 doesn't create any user-facing tool function changes, which may be desirable if this could break some users of the project. Let me know if you have an opinion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants