-
Notifications
You must be signed in to change notification settings - Fork 18
Add python-magic for MIME type detection from file contents #3172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add python-magic for MIME type detection from file contents #3172
Conversation
- Add python-magic==0.4.27 dependency to pyproject.toml - Modify ensure_filename_with_extension to accept optional contents parameter - Use python-magic to detect MIME type when provided MIME type is application/octet-stream - Update call sites in upload.py to pass file contents - Add comprehensive tests with mocking for python-magic functionality - Handle cases where libmagic is not available with graceful fallback Co-Authored-By: [email protected] <[email protected]>
|
@codex review |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| """Tests for file extension inference utilities.""" | ||
|
|
||
| import pytest | ||
| from io import BytesIO | ||
| from unittest.mock import patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place new extension tests under collected test paths
Pytest is configured via pyproject.toml with testpaths = ["tests"], and the default make test just runs pytest, so modules outside that directory are skipped. This new test module lives under src/vellum/utils/files/tests, meaning none of the python-magic coverage will execute in CI. Consider moving it under tests/ or broadening testpaths so these tests actually run.
Useful? React with 👍 / 👎.
siddseethepalli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, we'll need to add the libmagic dep to the PWS Dockerfile for this to work right?
|
@noanflaherty what do you plan on doing with this? |
siddseethepalli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably ok, but if we just do this on the django side we can accomplish the same thing right?
| filename: Optional filename provided by the user | ||
| mime_type: The MIME type of the file (e.g., "application/pdf", "image/png"). This'll be used to infer the | ||
| extension if the filename lacks one. | ||
| contents: Optional file contents (bytes or file-like object) to use for MIME type detection via python-magic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[q] echoing Sidd's question, especially this will require Dockerfile changes. Do we need to use the inferred mime type on the sdk side before we call the upload api?
Implements python-magic for MIME type detection from file contents, similar to the pattern used in https://github.com/vellum-ai/vellum/pull/17154.
Summary
When uploading files with generic MIME types (specifically
application/octet-stream), the SDK now uses python-magic to detect the actual file type from the file contents. This enables proper file extension inference for files without explicit MIME type information.Key changes:
python-magic==0.4.27dependencyensure_filename_with_extension()to accept optionalcontentsparameter (bytes or IO)application/octet-streamupload.pyto pass file contents for detectionBehavior:
application/octet-streamAND contents are providedtext/html; charset=utf-8→text/html)Review & Testing Checklist for Human
Verify libmagic is installed in CI/production environments - python-magic is just a Python wrapper around the system
libmagiclibrary. Without it, the detection will silently fail and fall back to.binextension. Check that CI images and production environments have libmagic installed (e.g.,libmagic1on Debian/Ubuntu).Test with real files - The tests use mocking, so they don't verify actual python-magic integration. Test uploading files with
application/octet-streamMIME type (e.g., files without extensions) to verify detection works correctly in a real environment.Review scope limitation - The implementation only uses python-magic for
application/octet-streamMIME type. Consider whether this should be expanded to other generic MIME types or if this narrow scope is intentional.Test Plan
application/octet-streamMIME type (e.g., a PDF without extension).pdf).binNotes
contentsparameter continue to work unchanged