Fix potential path traversal and local file read vulnerabilities#197
Fix potential path traversal and local file read vulnerabilities#197LarFii merged 2 commits intoHKUDS:mainfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af0ecbe5e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
I noticed that image paths extracted from document parsing results and retrieval context weren't being properly sanitized before file operations. This could allow a malicious document to trick the VLM into reading sensitive system files via indirect prompt injection or crafted parser output. Added directory boundary checks in the MinerU parser, path validation in the VLM query mixin, and symlink blocking in the image validation utility.
af0ecbe to
fc6fca7
Compare
|
Updated the logic to use |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc6fca7934
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
There are some lint errors that need to be fixed by running:
|
|
@codex review |
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Description
Fixed multiple security vulnerabilities related to insecure path handling in both the document parsing phase and the multimodal query phase. These changes ensure that the system does not accidentally read or leak sensitive local files when processing untrusted document content or retrieval context.
Related Issues
None.
Changes Made
MinerUParser._read_output_filesto ensure that resolved image paths from MinerU's output JSON are strictly within the intended output directory._process_image_paths_for_vlmto validate that any image paths matched in the retrieval context (via "Image Path:" markers) reside within safe, predefined directories (CWD, working directory, or output directory).validate_image_fileto explicitly block symbolic links, preventing symlink-based path traversal attacks.Checklist
Additional Notes
I noticed the VLM enhanced query mode uses regex to find image paths in the retrieved context. While powerful, this mechanism is susceptible to indirect prompt injection if a malicious document contains text like
Image Path: /etc/passwd. These fixes add several layers of defense-in-depth to mitigate this and similar risks.