Skip to content

Comments

Fix potential path traversal and local file read vulnerabilities#197

Merged
LarFii merged 2 commits intoHKUDS:mainfrom
RinZ27:fix/path-traversal-vulnerabilities
Feb 20, 2026
Merged

Fix potential path traversal and local file read vulnerabilities#197
LarFii merged 2 commits intoHKUDS:mainfrom
RinZ27:fix/path-traversal-vulnerabilities

Conversation

@RinZ27
Copy link
Contributor

@RinZ27 RinZ27 commented Feb 14, 2026

Description

Fixed multiple security vulnerabilities related to insecure path handling in both the document parsing phase and the multimodal query phase. These changes ensure that the system does not accidentally read or leak sensitive local files when processing untrusted document content or retrieval context.

Related Issues

None.

Changes Made

  • raganything/parser.py: Added a boundary check in MinerUParser._read_output_files to ensure that resolved image paths from MinerU's output JSON are strictly within the intended output directory.
  • raganything/query.py: Enhanced _process_image_paths_for_vlm to validate that any image paths matched in the retrieval context (via "Image Path:" markers) reside within safe, predefined directories (CWD, working directory, or output directory).
  • raganything/utils.py: Hardened validate_image_file to explicitly block symbolic links, preventing symlink-based path traversal attacks.

Checklist

  • Changes reviewed for security impact
  • Code follows project conventions
  • Path sanitization logic tested against boundary cases

Additional Notes

I noticed the VLM enhanced query mode uses regex to find image paths in the retrieved context. While powerful, this mechanism is susceptible to indirect prompt injection if a malicious document contains text like Image Path: /etc/passwd. These fixes add several layers of defense-in-depth to mitigate this and similar risks.

@LarFii
Copy link
Collaborator

LarFii commented Feb 17, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af0ecbe5e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

I noticed that image paths extracted from document parsing results and retrieval context weren't being properly sanitized before file operations. This could allow a malicious document to trick the VLM into reading sensitive system files via indirect prompt injection or crafted parser output.

Added directory boundary checks in the MinerU parser, path validation in the VLM query mixin, and symlink blocking in the image validation utility.
@RinZ27 RinZ27 force-pushed the fix/path-traversal-vulnerabilities branch from af0ecbe to fc6fca7 Compare February 18, 2026 02:57
@RinZ27
Copy link
Contributor Author

RinZ27 commented Feb 18, 2026

Updated the logic to use Path.is_relative_to() instead of a string-based prefix check. This ensures that the resolved path is strictly contained within the base directory, even if other directories share a common name prefix.

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

@LarFii
Copy link
Collaborator

LarFii commented Feb 18, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc6fca7934

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@LarFii
Copy link
Collaborator

LarFii commented Feb 20, 2026

There are some lint errors that need to be fixed by running:

pre-commit run --all-files

@LarFii
Copy link
Collaborator

LarFii commented Feb 20, 2026

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@LarFii LarFii merged commit 16ce0f8 into HKUDS:main Feb 20, 2026
1 check failed
@RinZ27 RinZ27 deleted the fix/path-traversal-vulnerabilities branch February 21, 2026 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants