-
Notifications
You must be signed in to change notification settings - Fork 67
feat(backend): implement git clone local caching with user_id-based isolation #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…solation
Add git clone caching using --reference mode to reduce network transfer
and disk usage when cloning repositories multiple times.
Core features:
- Local cache using Git --reference mode (95% bandwidth savings)
- Automatic cache updates before use
- User isolation based on database user.id (not username)
- Permission handling with automatic token refresh
- Fallback to regular clone on cache failure
- Disabled by default, enabled via GIT_CACHE_ENABLED env var
Implementation:
- shared/utils/git_cache.py: Cache management with user_id validation
- get_cache_user_id(): Strict validation, requires valid user_id
- Cache structure: /git-cache/user_{user_id}/{domain}/{path}.git
- shared/utils/git_util.py: Integration with git clone workflow
- _clone_with_reference(): Clone using cached objects
- _clone_without_cache(): Regular clone fallback
- executor_manager/executors/docker/executor.py:
- Extract user_id from task.user.id
- Pass GIT_CACHE_USER_ID to containers for cache isolation
- executor_manager/config/config.py: Cache configuration
- executor_manager/executors/docker/constants.py: Cache constants
- docker-compose.yml: Add git_cache_data volume
Security:
- User isolation enforced via database user.id (immutable, unique)
- Strict validation: user_id must be positive integer
- Token isolation: each cache uses its owner's token
- Path traversal protection via integer validation
Testing:
- shared/tests/utils/test_git_cache.py: Comprehensive user_id tests
- shared/tests/utils/test_git_util.py: Cache integration tests
- Security tests for injection attacks
- Edge cases: large IDs, missing values, invalid types
Documentation:
- docs/git-cache-implementation.md: Complete implementation guide
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…rity
Implements physical isolation at container level to prevent cross-user
cache access in the git clone local caching feature.
## Security Improvements
**Container-Level Isolation**
- Each executor container now mounts only its own /git-cache/user_{user_id} subdirectory
- Other users' cache directories are completely invisible within containers
- Prevents lateral movement even if container is compromised
**Path Validation**
- Added _validate_cache_path() function to prevent path traversal attacks
- Ensures all cache operations stay within user's allowed directory
- Blocks attempts like ../../../user_456/ malicious access
**Environment Variables**
- Added GIT_CACHE_USER_BASE_DIR for user-specific cache root
- executor_manager sets this to /git-cache/user_{user_id} for each container
## Code Changes
**executor_manager/executors/docker/executor.py**
- Modified _add_git_cache_mount() to mount user subdirectory only
- Mount point: git_cache_data:/git-cache/user_{user_id}
- Added user_id validation (must be positive integer)
**shared/utils/git_cache.py**
- Added get_user_cache_base_dir() function
- Added _validate_cache_path() security function
- Updated get_cache_repo_path() with path validation
- Updated ensure_cache_repo() with security checks
**shared/tests/utils/test_git_cache.py**
- Added TestGitCacheSecureIsolation test class (5 tests)
- Added TestGitCachePathValidation test class (4 tests)
- All setup_method() updated to include new env vars
**docs/git-cache-implementation.md**
- Updated to reflect container-level isolation
- Added security comparison (before/after)
- Added version history (v1.1.0)
**docs/git-cache-security-isolation-design.md**
- New comprehensive security design document
## Security Impact
| Aspect | Before | After |
|--------|--------|-------|
| Isolation | Logical only | Physical + Application |
| Cross-user access | Possible | Impossible |
| Data leakage risk | High | Very Low |
| Container visibility | All user dirs | Own user dir only |
## Testing
All new tests verify:
- User base directory resolution
- Path validation prevents traversal
- Cross-user access is blocked
- Cache paths stay within allowed boundaries
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…_CACHE_DIR This commit simplifies the git cache implementation by removing the redundant GIT_CACHE_DIR environment variable and consolidating to use only GIT_CACHE_USER_BASE_DIR for user-level cache isolation. ## Changes ### Core Implementation - Remove DEFAULT_CACHE_DIR and ENV_CACHE_DIR constants - Remove get_cache_dir() function (no longer used) - Remove get_reference_path() function (unused) - Remove cache_dir parameter from get_cache_repo_path(url) - Update get_user_cache_base_dir() to require GIT_CACHE_USER_BASE_DIR ### Executor Manager - Remove GIT_CACHE_DIR from config.py - Remove passing GIT_CACHE_DIR to executor containers - Only pass GIT_CACHE_USER_BASE_DIR (accurate mount point) ### Environment Variables - executor_manager container: GIT_CACHE_ENABLED, GIT_CACHE_AUTO_UPDATE - executor container: GIT_CACHE_ENABLED, GIT_CACHE_USER_BASE_DIR, GIT_CACHE_USER_ID, GIT_CACHE_AUTO_UPDATE ### Tests - Remove test_get_cache_dir_default and test_get_cache_dir_custom - Remove test_get_user_cache_base_dir_fallback - Remove test_get_cache_repo_path_with_custom_cache_dir - Add test_get_user_cache_base_dir_raises_error_without_env_var - Remove test_cache_uses_user_id_not_git_username (misleading) - Clean up environment variable clearing in test setup ### Documentation - Update git-cache-implementation.md - Remove git-cache-security-isolation-design.md (outdated) - Update docker-compose.yml example ## Benefits - Clearer separation: GIT_CACHE_USER_BASE_DIR is the single source of truth - Eliminates confusion between GIT_CACHE_DIR and actual mount point - Simpler code with fewer variables to maintain - More explicit error messages when required env vars are missing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…ment variable (#2) * Initial plan * fix(shared): add missing GIT_CACHE_USER_BASE_DIR env var to git_cache tests Co-authored-by: graindt <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: graindt <[email protected]>
📝 WalkthroughWalkthroughAdds an opt-in per-user Git object cache: config flags, Docker constants and executor mounts, per-user Docker volume manager, cache lifecycle (ensure/update) and cleanup manager with scheduler integration, cache-aware clone flow using --reference with fallbacks, comprehensive tests, and implementation documentation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Executor as DockerExecutor
participant VolumeMgr as GitCacheVolumeMgr
participant Cache as GitCacheModule
participant GitRemote as Git Remote
Client->>Executor: submit task (repo_url, auth, user_id)
Executor->>Executor: extract user_id, check GIT_CACHE_ENABLED
alt Cache enabled
Executor->>VolumeMgr: ensure user volume exists (user_id)
VolumeMgr-->>Executor: volume name / mount ready
Executor->>Cache: get_cache_repo_path(repo_url)
Cache-->>Executor: cache_path
Executor->>Cache: ensure_cache_repo(cache_path, auth)
alt cache prepared
Executor->>GitRemote: git clone --reference cache_path (auth)
else cache prep failed
Executor->>GitRemote: fallback git clone (no reference)
end
GitRemote-->>Executor: clone result
else Cache disabled
Executor->>GitRemote: standard git clone
GitRemote-->>Executor: clone result
end
Executor->>Executor: setup hooks, run task
Executor-->>Client: task complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (15)
executor_manager/executors/docker/executor.py (3)
602-602: UseOptional[int]instead of implicitOptional.Per PEP 484,
int = Noneas a default doesn't make the type hint optional. UseOptional[int]for clarity.🔎 Proposed fix
- def _add_git_cache_mount(self, cmd: List[str], user_id: int = None) -> None: + def _add_git_cache_mount(self, cmd: List[str], user_id: Optional[int] = None) -> None:Also add
Optionalto the imports at line 18:-from typing import Any, Dict, List, Optional, Tuple +from typing import Any, Dict, List, Optional, Tuple(Already imported, so just update the type hint)
653-653: Remove extraneous f-string prefix.The string
f"GIT_CACHE_ENABLED=true"has no placeholders.🔎 Proposed fix
cmd.extend([ - "-e", f"GIT_CACHE_ENABLED=true", + "-e", "GIT_CACHE_ENABLED=true", "-e", f"GIT_CACHE_USER_BASE_DIR={user_cache_subdir}",
629-634: Useraise ... from eto preserve exception chain.When re-raising exceptions in the
exceptclause, useraise ... from eto maintain the exception chain for better debugging.🔎 Proposed fix
try: user_id_int = int(user_id) if user_id_int <= 0: - raise ValueError(f"Invalid user_id: {user_id}. Must be a positive integer.") + raise ValueError(f"Invalid user_id: {user_id}. Must be a positive integer.") from None except (ValueError, TypeError) as e: - raise ValueError(f"Invalid user_id for git cache: {e}") + raise ValueError(f"Invalid user_id for git cache: {e}") from eshared/tests/utils/test_git_cache.py (1)
1-16: LGTM!Test module structure is clean with proper imports and organization.
Minor note:
MagicMockis imported but appears unused in the visible test code. Consider removing if not needed elsewhere.shared/utils/git_util.py (3)
147-147: Remove extraneous f-string prefix.🔎 Proposed fix
- logger.info(f"Git cache is enabled, attempting clone with cache") + logger.info("Git cache is enabled, attempting clone with cache")
215-216: Remove unused variableresult.The
resultfromsubprocess.runis assigned but never used.🔎 Proposed fix
- result = subprocess.run( + subprocess.run( cmd, capture_output=True, text=True, check=True, timeout=600 )
237-239: Consider documenting whyusernameandtokenparameters are unused.These parameters are passed but not used in the function body. If they're kept for API consistency or future SSH fallback support, add a brief comment to clarify.
🔎 Proposed fix
def _clone_without_cache( - project_url, branch, project_path, auth_url, username, token + project_url, branch, project_path, auth_url, username, token # noqa: ARG001 ): """ Clone repository without using cache (regular clone). Args: project_url: Original repository URL branch: Branch to clone project_path: Local destination path auth_url: Authenticated URL - username: Git username (for SSH fallback) - token: Git token (for SSH fallback) + username: Git username (unused, kept for API consistency with potential SSH fallback) + token: Git token (unused, kept for API consistency with potential SSH fallback) Returns: Tuple (success, error_message) """docs/git-cache-implementation.md (3)
38-59: Add language specifier to code blocks.Several code blocks are missing language identifiers which helps with syntax highlighting.
🔎 Proposed fix
-``` +```text /git-cache/ (git_cache_data Volume) ├── user_123/Apply similar changes to code blocks at lines 63, 453, 468, and 485.
1-3: Consider placing documentation in language-specific directories.Per coding guidelines, detailed documentation for new features should be placed in
docs/en/anddocs/zh/directories. Since this documentation is in Chinese, consider moving it todocs/zh/git-cache-implementation.mdand creating an English version atdocs/en/git-cache-implementation.md.Based on coding guidelines.
719-727: Use heading instead of bold text for version section.Line 721 uses bold emphasis (
**初始实现**) where a proper heading would be more appropriate for document structure.🔎 Proposed fix
### v1.0.0 (2025-01-01) -**初始实现** +#### 初始实现 - Git `--reference` 模式缓存shared/utils/git_cache.py (4)
96-105: Preserve exception chain withraise ... from e.When re-raising exceptions, use
raise ... from eto maintain the full exception chain for better debugging.🔎 Proposed fix
try: user_id = int(user_id_str) if user_id <= 0: raise ValueError(f"Invalid user_id: {user_id}. Must be a positive integer.") return user_id except ValueError as e: raise ValueError( f"Invalid {ENV_CACHE_USER_ID} value: '{user_id_str}'. " f"Must be a valid integer. Error: {e}" - ) + ) from eBased on static analysis hints.
209-211: Update type hint to use explicit| Nonesyntax.The
branchparameter should use explicit optional type syntax per PEP 484 and coding guidelines.🔎 Proposed fix
def ensure_cache_repo( - cache_path: str, auth_url: str, branch: str = None + cache_path: str, auth_url: str, branch: str | None = None ) -> tuple[bool, str]:As per coding guidelines requiring type hints compliance.
274-288: Minor cleanup: unused variable and unnecessary f-string.The
resultvariable on Line 274 is unused (exception will be raised if command fails due tocheck=True), and Line 286 has an f-string without placeholders.🔎 Proposed fixes
- result = subprocess.run( + subprocess.run( cmd, capture_output=True, text=True, check=True, timeout=300 ) logger.info(f"Cache repository created: {cache_path}") return True, None except subprocess.CalledProcessError as e: error_msg = e.stderr if e.stderr else str(e) logger.error(f"Failed to create cache repository: {error_msg}") return False, error_msg except subprocess.TimeoutExpired: - error_msg = f"Timeout creating cache repository after 300 seconds" + error_msg = "Timeout creating cache repository after 300 seconds" logger.error(error_msg)Based on static analysis hints.
336-352: Minor cleanup: unused variable and unnecessary f-string.Similar to
ensure_cache_repo, Line 336 has an unusedresultvariable and Line 350 has an unnecessary f-string prefix.🔎 Proposed fixes
- result = subprocess.run( + subprocess.run( cmd, cwd=cache_path, capture_output=True, text=True, check=True, timeout=300 ) logger.info(f"Cache repository updated: {cache_path}") return True, None except subprocess.CalledProcessError as e: error_msg = e.stderr if e.stderr else str(e) logger.warning(f"Failed to update cache repository: {error_msg}") # Don't fail the operation if cache update fails # The cache can still be used as-is return True, None except subprocess.TimeoutExpired: - logger.warning(f"Timeout updating cache repository after 300 seconds") + logger.warning("Timeout updating cache repository after 300 seconds")Based on static analysis hints.
shared/tests/utils/test_git_util.py (1)
422-532: Good edge case coverage.The edge case tests properly handle timeout errors, user ID validation, and branch parameter variations (None and empty string). Environment variable cleanup in the
finallyblock is good practice.If you want to be pedantic about unused variables, you could prefix
errorwith_on lines 480, 505, and 526 where it's unpacked but not used:success, _ = clone_repo_with_token(...)However, this is a minor style issue in test code and not critical.
Based on static analysis hints.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
docker-compose.ymldocs/git-cache-implementation.mdexecutor_manager/config/config.pyexecutor_manager/executors/docker/constants.pyexecutor_manager/executors/docker/executor.pyshared/tests/utils/test_git_cache.pyshared/tests/utils/test_git_util.pyshared/utils/git_cache.pyshared/utils/git_util.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
executor_manager/config/config.pyshared/utils/git_util.pyshared/tests/utils/test_git_cache.pyshared/utils/git_cache.pyexecutor_manager/executors/docker/constants.pyshared/tests/utils/test_git_util.pyexecutor_manager/executors/docker/executor.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
executor_manager/config/config.pyshared/utils/git_util.pyshared/tests/utils/test_git_cache.pyshared/utils/git_cache.pyexecutor_manager/executors/docker/constants.pyshared/tests/utils/test_git_util.pyexecutor_manager/executors/docker/executor.py
docs/**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Detailed documentation for new features MUST be placed in
docs/en/anddocs/zh/directories with reference from AGENTS.md
Files:
docs/git-cache-implementation.md
🧬 Code graph analysis (3)
shared/utils/git_util.py (1)
shared/utils/git_cache.py (3)
ensure_cache_repo(209-291)get_cache_repo_path(139-206)is_cache_enabled(30-37)
shared/tests/utils/test_git_cache.py (1)
shared/utils/git_cache.py (6)
get_cache_repo_path(139-206)get_cache_user_id(75-105)get_user_cache_base_dir(40-62)is_auto_update_enabled(65-72)is_cache_enabled(30-37)_validate_cache_path(108-136)
shared/tests/utils/test_git_util.py (1)
shared/utils/git_util.py (4)
_clone_with_reference(185-234)_clone_without_cache(237-289)clone_repo_with_token(102-182)is_gerrit_url(80-99)
🪛 markdownlint-cli2 (0.18.1)
docs/git-cache-implementation.md
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
63-63: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
453-453: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
468-468: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
485-485: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
721-721: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.10)
shared/utils/git_util.py
147-147: f-string without any placeholders
Remove extraneous f prefix
(F541)
215-215: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
215-215: subprocess call: check for execution of untrusted input
(S603)
222-222: Consider moving this statement to an else block
(TRY300)
226-226: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
230-230: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
232-232: Do not catch blind exception: Exception
(BLE001)
233-233: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
238-238: Unused function argument: username
(ARG001)
238-238: Unused function argument: token
(ARG001)
268-268: subprocess call: check for execution of untrusted input
(S603)
277-277: Consider moving this statement to an else block
(TRY300)
281-281: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
285-285: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
shared/utils/git_cache.py
58-61: Avoid specifying long messages outside the exception class
(TRY003)
90-94: Avoid specifying long messages outside the exception class
(TRY003)
99-99: Abstract raise to an inner function
(TRY301)
99-99: Avoid specifying long messages outside the exception class
(TRY003)
100-100: Consider moving this statement to an else block
(TRY300)
102-105: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
102-105: Avoid specifying long messages outside the exception class
(TRY003)
130-133: Avoid specifying long messages outside the exception class
(TRY003)
202-202: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
210-210: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
231-231: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
242-242: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
274-274: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
274-274: subprocess call: check for execution of untrusted input
(S603)
283-283: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
286-286: f-string without any placeholders
Remove extraneous f prefix
(F541)
287-287: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
289-289: Do not catch blind exception: Exception
(BLE001)
290-290: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
295-295: Unused function argument: branch
(ARG001)
295-295: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
295-295: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
313-313: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
325-325: subprocess call: check for execution of untrusted input
(S603)
329-329: Do not catch blind exception: Exception
(BLE001)
336-336: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
336-336: subprocess call: check for execution of untrusted input
(S603)
341-341: Consider moving this statement to an else block
(TRY300)
350-350: f-string without any placeholders
Remove extraneous f prefix
(F541)
353-353: Do not catch blind exception: Exception
(BLE001)
shared/tests/utils/test_git_util.py
47-47: Probable insecure usage of temporary file or directory: "/tmp/test-repo"
(S108)
49-49: Possible hardcoded password assigned to: "token"
(S105)
81-81: Probable insecure usage of temporary file or directory: "/tmp/test-repo"
(S108)
83-83: Possible hardcoded password assigned to: "token"
(S105)
122-122: Probable insecure usage of temporary file or directory: "/tmp/test-repo"
(S108)
124-124: Possible hardcoded password assigned to: "token"
(S105)
147-147: Probable insecure usage of temporary file or directory: "/tmp/test-repo"
(S108)
149-149: Possible hardcoded password assigned to: "token"
(S105)
172-172: Probable insecure usage of temporary file or directory: "/tmp/test-repo"
(S108)
212-212: Probable insecure usage of temporary file or directory: "/tmp/test-repo"
(S108)
442-442: Probable insecure usage of temporary file or directory: "/tmp/test-repo"
(S108)
444-444: Possible hardcoded password assigned to: "token"
(S105)
473-473: Possible hardcoded password assigned to: "token"
(S105)
480-480: Unpacked variable error is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
481-481: Probable insecure usage of temporary file or directory: "/tmp/repo"
(S108)
501-501: Probable insecure usage of temporary file or directory: "/tmp/repo"
(S108)
503-503: Possible hardcoded password assigned to: "token"
(S105)
505-505: Unpacked variable error is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
522-522: Probable insecure usage of temporary file or directory: "/tmp/repo"
(S108)
524-524: Possible hardcoded password assigned to: "token"
(S105)
526-526: Unpacked variable error is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
executor_manager/executors/docker/executor.py
602-602: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
623-626: Avoid specifying long messages outside the exception class
(TRY003)
632-632: Abstract raise to an inner function
(TRY301)
632-632: Avoid specifying long messages outside the exception class
(TRY003)
634-634: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
634-634: Avoid specifying long messages outside the exception class
(TRY003)
653-653: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test wegent CLI Integration
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: Test Frontend
🔇 Additional comments (19)
executor_manager/executors/docker/constants.py (1)
32-36: LGTM!The Git cache constants are well-defined with clear comments. The volume name
wegent_git_cache_datacorrectly includes the project prefix to avoid conflicts with other Docker projects.executor_manager/config/config.py (1)
58-61: LGTM!The Git cache configuration variables have sensible defaults: cache is disabled by default (safe) and auto-update is enabled by default (ensures fresh data when cache is used).
docker-compose.yml (2)
149-159: LGTM!The Git cache configuration is properly set up as opt-in. The volume mount (
git_cache_data:/git-cache) is always present, which is correct since the executor_manager needs access to create per-user subdirectories when cache is enabled.Note: Docker Compose automatically prefixes volumes with the project name, so
git_cache_databecomeswegent_git_cache_data, correctly matchingGIT_CACHE_VOLUMEin constants.py.
172-173: LGTM!Volume declaration is properly added with the local driver.
executor_manager/executors/docker/executor.py (2)
34-36: LGTM!The Git cache constants are correctly imported for use in the mount logic.
150-160: LGTM!The
user_idextraction fromtask.user.idis correctly implemented and propagated through the task info dictionary.shared/tests/utils/test_git_cache.py (4)
19-107: LGTM! Comprehensive user_id validation tests.Excellent coverage of edge cases including valid values, missing values, invalid strings, zero, negative numbers, floats, empty strings, and whitespace.
109-251: LGTM! Good coverage of core functionality.Tests cover enabled/disabled states, case insensitivity, auto-update settings, and cache path generation for various URL formats (GitHub, GitLab, Gerrit, SSH, nested paths).
253-311: LGTM! Security-focused tests.Good coverage of user isolation and injection prevention. The injection test cases (SQL injection, command injection patterns) validate input sanitization.
449-513: LGTM! Path validation tests are well-designed.Tests for
_validate_cache_pathproperly verify:
- Valid paths within allowed directories
- Rejection of paths outside allowed directories
- Path traversal attack prevention (
../patterns)shared/utils/git_util.py (3)
10-14: LGTM!Cache utility imports are properly organized.
145-182: LGTM! Well-structured cache integration with proper fallback.The cache flow correctly:
- Checks if cache is enabled
- Prepares the cache repository
- Attempts reference-based clone
- Falls back to regular clone on any failure
185-234: LGTM! Reference clone implementation is correct.The
--referenceclone with proper timeout handling and fallback support is well-implemented.shared/utils/git_cache.py (4)
1-28: LGTM! Well-structured module.The module header, imports, and constants are clean and follow Python best practices. The docstring clearly describes the purpose and key security aspect (user_id-based isolation).
30-73: LGTM! Configuration getters are well-implemented.These functions properly retrieve and validate configuration from environment variables. The error message in
get_user_cache_base_dir()is helpful for debugging.
108-137: LGTM! Path traversal protection is correctly implemented.The validation logic properly prevents path traversal attacks by ensuring the cache path is within the allowed base directory using absolute path resolution and prefix checking.
139-207: LGTM! Cache path generation with proper validation.The function correctly handles both SSH and HTTPS Git URL formats, constructs a safe cache path, and validates it to prevent path traversal. The error handling appropriately logs and re-raises validation errors.
shared/tests/utils/test_git_util.py (2)
13-22: LGTM! Well-structured test class with proper setup.The test class follows pytest conventions and properly clears environment variables in
setup_methodto ensure test isolation.
23-236: Excellent test coverage for cache functionality!The test suite comprehensively covers:
- Cache-enabled happy path
- Fallback scenarios (cache prep failure, reference clone failure)
- Cache-disabled flow
- Command structure verification for both cache and non-cache paths
All tests use proper mocking and clear assertions.
| **关键安全特性**: | ||
| - ✅ **物理隔离**:每个容器只挂载自己的 `user_{user_id}` 子目录 | ||
| - ✅ **不可见**:容器内无法看到其他用户的 `user_*` 目录 | ||
| - ✅ **不可访问**:即使知道路径也无法访问(目录不存在) | ||
| - ✅ **双重防护**:容器层隔离 + 应用层路径验证 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find the documentation file and executor.py
find . -name "git-cache-implementation.md" -o -name "executor.py" | head -20Repository: wecode-ai/Wegent
Length of output: 193
🏁 Script executed:
# Also check the docs directory structure
ls -la docs/ 2>/dev/null | head -20Repository: wecode-ai/Wegent
Length of output: 616
🏁 Script executed:
# Search for Docker volume mount configuration in executor.py
rg -A 10 -B 5 "volume|docker.*mount" --type pyRepository: wecode-ai/Wegent
Length of output: 16055
🏁 Script executed:
# Read the documentation file to understand the security claims
cat -n docs/git-cache-implementation.md | sed -n '70,90p'Repository: wecode-ai/Wegent
Length of output: 619
🏁 Script executed:
# Check for references in AGENTS.md
grep -n "git-cache" docs/AGENTS.md 2>/dev/null || echo "No git-cache reference found in AGENTS.md"Repository: wecode-ai/Wegent
Length of output: 101
🏁 Script executed:
# Read more context around the isolation claims
cat -n docs/git-cache-implementation.md | sed -n '380,420p'Repository: wecode-ai/Wegent
Length of output: 1282
🏁 Script executed:
# Read the full _add_git_cache_mount method
cat -n executor_manager/executors/docker/executor.py | sed -n '200,270p'Repository: wecode-ai/Wegent
Length of output: 3328
🏁 Script executed:
# Check how the volume is created/initialized
rg -B 5 -A 10 "GIT_CACHE_VOLUME|git_cache_data" executor_manager/executors/docker/Repository: wecode-ai/Wegent
Length of output: 4032
🏁 Script executed:
# Check if there's any docker-compose file that defines volume structure
find . -name "docker-compose*" -o -name "compose*" | xargs grep -l "git_cache_data" 2>/dev/null | head -5Repository: wecode-ai/Wegent
Length of output: 80
🏁 Script executed:
# Search for where the git cache directory structure is created
rg -A 10 "user_\{user" executor_manager/ backend/Repository: wecode-ai/Wegent
Length of output: 4198
🏁 Script executed:
# Let me verify the exact mounting behavior by examining the full context
cat -n executor_manager/executors/docker/executor.py | sed -n '440,480p'Repository: wecode-ai/Wegent
Length of output: 1927
🏁 Script executed:
# Check if there's any initialization code that creates the user directories
rg -B 5 -A 15 "def _add_git_cache_mount" executor_manager/executors/docker/executor.pyRepository: wecode-ai/Wegent
Length of output: 1014
🏁 Script executed:
# Verify the docker-compose setup for git cache volume
cat docker-compose.yml | grep -A 10 "git_cache_data"Repository: wecode-ai/Wegent
Length of output: 368
🏁 Script executed:
# Check if there's any documentation or comments about the volume structure
rg -B 3 -A 10 "git_cache_data" . --type md --type txt --type py | head -80Repository: wecode-ai/Wegent
Length of output: 4317
🏁 Script executed:
# Look for where subdirectories might be created in the volume
rg -B 5 -A 10 "mkdir.*user_|makedirs.*user_" executor_manager/ backend/Repository: wecode-ai/Wegent
Length of output: 42
🏁 Script executed:
# Check if there's any initialization or setup script for the volume
find . -name "*.sh" -o -name "*.py" | xargs grep -l "git_cache_data\|user_.*mkdir" 2>/dev/null | head -10Repository: wecode-ai/Wegent
Length of output: 108
🏁 Script executed:
# Check the implementation to see how paths are actually used
rg -B 2 -A 8 "GIT_CACHE_USER_BASE_DIR" executor_manager/Repository: wecode-ai/Wegent
Length of output: 1504
🏁 Script executed:
# Verify the exact implementation and naming to confirm isolation understanding
cat -n executor_manager/executors/docker/executor.py | sed -n '330,390p'Repository: wecode-ai/Wegent
Length of output: 2788
🏁 Script executed:
# Verify how the volume structure is actually organized
rg -A 20 "user_123|user_456|挂载点为" docs/git-cache-implementation.md | head -40Repository: wecode-ai/Wegent
Length of output: 895
Move documentation to proper location per guidelines.
The documentation file should be placed in docs/en/git-cache-implementation.md and docs/zh/git-cache-implementation.md directories with a reference from docs/AGENTS.md, as specified in the coding guidelines for detailed documentation of new features.
🤖 Prompt for AI Agents
In docs/git-cache-implementation.md around lines 77 to 82, the file is in the
wrong location per project guidelines; move its content into two files
docs/en/git-cache-implementation.md and docs/zh/git-cache-implementation.md
(English and Chinese versions respectively), ensure the Chinese content stays in
the zh file and create or adapt an English translation for the en file, and then
add a reference/link to the new docs/en/git-cache-implementation.md and
docs/zh/git-cache-implementation.md from docs/AGENTS.md (update any
table-of-contents or section that lists feature docs) so the new docs are
discoverable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
shared/utils/git_cache.py (1)
294-296: Use explicit| Nonetype hint per PEP 484.The
auth_urlparameter still uses implicitOptional. This was partially addressed when removing thebranchparameter per the previous review, butauth_urlneeds the same fix.🔎 Proposed fix
def update_cache_repo( - cache_path: str, auth_url: str = None + cache_path: str, auth_url: str | None = None ) -> tuple[bool, str]:Based on static analysis hints and past review comments.
🧹 Nitpick comments (5)
shared/utils/git_cache.py (5)
101-105: Chain the original exception for better debugging.When re-raising an exception within an
exceptblock, usefrom eto preserve the exception chain. This helps with debugging by showing the original cause.🔎 Proposed fix
except ValueError as e: raise ValueError( f"Invalid {ENV_CACHE_USER_ID} value: '{user_id_str}'. " f"Must be a valid integer. Error: {e}" - ) + ) from e
182-186: Consider handling edge case when domain is empty.If a malformed URL is provided where both
parsed.netlocandparsed.hostnameareNoneor empty,domainwould be falsy. While_validate_cache_pathwould likely catch malformed paths downstream, explicit validation here would provide clearer error messages.🔎 Proposed fix
else: # Handle HTTPS/HTTP format: https://domain.com/path/repo.git parsed = urlparse(url) domain = parsed.netloc or parsed.hostname path = parsed.path + + if not domain: + raise ValueError(f"Could not extract domain from URL: {url}")
210-211: Use explicit| Nonetype hint per PEP 484.The
branchparameter uses implicitOptionalwhich is prohibited by PEP 484.🔎 Proposed fix
def ensure_cache_repo( - cache_path: str, auth_url: str, branch: str = None + cache_path: str, auth_url: str, branch: str | None = None ) -> tuple[bool, str]:Based on static analysis hints.
274-277: Remove unused variable and fix f-string.Minor cleanup:
resultis assigned but never used, and line 286 has an f-string without placeholders.🔎 Proposed fix
- result = subprocess.run( + subprocess.run( cmd, capture_output=True, text=True, check=True, timeout=300 )And for line 286:
- error_msg = f"Timeout creating cache repository after 300 seconds" + error_msg = "Timeout creating cache repository after 300 seconds"Based on static analysis hints.
335-340: Minor cleanup: unused variable and f-string without placeholders.🔎 Proposed fix
- result = subprocess.run( + subprocess.run( cmd, cwd=cache_path, capture_output=True, text=True, check=True, timeout=300 )And for line 349:
- logger.warning(f"Timeout updating cache repository after 300 seconds") + logger.warning("Timeout updating cache repository after 300 seconds")Based on static analysis hints.
Also applies to: 349-349
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/utils/git_cache.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
shared/utils/git_cache.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
shared/utils/git_cache.py
🪛 Ruff (0.14.10)
shared/utils/git_cache.py
58-61: Avoid specifying long messages outside the exception class
(TRY003)
90-94: Avoid specifying long messages outside the exception class
(TRY003)
99-99: Abstract raise to an inner function
(TRY301)
99-99: Avoid specifying long messages outside the exception class
(TRY003)
100-100: Consider moving this statement to an else block
(TRY300)
102-105: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
102-105: Avoid specifying long messages outside the exception class
(TRY003)
130-133: Avoid specifying long messages outside the exception class
(TRY003)
202-202: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
210-210: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
231-231: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
242-242: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
274-274: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
274-274: subprocess call: check for execution of untrusted input
(S603)
283-283: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
286-286: f-string without any placeholders
Remove extraneous f prefix
(F541)
287-287: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
289-289: Do not catch blind exception: Exception
(BLE001)
290-290: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
295-295: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
312-312: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
324-324: subprocess call: check for execution of untrusted input
(S603)
328-328: Do not catch blind exception: Exception
(BLE001)
335-335: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
335-335: subprocess call: check for execution of untrusted input
(S603)
340-340: Consider moving this statement to an else block
(TRY300)
349-349: f-string without any placeholders
Remove extraneous f prefix
(F541)
352-352: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test wegent CLI Integration
- GitHub Check: Test Frontend
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: E2E Tests (Shard 3/3)
🔇 Additional comments (4)
shared/utils/git_cache.py (4)
30-73: LGTM!The configuration getter functions are clean, well-documented, and have sensible defaults.
108-137: LGTM!The path validation correctly prevents path traversal attacks by resolving absolute paths before checking containment.
342-355: Good resilience design for cache updates.The intentional decision to return
(True, None)on update failures ensures that cache update issues don't block the main clone operation. This is well-documented in the comments.
264-269:--branchworks correctly with--mirror; no compatibility issue.The
--mirrorand--branchflags are compatible and documented for use together. While--mirrorimplies--bare(no working tree),--branchcorrectly sets which ref HEAD points to in the mirrored repository. Git does not ignore--branch—it's a standard way to specify which branch should be HEAD in the mirror.Likely an incorrect or invalid review comment.
| cmd.extend([auth_url, cache_path]) | ||
|
|
||
| logger.info(f"Creating bare cache: {' '.join(cmd)}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: Avoid logging the command with credentials.
Line 272 logs the full command including auth_url, which may contain embedded credentials (e.g., https://[email protected]/...). This would expose sensitive tokens in logs.
🔎 Proposed fix
cmd.extend([auth_url, cache_path])
- logger.info(f"Creating bare cache: {' '.join(cmd)}")
+ # Avoid logging auth_url as it may contain credentials
+ logger.info(f"Creating bare cache at: {cache_path}")
result = subprocess.run(🤖 Prompt for AI Agents
In shared/utils/git_cache.py around lines 270 to 273, the logger.info currently
prints the full command including auth_url which may contain credentials; change
the logging to avoid exposing secrets by not logging auth_url directly — either
omit that element from the logged command or sanitize it first (use
urllib.parse.urlparse to remove username/password and reconstruct a safe URL or
replace credentials with a placeholder like ****), then log the sanitized or
truncated command (e.g., join cmd with the safe URL or log cmd without the URL
and separately log the remote host/path without credentials).
- Replace shared volume with per-user volumes (wegent_git_cache_user_{id})
- Add volume lifecycle management (create, delete, list, metadata)
- Implement automatic cleanup of inactive volumes (default 30 days)
- Simplify cache path from /git-cache/user_{id}/... to /git-cache/...
- Add 45 test cases covering volume management, cleanup, and security
- Update documentation to reflect physical isolation architecture
This upgrade provides true container-level isolation where each user's
cache is in a separate Docker volume, preventing cross-user access even
if the application layer is compromised. Trade-off: increased disk usage
for improved security, mitigated by automatic cleanup.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/git-cache-implementation.md (1)
1-901: Move documentation to proper location per guidelines.This issue was previously identified and remains unresolved. The documentation should be placed in
docs/en/git-cache-implementation.mdanddocs/zh/git-cache-implementation.mdwith references fromdocs/AGENTS.md.Based on learnings from previous review and coding guidelines for documentation structure.
shared/utils/git_cache.py (1)
213-295: Security: Avoid logging the command with credentials.Line 276 logs the full command including
auth_url, which may contain embedded credentials (e.g.,https://[email protected]/...). This exposes sensitive tokens in logs.🔎 Proposed fix
cmd.extend([auth_url, cache_path]) - logger.info(f"Creating bare cache: {' '.join(cmd)}") + # Avoid logging auth_url as it may contain credentials + logger.info(f"Creating bare cache at: {cache_path}") result = subprocess.run(
🧹 Nitpick comments (9)
executor_manager/executors/docker/git_cache_volume_manager.py (4)
114-114: Remove unused variable assignment.The
resultvariable is assigned but never used. Sincecheck=Trueis specified, the subprocess will raiseCalledProcessErroron failure, making the result variable unnecessary.🔎 Proposed fix
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=30, check=True) + subprocess.run(cmd, capture_output=True, text=True, timeout=30, check=True)
228-229: Remove unused variable assignment.Similar to line 114, the
resultvariable is assigned but never used.🔎 Proposed fix
- result = subprocess.run( - ["docker", "volume", "rm", volume_name], capture_output=True, text=True, timeout=30, check=True - ) + subprocess.run( + ["docker", "volume", "rm", volume_name], capture_output=True, text=True, timeout=30, check=True + )
119-128: Preferlogging.exceptionoverlogging.errorin exception handlers.Using
logger.exception()automatically includes the stack trace, which significantly aids debugging. This is a Python best practice for logging withinexceptblocks.🔎 Proposed fix
except subprocess.CalledProcessError as e: error_msg = e.stderr if e.stderr else str(e) - logger.error(f"Failed to create volume {volume_name}: {error_msg}") + logger.exception(f"Failed to create volume {volume_name}: {error_msg}") return False, error_msg except subprocess.TimeoutExpired: error_msg = "Timeout creating volume after 30 seconds" - logger.error(error_msg) + logger.exception(error_msg) return False, error_msg except Exception as e: - logger.error(f"Unexpected error creating volume: {e}") + logger.exception(f"Unexpected error creating volume: {e}") return False, str(e)
235-244: Preferlogging.exceptionoverlogging.errorin exception handlers.Same recommendation as the previous comment for better debugging with automatic stack traces.
🔎 Proposed fix
except subprocess.CalledProcessError as e: error_msg = e.stderr if e.stderr else str(e) - logger.error(f"Failed to delete volume {volume_name}: {error_msg}") + logger.exception(f"Failed to delete volume {volume_name}: {error_msg}") return False, error_msg except subprocess.TimeoutExpired: error_msg = "Timeout deleting volume after 30 seconds" - logger.error(error_msg) + logger.exception(error_msg) return False, error_msg except Exception as e: - logger.error(f"Unexpected error deleting volume: {e}") + logger.exception(f"Unexpected error deleting volume: {e}") return False, str(e)executor_manager/executors/docker/executor.py (1)
602-686: Physical isolation correctly implemented.The current implementation provides proper physical isolation by creating separate Docker volumes per user (
wegent_git_cache_user_{id}). Each user's volume is distinct at the storage layer, which addresses the concern raised in the past review comment.The approach is sound:
- User-specific volumes ensure physical isolation
- Consistent mount path (
/git-cache) simplifies container configuration- Volume creation with validation provides proper error handling
Minor style improvements:
Optional refactorings
Line 672 has an f-string without placeholders:
- cmd.extend([ - "-e", f"GIT_CACHE_ENABLED=true", + cmd.extend([ + "-e", "GIT_CACHE_ENABLED=true",For line 641, add exception chaining per PEP 3134:
except (ValueError, TypeError) as e: - raise ValueError(f"Invalid user_id for git cache: {e}") + raise ValueError(f"Invalid user_id for git cache: {e}") from eexecutor_manager/git_cache_cleanup.py (1)
202-217: Rename unused loop variable for clarity.The loop variable
user_idis not used within the loop body. Python convention is to use_or a prefixed name to indicate intentionally unused variables.🔎 Proposed fix
- for user_id, metadata in user_volumes.items(): + for _user_id, metadata in user_volumes.items(): volume_name = metadata["volume_name"]shared/tests/utils/test_git_cache_physical_isolation.py (1)
65-106: Security validation tests are comprehensive.The tests properly validate path traversal prevention and user ID validation across various invalid inputs. All assertions correctly catch expected errors.
Static analysis suggests using raw strings for regex patterns in
match=parameters. While the current code works correctly, you can optionally make the patterns explicit:🔎 Optional refinement
- with pytest.raises(ValueError, match="Security violation|outside allowed"): + with pytest.raises(ValueError, match=r"Security violation|outside allowed"): get_cache_repo_path(url)- with pytest.raises(ValueError, match="GIT_CACHE_USER_BASE_DIR.*required"): + with pytest.raises(ValueError, match=r"GIT_CACHE_USER_BASE_DIR.*required"): get_user_cache_base_dir()Based on static analysis hints.
docs/git-cache-implementation.md (1)
39-628: Optionally specify language identifiers for fenced code blocks.Multiple fenced code blocks lack language identifiers (lines 39, 59, 74, 561, 573, 598, 628), which improves syntax highlighting and readability. While this doesn't affect functionality, adding language identifiers is a good practice.
🔎 Example fix
-``` +```text Docker Volumes: ├── wegent_git_cache_user_123/Based on static analysis hints.
executor_manager/tests/test_git_cache_cleanup.py (1)
213-234: Remove unused mock parameter.The
mock_sizeparameter is defined but never used in this test since invalid date formats cause the volume to be skipped before size retrieval.🔎 Proposed fix
@patch("executor_manager.git_cache_cleanup.list_user_volumes") @patch("executor_manager.git_cache_cleanup.delete_volume") - @patch("executor_manager.git_cache_cleanup.get_volume_size") - def test_cleanup_handles_invalid_dates(self, mock_size, mock_delete, mock_list): + def test_cleanup_handles_invalid_dates(self, mock_delete, mock_list): """Test cleanup handles volumes with invalid date formats"""Based on static analysis hints.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docker-compose.ymldocs/git-cache-implementation.mdexecutor_manager/executors/docker/executor.pyexecutor_manager/executors/docker/git_cache_volume_manager.pyexecutor_manager/git_cache_cleanup.pyexecutor_manager/scheduler/scheduler.pyexecutor_manager/tests/executors/docker/__init__.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.pyexecutor_manager/tests/test_git_cache_cleanup.pyshared/tests/utils/test_git_cache_physical_isolation.pyshared/utils/git_cache.py
✅ Files skipped from review due to trivial changes (1)
- executor_manager/tests/executors/docker/init.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
executor_manager/executors/docker/git_cache_volume_manager.pyshared/utils/git_cache.pyexecutor_manager/executors/docker/executor.pyexecutor_manager/tests/test_git_cache_cleanup.pyexecutor_manager/git_cache_cleanup.pyexecutor_manager/scheduler/scheduler.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.pyshared/tests/utils/test_git_cache_physical_isolation.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
executor_manager/executors/docker/git_cache_volume_manager.pyshared/utils/git_cache.pyexecutor_manager/executors/docker/executor.pyexecutor_manager/tests/test_git_cache_cleanup.pyexecutor_manager/git_cache_cleanup.pyexecutor_manager/scheduler/scheduler.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.pyshared/tests/utils/test_git_cache_physical_isolation.py
docs/**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Detailed documentation for new features MUST be placed in
docs/en/anddocs/zh/directories with reference from AGENTS.md
Files:
docs/git-cache-implementation.md
🧠 Learnings (4)
📚 Learning: 2025-12-31T03:47:12.173Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-31T03:47:12.173Z
Learning: Applies to docs/**/*.md : Detailed documentation for new features MUST be placed in `docs/en/` and `docs/zh/` directories with reference from AGENTS.md
Applied to files:
docs/git-cache-implementation.md
📚 Learning: 2025-12-18T02:09:09.776Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-18T02:09:09.776Z
Learning: Test all commands before documenting them in AGENTS.md
Applied to files:
docs/git-cache-implementation.md
📚 Learning: 2025-12-18T02:09:09.776Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-18T02:09:09.776Z
Learning: AI agents MUST update relevant documentation immediately after completing significant code changes
Applied to files:
docs/git-cache-implementation.md
📚 Learning: 2025-12-18T02:09:09.776Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-18T02:09:09.776Z
Learning: Applies to **/*.test.{py,ts,tsx}|**/*_test.py : Keep tests independent and isolated
Applied to files:
shared/tests/utils/test_git_cache_physical_isolation.py
🧬 Code graph analysis (7)
executor_manager/executors/docker/git_cache_volume_manager.py (1)
shared/logger.py (1)
setup_logger(75-159)
shared/utils/git_cache.py (1)
shared/logger.py (1)
setup_logger(75-159)
executor_manager/tests/test_git_cache_cleanup.py (1)
executor_manager/git_cache_cleanup.py (2)
cleanup_inactive_volumes(69-176)get_volume_stats(178-225)
executor_manager/git_cache_cleanup.py (1)
executor_manager/executors/docker/git_cache_volume_manager.py (3)
list_user_volumes(248-298)delete_volume(216-245)get_volume_size(301-333)
executor_manager/scheduler/scheduler.py (1)
executor_manager/git_cache_cleanup.py (2)
GitCacheCleanupManager(24-225)cleanup_inactive_volumes(69-176)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py (1)
executor_manager/executors/docker/git_cache_volume_manager.py (9)
get_user_volume_name(35-45)volume_exists(48-68)create_user_volume(71-129)get_volume_metadata(132-159)update_volume_last_used(162-213)delete_volume(216-245)list_user_volumes(248-298)get_volume_size(301-333)get_all_user_volume_names(336-357)
shared/tests/utils/test_git_cache_physical_isolation.py (2)
shared/utils/git_cache.py (3)
get_cache_repo_path(140-210)get_user_cache_base_dir(40-63)get_cache_user_id(76-106)executor_manager/executors/docker/git_cache_volume_manager.py (5)
get_user_volume_name(35-45)create_user_volume(71-129)delete_volume(216-245)volume_exists(48-68)get_volume_metadata(132-159)
🪛 markdownlint-cli2 (0.18.1)
docs/git-cache-implementation.md
39-39: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
59-59: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
74-74: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
561-561: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
573-573: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
598-598: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
626-626: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
627-627: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
628-628: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
758-758: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
770-770: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
888-888: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.10)
executor_manager/executors/docker/git_cache_volume_manager.py
59-59: subprocess call: check for execution of untrusted input
(S603)
60-60: Starting a process with a partial executable path
(S607)
65-65: Consider moving this statement to an else block
(TRY300)
66-66: Do not catch blind exception: Exception
(BLE001)
114-114: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
114-114: subprocess call: check for execution of untrusted input
(S603)
117-117: Consider moving this statement to an else block
(TRY300)
121-121: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
125-125: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
127-127: Do not catch blind exception: Exception
(BLE001)
128-128: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
143-143: subprocess call: check for execution of untrusted input
(S603)
144-144: Starting a process with a partial executable path
(S607)
155-155: Consider moving this statement to an else block
(TRY300)
157-157: Do not catch blind exception: Exception
(BLE001)
201-201: subprocess call: check for execution of untrusted input
(S603)
211-211: Do not catch blind exception: Exception
(BLE001)
228-228: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
228-228: subprocess call: check for execution of untrusted input
(S603)
229-229: Starting a process with a partial executable path
(S607)
233-233: Consider moving this statement to an else block
(TRY300)
237-237: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
241-241: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
243-243: Do not catch blind exception: Exception
(BLE001)
244-244: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
266-266: Starting a process with a partial executable path
(S607)
294-294: Consider moving this statement to an else block
(TRY300)
296-296: Do not catch blind exception: Exception
(BLE001)
297-297: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
325-325: subprocess call: check for execution of untrusted input
(S603)
331-331: Do not catch blind exception: Exception
(BLE001)
345-345: Starting a process with a partial executable path
(S607)
355-355: Do not catch blind exception: Exception
(BLE001)
356-356: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
shared/utils/git_cache.py
59-62: Avoid specifying long messages outside the exception class
(TRY003)
91-95: Avoid specifying long messages outside the exception class
(TRY003)
100-100: Abstract raise to an inner function
(TRY301)
100-100: Avoid specifying long messages outside the exception class
(TRY003)
101-101: Consider moving this statement to an else block
(TRY300)
103-106: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
103-106: Avoid specifying long messages outside the exception class
(TRY003)
131-134: Avoid specifying long messages outside the exception class
(TRY003)
206-206: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
214-214: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
235-235: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
246-246: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
278-278: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
278-278: subprocess call: check for execution of untrusted input
(S603)
287-287: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
290-290: f-string without any placeholders
Remove extraneous f prefix
(F541)
291-291: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
293-293: Do not catch blind exception: Exception
(BLE001)
294-294: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
299-299: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
316-316: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
328-328: subprocess call: check for execution of untrusted input
(S603)
332-332: Do not catch blind exception: Exception
(BLE001)
339-339: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
339-339: subprocess call: check for execution of untrusted input
(S603)
344-344: Consider moving this statement to an else block
(TRY300)
353-353: f-string without any placeholders
Remove extraneous f prefix
(F541)
356-356: Do not catch blind exception: Exception
(BLE001)
executor_manager/executors/docker/executor.py
602-602: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
630-633: Avoid specifying long messages outside the exception class
(TRY003)
639-639: Abstract raise to an inner function
(TRY301)
639-639: Avoid specifying long messages outside the exception class
(TRY003)
641-641: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
641-641: Avoid specifying long messages outside the exception class
(TRY003)
651-654: Avoid specifying long messages outside the exception class
(TRY003)
672-672: f-string without any placeholders
Remove extraneous f prefix
(F541)
executor_manager/tests/test_git_cache_cleanup.py
216-216: Unused method argument: mock_size
(ARG002)
executor_manager/git_cache_cleanup.py
202-202: Loop control variable user_id not used within loop body
Rename unused user_id to _user_id
(B007)
shared/tests/utils/test_git_cache_physical_isolation.py
70-70: Pattern passed to match= contains metacharacters but is neither escaped nor raw
(RUF043)
105-105: Pattern passed to match= contains metacharacters but is neither escaped nor raw
(RUF043)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: Test wegent CLI Integration
- GitHub Check: Test Frontend
🔇 Additional comments (29)
docker-compose.yml (1)
149-155: LGTM! Clear documentation for Git cache configuration.The commented examples provide clear guidance for users wanting to enable Git cache features. The opt-in approach is appropriate for new functionality.
executor_manager/scheduler/scheduler.py (1)
251-266: LGTM! Well-implemented conditional cleanup scheduling.The lazy import pattern correctly avoids import-time side effects, and the conditional scheduling based on environment configuration is appropriate. The daily 2 AM schedule and
replace_existing=Trueflag are sensible choices for a maintenance task.executor_manager/git_cache_cleanup.py (1)
69-176: LGTM! Robust cleanup implementation with good error handling.The cleanup logic is well-structured with:
- Proper UTC datetime handling for timestamp comparisons
- Protected user support
- Dry-run mode for safe testing
- Comprehensive error tracking and logging
- Graceful handling of missing/invalid timestamps
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py (9)
1-22: LGTM!The imports and test class structure are clean and well-organized. All necessary functions from the volume manager are imported for comprehensive testing.
24-28: LGTM!Good coverage of volume name generation with various user ID values (small, medium, large).
30-41: LGTM!Volume existence checks are properly tested for both positive and negative cases.
43-83: LGTM!The volume creation tests cover all important scenarios: successful creation with proper labels, idempotent behavior when volume exists, and error handling. The label verification approach is appropriate.
85-106: LGTM!Metadata retrieval tests properly validate both successful retrieval and error handling for non-existent volumes.
108-127: LGTM!Update timestamp tests appropriately cover both successful updates and handling of non-existent volumes.
129-166: LGTM!Volume deletion and listing tests are comprehensive, including proper filtering of non-wegent volumes and metadata handling.
168-184: LGTM!Volume size tests properly validate parsing of
ducommand output and error handling.
186-205: LGTM!Tests for retrieving all user volume names properly validate filtering by prefix and handling of empty results.
shared/tests/utils/test_git_cache_physical_isolation.py (6)
1-30: LGTM!Test setup and teardown properly manage environment variables, ensuring test isolation as per coding guidelines.
32-64: LGTM!Security tests effectively validate that cache paths don't include user subdirectories, confirming the physical isolation design.
108-148: LGTM!Tests effectively validate complex repository paths and confirm that different users get identical path structures (while being stored in separate volumes), which is the intended design.
150-170: LGTM!Gerrit URL handling is tested, and the integration test class is properly structured with clear test cases for volume name generation.
171-231: LGTM!Docker integration tests are properly gated and include appropriate cleanup. The tests validate volume lifecycle and metadata labels effectively.
233-257: LGTM!Cleanup manager configuration test properly validates environment-driven settings and maintains test isolation by cleaning up environment variables.
executor_manager/tests/test_git_cache_cleanup.py (6)
1-21: LGTM!Test structure and initialization test are clean, properly validating configuration from environment variables.
23-62: LGTM!Cleanup tests effectively validate that only inactive volumes exceeding the threshold are cleaned, while active volumes are preserved.
64-94: LGTM!Protected users test correctly validates that protected user volumes are not deleted even when inactive, and verifies the delete operation is only called for non-protected users.
96-144: LGTM!Dry-run mode and error handling tests are comprehensive, validating that dry-run prevents actual deletion and errors are properly captured without crashing.
146-184: LGTM!Volume statistics and protected user parsing tests appropriately validate metrics collection and robust handling of invalid input.
186-212: LGTM!Test properly validates fallback to
created_atwhenlast_usedis missing, ensuring robust date handling.shared/utils/git_cache.py (5)
1-73: LGTM!Module structure and helper functions are well-organized with comprehensive docstrings and proper type hints.
76-106: LGTM!User ID validation is comprehensive with clear error messages for various invalid inputs. The function properly enforces positive integer requirements for security.
109-137: LGTM!Path validation function is security-critical and correctly implemented. The absolute path resolution and boundary checking prevent path traversal attacks.
140-210: LGTM!Cache path generation correctly handles multiple Git URL formats (SSH, HTTPS) and includes security validation. The implementation aligns with the physical isolation design.
298-359: LGTM!Cache update function properly handles remote URL updates and implements graceful fallback strategy where update failures don't block cache usage. The error handling is intentional and well-documented.
Replace Docker labels with touch files (.last_used, .metadata) for reliable volume metadata tracking. Docker's local driver doesn't support updating labels on existing volumes, causing the cleanup tracking to be broken. Changes: - Add touch file helper functions (_read_last_used_from_volume, _write_last_used_to_volume, etc.) - Rewrite update_volume_last_used() to use touch files instead of labels - Update create_user_volume() to initialize metadata files - Update list_user_volumes() to read from touch files with label fallback - Add automatic migration from labels to touch files for existing volumes - Add 12 new test cases for touch file operations Benefits: - Metadata can be updated even when volume is in use - Reliable timestamp tracking for cleanup functionality - Supports concurrent access from multiple containers - Backward compatible with existing label-based volumes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/git-cache-implementation.md (1)
1-988: Move documentation to proper location per coding guidelines.Based on the coding guidelines and learnings, detailed documentation for new features must be placed in
docs/en/anddocs/zh/directories with a reference fromAGENTS.md. This documentation file is currently at the rootdocs/level.Please:
- Create
docs/en/git-cache-implementation.mdwith an English translation- Move the Chinese content to
docs/zh/git-cache-implementation.md- Add references to both files in
docs/AGENTS.mdAdditionally, consider addressing the markdown formatting issues flagged by static analysis:
- Add language specifiers to fenced code blocks (lines 39, 76, 91, 592, 604, 629, 659, 904)
- Fix unordered list indentation (lines 657-658)
- Add blank lines around tables (line 884)
Based on learnings: Detailed documentation for new features MUST be placed in
docs/en/anddocs/zh/directories with reference from AGENTS.md.
🧹 Nitpick comments (2)
executor_manager/executors/docker/git_cache_volume_manager.py (2)
76-144: Clean up unused variable assignment.Line 119 assigns the subprocess result to
resultbut never uses it. Consider either removing the assignment or using it for enhanced logging/debugging.🔎 Proposed fix
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=30, check=True) + subprocess.run(cmd, capture_output=True, text=True, timeout=30, check=True) logger.info(f"Successfully created volume: {volume_name}")Or keep the assignment for potential debugging:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30, check=True) - logger.info(f"Successfully created volume: {volume_name}") + logger.info(f"Successfully created volume: {result.stdout.strip()}")
212-241: Clean up unused variable assignment.Line 224 assigns the subprocess result to
resultbut never uses it. Consider removing the assignment since the function only needs to check success via exception handling.🔎 Proposed fix
- result = subprocess.run( + subprocess.run( ["docker", "volume", "rm", volume_name], capture_output=True, text=True, timeout=30, check=True )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/git-cache-implementation.mdexecutor_manager/executors/docker/git_cache_volume_manager.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
executor_manager/executors/docker/git_cache_volume_manager.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
executor_manager/executors/docker/git_cache_volume_manager.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.py
docs/**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Detailed documentation for new features MUST be placed in
docs/en/anddocs/zh/directories with reference from AGENTS.md
Files:
docs/git-cache-implementation.md
🧠 Learnings (3)
📚 Learning: 2025-12-31T03:47:12.173Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-31T03:47:12.173Z
Learning: Applies to docs/**/*.md : Detailed documentation for new features MUST be placed in `docs/en/` and `docs/zh/` directories with reference from AGENTS.md
Applied to files:
docs/git-cache-implementation.md
📚 Learning: 2025-12-18T02:09:09.776Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-18T02:09:09.776Z
Learning: Test all commands before documenting them in AGENTS.md
Applied to files:
docs/git-cache-implementation.md
📚 Learning: 2025-12-18T02:09:09.776Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-18T02:09:09.776Z
Learning: AI agents MUST update relevant documentation immediately after completing significant code changes
Applied to files:
docs/git-cache-implementation.md
🧬 Code graph analysis (2)
executor_manager/executors/docker/git_cache_volume_manager.py (1)
shared/logger.py (1)
setup_logger(75-159)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py (1)
executor_manager/executors/docker/git_cache_volume_manager.py (15)
get_user_volume_name(40-50)volume_exists(53-73)create_user_volume(76-144)get_volume_metadata(147-174)update_volume_last_used(177-209)delete_volume(212-241)list_user_volumes(244-296)get_volume_size(299-331)get_all_user_volume_names(334-355)_read_last_used_from_volume(358-396)_read_metadata_from_volume(399-434)_write_last_used_to_volume(437-474)_write_metadata_to_volume(477-522)_initialize_volume_metadata(525-553)_read_volume_files(556-585)
🪛 markdownlint-cli2 (0.18.1)
docs/git-cache-implementation.md
39-39: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
91-91: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
592-592: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
604-604: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
629-629: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
657-657: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
658-658: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
659-659: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
789-789: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
801-801: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
884-884: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
904-904: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
975-975: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.10)
executor_manager/executors/docker/git_cache_volume_manager.py
64-64: subprocess call: check for execution of untrusted input
(S603)
65-65: Starting a process with a partial executable path
(S607)
70-70: Consider moving this statement to an else block
(TRY300)
71-71: Do not catch blind exception: Exception
(BLE001)
119-119: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
119-119: subprocess call: check for execution of untrusted input
(S603)
132-132: Consider moving this statement to an else block
(TRY300)
136-136: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
140-140: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
142-142: Do not catch blind exception: Exception
(BLE001)
143-143: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
158-158: subprocess call: check for execution of untrusted input
(S603)
159-159: Starting a process with a partial executable path
(S607)
170-170: Consider moving this statement to an else block
(TRY300)
172-172: Do not catch blind exception: Exception
(BLE001)
205-205: Consider moving this statement to an else block
(TRY300)
207-207: Do not catch blind exception: Exception
(BLE001)
224-224: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
224-224: subprocess call: check for execution of untrusted input
(S603)
225-225: Starting a process with a partial executable path
(S607)
229-229: Consider moving this statement to an else block
(TRY300)
233-233: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
237-237: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
239-239: Do not catch blind exception: Exception
(BLE001)
240-240: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
265-265: Starting a process with a partial executable path
(S607)
292-292: Consider moving this statement to an else block
(TRY300)
294-294: Do not catch blind exception: Exception
(BLE001)
295-295: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
323-323: subprocess call: check for execution of untrusted input
(S603)
329-329: Do not catch blind exception: Exception
(BLE001)
343-343: Starting a process with a partial executable path
(S607)
353-353: Do not catch blind exception: Exception
(BLE001)
354-354: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
380-380: subprocess call: check for execution of untrusted input
(S603)
385-385: Consider moving this statement to an else block
(TRY300)
394-394: Do not catch blind exception: Exception
(BLE001)
421-421: subprocess call: check for execution of untrusted input
(S603)
423-423: Consider moving this statement to an else block
(TRY300)
432-432: Do not catch blind exception: Exception
(BLE001)
461-461: subprocess call: check for execution of untrusted input
(S603)
472-472: Do not catch blind exception: Exception
(BLE001)
509-509: subprocess call: check for execution of untrusted input
(S603)
520-520: Do not catch blind exception: Exception
(BLE001)
549-549: Consider moving this statement to an else block
(TRY300)
551-551: Do not catch blind exception: Exception
(BLE001)
552-552: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
603-603: subprocess call: check for execution of untrusted input
(S603)
604-604: Starting a process with a partial executable path
(S607)
625-625: Do not catch blind exception: Exception
(BLE001)
634-634: Consider moving this statement to an else block
(TRY300)
636-636: Do not catch blind exception: Exception
(BLE001)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
319-319: Unused method argument: mock_write_meta
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: Test Frontend
- GitHub Check: Test wegent CLI Integration
🔇 Additional comments (7)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py (2)
27-215: Excellent test coverage for volume manager operations.The test class comprehensively covers all volume lifecycle operations including creation with metadata labels, existence checks, metadata retrieval, last-used updates via touch files, deletion, listing, and sizing. The tests correctly mock subprocess calls and verify command arguments.
217-366: Comprehensive touch file operation tests with good coverage.The test class thoroughly covers touch file operations including read/write of
.last_usedand.metadatafiles, initialization sequences, and fallback/migration scenarios. The tests validate both success and failure paths with appropriate error handling.Regarding the static analysis hint about unused
mock_write_metaat line 319: This is a false positive. The test correctly validates that_initialize_volume_metadatareturns False immediately when_write_last_used_to_volumefails, without ever calling_write_metadata_to_volume. The mock must be present due to the patch decorator ordering, but it's correctly unused in this early-failure test case.executor_manager/executors/docker/git_cache_volume_manager.py (5)
1-38: Well-structured module with clear documentation.The module header, imports, and constants are well-organized. The docstring clearly explains the purpose of user-isolated volumes with the naming pattern
wegent_git_cache_user_{id}. Constants are appropriately defined for touch file paths, mount points, and Docker labels.
177-209: Touch file approach correctly addresses Docker label immutability.The implementation correctly writes timestamps to a
.last_usedfile inside the volume rather than attempting to update Docker labels. This solves the critical issue from past review comments wheredocker volume createon an existing volume would succeed but silently fail to update labels.The touch file approach ensures:
- Timestamps can be reliably updated even when volumes are in use
- Metadata persists and can be read independently of Docker label limitations
- Cleanup functionality can accurately identify inactive volumes
299-355: Volume sizing and listing helpers are well-implemented.The
get_volume_sizefunction appropriately uses a temporary Alpine container with read-only mount to calculate disk usage, andget_all_user_volume_namescorrectly filters volumes by the configured prefix.
358-522: Touch file read/write operations are robust and well-designed.The implementation correctly uses temporary Alpine containers to access volume contents for reading and writing metadata files. Key strengths:
- ISO format validation for timestamps (line 384)
- Heredoc syntax for reliable JSON writing (line 506)
- Appropriate error handling and logging at debug/warning levels
- Read-only mounts where appropriate (lines 374, 415)
These operations form the foundation of the reliable metadata tracking system that replaces the problematic Docker label approach.
525-638: Excellent migration strategy for backward compatibility.The implementation provides a smooth migration path from Docker labels to touch files:
- Primary method (
_read_volume_files): Attempts to read metadata from touch files- Fallback (
_fallback_to_docker_inspect): Reads from Docker labels for volumes created before the touch file system- Auto-migration: When labels are found, automatically initializes touch files (line 624) for future use
This ensures existing volumes continue to work while gradually migrating to the more reliable touch file approach. The migration is transparent to users and happens automatically on volume access.
* Initial plan * fix: update pytest tests to fix failing executor_manager tests Co-authored-by: graindt <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: graindt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
executor_manager/tests/test_git_cache_cleanup.py (1)
216-237: Remove unused mock parameter for test cleanliness.The
mock_sizeparameter is unused because volumes with invalid dates are skipped before size checks occur. Remove the decorator to make the test more precise.🔎 Proposed fix
@patch("executor_manager.executors.docker.git_cache_volume_manager.list_user_volumes") @patch("executor_manager.executors.docker.git_cache_volume_manager.delete_volume") - @patch("executor_manager.executors.docker.git_cache_volume_manager.get_volume_size") - def test_cleanup_handles_invalid_dates(self, mock_size, mock_delete, mock_list): + def test_cleanup_handles_invalid_dates(self, mock_delete, mock_list): """Test cleanup handles volumes with invalid date formats"""
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.pyexecutor_manager/tests/test_git_cache_cleanup.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
executor_manager/tests/executors/docker/test_git_cache_volume_manager.pyexecutor_manager/tests/test_git_cache_cleanup.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
executor_manager/tests/executors/docker/test_git_cache_volume_manager.pyexecutor_manager/tests/test_git_cache_cleanup.py
🧬 Code graph analysis (2)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py (1)
executor_manager/executors/docker/git_cache_volume_manager.py (15)
get_user_volume_name(40-50)volume_exists(53-73)create_user_volume(76-144)get_volume_metadata(147-174)update_volume_last_used(177-209)delete_volume(212-241)list_user_volumes(244-296)get_volume_size(299-331)get_all_user_volume_names(334-355)_read_last_used_from_volume(358-396)_read_metadata_from_volume(399-434)_write_last_used_to_volume(437-474)_write_metadata_to_volume(477-522)_initialize_volume_metadata(525-553)_read_volume_files(556-585)
executor_manager/tests/test_git_cache_cleanup.py (1)
executor_manager/git_cache_cleanup.py (2)
cleanup_inactive_volumes(69-176)get_volume_stats(178-225)
🪛 Ruff (0.14.10)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
320-320: Unused method argument: mock_write_meta
(ARG002)
executor_manager/tests/test_git_cache_cleanup.py
219-219: Unused method argument: mock_size
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: Test wegent CLI Integration
🔇 Additional comments (13)
executor_manager/tests/test_git_cache_cleanup.py (10)
1-12: LGTM!The imports are well-organized and follow PEP 8 conventions. The test class structure is clear with a descriptive docstring.
14-21: LGTM!The initialization test correctly validates that environment variables are properly loaded and parsed by the cleanup manager.
23-33: LGTM!The test properly validates that cleanup is skipped when disabled, returning empty results as expected.
35-62: LGTM!The test effectively validates the core cleanup logic, ensuring only volumes exceeding the inactive threshold are deleted while preserving active volumes.
64-94: LGTM!The test correctly validates that protected users' volumes are excluded from cleanup, even when they exceed the inactivity threshold.
96-119: LGTM!The dry run test correctly validates that volumes are identified and reported without actual deletion, which is the expected behavior for this mode.
121-144: LGTM!The error handling test correctly validates that deletion failures are gracefully handled and reported without crashing the cleanup process.
146-173: LGTM!The statistics test thoroughly validates volume metrics calculation, including proper handling of multiple volumes with different sizes and activity states.
175-183: LGTM!The test correctly validates the fail-safe behavior where the entire protected users list is rejected if any entry is invalid, preventing partial configuration errors.
185-214: LGTM!The test effectively validates the fallback mechanism from
last_usedtocreated_atfor date determination, ensuring volumes without complete metadata are handled appropriately.executor_manager/tests/executors/docker/test_git_cache_volume_manager.py (3)
5-24: LGTM!Imports are well-organized and include all necessary functions for comprehensive testing. Testing private functions (those prefixed with
_) provides good coverage of internal implementation details.
328-360: LGTM!These tests properly verify the volume file reading logic including:
- Successful reading of both metadata files
- Fallback behavior when
.last_usedis missing (usescreated_at)- Proper handling when
.metadatais missingThe mock data type (
user_idas int) is consistent with JSON parsing behavior.
92-158: Excellent test coverage and quality!The test suite comprehensively covers:
- ✅ Success and failure paths for all volume operations
- ✅ Edge cases (volume not found, empty results, invalid formats, timeouts)
- ✅ Proper error handling with
subprocess.CalledProcessError- ✅ Mock verification with
assert_called_once()and call count checks- ✅ Docker command validation (checking for expected flags and arguments)
- ✅ Metadata operations including fallback behaviors
The tests provide strong confidence in the git cache volume manager implementation.
Also applies to: 178-216, 221-316, 362-369
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
Outdated
Show resolved
Hide resolved
| @patch("executor_manager.executors.docker.git_cache_volume_manager._read_volume_files") | ||
| @patch("executor_manager.executors.docker.git_cache_volume_manager.subprocess.run") | ||
| def test_list_user_volumes(self, mock_run, mock_read_files): | ||
| """Test listing user volumes from touch files""" | ||
| mock_run.return_value = MagicMock(returncode=0, stdout="wegent_git_cache_user_123\nwegent_git_cache_user_456\nother_volume\n") | ||
| mock_read_files.side_effect = [ | ||
| {"user_id": "123", "created_at": "2025-01-03T10:00:00", "last_used": "2025-01-03T15:30:00"}, | ||
| {"user_id": "456", "created_at": "2025-01-02T10:00:00", "last_used": "2025-01-02T15:30:00"}, | ||
| None, | ||
| ] | ||
|
|
||
| volumes = list_user_volumes() | ||
|
|
||
| assert len(volumes) == 2 | ||
| assert 123 in volumes | ||
| assert 456 in volumes | ||
| assert volumes[123]["volume_name"] == "wegent_git_cache_user_123" | ||
| assert volumes[456]["volume_name"] == "wegent_git_cache_user_456" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "*.py" | grep -E "(git_cache_volume_manager|executor)" | head -20Repository: wecode-ai/Wegent
Length of output: 899
🏁 Script executed:
fd "git_cache_volume_manager" --type fRepository: wecode-ai/Wegent
Length of output: 194
🏁 Script executed:
# Look for the implementation file (not test)
rg "def list_user_volumes" --type py -A 5Repository: wecode-ai/Wegent
Length of output: 670
🏁 Script executed:
# Find _read_metadata_from_volume implementation
rg "def _read_metadata_from_volume" --type py -A 10Repository: wecode-ai/Wegent
Length of output: 1022
🏁 Script executed:
# Check the actual return type hints
rg "def list_user_volumes|def _read_metadata_from_volume" --type py -B 2 -A 15Repository: wecode-ai/Wegent
Length of output: 3206
🏁 Script executed:
cat executor_manager/executors/docker/git_cache_volume_manager.pyRepository: wecode-ai/Wegent
Length of output: 19109
Fix mock data type for user_id to match actual implementation.
The mock at lines 165-166 returns user_id as a string ("user_id": "123"), but _read_volume_files actually returns it as an integer. Since metadata is stored in JSON with user_id as an int and parsed via json.loads(), the mock should use "user_id": 123 (not a string). This ensures the test accurately reflects the implementation's actual behavior.
🤖 Prompt for AI Agents
In executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
around lines 159 to 176, the test mocks return user_id values as strings but the
real _read_volume_files returns integers; update the mock_read_files.side_effect
entries so the user_id fields are integers (e.g., user_id: 123 and user_id: 456)
instead of strings to match actual behavior and ensure the test reflects the
implementation.
| @patch("executor_manager.executors.docker.git_cache_volume_manager._write_last_used_to_volume") | ||
| @patch("executor_manager.executors.docker.git_cache_volume_manager._write_metadata_to_volume") | ||
| def test_initialize_volume_metadata_failure(self, mock_write_meta, mock_write_last): | ||
| """Test initializing metadata when write fails""" | ||
| mock_write_last.return_value = False | ||
|
|
||
| result = _initialize_volume_metadata("wegent_git_cache_user_123", 123, "2025-01-03T10:00:00") | ||
|
|
||
| assert result is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Address unused mock parameter and add missing test case.
The mock_write_meta parameter is unused because when _write_last_used_to_volume returns False, the function short-circuits. Consider:
-
Assert the short-circuit behavior by verifying
mock_write_metais not called:assert result is False mock_write_meta.assert_not_called()
-
Add a test case where the first write succeeds but the second fails:
def test_initialize_volume_metadata_second_write_failure(self, mock_write_meta, mock_write_last): """Test initializing metadata when second write fails""" mock_write_last.return_value = True mock_write_meta.return_value = False result = _initialize_volume_metadata("wegent_git_cache_user_123", 123, "2025-01-03T10:00:00") assert result is False mock_write_last.assert_called_once() mock_write_meta.assert_called_once()
Based on static analysis: Ruff detected unused parameter at line 320.
🧰 Tools
🪛 Ruff (0.14.10)
320-320: Unused method argument: mock_write_meta
(ARG002)
🤖 Prompt for AI Agents
In executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
around lines 318 to 326, the test leaves mock_write_meta unused and misses
asserting short-circuit behavior; update the existing test to assert result is
False and that mock_write_meta.assert_not_called(), and add a new test that sets
mock_write_last to return True and mock_write_meta to return False, asserts the
function returns False, and verifies both mock_write_last.assert_called_once()
and mock_write_meta.assert_called_once().
Move TestPhysicalIsolationIntegration from shared/tests/utils/ to executor_manager/tests/ to fix circular dependency issue where shared module tests required executor_manager dependencies (httpx). Changes: - Remove TestPhysicalIsolationIntegration from shared/tests/utils/test_git_cache_physical_isolation.py - Add TestIntegration class to executor_manager/tests/executors/docker/test_git_cache_volume_manager.py - Add test_cleanup_manager_configuration to executor_manager/tests/test_git_cache_cleanup.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py (5)
30-216: Add type hints to all test methods.Per coding guidelines, Python code MUST include type hints. All test methods in this class should include return type annotations (
-> None).🔎 Example fix
- def test_get_user_volume_name(self): + def test_get_user_volume_name(self) -> None: """Test volume name generation""" assert get_user_volume_name(123) == "wegent_git_cache_user_123"Apply this pattern to all test methods in the class.
Based on coding guidelines: "Python code MUST include type hints"
165-166: Fix mock data type foruser_idto match implementation.The mock returns
user_idas strings ("123","456"), but_read_volume_filesreturnsuser_idas integers (parsed from JSON). Update the mock to use integers for accurate testing.🔎 Proposed fix
mock_read_files.side_effect = [ - {"user_id": "123", "created_at": "2025-01-03T10:00:00", "last_used": "2025-01-03T15:30:00"}, - {"user_id": "456", "created_at": "2025-01-02T10:00:00", "last_used": "2025-01-02T15:30:00"}, + {"user_id": 123, "created_at": "2025-01-03T10:00:00", "last_used": "2025-01-03T15:30:00"}, + {"user_id": 456, "created_at": "2025-01-02T10:00:00", "last_used": "2025-01-02T15:30:00"}, None, ]
222-250: Add type hints to integration test methods.Per coding guidelines, Python code MUST include type hints. Both integration test methods should include return type annotations (
-> None).🔎 Example fix
- def test_volume_creation_and_deletion(self): + def test_volume_creation_and_deletion(self) -> None: """Test actual volume creation and deletion"""Apply to both test methods in this class.
Based on coding guidelines: "Python code MUST include type hints"
274-421: Add type hints to all test methods in this class.Per coding guidelines, Python code MUST include type hints. All test methods should include return type annotations (
-> None).🔎 Example fix
- def test_read_last_used_from_volume_success(self, mock_run): + def test_read_last_used_from_volume_success(self, mock_run) -> None: """Test reading .last_used file from volume"""Apply to all test methods in this class.
Based on coding guidelines: "Python code MUST include type hints"
372-378: Address unused mock parameter and add missing test case.The
mock_write_metaparameter is unused because when_write_last_used_to_volumereturnsFalse, the function short-circuits before calling_write_metadata_to_volume.Consider:
- Assert the short-circuit behavior by verifying
mock_write_metais not called- Add a test case where the first write succeeds but the second fails
🔎 Proposed improvements
Update existing test to assert short-circuit:
def test_initialize_volume_metadata_failure(self, mock_write_meta, mock_write_last): """Test initializing metadata when write fails""" mock_write_last.return_value = False result = _initialize_volume_metadata("wegent_git_cache_user_123", 123, "2025-01-03T10:00:00") assert result is False + mock_write_meta.assert_not_called()Add new test for second write failure:
@patch("executor_manager.executors.docker.git_cache_volume_manager._write_last_used_to_volume") @patch("executor_manager.executors.docker.git_cache_volume_manager._write_metadata_to_volume") def test_initialize_volume_metadata_second_write_failure(self, mock_write_meta, mock_write_last) -> None: """Test initializing metadata when second write fails""" mock_write_last.return_value = True mock_write_meta.return_value = False result = _initialize_volume_metadata("wegent_git_cache_user_123", 123, "2025-01-03T10:00:00") assert result is False mock_write_last.assert_called_once() mock_write_meta.assert_called_once()Based on static analysis: Ruff detected unused parameter at line 372.
🧹 Nitpick comments (3)
shared/tests/utils/test_git_cache_physical_isolation.py (1)
70-70: Use raw strings for regex patterns in match parameters.The
matchparameter inpytest.raisesexpects a regex pattern. Lines 70 and 105 contain metacharacters that should be in raw strings to make the regex intent explicit and avoid potential escaping issues.🔎 Proposed fix
- with pytest.raises(ValueError, match="Security violation|outside allowed"): + with pytest.raises(ValueError, match=r"Security violation|outside allowed"): get_cache_repo_path(url)- with pytest.raises(ValueError, match="GIT_CACHE_USER_BASE_DIR.*required"): + with pytest.raises(ValueError, match=r"GIT_CACHE_USER_BASE_DIR.*required"): get_user_cache_base_dir()Also applies to: 105-105
executor_manager/tests/test_git_cache_cleanup.py (2)
219-238: Remove unusedmock_sizeparameter.The
mock_sizeparameter is mocked but never used in this test, as volumes with invalid dates are skipped before size calculations. The test logic is correct, but the unused parameter can be removed for clarity.🔎 Proposed fix
@patch("executor_manager.executors.docker.git_cache_volume_manager.list_user_volumes") @patch("executor_manager.executors.docker.git_cache_volume_manager.delete_volume") - @patch("executor_manager.executors.docker.git_cache_volume_manager.get_volume_size") - def test_cleanup_handles_invalid_dates(self, mock_size, mock_delete, mock_list): + def test_cleanup_handles_invalid_dates(self, mock_delete, mock_list): """Test cleanup handles volumes with invalid date formats"""
239-261: Prefer@patch.dictfor environment variable management.While the manual environment variable setup and cleanup works (after importing
os), using@patch.dictwould be more consistent with other tests in this file and would automatically handle cleanup even if assertions fail.🔎 Proposed refactor
+ @patch.dict("os.environ", { + "GIT_CACHE_CLEANUP_ENABLED": "true", + "GIT_CACHE_INACTIVE_DAYS": "60", + "GIT_CACHE_PROTECTED_USERS": "10,20,30", + "GIT_CACHE_CLEANUP_DRY_RUN": "true" + }) def test_cleanup_manager_configuration(self): """Test cleanup manager reads configuration correctly including dry_run""" - # Test with custom configuration - os.environ["GIT_CACHE_CLEANUP_ENABLED"] = "true" - os.environ["GIT_CACHE_INACTIVE_DAYS"] = "60" - os.environ["GIT_CACHE_PROTECTED_USERS"] = "10,20,30" - os.environ["GIT_CACHE_CLEANUP_DRY_RUN"] = "true" - manager = GitCacheCleanupManager() assert manager.enabled is True assert manager.inactive_days == 60 assert manager.protected_users == {10, 20, 30} assert manager.dry_run is True - - # Cleanup - for key in [ - "GIT_CACHE_CLEANUP_ENABLED", - "GIT_CACHE_INACTIVE_DAYS", - "GIT_CACHE_PROTECTED_USERS", - "GIT_CACHE_CLEANUP_DRY_RUN", - ]: - os.environ.pop(key, None)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.pyexecutor_manager/tests/test_git_cache_cleanup.pyshared/tests/utils/test_git_cache_physical_isolation.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
executor_manager/tests/test_git_cache_cleanup.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.pyshared/tests/utils/test_git_cache_physical_isolation.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
executor_manager/tests/test_git_cache_cleanup.pyexecutor_manager/tests/executors/docker/test_git_cache_volume_manager.pyshared/tests/utils/test_git_cache_physical_isolation.py
🧠 Learnings (2)
📚 Learning: 2025-12-31T03:47:12.173Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-31T03:47:12.173Z
Learning: Applies to **/*.py : Python code MUST include type hints
Applied to files:
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
📚 Learning: 2025-12-18T02:09:09.776Z
Learnt from: CR
Repo: wecode-ai/Wegent PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-18T02:09:09.776Z
Learning: Applies to **/*.test.{py,ts,tsx}|**/*_test.py : Keep tests independent and isolated
Applied to files:
shared/tests/utils/test_git_cache_physical_isolation.py
🧬 Code graph analysis (2)
executor_manager/tests/test_git_cache_cleanup.py (1)
executor_manager/git_cache_cleanup.py (3)
GitCacheCleanupManager(24-225)cleanup_inactive_volumes(69-176)get_volume_stats(178-225)
shared/tests/utils/test_git_cache_physical_isolation.py (1)
shared/utils/git_cache.py (3)
get_cache_repo_path(140-210)get_user_cache_base_dir(40-63)get_cache_user_id(76-106)
🪛 Ruff (0.14.10)
executor_manager/tests/test_git_cache_cleanup.py
219-219: Unused method argument: mock_size
(ARG002)
242-242: Undefined name os
(F821)
243-243: Undefined name os
(F821)
244-244: Undefined name os
(F821)
245-245: Undefined name os
(F821)
261-261: Undefined name os
(F821)
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
221-221: Undefined name os
(F821)
249-249: Undefined name os
(F821)
372-372: Unused method argument: mock_write_meta
(ARG002)
shared/tests/utils/test_git_cache_physical_isolation.py
70-70: Pattern passed to match= contains metacharacters but is neither escaped nor raw
(RUF043)
105-105: Pattern passed to match= contains metacharacters but is neither escaped nor raw
(RUF043)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: Test wegent CLI Integration
- GitHub Check: Test Frontend
🔇 Additional comments (10)
shared/tests/utils/test_git_cache_physical_isolation.py (1)
18-157: Security tests are well-implemented.The security test suite comprehensively covers the physical isolation design:
- Path generation without user subdirectories (volume-based isolation)
- Multiple URL formats (SSH, HTTPS, GitHub, GitLab, Gerrit)
- Security validations (path traversal, user ID requirements)
- Edge cases (missing extensions, complex paths, cross-user consistency)
Test isolation is properly maintained with setup/teardown methods, adhering to best practices.
Note: The AI summary mentions integration tests (volume name generation, docker tests, cleanup manager configuration) that are not present in this file. According to the PR objectives, integration tests were moved to
executor_manager/tests/to avoid circular dependencies—this file now contains only the security tests.executor_manager/tests/test_git_cache_cleanup.py (9)
14-22: LGTM!The initialization test correctly verifies that the cleanup manager reads and parses environment configuration including enabled state, inactive days threshold, and protected user set.
23-33: LGTM!The test correctly verifies that cleanup is a no-op when disabled, returning empty results.
35-63: LGTM!The test correctly verifies that only volumes inactive beyond the threshold (40 days > 30 days) are deleted, while recently used volumes (10 days) are preserved.
64-95: LGTM!The test correctly verifies that protected users' volumes are excluded from deletion and tracked in
protected_volumes, even when they meet the inactivity criteria.
96-120: LGTM!The test correctly verifies dry-run mode behavior: volumes are reported as deleted with freed space calculated, but
delete_volumeis never actually called.
121-145: LGTM!The test correctly verifies graceful error handling: when deletion fails, the volume is excluded from
deleted_volumesand the error message is captured in theerrorslist.
146-174: LGTM!The test correctly verifies volume statistics calculation including total count, total size (in both bytes and MB), and inactive volume count based on the inactivity threshold.
175-184: LGTM!The test correctly verifies that when the protected users list contains an invalid (non-integer) value, the entire list is rejected and an empty set is returned, consistent with the documented behavior in
_load_protected_users.
185-215: LGTM!The test correctly verifies the fallback behavior: when
last_usedis missing, the cleanup manager usescreated_atfor inactivity determination. Volumes with both fields empty are appropriately skipped.
executor_manager/tests/executors/docker/test_git_cache_volume_manager.py
Show resolved
Hide resolved
Add -> None return type annotations to all test methods in test_git_cache_volume_manager.py to comply with coding guidelines requiring type hints in Python code. Also add missing os import needed by TestIntegration class. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add -> None return type annotations to all test methods in test_git_cache_cleanup.py to comply with coding guidelines requiring type hints in Python code. Also add missing os import needed for os.environ access. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
executor_manager/tests/test_git_cache_cleanup.py (5)
97-121: Consider clarifying dry-run semantics.The test correctly verifies that
deleteis not called in dry-run mode (line 120), butdeleted_volumesstill contains entries (line 118). This implies "deleted_volumes" means "would be deleted" rather than "were deleted" in dry-run mode. While the behavior appears intentional, the field name could be misleading to consumers of the cleanup result.
176-185: Consider more graceful handling of invalid protected users.The test documents that any invalid value in
GIT_CACHE_PROTECTED_USERScauses the entire list to be rejected. A more resilient approach would skip only the invalid entries (e.g., parse "1,2,invalid,4" as {1,2,4}) and log a warning, rather than discarding all valid user IDs. This would prevent a single typo from removing protection for all users.
186-216: Add explicit assertion for volume with all dates missing.The test verifies that user 123 (with
created_atbut emptylast_used) is deleted correctly. However, user 456 (with both dates empty) is only implicitly verified through the count assertion. Consider adding an explicit assertion to document the expected behavior:assert "wegent_git_cache_user_456" not in result["deleted_volumes"]This clarifies that volumes with no valid date information are skipped rather than deleted.
217-239: Remove unused mock parameter.The
mock_sizeparameter is not used in this test. Consider removing it from both the@patchdecorator (line 219) and the method signature to reduce noise.🔎 Proposed fix
@patch("executor_manager.executors.docker.git_cache_volume_manager.list_user_volumes") @patch("executor_manager.executors.docker.git_cache_volume_manager.delete_volume") -@patch("executor_manager.executors.docker.git_cache_volume_manager.get_volume_size") -def test_cleanup_handles_invalid_dates(self, mock_size, mock_delete, mock_list) -> None: +def test_cleanup_handles_invalid_dates(self, mock_delete, mock_list) -> None: """Test cleanup handles volumes with invalid date formats"""
240-262: Use@patch.dictfor safer environment variable handling.The test directly manipulates
os.environand relies on manual cleanup (lines 256-262). If an assertion fails before reaching the cleanup block, the environment variables will remain set and may affect subsequent tests. Other tests in this file use@patch.dictwhich automatically restores environment state.🔎 Proposed fix
+@patch.dict("os.environ", { + "GIT_CACHE_CLEANUP_ENABLED": "true", + "GIT_CACHE_INACTIVE_DAYS": "60", + "GIT_CACHE_PROTECTED_USERS": "10,20,30", + "GIT_CACHE_CLEANUP_DRY_RUN": "true" +}) def test_cleanup_manager_configuration(self) -> None: """Test cleanup manager reads configuration correctly including dry_run""" - # Test with custom configuration - os.environ["GIT_CACHE_CLEANUP_ENABLED"] = "true" - os.environ["GIT_CACHE_INACTIVE_DAYS"] = "60" - os.environ["GIT_CACHE_PROTECTED_USERS"] = "10,20,30" - os.environ["GIT_CACHE_CLEANUP_DRY_RUN"] = "true" - manager = GitCacheCleanupManager() assert manager.enabled is True assert manager.inactive_days == 60 assert manager.protected_users == {10, 20, 30} assert manager.dry_run is True - - # Cleanup - for key in [ - "GIT_CACHE_CLEANUP_ENABLED", - "GIT_CACHE_INACTIVE_DAYS", - "GIT_CACHE_PROTECTED_USERS", - "GIT_CACHE_CLEANUP_DRY_RUN", - ]: - os.environ.pop(key, None)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
executor_manager/tests/test_git_cache_cleanup.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
executor_manager/tests/test_git_cache_cleanup.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
executor_manager/tests/test_git_cache_cleanup.py
🪛 Ruff (0.14.10)
executor_manager/tests/test_git_cache_cleanup.py
220-220: Unused method argument: mock_size
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: Test Frontend
🔇 Additional comments (2)
executor_manager/tests/test_git_cache_cleanup.py (2)
5-5: Import issue resolved.The
osimport has been added, addressing the previous review comment. The module is now properly imported for use intest_cleanup_manager_configuration.
24-35: Well-structured test for disabled state.The test correctly verifies that when cleanup is disabled, all result fields return empty/zero values without attempting any operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
executor_manager/executors/docker/executor.py (2)
640-645: Use exception chaining for better error traceability.When re-raising an exception within an
exceptclause, useraise ... from errto preserve the original exception context. This provides better debugging information when errors occur.🔎 Proposed fix
# Validate user_id try: user_id_int = int(user_id) if user_id_int <= 0: raise ValueError(f"Invalid user_id: {user_id}. Must be a positive integer.") except (ValueError, TypeError) as e: - raise ValueError(f"Invalid user_id for git cache: {e}") + raise ValueError(f"Invalid user_id for git cache: {e}") from e
676-676: Remove unnecessary f-string prefix.Line 676 uses an f-string without any placeholders. Remove the
fprefix for clarity.🔎 Proposed fix
cmd.extend([ - "-e", f"GIT_CACHE_ENABLED=true", + "-e", "GIT_CACHE_ENABLED=true", "-e", f"GIT_CACHE_USER_BASE_DIR={user_cache_dir}", "-e", f"GIT_CACHE_AUTO_UPDATE={git_cache_auto_update}", "-e", f"GIT_CACHE_USER_ID={user_id_int}" # Pass real user ID for cache isolation ])
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
executor_manager/executors/docker/executor.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
executor_manager/executors/docker/executor.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
executor_manager/executors/docker/executor.py
🧬 Code graph analysis (1)
executor_manager/executors/docker/executor.py (1)
executor_manager/executors/docker/git_cache_volume_manager.py (3)
create_user_volume(76-144)get_user_volume_name(40-50)update_volume_last_used(177-209)
🪛 Ruff (0.14.10)
executor_manager/executors/docker/executor.py
606-606: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
634-637: Avoid specifying long messages outside the exception class
(TRY003)
643-643: Abstract raise to an inner function
(TRY301)
643-643: Avoid specifying long messages outside the exception class
(TRY003)
645-645: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
645-645: Avoid specifying long messages outside the exception class
(TRY003)
655-658: Avoid specifying long messages outside the exception class
(TRY003)
676-676: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: Test Frontend
- GitHub Check: Test wegent CLI Integration
🔇 Additional comments (4)
executor_manager/executors/docker/executor.py (4)
34-35: LGTM! Clean import of git cache constants.The imports are correctly structured and the constants are properly used throughout the git cache mounting logic.
150-150: LGTM! Proper user_id extraction and propagation.The user_id is correctly extracted from the task's user configuration and included in the task_info dictionary for downstream use in git cache isolation.
Also applies to: 157-157
500-501: LGTM! Correct validation task handling and user_id propagation.The code properly extracts user_id and conditionally mounts the git cache, correctly skipping it for validation tasks as intended by the PR objectives.
Also applies to: 564-570
606-690: Excellent implementation of per-user git cache isolation.The
_add_git_cache_mountmethod correctly implements physical user isolation using per-user Docker volumes. Key strengths:
- Strong validation: Ensures
user_idis present and valid before enabling cache- Fail-fast approach: Raises errors immediately if volume creation fails, preventing silent failures
- Physical isolation: Each user gets a dedicated Docker volume (
wegent_git_cache_user_{id}), ensuring complete data separation- Complete environment setup: All required environment variables are properly propagated to the container
The implementation correctly addresses the isolation requirements and integrates well with the validation task skip logic.
| if executor_workspace: | ||
| cmd.extend(["-v", f"{executor_workspace}:{WORKSPACE_MOUNT_PATH}"]) | ||
|
|
||
| def _add_git_cache_mount(self, cmd: List[str], user_id: int = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type hint to use Optional[int] for PEP 484 compliance.
The function signature uses user_id: int = None, which violates PEP 484. Use Optional[int] to properly indicate that user_id can be None.
🔎 Proposed fix
- def _add_git_cache_mount(self, cmd: List[str], user_id: int = None) -> None:
+ def _add_git_cache_mount(self, cmd: List[str], user_id: Optional[int] = None) -> None:🧰 Tools
🪛 Ruff (0.14.10)
606-606: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
In executor_manager/executors/docker/executor.py around line 606, the method
signature def _add_git_cache_mount(self, cmd: List[str], user_id: int = None) ->
None uses user_id: int = None which violates PEP 484; change the annotation to
Optional[int] and import Optional from typing if not already imported, i.e.,
update the parameter to user_id: Optional[int] = None and ensure typing import
is present.
This pull request introduces user-isolated git cache support for Docker executors, providing improved security and cache management by giving each user a dedicated Docker volume for their git cache. The changes include configuration options, Docker command modifications, and a new volume management module.
User-isolated git cache implementation:
git_cache_volume_manager.pythat manages Docker volumes for each user, including creation, deletion, metadata management, and usage tracking. Each user gets a volume namedwegent_git_cache_user_{id}for physical cache isolation.Configuration and constants:
GIT_CACHE_ENABLED,GIT_CACHE_AUTO_UPDATE) toconfig.py, and documented related environment variables indocker-compose.yml. [1] [2]constants.py, and ensured these are imported where needed. [1] [2]These changes enable secure, user-specific git cache storage for Docker-based task execution, with clear configuration and robust volume lifecycle management.This pull request introduces support for a user-isolated Git cache for the executor manager service, improving both performance and security when cloning repositories. It adds configuration options, Docker volume setup, secure mounting logic, and comprehensive tests for the cache feature and its edge cases.
Git cache feature and security improvements:
GIT_CACHE_ENABLED,GIT_CACHE_AUTO_UPDATE) to enable and configure the Git cache inexecutor_manager/config/config.pyand documented their usage indocker-compose.yml. [1] [2]executor_manager/executors/docker/constants.pyfor consistent usage and Docker volume mapping._add_git_cache_mountwithinexecutor_manager/executors/docker/executor.py, ensuring each container only accesses its own cache directory and validating user isolation.git_cache_datavolume for persistent cache storage. [1] [2]Task and command preparation changes:
user_idfrom task info and propagated it through Docker command preparation to support secure cache isolation. [1] [2] [3]Testing and edge case coverage:
shared/tests/utils/test_git_util.pyfor all cache scenarios, including cache enabled/disabled, fallback behavior, user isolation, branch handling, and error cases. [1] [2]Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.