feat(scm-multi-platform-detection): Extending functionality with capped reads#117745
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 55baf7e. Configure here.
| if "match_content" in rule: | ||
| pattern = rule["match_content"] | ||
| path_filter = rule.get("path") | ||
| ext_filter = rule.get("match_ext") | ||
| result: set[str] = set() | ||
| for full_path, content in content_by_path.items(): | ||
| basename = full_path.rsplit("/", 1)[-1] | ||
| if path_filter and basename != path_filter: | ||
| continue | ||
| if ext_filter and not basename.endswith(ext_filter): | ||
| continue |
There was a problem hiding this comment.
Bug: The co-location check fails when an every rule matches a root directory and a some rule matches content in a nested file, causing framework detection to fail.
Severity: MEDIUM
Suggested Fix
The co-location logic should be adjusted to correctly handle cases where one scope is the root ("") and the other is a subdirectory. The check should succeed if the file's parent directory is within the directory matched by the every rule. This could involve treating the root scope as a valid parent for any subdirectory scope.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/integrations/github/multi_platform_detection.py#L360-L370
Potential issue: The framework detection logic incorrectly handles co-location checks
between directory-based (`every`) and content-based (`some`) rules. When an `every` rule
matches a directory at the repository root (e.g., `app/`), its scope is determined as
the root (`""`). If a `some` rule then matches content within a file inside that
directory (e.g., `app/build.gradle`), its scope is the subdirectory (`"app"`). The
subsequent check fails because the root scope `""` is not found within the
subdirectory's scope `{"app"}`. This leads to false negatives in framework detection,
particularly for Android projects where the `android` marker might only exist in a
nested `build.gradle` file.
Did we get this right? 👍 / 👎 to inform future reviews.
Adds unit and integration tests for the previously untested parts of `multi_platform_detection.py`, and cleans up minor quality issues in the existing test suite. New test coverage: `TestSelectActivePlatforms` — `MAX_LANGUAGES` cap, language grouping (TypeScript + JavaScript → one javascript slot), `IGNORED_LANGUAGES` skipped, byte-count descending ordering `TestPathIsIgnored` — segment-vs-substring correctness (e.g. build.gradle is not ignored even though build/ is an ignored directory) `TestGetTree` — defensive parsing of non-dict responses, missing tree key, truncated flag propagation TestDetectPlatformsMulti (extended) — existence-only Pass 1 high match with zero content reads, co-location false-positive prevention end-to-end, high confidence sorts before medium regardless of byte count Test quality cleanup:
5ab1180
into
abdk/multi-platform-detection-v8

Unified
_rule_parent_dirsand _framework_matches_scoped to handle all rule types — existence, match_content, and match_package — in a single pass each. Both now accept content_by_path and manifests_by_path; passing empty dicts for these in the existence pass makes content/package rules silently return empty sets with no special-casing needed.Added content reads to detect_platforms_multi as a second high-confidence pass. After existence matching,
_collect_needed_pathsidentifies candidate files, up toMAX_CONTENT_READS= 5 are fetched using the existing_get_repo_file_contentand _parse_package_manifest helpers, sorted shallowest-first so root manifests are always within the cap. Supersession runs after both passes so content-detected frameworks can supersede existence matches.Added
TestCollectNeededPathsandTestDetectPlatformsMulticovering the major branches: content-only detection, package-only detection, the shallow-first cap, content-driven supersession, and the zero-reads case.