Add Greptile rule to flag missing databricks CI tag on test changes#15076
Conversation
Greptile SummaryThis PR adds a Greptile advisory rule (
Confidence Score: 5/5Config/documentation-only change with no user-facing or runtime impact; safe to merge. All three files are advisory configuration and documentation. The new rule is correctly scoped to No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[PR submitted] --> B{Diff touches db shim\nor databricks path?}
B -- Yes --> C[Databricks CI auto-triggers]
B -- No --> D{PR title has databricks tag?}
D -- Yes --> C
D -- No --> E{Diff touches integration_tests?}
E -- No --> F[Standard Linux pre-merge only]
E -- Yes --> G{DBR-divergent test?\nfilesystem or plan-string semantics?}
G -- No --> F
G -- Yes --> H[H9 advisory: recommend adding databricks tag]
H --> I[Contributor adds tag or confirms DB coverage]
I --> C
C --> J[Databricks pre-merge: DskipTests + Python integration tests]
Reviews (2): Last reviewed commit: "Add Greptile rule to flag missing [datab..." | Re-trigger Greptile |
…test changes Databricks pre-merge CI is conditional: per jenkins/Jenkinsfile-blossom.premerge it runs only when the PR title contains [databricks] or the diff touches a Databricks-shim path (sql-plugin/src/main/...db/ or a path containing "databricks"). The standard Linux pre-merge never runs Databricks. This leaves a gap. A change can be correct on vanilla Spark yet behave differently on the Databricks Spark fork without touching any auto-trigger path -- e.g. integration tests that rely on filesystem/path semantics (local vs DBFS/abfss, file:// scheme, os.walk/os.path) or that assert on optimizer plan strings (alias names and plan rendering differ on DBR). Such a test merges green because the only job that would have exercised it on Databricks was never triggered, then surfaces as a failure later on an unrelated PR that does carry [databricks] -- making an innocent PR look broken and costing triage time. To close the gap on the review side: - .greptile/config.json: add the "databricks-ci-tag" rule (scoped to integration_tests/**, severity medium) so Greptile recommends adding [databricks] when an integration-test change looks Databricks-divergent and the PR title lacks the tag. It explicitly does not flag changes already under a *db* shim path (auto-covered) or doc-only changes, to avoid noise. - .greptile/rules.md: split the vague [databricks] mention out of H7 into a focused H9 "Databricks coverage" checklist item. - AGENTS.md: document when [databricks] is needed (not just how), so both humans and Greptile (whose instructions reference AGENTS.md) share one source of truth. Scope is integration_tests/** only -- not the Scala unit-test dirs. The Databricks pre-merge builds with -DskipTests and runs only the Python integration tests (run_pyspark_from_build.sh); Scala unit tests never execute on Databricks, so the [databricks] tag cannot validate them and recommending it there would be misleading. Verified against the NVIDIA#15064 Databricks CI_PART1 console log: all shims built with -DskipTests, scalatest goal skipped (73x "Tests are skipped", zero ScalaTest/Surefire run summaries), followed only by run_pyspark_from_build.sh. The rule is advisory -- it nudges; it does not gate merges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
5ad0b07 to
ebe689d
Compare
|
build |
H9 in .greptile/rules.md listed only the *db* shim path as a Databricks-CI auto-trigger, but config.json and AGENTS.md also document paths containing `databricks`. Add that condition so the three files agree. Addresses review feedback from @res-life. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
|
build |
|
build |
Description
Databricks pre-merge CI is conditional — it runs only when the PR title has
[databricks]or the diff touches a*db*shim path (perjenkins/Jenkinsfile-blossom.premerge). So a test change that's correct on vanilla Spark but diverges on the Databricks fork (filesystem/path semantics, or optimizer plan-string assertions) can merge green, then resurface as a failure on an unrelated PR that later carries[databricks].This adds a Greptile nudge to catch those cases at review time:
.greptile/config.json: newdatabricks-ci-tagrule, scoped tointegration_tests/**andtests/**(severitymedium), recommending[databricks]when a test change looks DBR-divergent and the title lacks it. Skips*db*-shim and doc-only changes to avoid noise..greptile/rules.md: focused H9 "Databricks coverage" checklist item.AGENTS.md: document when[databricks]is needed, not just how.Advisory only — it does not gate merges. No user-facing change.
Checklists
Documentation
Testing
(Please provide the names of the existing tests in the PR description.)
Performance