Add Greptile rule to flag missing databricks CI tag on test changes by amahussein · Pull Request #15076 · NVIDIA/cudf-spark

amahussein · 2026-06-12T21:05:02Z

Description

Databricks pre-merge CI is conditional — it runs only when the PR title has [databricks] or the diff touches a *db* shim path (per jenkins/Jenkinsfile-blossom.premerge). So a test change that's correct on vanilla Spark but diverges on the Databricks fork (filesystem/path semantics, or optimizer plan-string assertions) can merge green, then resurface as a failure on an unrelated PR that later carries [databricks].

This adds a Greptile nudge to catch those cases at review time:

.greptile/config.json: new databricks-ci-tag rule, scoped to integration_tests/** and tests/** (severity medium), recommending [databricks] when a test change looks DBR-divergent and the title lacks it. Skips *db*-shim and doc-only changes to avoid noise.
.greptile/rules.md: focused H9 "Databricks coverage" checklist item.
AGENTS.md: document when [databricks] is needed, not just how.

Advisory only — it does not gate merges. No user-facing change.

Checklists

Documentation

Updated for new or modified user-facing features or behaviors
No user-facing change

Testing

Added or modified tests to cover new code paths
Covered by existing tests
(Please provide the names of the existing tests in the PR description.)
Not required

Performance

Tests ran and results are added in the PR description
Issue filed with a link in the PR description
Not required

greptile-apps · 2026-06-12T21:07:38Z

Greptile Summary

This PR adds a Greptile advisory rule (databricks-ci-tag) to nudge reviewers when an integration test change could behave differently on Databricks but the PR title lacks [databricks] — closing a gap where DBR-only failures slip through standard Linux pre-merge CI.

.greptile/config.json: New rule scoped to integration_tests/** covering filesystem/path semantics and optimizer/plan-string divergence categories; advisory only, does not gate merges.
.greptile/rules.md: Old H7 trimmed (DB mention moved out) and new H9 added with the Databricks coverage checklist item.
AGENTS.md: PR title tags bullet expanded with precise auto-trigger conditions and concrete examples (DBFS paths, abfss, os.walk, plan rendering) so contributors know when to add [databricks] manually.

Confidence Score: 5/5

Config/documentation-only change with no user-facing or runtime impact; safe to merge.

All three files are advisory configuration and documentation. The new rule is correctly scoped to integration_tests/**, its inline rationale (Databricks CI only runs Python integration tests, never Scala unit tests) is accurate and consistent across all three changed files, and the change cannot break any existing functionality.

No files require special attention.

Important Files Changed

Filename	Overview
.greptile/config.json	Adds `databricks-ci-tag` rule (id, rule text, scope `integration_tests/**`, severity medium) to the Greptile rule set; advisory only, no enforcement.
.greptile/rules.md	Splits the old H7 (which bundled databricks mention) into a focused H7 (non-DB CI gaps) and a new H9 (Databricks coverage); text is consistent with config.json rule and AGENTS.md.
AGENTS.md	Expands the `PR title tags` bullet with precise conditions for when Databricks CI auto-runs versus needing a manual `[databricks]` tag, including concrete filesystem and plan-string examples.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[PR submitted] --> B{Diff touches db shim\nor databricks path?}
    B -- Yes --> C[Databricks CI auto-triggers]
    B -- No --> D{PR title has databricks tag?}
    D -- Yes --> C
    D -- No --> E{Diff touches integration_tests?}
    E -- No --> F[Standard Linux pre-merge only]
    E -- Yes --> G{DBR-divergent test?\nfilesystem or plan-string semantics?}
    G -- No --> F
    G -- Yes --> H[H9 advisory: recommend adding databricks tag]
    H --> I[Contributor adds tag or confirms DB coverage]
    I --> C
    C --> J[Databricks pre-merge: DskipTests + Python integration tests]

_{Reviews (2): Last reviewed commit: "Add Greptile rule to flag missing [datab..." | Re-trigger Greptile}

…test changes Databricks pre-merge CI is conditional: per jenkins/Jenkinsfile-blossom.premerge it runs only when the PR title contains [databricks] or the diff touches a Databricks-shim path (sql-plugin/src/main/...db/ or a path containing "databricks"). The standard Linux pre-merge never runs Databricks. This leaves a gap. A change can be correct on vanilla Spark yet behave differently on the Databricks Spark fork without touching any auto-trigger path -- e.g. integration tests that rely on filesystem/path semantics (local vs DBFS/abfss, file:// scheme, os.walk/os.path) or that assert on optimizer plan strings (alias names and plan rendering differ on DBR). Such a test merges green because the only job that would have exercised it on Databricks was never triggered, then surfaces as a failure later on an unrelated PR that does carry [databricks] -- making an innocent PR look broken and costing triage time. To close the gap on the review side: - .greptile/config.json: add the "databricks-ci-tag" rule (scoped to integration_tests/**, severity medium) so Greptile recommends adding [databricks] when an integration-test change looks Databricks-divergent and the PR title lacks the tag. It explicitly does not flag changes already under a *db* shim path (auto-covered) or doc-only changes, to avoid noise. - .greptile/rules.md: split the vague [databricks] mention out of H7 into a focused H9 "Databricks coverage" checklist item. - AGENTS.md: document when [databricks] is needed (not just how), so both humans and Greptile (whose instructions reference AGENTS.md) share one source of truth. Scope is integration_tests/** only -- not the Scala unit-test dirs. The Databricks pre-merge builds with -DskipTests and runs only the Python integration tests (run_pyspark_from_build.sh); Scala unit tests never execute on Databricks, so the [databricks] tag cannot validate them and recommending it there would be misleading. Verified against the NVIDIA#15064 Databricks CI_PART1 console log: all shims built with -DskipTests, scalatest goal skipped (73x "Tests are skipped", zero ScalaTest/Surefire run summaries), followed only by run_pyspark_from_build.sh. The rule is advisory -- it nudges; it does not gate merges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2026-06-15T14:36:57Z

build

@res-life

H9 in .greptile/rules.md listed only the *db* shim path as a Databricks-CI auto-trigger, but config.json and AGENTS.md also document paths containing `databricks`. Add that condition so the three files agree. Addresses review feedback from @res-life. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

amahussein · 2026-06-16T14:12:15Z

build

nartal1

LGTM.

amahussein · 2026-06-22T21:52:14Z

build

amahussein requested a review from a team June 12, 2026 21:05

amahussein self-assigned this Jun 12, 2026

amahussein requested review from a team as code owners June 12, 2026 21:05

amahussein added the build Related to CI / CD or cleanly building label Jun 12, 2026

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread .greptile/config.json Outdated

amahussein force-pushed the rapids-greptile-db branch from 5ad0b07 to ebe689d Compare June 12, 2026 21:23

amahussein requested a review from wjxiz1992 June 15, 2026 14:15

res-life reviewed Jun 16, 2026

View reviewed changes

Comment thread .greptile/rules.md Outdated

amahussein requested review from res-life and removed request for a team June 22, 2026 14:58

nartal1 approved these changes Jun 22, 2026

View reviewed changes

amahussein changed the title ~~Add Greptile rule to flag missing databricks CI tag on test changes [skip ci]~~ Add Greptile rule to flag missing databricks CI tag on test changes Jun 22, 2026

Merge branch 'main' into rapids-greptile-db

5b8e58e

amahussein merged commit a17edd5 into NVIDIA:main Jun 23, 2026
47 checks passed

amahussein deleted the rapids-greptile-db branch June 23, 2026 02:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Greptile rule to flag missing databricks CI tag on test changes#15076

Add Greptile rule to flag missing databricks CI tag on test changes#15076
amahussein merged 3 commits into
NVIDIA:mainfrom
amahussein:rapids-greptile-db

amahussein commented Jun 12, 2026

Uh oh!

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

amahussein commented Jun 15, 2026

Uh oh!

Uh oh!

amahussein commented Jun 16, 2026

Uh oh!

nartal1 left a comment

Uh oh!

amahussein commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

amahussein commented Jun 12, 2026

Description

Checklists

Uh oh!

greptile-apps Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

amahussein commented Jun 15, 2026

Uh oh!

Uh oh!

amahussein commented Jun 16, 2026

Uh oh!

nartal1 left a comment

Choose a reason for hiding this comment

Uh oh!

amahussein commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading