Skip to content

Add Greptile rule to flag missing databricks CI tag on test changes#15076

Merged
amahussein merged 3 commits into
NVIDIA:mainfrom
amahussein:rapids-greptile-db
Jun 23, 2026
Merged

Add Greptile rule to flag missing databricks CI tag on test changes#15076
amahussein merged 3 commits into
NVIDIA:mainfrom
amahussein:rapids-greptile-db

Conversation

@amahussein

Copy link
Copy Markdown
Collaborator

Description

Databricks pre-merge CI is conditional — it runs only when the PR title has [databricks] or the diff touches a *db* shim path (per jenkins/Jenkinsfile-blossom.premerge). So a test change that's correct on vanilla Spark but diverges on the Databricks fork (filesystem/path semantics, or optimizer plan-string assertions) can merge green, then resurface as a failure on an unrelated PR that later carries [databricks].

This adds a Greptile nudge to catch those cases at review time:

  • .greptile/config.json: new databricks-ci-tag rule, scoped to integration_tests/** and tests/** (severity medium), recommending [databricks] when a test change looks DBR-divergent and the title lacks it. Skips *db*-shim and doc-only changes to avoid noise.
  • .greptile/rules.md: focused H9 "Databricks coverage" checklist item.
  • AGENTS.md: document when [databricks] is needed, not just how.

Advisory only — it does not gate merges. No user-facing change.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@amahussein amahussein requested a review from a team June 12, 2026 21:05
@amahussein amahussein self-assigned this Jun 12, 2026
@amahussein amahussein requested review from a team as code owners June 12, 2026 21:05
@amahussein amahussein added the build Related to CI / CD or cleanly building label Jun 12, 2026
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a Greptile advisory rule (databricks-ci-tag) to nudge reviewers when an integration test change could behave differently on Databricks but the PR title lacks [databricks] — closing a gap where DBR-only failures slip through standard Linux pre-merge CI.

  • .greptile/config.json: New rule scoped to integration_tests/** covering filesystem/path semantics and optimizer/plan-string divergence categories; advisory only, does not gate merges.
  • .greptile/rules.md: Old H7 trimmed (DB mention moved out) and new H9 added with the Databricks coverage checklist item.
  • AGENTS.md: PR title tags bullet expanded with precise auto-trigger conditions and concrete examples (DBFS paths, abfss, os.walk, plan rendering) so contributors know when to add [databricks] manually.

Confidence Score: 5/5

Config/documentation-only change with no user-facing or runtime impact; safe to merge.

All three files are advisory configuration and documentation. The new rule is correctly scoped to integration_tests/**, its inline rationale (Databricks CI only runs Python integration tests, never Scala unit tests) is accurate and consistent across all three changed files, and the change cannot break any existing functionality.

No files require special attention.

Important Files Changed

Filename Overview
.greptile/config.json Adds databricks-ci-tag rule (id, rule text, scope integration_tests/**, severity medium) to the Greptile rule set; advisory only, no enforcement.
.greptile/rules.md Splits the old H7 (which bundled databricks mention) into a focused H7 (non-DB CI gaps) and a new H9 (Databricks coverage); text is consistent with config.json rule and AGENTS.md.
AGENTS.md Expands the PR title tags bullet with precise conditions for when Databricks CI auto-runs versus needing a manual [databricks] tag, including concrete filesystem and plan-string examples.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[PR submitted] --> B{Diff touches db shim\nor databricks path?}
    B -- Yes --> C[Databricks CI auto-triggers]
    B -- No --> D{PR title has databricks tag?}
    D -- Yes --> C
    D -- No --> E{Diff touches integration_tests?}
    E -- No --> F[Standard Linux pre-merge only]
    E -- Yes --> G{DBR-divergent test?\nfilesystem or plan-string semantics?}
    G -- No --> F
    G -- Yes --> H[H9 advisory: recommend adding databricks tag]
    H --> I[Contributor adds tag or confirms DB coverage]
    I --> C
    C --> J[Databricks pre-merge: DskipTests + Python integration tests]
Loading

Reviews (2): Last reviewed commit: "Add Greptile rule to flag missing [datab..." | Re-trigger Greptile

Comment thread .greptile/config.json Outdated
…test changes

Databricks pre-merge CI is conditional: per jenkins/Jenkinsfile-blossom.premerge
it runs only when the PR title contains [databricks] or the diff touches a
Databricks-shim path (sql-plugin/src/main/...db/ or a path containing
"databricks"). The standard Linux pre-merge never runs Databricks.

This leaves a gap. A change can be correct on vanilla Spark yet behave
differently on the Databricks Spark fork without touching any auto-trigger
path -- e.g. integration tests that rely on filesystem/path semantics
(local vs DBFS/abfss, file:// scheme, os.walk/os.path) or that assert on
optimizer plan strings (alias names and plan rendering differ on DBR). Such a
test merges green because the only job that would have exercised it on
Databricks was never triggered, then surfaces as a failure later on an
unrelated PR that does carry [databricks] -- making an innocent PR look broken
and costing triage time.

To close the gap on the review side:

- .greptile/config.json: add the "databricks-ci-tag" rule (scoped to
  integration_tests/**, severity medium) so Greptile recommends adding
  [databricks] when an integration-test change looks Databricks-divergent and
  the PR title lacks the tag. It explicitly does not flag changes already under
  a *db* shim path (auto-covered) or doc-only changes, to avoid noise.
- .greptile/rules.md: split the vague [databricks] mention out of H7 into a
  focused H9 "Databricks coverage" checklist item.
- AGENTS.md: document when [databricks] is needed (not just how), so both
  humans and Greptile (whose instructions reference AGENTS.md) share one
  source of truth.

Scope is integration_tests/** only -- not the Scala unit-test dirs. The
Databricks pre-merge builds with -DskipTests and runs only the Python
integration tests (run_pyspark_from_build.sh); Scala unit tests never execute
on Databricks, so the [databricks] tag cannot validate them and recommending it
there would be misleading. Verified against the NVIDIA#15064 Databricks CI_PART1
console log: all shims built with -DskipTests, scalatest goal skipped (73x
"Tests are skipped", zero ScalaTest/Surefire run summaries), followed only by
run_pyspark_from_build.sh. The rule is advisory -- it nudges; it does not gate
merges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
@amahussein amahussein force-pushed the rapids-greptile-db branch from 5ad0b07 to ebe689d Compare June 12, 2026 21:23
@amahussein amahussein requested a review from wjxiz1992 June 15, 2026 14:15
@amahussein

Copy link
Copy Markdown
Collaborator Author

build

Comment thread .greptile/rules.md Outdated
H9 in .greptile/rules.md listed only the *db* shim path as a Databricks-CI
auto-trigger, but config.json and AGENTS.md also document paths containing
`databricks`. Add that condition so the three files agree. Addresses review
feedback from @res-life.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
@amahussein

Copy link
Copy Markdown
Collaborator Author

build

@amahussein amahussein requested review from res-life and removed request for a team June 22, 2026 14:58

@nartal1 nartal1 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@amahussein amahussein changed the title Add Greptile rule to flag missing databricks CI tag on test changes [skip ci] Add Greptile rule to flag missing databricks CI tag on test changes Jun 22, 2026
@amahussein

Copy link
Copy Markdown
Collaborator Author

build

@amahussein amahussein merged commit a17edd5 into NVIDIA:main Jun 23, 2026
47 checks passed
@amahussein amahussein deleted the rapids-greptile-db branch June 23, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Related to CI / CD or cleanly building

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants