Skip to content

Conversation

@allisonport-db
Copy link
Collaborator

@allisonport-db allisonport-db commented Dec 2, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

PART OF #5326

Contains the following changes:

  • Removes Spark 3.5 support
  • Adds explicit Spark 4.0 support
  • Removes a "master" build for now
  • Merges shims from the 3.5 vs 4.0 breaking changes into the src code

In a future PR

  • we will add Spark 4.1.0-SNAPSHOT support (in preparation for the Spark 4.1 release)
  • we will add back a "master" build tracking Spark master
    (these will require adding new shims, but in different areas)

How was this patch tested?

Unit tests + ran integration tests locally (python, scala + pip)

Tracking open TODOs at #5326

@allisonport-db allisonport-db changed the title [Spark][Infra][WIP] Drop support for Spark 3.5 in master [Spark][Infra] Drop support for Spark 3.5 in master Dec 3, 2025
@allisonport-db
Copy link
Collaborator Author

In theory I could do the src code shims + test code shims separately if that would help. Let me know if that makes reviews easier (not sure if anyone wants to review the shim code changes closely, or if tests pass & code compiles that's enough).

@@ -1,59 +0,0 @@
name: "Delta Spark Master"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we update the PR title to say drop support for spark 3.5 and spark master compilation ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we haven't actually been compiling with spark master in a while... (as we're using a very stale snapshot). But I can make the title more clear

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Sorry, I'm still confused. Here we are deleting our job to compile against spark "master" right? (perhaps it was a stale master ..)

But does Drop support for Spark 3.5 and formally pin to released Spark 4.0.1 reflect that?

That seems like an important highlight, sorry, and I want to make sure my understanding is correct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling it spark master before was misleading, in fact, in the previous PR we renamed the spark version spec to spark40Snapshot instead of master. I think saying we are removing spark master is misleading considering we never were compiling with Spark master. We will be fixing that in future PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more correct to say spark_master_test.yaml was incorrectly named this whole time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this action completely? eventually we will be addng back spark master support. then what should we be doing .. finding it from history and adding it back?
alternative is to just make this a no-op in some way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for iceberg action I just made it a no-op.

I'm fine with just removing this though since this work i'm tracking directly and will add it back (need to update the CI in a few ways for multi-versions anyways)

@allisonport-db allisonport-db changed the title [Spark][Infra] Drop support for Spark 3.5 in master [Spark][Infra] Drop support for Spark 3.5 and formally pin to released Spark 4.0.1 Dec 6, 2025

// Changes in 4.1.0
// TODO: change in type hierarchy due to removal of DeltaThrowableConditionShim
ProblemFilters.exclude[MissingTypesProblem]("io.delta.exceptions.*")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reviewers this seems safe to me, considering no one should be catching DeltaThrowableConditionShim... but would like additional opinions

).configureUnidoc()

/*
TODO: readd delta-iceberg on Spark 4.0+
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzlfred Hey Fred, we will be releasing in on both Spark 4.0 and Spark 4.1 next release, we will need to update this build to work for that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also tracking the todo at #5326

).configureUnidoc()

/*
TODO: compilation broken for Spark 4.0
Copy link
Collaborator Author

@allisonport-db allisonport-db Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

@linzhou-db @littlegrasscao FYI can you please look into fixing this once I merge this PR

val lookupSparkVersion: PartialFunction[(Int, Int), String] = {
// version 4.0.0-preview1
case (major, minor) if major >= 4 => "4.0.0-preview1"
// TODO: how to run integration tests for multiple Spark versions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

with open("python/README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()

# TODO: once we support multiple Spark versions update this to be compatible with both
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

Copy link
Collaborator

@raveeram-db raveeram-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build changes look good to me overall

uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "11"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question @scottsand-db why is the kernel unitycatalog is a separate github action from kernel?


// Try to write as same file and expect an error
intercept[FileAlreadyExistsException] {
val e = intercept[IOException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is remove spark 3.5 causing all these kernel code changes?

Copy link
Collaborator Author

@allisonport-db allisonport-db Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I upgraded our hadoop version to match Spark 4.0. More details on #5616 (comment)

Kernel changes are only related to this.

@tdas
Copy link
Contributor

tdas commented Dec 9, 2025

lets merge this quick. the java version is adding all sort of complications in a different place.

@allisonport-db allisonport-db merged commit db826cf into delta-io:master Dec 9, 2025
14 checks passed
TimothyW553 pushed a commit to TimothyW553/delta that referenced this pull request Dec 10, 2025
…d Spark 4.0.1 (delta-io#5616)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

PART OF delta-io#5326

Contains the following changes:
- Removes Spark 3.5 support
- Adds explicit Spark 4.0 support
- Removes a "master" build for now
- Merges shims from the 3.5 vs 4.0 breaking changes into the src code

In a future PR
- we will add Spark 4.1.0-SNAPSHOT support (in preparation for the Spark
4.1 release)
- we will add back a "master" build tracking Spark master
(these will require adding new shims, but in different areas)

## How was this patch tested?

Unit tests + ran integration tests locally (python, scala + pip)

Tracking open TODOs at delta-io#5326
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants