Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
3501613
Initial build changes
allisonport-db Dec 2, 2025
77d305b
remove 3.5 shims
allisonport-db Dec 2, 2025
83ebe6b
Remove LoggingShims and DeltaSqlParserShims
allisonport-db Dec 2, 2025
90e1e1f
Variant shim
allisonport-db Dec 2, 2025
2a0f940
DecimalPrecisionTypeCoercionShims
allisonport-db Dec 2, 2025
86fea31
LogKeyShims
allisonport-db Dec 2, 2025
20f72f3
Some more shims
allisonport-db Dec 2, 2025
77a9a42
TypeWideningShim
allisonport-db Dec 2, 2025
13ac618
MergeIntoMaterializeSourceShims
allisonport-db Dec 3, 2025
5abacfa
Throwable, TimeTravelSpec, LogicalRelation
allisonport-db Dec 3, 2025
a8c2804
DeltaInvariantCheckerExecShims
allisonport-db Dec 3, 2025
e3dd2b9
Misc
allisonport-db Dec 3, 2025
ec34fd8
Misc shims
allisonport-db Dec 3, 2025
8fd880f
Remove connectors
allisonport-db Dec 3, 2025
70effca
Update test_cross_spark_publish.py
allisonport-db Dec 3, 2025
11cec06
Revert "Remove connectors"
allisonport-db Dec 3, 2025
73deabf
Remove shim dir from build
allisonport-db Dec 3, 2025
c9dc7fa
add default
allisonport-db Dec 3, 2025
36b8472
drop iceberg for now
allisonport-db Dec 3, 2025
9e41fec
Misc job fixes
allisonport-db Dec 3, 2025
ee907b7
Fix kernel tests
allisonport-db Dec 3, 2025
73bea33
Fix mima for now
allisonport-db Dec 3, 2025
6aa8f58
Type widening shims
allisonport-db Dec 5, 2025
1c58630
SnapshotManagementSuiteShims
allisonport-db Dec 5, 2025
6bb2c54
DeltaGenerateSymlinkManifestSuiteShims
allisonport-db Dec 5, 2025
d1f8616
DeltaHistoryManagerSuiteShims
allisonport-db Dec 5, 2025
2374c06
DeltaInsertIntoTableSuiteShims
allisonport-db Dec 5, 2025
386925a
DeltaSuiteShims
allisonport-db Dec 5, 2025
ba34b06
DeltaVacuumSuiteShims
allisonport-db Dec 5, 2025
96f574f
DescribeDeltaHistorySuiteShims
allisonport-db Dec 5, 2025
da50359
MergeIntoMaterializeSourceShims
allisonport-db Dec 5, 2025
f8b0508
ImplicitDMLCastingSuiteShims
allisonport-db Dec 5, 2025
a0a4d27
MergeIntoMetricsShims
allisonport-db Dec 5, 2025
5ae7ee7
Structured logging tests
allisonport-db Dec 5, 2025
219c0a5
Variant suites
allisonport-db Dec 5, 2025
17319b5
Version test shims
allisonport-db Dec 5, 2025
3e3317b
Final shim fixes
allisonport-db Dec 5, 2025
603ab3f
log4j configs
allisonport-db Dec 5, 2025
4bc7d73
Merge remote-tracking branch 'delta-io/master' into update-build-1
allisonport-db Dec 6, 2025
a6fd612
Fix scalastyle
allisonport-db Dec 6, 2025
0de65e8
Fix sharing test compile
allisonport-db Dec 8, 2025
8d6d117
Merge remote-tracking branch 'delta-io/master' into update-build-1
allisonport-db Dec 8, 2025
4dbad62
Update setup.py
allisonport-db Dec 8, 2025
f8c5c3f
Skip iceberg test for now
allisonport-db Dec 8, 2025
264e629
Fixes
allisonport-db Dec 9, 2025
ae7363c
Update test_cross_spark_publish.py
allisonport-db Dec 9, 2025
49ff48c
Move comments in kernel_test
allisonport-db Dec 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/iceberg_test.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: "Delta Iceberg Latest"
on: [push, pull_request]
on: [] # [push, pull_request]
jobs:
test:
name: "DIL: Scala ${{ matrix.scala }}"
Expand All @@ -25,7 +25,7 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "11"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/kernel_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,12 @@ jobs:
echo "Runner arch: ${{ runner.arch }}"
- name: Checkout code
uses: actions/checkout@v4
# Run unit tests with JDK 17. These unit tests depend on Spark, and Spark 4.0+ is JDK 17.
- name: install java
uses: actions/setup-java@v4
with:
distribution: "zulu"
java-version: "11"
java-version: "17"
- name: Cache SBT and dependencies
id: cache-sbt
uses: actions/cache@v4
Expand All @@ -59,7 +60,7 @@ jobs:
else
echo "❌ Cache MISS - will download dependencies"
fi
- name: Run tests
- name: Run unit tests
run: |
python run-tests.py --group kernel --coverage --shard ${{ matrix.shard }}

Expand All @@ -68,6 +69,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v3
# Run integration tests with JDK 11, as they have no Spark dependency
- name: install java
uses: actions/setup-java@v3
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/kernel_unitycatalog_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "11"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question @scottsand-db why is the kernel unitycatalog is a separate github action from kernel?

java-version: "17"
if: steps.git-diff.outputs.diff
- name: Run Unity tests with coverage
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/spark_examples_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "11"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
Expand Down
59 changes: 0 additions & 59 deletions .github/workflows/spark_master_test.yaml

This file was deleted.

31 changes: 17 additions & 14 deletions .github/workflows/spark_python_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "11"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
Expand Down Expand Up @@ -53,33 +53,36 @@ jobs:
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv install 3.8.18
pyenv global system 3.8.18
pipenv --python 3.8 install
pyenv install 3.9
pyenv global system 3.9
pipenv --python 3.9 install
# Update the pip version to 24.0. By default `pyenv.run` installs the latest pip version
# available. From version 24.1, `pip` doesn't allow installing python packages
# with version string containing `-`. In Delta-Spark case, the pypi package generated has
# `-SNAPSHOT` in version (e.g. `3.3.0-SNAPSHOT`) as the version is picked up from
# the`version.sbt` file.
pipenv run pip install pip==24.0 setuptools==69.5.1 wheel==0.43.0
# Install PySpark without bundled Scala 2.12 JARs - read more in the future note below
pipenv run pip install pyspark==3.5.3 --no-deps
pipenv run pip install py4j==0.10.9.7
pipenv run pip install flake8==3.5.0 pypandoc==1.3.3
pipenv run pip install black==23.9.1
pipenv run pip install pyspark==4.0.1
pipenv run pip install flake8==3.9.0
pipenv run pip install black==23.12.1
pipenv run pip install importlib_metadata==3.10.0
# The mypy versions 0.982 and 1.8.0 have conflicting rules (cannot get style checks to
# pass for both versions on the same file) so we upgrade this to match Spark 4.0
pipenv run pip install mypy==1.8.0
pipenv run pip install mypy-protobuf==3.3.0
pipenv run pip install cryptography==37.0.4
pipenv run pip install twine==4.0.1
pipenv run pip install wheel==0.33.4
pipenv run pip install setuptools==41.1.0
pipenv run pip install pydocstyle==3.0.0
pipenv run pip install pandas==1.1.3
pipenv run pip install pyarrow==8.0.0
pipenv run pip install numpy==1.20.3
pipenv run pip install pandas==2.2.0
pipenv run pip install pyarrow==11.0.0
pipenv run pip install pypandoc==1.3.3
pipenv run pip install numpy==1.22.4
pipenv run pip install grpcio==1.67.0
pipenv run pip install grpcio-status==1.67.0
pipenv run pip install googleapis-common-protos==1.65.0
pipenv run pip install protobuf==5.29.1
pipenv run pip install googleapis-common-protos-stubs==2.2.0
pipenv run pip install grpc-stubs==1.24.11
if: steps.git-diff.outputs.diff
- name: Run Python tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_master_test.yaml
Expand Down
29 changes: 18 additions & 11 deletions .github/workflows/spark_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "11"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
Expand Down Expand Up @@ -57,29 +57,36 @@ jobs:
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv install 3.8.18
pyenv global system 3.8.18
pipenv --python 3.8 install
pyenv install 3.9
pyenv global system 3.9
pipenv --python 3.9 install
# Update the pip version to 24.0. By default `pyenv.run` installs the latest pip version
# available. From version 24.1, `pip` doesn't allow installing python packages
# with version string containing `-`. In Delta-Spark case, the pypi package generated has
# `-SNAPSHOT` in version (e.g. `3.3.0-SNAPSHOT`) as the version is picked up from
# the`version.sbt` file.
pipenv run pip install pip==24.0 setuptools==69.5.1 wheel==0.43.0
pipenv run pip install pyspark==3.5.3
pipenv run pip install flake8==3.5.0 pypandoc==1.3.3
pipenv run pip install black==23.9.1
pipenv run pip install pyspark==4.0.1
pipenv run pip install flake8==3.9.0
pipenv run pip install black==23.12.1
pipenv run pip install importlib_metadata==3.10.0
pipenv run pip install mypy==0.982
pipenv run pip install mypy==1.8.0
pipenv run pip install mypy-protobuf==3.3.0
pipenv run pip install cryptography==37.0.4
pipenv run pip install twine==4.0.1
pipenv run pip install wheel==0.33.4
pipenv run pip install setuptools==41.1.0
pipenv run pip install pydocstyle==3.0.0
pipenv run pip install pandas==1.1.3
pipenv run pip install pyarrow==8.0.0
pipenv run pip install numpy==1.20.3
pipenv run pip install pandas==2.2.0
pipenv run pip install pyarrow==11.0.0
pipenv run pip install pypandoc==1.3.3
pipenv run pip install numpy==1.22.4
pipenv run pip install grpcio==1.67.0
pipenv run pip install grpcio-status==1.67.0
pipenv run pip install googleapis-common-protos==1.65.0
pipenv run pip install protobuf==5.29.1
pipenv run pip install googleapis-common-protos-stubs==2.2.0
pipenv run pip install grpc-stubs==1.24.11
if: steps.git-diff.outputs.diff
- name: Scala structured logging style check
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/unidoc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "11"
java-version: "17"
- uses: actions/checkout@v3
- name: generate unidoc
run: build/sbt "++ ${{ matrix.scala }}" unidoc
33 changes: 13 additions & 20 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ val sparkVersion = settingKey[String]("Spark version")

// Dependent library versions
val defaultSparkVersion = SparkVersionSpec.DEFAULT.fullVersion // Spark version to use for testing in non-delta-spark related modules
val hadoopVersion = "3.3.4"
val hadoopVersion = "3.4.0"
val scalaTestVersion = "3.2.15"
val scalaTestVersionForConnectors = "3.0.8"
val parquet4sVersion = "1.9.4"
Expand Down Expand Up @@ -257,7 +257,7 @@ lazy val connectClient = (project in file("spark-connect/client"))
// Create a symlink for the log4j properties
val confDir = distributionDir / "conf"
IO.createDirectory(confDir)
val log4jProps = (spark / Test / resourceDirectory).value / "log4j2_spark_master.properties"
val log4jProps = (spark / Test / resourceDirectory).value / "log4j2.properties"
val linkedLog4jProps = confDir / "log4j2.properties"
Files.createSymbolicLink(linkedLog4jProps.toPath, log4jProps.toPath)
}
Expand Down Expand Up @@ -705,6 +705,8 @@ lazy val contribs = (project in file("contribs"))
Compile / compile := ((Compile / compile) dependsOn createTargetClassesDir).value
).configureUnidoc()

/*
TODO: compilation broken for Spark 4.0
Copy link
Collaborator Author

@allisonport-db allisonport-db Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

@linzhou-db @littlegrasscao FYI can you please look into fixing this once I merge this PR

lazy val sharing = (project in file("sharing"))
.dependsOn(spark % "compile->compile;test->test;provided->provided")
.disablePlugins(JavaFormatterPlugin, ScalafmtPlugin)
Expand All @@ -715,22 +717,6 @@ lazy val sharing = (project in file("sharing"))
releaseSettings,
CrossSparkVersions.sparkDependentSettings(sparkVersion),
Test / javaOptions ++= Seq("-ea"),
Compile / compile := runTaskOnlyOnSparkMaster(
task = Compile / compile,
taskName = "compile",
projectName = "delta-sharing-spark",
emptyValue = Analysis.empty.asInstanceOf[CompileAnalysis]
).value,
Test / test := runTaskOnlyOnSparkMaster(
task = Test / test,
taskName = "test",
projectName = "delta-sharing-spark",
emptyValue = ()).value,
publish := runTaskOnlyOnSparkMaster(
task = publish,
taskName = "publish",
projectName = "delta-sharing-spark",
emptyValue = ()).value,
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion.value % "provided",

Expand All @@ -747,6 +733,7 @@ lazy val sharing = (project in file("sharing"))
"org.apache.spark" %% "spark-hive" % sparkVersion.value % "test" classifier "tests",
)
).configureUnidoc()
*/

lazy val kernelApi = (project in file("kernel/kernel-api"))
.enablePlugins(ScalafmtPlugin)
Expand Down Expand Up @@ -898,7 +885,7 @@ lazy val kernelDefaults = (project in file("kernel/kernel-defaults"))
// such as warm runs, cold runs, defining benchmark parameter variables etc.
"org.openjdk.jmh" % "jmh-core" % "1.37" % "test",
"org.openjdk.jmh" % "jmh-generator-annprocess" % "1.37" % "test",
"io.delta" %% "delta-spark" % "3.3.2" % "test",
"io.delta" %% "delta-spark" % "4.0.0" % "test",

"org.apache.spark" %% "spark-hive" % defaultSparkVersion % "test" classifier "tests",
"org.apache.spark" %% "spark-sql" % defaultSparkVersion % "test" classifier "tests",
Expand Down Expand Up @@ -1010,6 +997,8 @@ lazy val storageS3DynamoDB = (project in file("storage-s3-dynamodb"))
)
).configureUnidoc()

/*
TODO: readd delta-iceberg on Spark 4.0+
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzlfred Hey Fred, we will be releasing in on both Spark 4.0 and Spark 4.1 next release, we will need to update this build to work for that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also tracking the todo at #5326

val icebergSparkRuntimeArtifactName = {
val (expMaj, expMin, _) = getMajorMinorPatch(defaultSparkVersion)
s"iceberg-spark-runtime-$expMaj.$expMin"
Expand Down Expand Up @@ -1165,6 +1154,7 @@ lazy val icebergShaded = (project in file("icebergShaded"))
assembly / assemblyMergeStrategy := updateMergeStrategy((assembly / assemblyMergeStrategy).value),
assemblyPackageScala / assembleArtifact := false,
)
*/

lazy val hudi = (project in file("hudi"))
.dependsOn(spark % "compile->compile;test->test;provided->provided")
Expand Down Expand Up @@ -1265,14 +1255,16 @@ val createTargetClassesDir = taskKey[Unit]("create target classes dir")

// Don't use these groups for any other projects
lazy val sparkGroup = project
.aggregate(spark, sparkV1, sparkV1Filtered, sparkV2, contribs, storage, storageS3DynamoDB, sharing, hudi)
// TODO: add sharing back after fixing compilation
.aggregate(spark, sparkV1, sparkV1Filtered, sparkV2, contribs, storage, storageS3DynamoDB, hudi)
.settings(
// crossScalaVersions must be set to Nil on the aggregating project
crossScalaVersions := Nil,
publishArtifact := false,
publish / skip := false,
)

/*
lazy val icebergGroup = project
.aggregate(iceberg, testDeltaIcebergJar)
.settings(
Expand All @@ -1281,6 +1273,7 @@ lazy val icebergGroup = project
publishArtifact := false,
publish / skip := false,
)
*/

lazy val kernelGroup = project
.aggregate(kernelApi, kernelDefaults, kernelBenchmarks)
Expand Down
6 changes: 4 additions & 2 deletions examples/scala/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,10 @@ def getMajorMinor(version: String): (Int, Int) = {
}
}
val lookupSparkVersion: PartialFunction[(Int, Int), String] = {
// version 4.0.0-preview1
case (major, minor) if major >= 4 => "4.0.0-preview1"
// TODO: how to run integration tests for multiple Spark versions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

case (major, minor) if major >= 4 && minor >= 1 => "4.0.1"
// version 4.0.0
case (major, minor) if major >= 4 => "4.0.0"
// versions 3.3.x+
case (major, minor) if major >= 3 && minor >=3 => "3.5.3"
// versions 3.0.0 to 3.2.x
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,11 @@ public static void checkpoint(Engine engine, Clock clock, SnapshotImpl snapshot)
numberOfAddFiles = checkpointDataIter.getNumberOfAddActions();
} catch (FileAlreadyExistsException faee) {
throw new CheckpointAlreadyExistsException(version);
} catch (IOException io) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading the hadoop version changes this error class

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm .. I wonder what the change was?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the change in hadoop but instead of seeing a FileAlreadyExistsException we see a IOException with cause FileAlreadyExistsException. We have this tested (at least one test fails w/out this fix here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They seem similar enough so didn't look further into it. Seems like a minor API difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottsand-db can you approve this change?

if (io.getCause() instanceof FileAlreadyExistsException) {
throw new CheckpointAlreadyExistsException(version);
}
throw io;
}

final CheckpointMetaData checkpointMetaData =
Expand Down
Loading
Loading