Skip to content

Commit

Permalink
configure Sbt packaging
Browse files Browse the repository at this point in the history
- one JAR without dependencies
- one assembly JAR that includes them
  • Loading branch information
barend-xebia committed Jun 28, 2024
1 parent 0fe1e28 commit 8047880
Show file tree
Hide file tree
Showing 6 changed files with 79 additions and 9 deletions.
32 changes: 30 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,35 @@
# OpenTelemetry SparkListener
# Spot: Spark-OpenTelemetry

This package connects [Apache Spark™][sp-home] to [OpenTelemetry][ot-home].

This creates a layer of indirection to allow reporting metrics from any Spark or PySpark job to [OpenTelemetry Collector][ot-col], or directly to any [supported backend][ot-export].

## Status

ℹ️This project is in initial development. It's not ready for use.

## Usage

The recommended way to use Spot relies on [OpenTelemetry Autoconfigure][ot-auto] to obtain the OpenTelemetry configuration. You pass the `spot-complete-*.jar` to spark-submit to make Spot available to your job, and configure `spark.extraListeners` to enable it.

```bash
SCALA_VERSION=2.12 # This will be 2.12 or 2.13, whichever matches your Spark deployment.
spark-submit \
--jar com.xebia.data.spot.spot-complete_${SCALA_VERSION}-x.y.z.jar \
--conf spark.extraListeners=com.xebia.data.spot.TelemetrySparkListener \
...
com.example.MySparkJob
```

### Prerequisites

Instrumenting for telemetry is useless until you publish the recorded data somewhere. This might be the native metrics suite of your chosen cloud provider, or a free or commercial third party system such as Prometheus + Tempo + Grafana. You can have your instrumented Spark jobs publish directly to the backend, or run the traffic via OpenTelemetry Collector. Choosing the backend and routing architecture is outside the scope of this document.

If you're using Spark on top of Kubernetes, you should install and configure the [OpenTelemetry Operator][ot-k8s-oper]. In any other deployment you should publish the appropriate [environment variables for autoconf][ot-auto-env].

[ot-auto]: https://opentelemetry.io/docs/languages/java/instrumentation/#automatic-configuration
[ot-auto-env]: https://opentelemetry.io/docs/languages/java/configuration/
[ot-col]: https://opentelemetry.io/docs/collector/
[ot-export]: https://opentelemetry.io/ecosystem/registry/?component=exporter
[ot-home]: https://opentelemetry.io/
[ot-k8s-oper]: https://opentelemetry.io/docs/kubernetes/operator/
[sp-home]: https://spark.apache.org
41 changes: 34 additions & 7 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -1,17 +1,44 @@
ThisBuild / organization := "com.xebia.data"
ThisBuild / scalaVersion := "2.13.13"
ThisBuild / crossScalaVersions := Seq("2.12.18", "2.13.13")
ThisBuild / scmInfo := Some(ScmInfo(
url("https://github.com/xebia/spot"),
"https://github.com/xebia/spot.git",
"[email protected]:xebia/spot.git"))

ThisBuild / scalaVersion := "2.13.14"
ThisBuild / crossScalaVersions := Seq("2.12.19", "2.13.14")

import Dependencies._

lazy val spot = project
.in(file("./spot"))
.disablePlugins(AssemblyPlugin)
.settings(
name := "spot",
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.5.1",
`opentelemetry-api`,
`spark-core` % Provided
),
)

"io.opentelemetry" % "opentelemetry-api" % "1.37.0",
"io.opentelemetry" % "opentelemetry-sdk" % "1.37.0" % Runtime,
lazy val `spot-complete` = project
.in(file("./spot-complete"))
.dependsOn(spot)
.settings(
name := "spot-complete",
libraryDependencies ++= Seq(
`opentelemetry-sdk`,
`opentelemetry-sdk-autoconfigure`
),
assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${version.value}.jar",
assembly / assemblyOption ~= {
_.withIncludeScala(false)
}
)

"io.opentelemetry" % "opentelemetry-sdk-extension-autoconfigure" % "1.34.0" % Optional,
)
lazy val root = project
.in(file("."))
.aggregate(spot, `spot-complete`)
.disablePlugins(AssemblyPlugin)
.settings(
publish / skip := true,
)
11 changes: 11 additions & 0 deletions project/Dependencies.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import sbt._

object Dependencies {
private[this] val openTelemetryVersion = "1.39.0"
private[this] val openTelemetryAutoConf = "1.38.0"

val `opentelemetry-api` = "io.opentelemetry" % "opentelemetry-api" % openTelemetryVersion
val `opentelemetry-sdk` = "io.opentelemetry" % "opentelemetry-sdk" % openTelemetryVersion
val `opentelemetry-sdk-autoconfigure` = "io.opentelemetry" % "opentelemetry-sdk-extension-autoconfigure" % openTelemetryAutoConf
val `spark-core` = "org.apache.spark" %% "spark-core" % "3.5.1"
}
2 changes: 2 additions & 0 deletions project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
addSbtPlugin("com.github.sbt" % "sbt-dynver" % "5.0.1")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.2.0")
1 change: 1 addition & 0 deletions spot-complete/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The `spot-complete` sbt project packages Spot and its dependencies as a single JAR file.
1 change: 1 addition & 0 deletions spot/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The `spot` sbt project packages the Spark-OpenTelemetry listener on its own.

0 comments on commit 8047880

Please sign in to comment.