-
Notifications
You must be signed in to change notification settings - Fork 146
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
06b9fa9
commit fe361a0
Showing
5 changed files
with
59 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,25 +3,31 @@ | |
![sparkMeasure CI](https://github.com/LucaCanali/sparkMeasure/workflows/sparkMeasure%20CI/badge.svg?branch=master&event=push) | ||
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/ch.cern.sparkmeasure/spark-measure_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/ch.cern.sparkmeasure/spark-measure_2.11) | ||
|
||
### SparkMeasure is a tool for performance troubleshooting of Apache Spark workloads | ||
SparkMeasure simplifies the collection and analysis of Spark performance metrics. | ||
Use sparkMeasure for troubleshooting **interactive and batch** Spark workloads. | ||
Use it also to collect metrics for long-term retention or as part of a **CI/CD** pipeline. | ||
### SparkMeasure is a tool for performance troubleshooting of Apache Spark jobs | ||
SparkMeasure simplifies the collection and analysis of Spark performance metrics. | ||
Use sparkMeasure for troubleshooting **interactive and batch** Spark workloads. | ||
Use it also to collect metrics for long-term retention or as part of a **CI/CD** pipeline. | ||
SparkMeasure is also intended as a working example of how to use Spark Listeners for collecting Spark task metrics data. | ||
* Main author and contact: [email protected] + credits to [email protected] + thanks to PR contributors | ||
* Compatibility: Spark 2.1.x and higher. | ||
|
||
### Getting started with sparkMeasure, by example | ||
* How to use: deploy [sparkMeasure from Maven Central](https://mvnrepository.com/artifact/ch.cern.sparkmeasure/spark-measure) | ||
- Spark 2.x built with scala 2_11: | ||
- Scala: `bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.11:0.16` | ||
- Python: `bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.11:0.16` | ||
- note: `pip install sparkmeasure` to get the Python wrapper API. | ||
- Spark 3.0.x and 2.4.x built with scala 2_12: | ||
- Scala: `bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.12:0.16` | ||
- Python: `bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.16` | ||
- note: `pip install sparkmeasure` to get the Python wrapper API. | ||
- Bleeding edge: build from master using sbt: `sbt +package` and use the jars instead of packages. | ||
* Main author and contact: | ||
* [email protected] + credits to [email protected] + thanks to PR contributors | ||
* For Spark 2.x and 3.x | ||
* Tested on Spark 2.4 and 3.0 | ||
* Spark 2.3 -> should also be OK | ||
* Spark 2.1 and 2.2 -> use sparkMeasure version 0.16 | ||
|
||
### Getting started with sparkMeasure | ||
* Note: sparkMeasure is available on [Maven Central](https://mvnrepository.com/artifact/ch.cern.sparkmeasure/spark-measure) | ||
* Spark 3.0.x and 2.4.x with scala 2.12: | ||
- Scala: `bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17` | ||
- Python: `bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17` | ||
- note: `pip install sparkmeasure` to get the Python wrapper API. | ||
* Spark 2.x with Scala 2.11: | ||
- Scala: `bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.11:0.17` | ||
- Python: `bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.11:0.17` | ||
- note: `pip install sparkmeasure` to get the Python wrapper API. | ||
* Bleeding edge: build sparkMeasure jar using sbt: `sbt +package` and use `--jars` | ||
with the jar just built instead of using `--packages`. | ||
* Note: find the latest jars already built as artifacts in the [GitHub actions](https://github.com/LucaCanali/sparkMeasure/actions) | ||
|
||
- [<img src="https://upload.wikimedia.org/wikipedia/commons/6/63/Databricks_Logo.png" height="40"> Scala notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/2729765977711377/442806354506758/latest.html) | ||
|
||
|
@@ -33,10 +39,10 @@ SparkMeasure is also intended as a working example of how to use Spark Listeners | |
|
||
- [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Jupyter_logo.svg/250px-Jupyter_logo.svg.png" height="50"> Local Python/Jupyter Notebook](examples/SparkMeasure_Jupyter_Python_getting_started.ipynb) | ||
|
||
- CLI: spark-shell and pyspark | ||
- CLI: spark-shell and PySpark | ||
``` | ||
# Scala CLI, Spark 3.0 | ||
bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.12:0.16 | ||
bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 | ||
val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) | ||
stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show()) | ||
|
@@ -57,32 +63,43 @@ Spark Context default degree of parallelism = 8 | |
Aggregated Spark stage metrics: | ||
numStages => 3 | ||
numTasks => 17 | ||
elapsedTime => 14594 (15 s) | ||
stageDuration => 14498 (14 s) | ||
executorRunTime => 108563 (1.8 min) | ||
executorCpuTime => 106613 (1.8 min) | ||
executorDeserializeTime => 4149 (4 s) | ||
executorDeserializeCpuTime => 1025 (1 s) | ||
resultSerializationTime => 1 (1 ms) | ||
jvmGCTime => 64 (64 ms) | ||
elapsedTime => 13520 (14 s) | ||
stageDuration => 13411 (13 s) | ||
executorRunTime => 100020 (1.7 min) | ||
executorCpuTime => 98899 (1.6 min) | ||
executorDeserializeTime => 4358 (4 s) | ||
executorDeserializeCpuTime => 1887 (2 s) | ||
resultSerializationTime => 2 (2 ms) | ||
jvmGCTime => 56 (56 ms) | ||
shuffleFetchWaitTime => 0 (0 ms) | ||
shuffleWriteTime => 15 (15 ms) | ||
shuffleWriteTime => 11 (11 ms) | ||
resultSize => 19955 (19.0 KB) | ||
numUpdatedBlockStatuses => 0 | ||
diskBytesSpilled => 0 (0 Bytes) | ||
memoryBytesSpilled => 0 (0 Bytes) | ||
peakExecutionMemory => 0 | ||
recordsRead => 2000 | ||
bytesRead => 0 (0 Bytes) | ||
recordsWritten => 0 | ||
bytesWritten => 0 (0 Bytes) | ||
shuffleTotalBytesRead => 472 (472 Bytes) | ||
shuffleRecordsRead => 8 | ||
shuffleTotalBlocksFetched => 8 | ||
shuffleLocalBlocksFetched => 8 | ||
shuffleRemoteBlocksFetched => 0 | ||
shuffleTotalBytesRead => 472 (472 Bytes) | ||
shuffleLocalBytesRead => 472 (472 Bytes) | ||
shuffleRemoteBytesRead => 0 (0 Bytes) | ||
shuffleRemoteBytesReadToDisk => 0 (0 Bytes) | ||
shuffleBytesWritten => 472 (472 Bytes) | ||
shuffleRecordsWritten => 8 | ||
``` | ||
- CLI: spark-shell, measure workload metrics aggregating from raw task metrics | ||
``` | ||
# Scala CLI, Spark 3.0 | ||
bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 | ||
val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) | ||
taskMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show()) | ||
``` | ||
|
||
### One tool for different use cases, links to documentation and examples | ||
* **Interactive mode**: use sparkMeasure to collect and analyze Spark workload metrics real-time when | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters