Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Issue with Running Hop Pipeline on Beam Spark Pipeline Engine #4752

Open
Raja10D opened this issue Jan 2, 2025 · 7 comments
Open

Comments

@Raja10D
Copy link

Raja10D commented Jan 2, 2025

Apache Hop version?

2.61.0

Java version?

17

Operating system

Linux

What happened?

I am reaching out to seek your assistance with an issue I am encountering while running a simple Hop pipeline using the Beam Spark pipeline engine. The pipeline executes without any issues when using the default local runner, but it fails when I switch to the Beam Spark engine.

Below are the details of my setup:

Apache Beam Version: 2.61.0
Apache Spark Version: 3.5.1
Apache Hop Version: apache-hop-client-2.11.0
The error logs are as follows:

Error:
2025/01/02 15:30:53 - Hop - Pipeline opened.
2025/01/02 15:30:53 - Hop - Launching pipeline [hop_trans2]...
2025/01/02 15:30:53 - Hop - Started the pipeline execution.
java.lang.NoClassDefFoundError: org/apache/spark/metrics/source/Source
at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
...
2025/01/02 15:30:53 - Hop - ERROR: hop_trans2: preparing pipeline execution failed
2025/01/02 15:30:53 - Hop - ERROR: org.apache.hop.core.exception.HopException:
2025/01/02 15:30:53 - Hop - Error preparing remote pipeline
2025/01/02 15:30:53 - Hop - Error converting Hop pipeline to Beam
2025/01/02 15:30:53 - Hop -
2025/01/02 15:30:53 - Hop - Caused by: java.lang.NoClassDefFoundError: org/apache/spark/metrics/source/Source
2025/01/02 15:30:53 - Hop - at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
...
It appears that the issue is related to the org.apache.spark.metrics.source.Source class not being found.

Issue Priority

Priority: 3

Issue Component

Component: Hop Config

@hansva
Copy link
Contributor

hansva commented Jan 3, 2025

Have you tried the steps from our guide and does it give this error?

https://hop.apache.org//manual/latest/pipeline/beam/beam-samples-spark.html#_get_spark

It seems like a jar might be missing. But it could also be a mismatch on spark/scala versions between Hop and your cluster

@Raja10D
Copy link
Author

Raja10D commented Jan 3, 2025 via email

@Raja10D
Copy link
Author

Raja10D commented Jan 3, 2025

decoders@decoders-Latitude-7480:~/Downloads/apache-hop-client-2.11.0/hop$ spark-submit
--class org.apache.hop.beam.run.MainBeam
--master local[4]
--jars /home/decoders/Downloads/apache-hop-client-2.11.0/hop/lib/beam/beam-runners-spark-3-2.61.0.jar
/home/decoders/Downloads/apache-hop-client-2.11.0/fatjar.jar
--runner=SparkRunner
--sparkMaster=local[4]
--tempLocation=/home/decoders/custom/temp
--pipeline=/home/decoders/Downloads/hop_trans2.hpl
--config=/home/decoders/Downloads/apache-hop-client-2.11.0/hop/config/hop-config.json
25/01/03 17:26:34 WARN Utils: Your hostname, decoders-Latitude-7480 resolves to a loopback address: 127.0.1.1; using 192.125.1.140 instead (on interface enp0s31f6)
25/01/03 17:26:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
25/01/03 17:26:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Initializing Hop
Error running Beam pipeline:
Error reading metadata from JSON

Error find metadata class for key null

The metadata plugin for key null could not be found in the plugin registry

org.apache.hop.core.exception.HopException:
Error reading metadata from JSON

Error find metadata class for key null

The metadata plugin for key null could not be found in the plugin registry

at org.apache.hop.core.metadata.SerializableMetadataProvider.<init>(SerializableMetadataProvider.java:143)
at org.apache.hop.beam.run.MainBeam.main(MainBeam.java:101)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: org.apache.hop.core.exception.HopException:
Error find metadata class for key null

The metadata plugin for key null could not be found in the plugin registry

at org.apache.hop.metadata.serializer.BaseMetadataProvider.getMetadataClassForKey(BaseMetadataProvider.java:69)
at org.apache.hop.core.metadata.SerializableMetadataProvider.<init>(SerializableMetadataProvider.java:122)
... 13 more

Caused by: org.apache.hop.core.exception.HopException:
The metadata plugin for key null could not be found in the plugin registry

at org.apache.hop.metadata.serializer.BaseMetadataProvider.getMetadataClassForKey(BaseMetadataProvider.java:61)
... 14 more

25/01/03 17:26:51 INFO ShutdownHookManager: Shutdown hook called
25/01/03 17:26:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-f18e078a-acef-43c4-9b67-72880678ee49
decoders@decoders-Latitude-7480:~/Downloads/apache-hop-client-2.11.0/hop$

This is the issue we are facing. The documentation you shared is that we are following
Any ways to debug this issue
Can you give suggestions for running on Spark

@hansva
Copy link
Contributor

hansva commented Jan 3, 2025

seems to be working here...

Downloads/spark-3.5.4-bin-hadoop3/bin on ☁️  (eu-west-1) on ☁️  [email protected] took 16s
❯ ./spark-submit \
  --master spark://MacBook-Hans.local:7077 \
  --class org.apache.hop.beam.run.MainBeam \
  --driver-java-options '-DPROJECT_HOME=/Users/hans/git/hop/assemblies/client/target/hop/config/projects/samples' \
  /Users/hans/git/hop/assemblies/client/target/hop/fat-jar.jar \
 /Users/hans/git/hop/assemblies/client/target/hop/config/projects/samples/beam/pipelines/input-process-output.hpl \
  /Users/hans/git/hop/assemblies/client/target/hop/metadata.json \
  Spark
25/01/03 13:16:56 WARN Utils: Your hostname, MacBook-Hans.local resolves to a loopback address: 127.0.0.1; using 192.168.86.198 instead (on interface en0)
25/01/03 13:16:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
>>>>>> Initializing Hop
Hop configuration file not found, not serializing: /Users/hans/Downloads/spark-3.5.4-bin-hadoop3/bin/config/hop-config.json
Argument 1 : Pipeline filename (.hpl)   : /Users/hans/git/hop/assemblies/client/target/hop/config/projects/samples/beam/pipelines/input-process-output.hpl
Argument 2 : Environment state filename: (.json)  : /Users/hans/git/hop/assemblies/client/target/hop/metadata.json
Argument 3 : Pipeline run configuration : Spark
>>>>>> Loading pipeline metadata
>>>>>> Building Apache Beam Pipeline...
>>>>>> Pipeline executing starting...
2025/01/03 13:17:01 - General - Created Apache Beam pipeline with name 'input-process-output'
2025/01/03 13:17:01 - General - Handled transform (INPUT) : Customers
2025/01/03 13:17:01 - General - Handled generic transform (TRANSFORM) : Only CA, gets data from 1 previous transform(s), targets=0, infos=0
2025/01/03 13:17:01 - General - Handled generic transform (TRANSFORM) : Limit fields, re-order, gets data from 1 previous transform(s), targets=0, infos=0
2025/01/03 13:17:01 - General - Handled transform (OUTPUT) : input-process-output, gets data from Limit fields, re-order
2025/01/03 13:17:01 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Spark'
25/01/03 13:17:01 INFO SparkRunner: Executing pipeline using the SparkRunner.
25/01/03 13:17:01 INFO SparkContextFactory: Creating a brand new Spark Context.
25/01/03 13:17:01 INFO SparkContext: Running Spark version 3.5.4
25/01/03 13:17:01 INFO SparkContext: OS info Mac OS X, 15.2, aarch64
25/01/03 13:17:01 INFO SparkContext: Java version 17.0.11
25/01/03 13:17:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/01/03 13:17:02 INFO ResourceUtils: ==============================================================
25/01/03 13:17:02 INFO ResourceUtils: No custom resources configured for spark.driver.
25/01/03 13:17:02 INFO ResourceUtils: ==============================================================
25/01/03 13:17:02 INFO SparkContext: Submitted application: BeamSparkPipelineRunConfiguration
25/01/03 13:17:02 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
25/01/03 13:17:02 INFO ResourceProfile: Limiting resource is cpu
25/01/03 13:17:02 INFO ResourceProfileManager: Added ResourceProfile id: 0
25/01/03 13:17:02 INFO SecurityManager: Changing view acls to: hans
25/01/03 13:17:02 INFO SecurityManager: Changing modify acls to: hans
25/01/03 13:17:02 INFO SecurityManager: Changing view acls groups to:
25/01/03 13:17:02 INFO SecurityManager: Changing modify acls groups to:
25/01/03 13:17:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: hans; groups with view permissions: EMPTY; users with modify permissions: hans; groups with modify permissions: EMPTY
25/01/03 13:17:02 INFO Utils: Successfully started service 'sparkDriver' on port 55057.
25/01/03 13:17:02 INFO SparkEnv: Registering MapOutputTracker
25/01/03 13:17:02 INFO SparkEnv: Registering BlockManagerMaster
25/01/03 13:17:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
25/01/03 13:17:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
25/01/03 13:17:02 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/03 13:17:02 INFO DiskBlockManager: Created local directory at /private/var/folders/v0/hjm6kpgd0076ncdkpf8rg3dw0000gn/T/blockmgr-58d26b5c-d34a-4682-8fe7-b5beb2ab0873
25/01/03 13:17:02 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
25/01/03 13:17:02 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/03 13:17:02 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
25/01/03 13:17:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
25/01/03 13:17:02 INFO SparkContext: Added JAR /Users/hans/git/hop/assemblies/client/target/hop/fat-jar.jar at spark://192.168.86.198:55057/jars/fat-jar.jar with timestamp 1735906621944
25/01/03 13:17:02 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://MacBook-Hans.local:7077...
25/01/03 13:17:02 INFO TransportClientFactory: Successfully created connection to MacBook-Hans.local/127.0.0.1:7077 after 14 ms (0 ms spent in bootstraps)
25/01/03 13:17:02 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20250103131702-0002
25/01/03 13:17:02 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20250103131702-0002/0 on worker-20250103125214-192.168.86.198-54182 (192.168.86.198:54182) with 12 core(s)
25/01/03 13:17:02 INFO StandaloneSchedulerBackend: Granted executor ID app-20250103131702-0002/0 on hostPort 192.168.86.198:54182 with 12 core(s), 1024.0 MiB RAM
25/01/03 13:17:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55059.
25/01/03 13:17:02 INFO NettyBlockTransferService: Server created on 192.168.86.198:55059
25/01/03 13:17:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
25/01/03 13:17:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.86.198, 55059, None)
25/01/03 13:17:02 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.86.198:55059 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.86.198, 55059, None)
25/01/03 13:17:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.86.198, 55059, None)
25/01/03 13:17:02 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.86.198, 55059, None)
25/01/03 13:17:02 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20250103131702-0002/0 is now RUNNING
25/01/03 13:17:02 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Entering directly-translatable composite transform: 'BeamOutputTransform/TextIO.Write/WriteFiles/GatherTempFileResults/Consolidate/Reshuffle'
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Entering directly-translatable composite transform: 'BeamOutputTransform/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle'
25/01/03 13:17:02 INFO MetricsAccumulator: Instantiated metrics accumulator: MetricQueryResults()
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating Read(CompressedSource)
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.hop.beam.core.fn.StringToHopFn@7aa06fac
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.hop.beam.core.transform.TransformFn@2ea82cef
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.hop.beam.core.transform.TransformFn@5cea34e6
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.hop.beam.core.fn.HopToStringFn@526fbb80
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating Window.Assign
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesFn@2162e4a
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating GroupByKey
25/01/03 13:17:02 INFO FileBasedSource: Splitting filepattern /Users/hans/git/hop/assemblies/client/target/hop/config/projects/samples/beam/input/customers-noheader-1k.txt into bundles of size 50000 took 3 ms and produced 1 files and 2 bundles
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn@3199c2c1
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.io.WriteFiles$WriteUnshardedBundlesToTempFiles$1@7ae736a8
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating Flatten.PCollections
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.Reshuffle$AssignShardFn@4623c0d3
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Entering directly-translatable composite transform: 'BeamOutputTransform/TextIO.Write/WriteFiles/GatherTempFileResults/Consolidate/Reshuffle'
25/01/03 13:17:02 INFO SparkRunner$Evaluator: Evaluating Reshuffle
25/01/03 13:17:03 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.MapElements$2@2182ebc7
25/01/03 13:17:03 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.MapElements$2@447cdbaa
25/01/03 13:17:03 INFO SparkRunner$Evaluator: Evaluating View.CreatePCollectionView
25/01/03 13:17:03 INFO SparkContext: Starting job: collect at BoundedDataset.java:96
25/01/03 13:17:03 INFO DAGScheduler: Registering RDD 20 (mapToPair at GroupNonMergingWindowsFunctions.java:273) as input to shuffle 1
25/01/03 13:17:03 INFO DAGScheduler: Registering RDD 37 (repartition at GroupCombineFunctions.java:191) as input to shuffle 0
25/01/03 13:17:03 INFO DAGScheduler: Got job 0 (collect at BoundedDataset.java:96) with 4 output partitions
25/01/03 13:17:03 INFO DAGScheduler: Final stage: ResultStage 2 (collect at BoundedDataset.java:96)
25/01/03 13:17:03 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
25/01/03 13:17:03 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 1)
25/01/03 13:17:03 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[20] at mapToPair at GroupNonMergingWindowsFunctions.java:273), which has no missing parents
25/01/03 13:17:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 215.7 KiB, free 434.2 MiB)
25/01/03 13:17:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 69.0 KiB, free 434.1 MiB)
25/01/03 13:17:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.86.198:55059 (size: 69.0 KiB, free: 434.3 MiB)
25/01/03 13:17:03 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:03 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[20] at mapToPair at GroupNonMergingWindowsFunctions.java:273) (first 15 tasks are for partitions Vector(0, 1))
25/01/03 13:17:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks resource profile 0
25/01/03 13:17:03 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.86.198:55061) with ID 0,  ResourceProfileId 0
25/01/03 13:17:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.86.198:55063 with 434.4 MiB RAM, BlockManagerId(0, 192.168.86.198, 55063, None)
25/01/03 13:17:05 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 10914 bytes)
25/01/03 13:17:05 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (192.168.86.198, executor 0, partition 1, PROCESS_LOCAL, 10914 bytes)
25/01/03 13:17:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.86.198:55063 (size: 69.0 KiB, free: 434.3 MiB)
25/01/03 13:17:10 INFO BlockManagerInfo: Added rdd_13_1 in memory on 192.168.86.198:55063 (size: 616.0 B, free: 434.3 MiB)
25/01/03 13:17:10 INFO BlockManagerInfo: Added rdd_13_0 in memory on 192.168.86.198:55063 (size: 616.0 B, free: 434.3 MiB)
25/01/03 13:17:10 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 5259 ms on 192.168.86.198 (executor 0) (1/2)
25/01/03 13:17:10 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 5267 ms on 192.168.86.198 (executor 0) (2/2)
25/01/03 13:17:10 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
25/01/03 13:17:10 INFO DAGScheduler: ShuffleMapStage 0 (mapToPair at GroupNonMergingWindowsFunctions.java:273) finished in 7,352 s
25/01/03 13:17:10 INFO DAGScheduler: looking for newly runnable stages
25/01/03 13:17:10 INFO DAGScheduler: running: Set()
25/01/03 13:17:10 INFO DAGScheduler: waiting: Set(ShuffleMapStage 1, ResultStage 2)
25/01/03 13:17:10 INFO DAGScheduler: failed: Set()
25/01/03 13:17:10 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[37] at repartition at GroupCombineFunctions.java:191), which has no missing parents
25/01/03 13:17:10 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 225.4 KiB, free 433.9 MiB)
25/01/03 13:17:10 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 72.3 KiB, free 433.8 MiB)
25/01/03 13:17:10 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.86.198:55059 (size: 72.3 KiB, free: 434.3 MiB)
25/01/03 13:17:10 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:10 INFO DAGScheduler: Submitting 4 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[37] at repartition at GroupCombineFunctions.java:191) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
25/01/03 13:17:10 INFO TaskSchedulerImpl: Adding task set 1.0 with 4 tasks resource profile 0
25/01/03 13:17:10 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 11023 bytes)
25/01/03 13:17:10 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3) (192.168.86.198, executor 0, partition 1, PROCESS_LOCAL, 11023 bytes)
25/01/03 13:17:10 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 4) (192.168.86.198, executor 0, partition 2, PROCESS_LOCAL, 9091 bytes)
25/01/03 13:17:10 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 5) (192.168.86.198, executor 0, partition 3, PROCESS_LOCAL, 9091 bytes)
25/01/03 13:17:10 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.86.198:55063 (size: 72.3 KiB, free: 434.3 MiB)
25/01/03 13:17:10 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 192.168.86.198:55061
25/01/03 13:17:10 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 85 ms on 192.168.86.198 (executor 0) (1/4)
25/01/03 13:17:10 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 84 ms on 192.168.86.198 (executor 0) (2/4)
25/01/03 13:17:10 INFO BlockManagerInfo: Added rdd_24_1 in memory on 192.168.86.198:55063 (size: 16.0 B, free: 434.3 MiB)
25/01/03 13:17:10 INFO BlockManagerInfo: Added rdd_24_0 in memory on 192.168.86.198:55063 (size: 16.0 B, free: 434.3 MiB)
25/01/03 13:17:10 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 5) in 499 ms on 192.168.86.198 (executor 0) (3/4)
25/01/03 13:17:10 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 4) in 499 ms on 192.168.86.198 (executor 0) (4/4)
25/01/03 13:17:10 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
25/01/03 13:17:10 INFO DAGScheduler: ShuffleMapStage 1 (repartition at GroupCombineFunctions.java:191) finished in 0,514 s
25/01/03 13:17:10 INFO DAGScheduler: looking for newly runnable stages
25/01/03 13:17:10 INFO DAGScheduler: running: Set()
25/01/03 13:17:10 INFO DAGScheduler: waiting: Set(ResultStage 2)
25/01/03 13:17:10 INFO DAGScheduler: failed: Set()
25/01/03 13:17:10 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[48] at map at BoundedDataset.java:95), which has no missing parents
25/01/03 13:17:10 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 21.4 KiB, free 433.8 MiB)
25/01/03 13:17:10 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 9.0 KiB, free 433.8 MiB)
25/01/03 13:17:10 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.86.198:55059 (size: 9.0 KiB, free: 434.3 MiB)
25/01/03 13:17:10 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:10 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 2 (MapPartitionsRDD[48] at map at BoundedDataset.java:95) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
25/01/03 13:17:10 INFO TaskSchedulerImpl: Adding task set 2.0 with 4 tasks resource profile 0
25/01/03 13:17:10 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 6) (192.168.86.198, executor 0, partition 1, NODE_LOCAL, 9269 bytes)
25/01/03 13:17:10 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 7) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:10 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 8) (192.168.86.198, executor 0, partition 2, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:10 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 9) (192.168.86.198, executor 0, partition 3, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:10 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.86.198:55063 (size: 9.0 KiB, free: 434.3 MiB)
25/01/03 13:17:11 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.86.198:55061
25/01/03 13:17:11 INFO BlockManagerInfo: Added rdd_44_0 in memory on 192.168.86.198:55063 (size: 16.0 B, free: 434.3 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added rdd_44_3 in memory on 192.168.86.198:55063 (size: 16.0 B, free: 434.3 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added rdd_44_2 in memory on 192.168.86.198:55063 (size: 16.0 B, free: 434.3 MiB)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 9) in 62 ms on 192.168.86.198 (executor 0) (1/4)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 8) in 62 ms on 192.168.86.198 (executor 0) (2/4)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 7) in 63 ms on 192.168.86.198 (executor 0) (3/4)
25/01/03 13:17:11 INFO BlockManagerInfo: Added rdd_44_1 in memory on 192.168.86.198:55063 (size: 808.0 B, free: 434.3 MiB)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 6) in 80 ms on 192.168.86.198 (executor 0) (4/4)
25/01/03 13:17:11 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
25/01/03 13:17:11 INFO DAGScheduler: ResultStage 2 (collect at BoundedDataset.java:96) finished in 0,087 s
25/01/03 13:17:11 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
25/01/03 13:17:11 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
25/01/03 13:17:11 INFO DAGScheduler: Job 0 finished: collect at BoundedDataset.java:96, took 8,010176 s
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating Impulse
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.MapElements$2@685e8e17
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.Reify$ReifyView$1@5d6e77a4
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 448.0 B, free 433.8 MiB)
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 330.0 B, free 433.8 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.86.198:55059 (size: 330.0 B, free: 434.3 MiB)
25/01/03 13:17:11 INFO SparkContext: Created broadcast 3 from broadcast at SideInputBroadcast.java:62
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.MapElements$2@72139933
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.MapElements$2@4f0b02a3
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn@280d1d8d
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.Reshuffle$AssignShardFn@5abfb698
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Entering directly-translatable composite transform: 'BeamOutputTransform/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle'
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating Reshuffle
25/01/03 13:17:11 INFO SparkRunner$Evaluator: Evaluating org.apache.beam.sdk.transforms.MapElements$2@64a0a1c6
25/01/03 13:17:11 INFO SparkContext: Starting job: foreach at BoundedDataset.java:127
25/01/03 13:17:11 INFO DAGScheduler: Got job 1 (foreach at BoundedDataset.java:127) with 2 output partitions
25/01/03 13:17:11 INFO DAGScheduler: Final stage: ResultStage 3 (foreach at BoundedDataset.java:127)
25/01/03 13:17:11 INFO DAGScheduler: Parents of final stage: List()
25/01/03 13:17:11 INFO DAGScheduler: Missing parents: List()
25/01/03 13:17:11 INFO DAGScheduler: Submitting ResultStage 3 (BeamOutputTransform/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnshardedBundles.out2 MapPartitionsRDD[19] at values at TransformTranslator.java:472), which has no missing parents
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 213.7 KiB, free 433.6 MiB)
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 68.1 KiB, free 433.5 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.86.198:55059 (size: 68.1 KiB, free: 434.2 MiB)
25/01/03 13:17:11 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:11 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (BeamOutputTransform/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnshardedBundles.out2 MapPartitionsRDD[19] at values at TransformTranslator.java:472) (first 15 tasks are for partitions Vector(0, 1))
25/01/03 13:17:11 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks resource profile 0
25/01/03 13:17:11 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 10) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 10925 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 11) (192.168.86.198, executor 0, partition 1, PROCESS_LOCAL, 10925 bytes)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.86.198:55063 (size: 68.1 KiB, free: 434.2 MiB)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 11) in 27 ms on 192.168.86.198 (executor 0) (1/2)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 10) in 28 ms on 192.168.86.198 (executor 0) (2/2)
25/01/03 13:17:11 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
25/01/03 13:17:11 INFO DAGScheduler: ResultStage 3 (foreach at BoundedDataset.java:127) finished in 0,033 s
25/01/03 13:17:11 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
25/01/03 13:17:11 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
25/01/03 13:17:11 INFO DAGScheduler: Job 1 finished: foreach at BoundedDataset.java:127, took 0,035680 s
25/01/03 13:17:11 INFO SparkContext: Starting job: foreach at BoundedDataset.java:127
25/01/03 13:17:11 INFO DAGScheduler: Got job 2 (foreach at BoundedDataset.java:127) with 2 output partitions
25/01/03 13:17:11 INFO DAGScheduler: Final stage: ResultStage 5 (foreach at BoundedDataset.java:127)
25/01/03 13:17:11 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 4)
25/01/03 13:17:11 INFO DAGScheduler: Missing parents: List()
25/01/03 13:17:11 INFO DAGScheduler: Submitting ResultStage 5 (BeamOutputTransform/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnwritten.out1 MapPartitionsRDD[28] at values at TransformTranslator.java:472), which has no missing parents
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 221.9 KiB, free 433.3 MiB)
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 71.3 KiB, free 433.2 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.86.198:55059 (size: 71.3 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:11 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (BeamOutputTransform/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnwritten.out1 MapPartitionsRDD[28] at values at TransformTranslator.java:472) (first 15 tasks are for partitions Vector(0, 1))
25/01/03 13:17:11 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks resource profile 0
25/01/03 13:17:11 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 12) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 8993 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 13) (192.168.86.198, executor 0, partition 1, PROCESS_LOCAL, 8993 bytes)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.86.198:55063 (size: 71.3 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 13) in 18 ms on 192.168.86.198 (executor 0) (1/2)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 12) in 18 ms on 192.168.86.198 (executor 0) (2/2)
25/01/03 13:17:11 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
25/01/03 13:17:11 INFO DAGScheduler: ResultStage 5 (foreach at BoundedDataset.java:127) finished in 0,024 s
25/01/03 13:17:11 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job
25/01/03 13:17:11 INFO TaskSchedulerImpl: Killing all running tasks in stage 5: Stage finished
25/01/03 13:17:11 INFO DAGScheduler: Job 2 finished: foreach at BoundedDataset.java:127, took 0,027143 s
25/01/03 13:17:11 INFO SparkContext: Starting job: foreach at BoundedDataset.java:127
25/01/03 13:17:11 INFO DAGScheduler: Got job 3 (foreach at BoundedDataset.java:127) with 4 output partitions
25/01/03 13:17:11 INFO DAGScheduler: Final stage: ResultStage 8 (foreach at BoundedDataset.java:127)
25/01/03 13:17:11 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 7)
25/01/03 13:17:11 INFO DAGScheduler: Missing parents: List()
25/01/03 13:17:11 INFO DAGScheduler: Submitting ResultStage 8 (BeamOutputTransform/TextIO.Write/WriteFiles/GatherTempFileResults/View.AsIterable/MapElements/Map/ParMultiDo(Anonymous).output MapPartitionsRDD[47] at values at TransformTranslator.java:472), which has no missing parents
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 22.5 KiB, free 433.2 MiB)
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 9.4 KiB, free 433.2 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 192.168.86.198:55059 (size: 9.4 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:11 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 8 (BeamOutputTransform/TextIO.Write/WriteFiles/GatherTempFileResults/View.AsIterable/MapElements/Map/ParMultiDo(Anonymous).output MapPartitionsRDD[47] at values at TransformTranslator.java:472) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
25/01/03 13:17:11 INFO TaskSchedulerImpl: Adding task set 8.0 with 4 tasks resource profile 0
25/01/03 13:17:11 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 14) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID 15) (192.168.86.198, executor 0, partition 1, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 2.0 in stage 8.0 (TID 16) (192.168.86.198, executor 0, partition 2, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 3.0 in stage 8.0 (TID 17) (192.168.86.198, executor 0, partition 3, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 192.168.86.198:55063 (size: 9.4 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 2.0 in stage 8.0 (TID 16) in 17 ms on 192.168.86.198 (executor 0) (1/4)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID 14) in 19 ms on 192.168.86.198 (executor 0) (2/4)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 3.0 in stage 8.0 (TID 17) in 18 ms on 192.168.86.198 (executor 0) (3/4)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 1.0 in stage 8.0 (TID 15) in 18 ms on 192.168.86.198 (executor 0) (4/4)
25/01/03 13:17:11 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool
25/01/03 13:17:11 INFO DAGScheduler: ResultStage 8 (foreach at BoundedDataset.java:127) finished in 0,023 s
25/01/03 13:17:11 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job
25/01/03 13:17:11 INFO TaskSchedulerImpl: Killing all running tasks in stage 8: Stage finished
25/01/03 13:17:11 INFO DAGScheduler: Job 3 finished: foreach at BoundedDataset.java:127, took 0,027300 s
25/01/03 13:17:11 INFO SparkContext: Starting job: foreach at BoundedDataset.java:127
25/01/03 13:17:11 INFO DAGScheduler: Registering RDD 70 (repartition at GroupCombineFunctions.java:191) as input to shuffle 2
25/01/03 13:17:11 INFO DAGScheduler: Got job 4 (foreach at BoundedDataset.java:127) with 12 output partitions
25/01/03 13:17:11 INFO DAGScheduler: Final stage: ResultStage 10 (foreach at BoundedDataset.java:127)
25/01/03 13:17:11 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 9)
25/01/03 13:17:11 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 9)
25/01/03 13:17:11 INFO DAGScheduler: Submitting ShuffleMapStage 9 (MapPartitionsRDD[70] at repartition at GroupCombineFunctions.java:191), which has no missing parents
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 32.5 KiB, free 433.2 MiB)
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 13.1 KiB, free 433.2 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 192.168.86.198:55059 (size: 13.1 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:11 INFO DAGScheduler: Submitting 12 missing tasks from ShuffleMapStage 9 (MapPartitionsRDD[70] at repartition at GroupCombineFunctions.java:191) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11))
25/01/03 13:17:11 INFO TaskSchedulerImpl: Adding task set 9.0 with 12 tasks resource profile 0
25/01/03 13:17:11 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 18) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 1.0 in stage 9.0 (TID 19) (192.168.86.198, executor 0, partition 1, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 2.0 in stage 9.0 (TID 20) (192.168.86.198, executor 0, partition 2, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 3.0 in stage 9.0 (TID 21) (192.168.86.198, executor 0, partition 3, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 4.0 in stage 9.0 (TID 22) (192.168.86.198, executor 0, partition 4, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 5.0 in stage 9.0 (TID 23) (192.168.86.198, executor 0, partition 5, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 6.0 in stage 9.0 (TID 24) (192.168.86.198, executor 0, partition 6, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 7.0 in stage 9.0 (TID 25) (192.168.86.198, executor 0, partition 7, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 8.0 in stage 9.0 (TID 26) (192.168.86.198, executor 0, partition 8, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 9.0 in stage 9.0 (TID 27) (192.168.86.198, executor 0, partition 9, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 10.0 in stage 9.0 (TID 28) (192.168.86.198, executor 0, partition 10, PROCESS_LOCAL, 9160 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 11.0 in stage 9.0 (TID 29) (192.168.86.198, executor 0, partition 11, PROCESS_LOCAL, 9171 bytes)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 192.168.86.198:55063 (size: 13.1 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 3.0 in stage 9.0 (TID 21) in 52 ms on 192.168.86.198 (executor 0) (1/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 10.0 in stage 9.0 (TID 28) in 50 ms on 192.168.86.198 (executor 0) (2/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 5.0 in stage 9.0 (TID 23) in 53 ms on 192.168.86.198 (executor 0) (3/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 9.0 in stage 9.0 (TID 27) in 51 ms on 192.168.86.198 (executor 0) (4/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 2.0 in stage 9.0 (TID 20) in 53 ms on 192.168.86.198 (executor 0) (5/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 8.0 in stage 9.0 (TID 26) in 51 ms on 192.168.86.198 (executor 0) (6/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 1.0 in stage 9.0 (TID 19) in 54 ms on 192.168.86.198 (executor 0) (7/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 7.0 in stage 9.0 (TID 25) in 53 ms on 192.168.86.198 (executor 0) (8/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 6.0 in stage 9.0 (TID 24) in 55 ms on 192.168.86.198 (executor 0) (9/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 4.0 in stage 9.0 (TID 22) in 55 ms on 192.168.86.198 (executor 0) (10/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 0.0 in stage 9.0 (TID 18) in 57 ms on 192.168.86.198 (executor 0) (11/12)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.86.198:55063 (size: 330.0 B, free: 434.1 MiB)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 11.0 in stage 9.0 (TID 29) in 107 ms on 192.168.86.198 (executor 0) (12/12)
25/01/03 13:17:11 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool
25/01/03 13:17:11 INFO DAGScheduler: ShuffleMapStage 9 (repartition at GroupCombineFunctions.java:191) finished in 0,115 s
25/01/03 13:17:11 INFO DAGScheduler: looking for newly runnable stages
25/01/03 13:17:11 INFO DAGScheduler: running: Set()
25/01/03 13:17:11 INFO DAGScheduler: waiting: Set(ResultStage 10)
25/01/03 13:17:11 INFO DAGScheduler: failed: Set()
25/01/03 13:17:11 INFO DAGScheduler: Submitting ResultStage 10 (BeamOutputTransform/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Values/Values/Map/ParMultiDo(Anonymous).output MapPartitionsRDD[77] at values at TransformTranslator.java:472), which has no missing parents
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 21.2 KiB, free 433.1 MiB)
25/01/03 13:17:11 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 8.9 KiB, free 433.1 MiB)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 192.168.86.198:55059 (size: 8.9 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1585
25/01/03 13:17:11 INFO DAGScheduler: Submitting 12 missing tasks from ResultStage 10 (BeamOutputTransform/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Values/Values/Map/ParMultiDo(Anonymous).output MapPartitionsRDD[77] at values at TransformTranslator.java:472) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11))
25/01/03 13:17:11 INFO TaskSchedulerImpl: Adding task set 10.0 with 12 tasks resource profile 0
25/01/03 13:17:11 INFO TaskSetManager: Starting task 3.0 in stage 10.0 (TID 30) (192.168.86.198, executor 0, partition 3, NODE_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 4.0 in stage 10.0 (TID 31) (192.168.86.198, executor 0, partition 4, NODE_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 32) (192.168.86.198, executor 0, partition 0, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 1.0 in stage 10.0 (TID 33) (192.168.86.198, executor 0, partition 1, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 2.0 in stage 10.0 (TID 34) (192.168.86.198, executor 0, partition 2, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 5.0 in stage 10.0 (TID 35) (192.168.86.198, executor 0, partition 5, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 6.0 in stage 10.0 (TID 36) (192.168.86.198, executor 0, partition 6, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 7.0 in stage 10.0 (TID 37) (192.168.86.198, executor 0, partition 7, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 8.0 in stage 10.0 (TID 38) (192.168.86.198, executor 0, partition 8, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 9.0 in stage 10.0 (TID 39) (192.168.86.198, executor 0, partition 9, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 10.0 in stage 10.0 (TID 40) (192.168.86.198, executor 0, partition 10, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO TaskSetManager: Starting task 11.0 in stage 10.0 (TID 41) (192.168.86.198, executor 0, partition 11, PROCESS_LOCAL, 9269 bytes)
25/01/03 13:17:11 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 192.168.86.198:55063 (size: 8.9 KiB, free: 434.1 MiB)
25/01/03 13:17:11 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 192.168.86.198:55061
25/01/03 13:17:11 INFO TaskSetManager: Finished task 6.0 in stage 10.0 (TID 36) in 21 ms on 192.168.86.198 (executor 0) (1/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 3.0 in stage 10.0 (TID 30) in 23 ms on 192.168.86.198 (executor 0) (2/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 5.0 in stage 10.0 (TID 35) in 22 ms on 192.168.86.198 (executor 0) (3/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 1.0 in stage 10.0 (TID 33) in 23 ms on 192.168.86.198 (executor 0) (4/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 10.0 in stage 10.0 (TID 40) in 21 ms on 192.168.86.198 (executor 0) (5/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 0.0 in stage 10.0 (TID 32) in 24 ms on 192.168.86.198 (executor 0) (6/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 11.0 in stage 10.0 (TID 41) in 22 ms on 192.168.86.198 (executor 0) (7/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 2.0 in stage 10.0 (TID 34) in 24 ms on 192.168.86.198 (executor 0) (8/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 9.0 in stage 10.0 (TID 39) in 24 ms on 192.168.86.198 (executor 0) (9/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 8.0 in stage 10.0 (TID 38) in 24 ms on 192.168.86.198 (executor 0) (10/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 7.0 in stage 10.0 (TID 37) in 24 ms on 192.168.86.198 (executor 0) (11/12)
25/01/03 13:17:11 INFO TaskSetManager: Finished task 4.0 in stage 10.0 (TID 31) in 25 ms on 192.168.86.198 (executor 0) (12/12)
25/01/03 13:17:11 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool
25/01/03 13:17:11 INFO DAGScheduler: ResultStage 10 (foreach at BoundedDataset.java:127) finished in 0,030 s
25/01/03 13:17:11 INFO DAGScheduler: Job 4 is finished. Cancelling potential speculative or zombie tasks for this job
25/01/03 13:17:11 INFO TaskSchedulerImpl: Killing all running tasks in stage 10: Stage finished
25/01/03 13:17:11 INFO DAGScheduler: Job 4 finished: foreach at BoundedDataset.java:127, took 0,152777 s
25/01/03 13:17:11 INFO SparkRunner: Batch pipeline execution complete.
25/01/03 13:17:11 INFO SparkContext: SparkContext is stopping with exitCode 0.
25/01/03 13:17:11 INFO SparkUI: Stopped Spark web UI at http://192.168.86.198:4040
25/01/03 13:17:11 INFO StandaloneSchedulerBackend: Shutting down all executors
25/01/03 13:17:11 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Asking each executor to shut down
25/01/03 13:17:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
25/01/03 13:17:11 INFO MemoryStore: MemoryStore cleared
25/01/03 13:17:11 INFO BlockManager: BlockManager stopped
25/01/03 13:17:11 INFO BlockManagerMaster: BlockManagerMaster stopped
25/01/03 13:17:11 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
25/01/03 13:17:11 INFO SparkContext: Successfully stopped SparkContext
2025/01/03 13:17:11 - General - Beam pipeline execution has finished.
>>>>>> Execution finished...
25/01/03 13:17:11 INFO ShutdownHookManager: Shutdown hook called
25/01/03 13:17:11 INFO ShutdownHookManager: Deleting directory /private/var/folders/v0/hjm6kpgd0076ncdkpf8rg3dw0000gn/T/spark-5a592189-8383-44ae-8574-70c2aaabba19
25/01/03 13:17:11 INFO ShutdownHookManager: Deleting directory /private/var/folders/v0/hjm6kpgd0076ncdkpf8rg3dw0000gn/T/spark-611dd39c-8451-4aed-9c91-ea906a4e9e04

@hansva
Copy link
Contributor

hansva commented Jan 3, 2025

make sure to also follow the steps here your metadata.json is wrong. It are also arguments to the fat-jar so no need to add --pipeline and things like that

@Raja10D
Copy link
Author

Raja10D commented Jan 3, 2025

decoders@decoders-Latitude-7480:~/Downloads/apache-hop-client-2.11.0/hop$ spark-submit
--master local[4]
--class org.apache.hop.beam.run.MainBeam
--driver-java-options '-DPROJECT_HOME=/home/decoders/Downloads/apache-hop-client-2.11.0/hop/config/projects/default'
/home/decoders/Downloads/apache-hop-client-2.11.0/fatjar.jar
--pipeline=/home/decoders/Downloads/apache-hop-client-2.11.0/hop/config/projects/default/beam/pipeline/hop_trans2.hpl
--config=/home/decoders/Downloads/apache-hop-client-2.11.0/hop/config/hop-config.json
--metadata=/home/decoders/Downloads/apache-hop-client-2.11.0/hop/config/projects/default/project-config.json
Spark
25/01/03 18:45:55 WARN Utils: Your hostname, decoders-Latitude-7480 resolves to a loopback address: 127.0.1.1; using 192.125.1.140 instead (on interface enp0s31f6)
25/01/03 18:45:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
25/01/03 18:46:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Initializing Hop
log4j:WARN No appenders could be found for logger (org.apache.commons.vfs2.impl.DefaultFileSystemManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error running Beam pipeline:
Error reading metadata from JSON

Error find metadata class for key null

The metadata plugin for key null could not be found in the plugin registry

org.apache.hop.core.exception.HopException:
Error reading metadata from JSON

Error find metadata class for key null

The metadata plugin for key null could not be found in the plugin registry

at org.apache.hop.core.metadata.SerializableMetadataProvider.<init>(SerializableMetadataProvider.java:143)
at org.apache.hop.beam.run.MainBeam.main(MainBeam.java:101)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: org.apache.hop.core.exception.HopException:
Error find metadata class for key null

The metadata plugin for key null could not be found in the plugin registry

at org.apache.hop.metadata.serializer.BaseMetadataProvider.getMetadataClassForKey(BaseMetadataProvider.java:69)
at org.apache.hop.core.metadata.SerializableMetadataProvider.<init>(SerializableMetadataProvider.java:122)
... 13 more

Caused by: org.apache.hop.core.exception.HopException:
The metadata plugin for key null could not be found in the plugin registry

at org.apache.hop.metadata.serializer.BaseMetadataProvider.getMetadataClassForKey(BaseMetadataProvider.java:61)
... 14 more

This is the issue I'm Facing
My pc is ubuntu

@hansva
Copy link
Contributor

hansva commented Jan 3, 2025

The metadata is wrong you need to export it from the client it is not the same as the project-config.json as explained in the second link I sent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants