Skip to content

Graph generator txt output format error #171

@Arash-Afshar

Description

@Arash-Afshar

Spark-Bench version (version number, tag, or git commit hash)

spark-bench_2.3.0_0.4.0-RELEASE

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

Spark 2.2.0, Yarn

Scala version on your cluster

Your exact configuration file (with system details anonymized for security)

spark-bench = {
spark-submit-config = [{
spark-args = {
master = "yarn"
executor-memory = 5G
num-executors = 5
}
workload-suites = [
{
descr = "Graph Gen"
benchmark-output = "console"
workloads = [
{
name = "graph-data-generator"
vertices = 1000
output = "hdfs:///one-thousand-vertex-graph.txt"
}
]
}
]
}]
}

Relevant stacktrace

18/04/30 22:21:00 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (**********:40656) with ID 1
18/04/30 22:21:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager **********:40021 with 2.5 GB RAM, BlockManagerId(1, *********, 40021, None)
18/04/30 22:21:15 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
Exception in thread "main" java.lang.Exception: Unrecognized or unspecified save format. Please check the file extension or add a file format to your arguments: Some(hdfs:///one-thousand-vertex-graph.txt)
at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyFormatOrThrow(SparkFuncs.scala:92)
at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyOutput(SparkFuncs.scala:35)
at com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:49)
at com.ibm.sparktc.sparkbench.datageneration.GraphDataGen.run(GraphDataGen.scala:90)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially(SuiteKickoff.scala:98)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:72)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:67)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.run(SuiteKickoff.scala:67)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
at scala.collection.immutable.List.foreach(List.scala:381)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially(MultipleSuiteKickoff.scala:38)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:28)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:25)
at scala.collection.immutable.List.foreach(List.scala:381)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.run(MultipleSuiteKickoff.scala:25)
at com.ibm.sparktc.sparkbench.cli.CLIKickoff$.main(CLIKickoff.scala:30)
at com.ibm.sparktc.sparkbench.cli.CLIKickoff.main(CLIKickoff.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/04/30 22:21:15 INFO spark.SparkContext: Invoking stop() from shutdown hook

Description of your problem and any other relevant info

Despite using "hdfs:///one-thousand-vertex-graph.txt" as output, it complains about incorrect output format:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions