Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPCH DataGen Not working #1157

Open
rajatma1993 opened this issue Dec 10, 2024 · 5 comments
Open

TPCH DataGen Not working #1157

rajatma1993 opened this issue Dec 10, 2024 · 5 comments

Comments

@rajatma1993
Copy link

As per the Instructions we are trying to fallow and generate the data for the TPCH Benchmarking , But the Command provided to run the Datagen is Throwing the Error :

Command : build/sbt "test:runMain com.databricks.spark.sql.perf.tpch.GenTPCHData
-d .
-s 10
-f parquet"

Error :

Using /usr/lib/jvm/java-1.17.0-openjdk-arm64 as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Our attempt to download sbt locally to build/sbt-launch-0.13.18.jar failed. Please install sbt manually from http://www.scala-sbt.org/

Can You Help us on this.

The command we got as we fallowed the next steps for Generating Data in repo -- databricks/park-sql-perf

@viirya
Copy link
Member

viirya commented Dec 10, 2024

Have you tried to install sbt manually?

@rajatma1993
Copy link
Author

rajatma1993 commented Dec 11, 2024

@viirya Nope did not try to install the sbt manually, sbt was already available in the machine we tried this DataGen.

Based on resolutions we tried and we observed the fallowing.

  1. sbt-launch-0.13.18.jar -- Jar File is already expired (2018) , don't be able to download and Use it. Can we use other version Jar files ??
  2. Data Generation Parquet -- We observed that each table will be having 1 parquet file, is that how it requires or we can have multiple parquet part files for Each Table ??

@andygrove
Copy link
Member

Perhaps this is the issue? databricks/spark-sql-perf#217

@rajatma1993
Copy link
Author

rajatma1993 commented Dec 12, 2024

Hi, I tried the Options Provided above , but still the issue is Same.

I am using JDK 17 for this , is this is could be reason ? Is jdk 17 Is compatible for this DataGen or Benchmark ??

Also may I know if I need place the sbt-launch-0.13.18.jar file in any specific location after the steps mentioned in the Above issue ?

@andygrove
Copy link
Member

I don't use the Databricks repo that you are trying to use, so it is difficult to offer advice. It seems like it may no longer be maintained.

Perhaps you could try using the Python scripts provided at https://github.com/apache/datafusion-benchmarks/tree/main/tpch instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants