Hi everyone,
I encountered a weird issue while trying your library. When saving the temp file to gcs, it called the storage api with a weird address: http://google.api.address/null.
I tried debugging through the code to find what was causing the problem and I did not find it, however I solved the issue accidentally.
I wanted to test creating a directory with the security account to see if it was a permission problem, so I added google-cloud-storage in my dependencies because I couldn't import import com.google.cloud.storage.StorageOptions , and this solved the issue...
Is there a way to make this error more explicit? Is it a problem that is global to google and not this library?
Here is the build.sbt to reproduce the error
"com.typesafe.scala-logging" %% "scala-logging" % "3.7.2",
"org.apache.spark" %% "spark-core" % "2.2.0",
"org.apache.spark" %% "spark-sql" % "2.2.0",
"com.google.cloud" % "google-cloud-bigquery" % "0.32.0-beta",
"com.google.cloud.bigdataoss" % "gcs-connector" % "1.6.2-hadoop2",
"com.spotify" % "spark-bigquery_2.11" % "0.2.2",
"org.apache.parquet" % "parquet-avro" % "1.9.0"
And the code
object Main extends App {
implicit val spark = SparkSession
.builder()
.appName("Name")
.master("local[*]")
.config("google.cloud.auth.service.account.json.keyfile", "/path")
.config("fs.gs.project.id", "project-id")
.getOrCreate()
bqSqlContext.bigQuerySelect(s"SELECT * FROM ${tableName} LIMIT 10")
}
And here is the exception:
Exception in thread "main" com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:241)
at com.google.cloud.hadoop.gcsio.BatchHelper.flushIfPossible(BatchHelper.java:118)
at com.google.cloud.hadoop.gcsio.BatchHelper.flush(BatchHelper.java:132)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfos(GoogleCloudStorageImpl.java:1493)
at com.google.cloud.hadoop.gcsio.ForwardingGoogleCloudStorage.getItemInfos(ForwardingGoogleCloudStorage.java:221)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfos(GoogleCloudStorageFileSystem.java:1159)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.mkdirs(GoogleCloudStorageFileSystem.java:530)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.mkdirs(GoogleHadoopFileSystemBase.java:1382)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1819)
at com.google.cloud.hadoop.io.bigquery.AbstractExportToCloudStorage.prepare(AbstractExportToCloudStorage.java:59)
at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:123)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1368)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.first(RDD.scala:1367)
at com.spotify.spark.bigquery.BigQuerySQLContext.bigQueryTable(BigQuerySQLContext.scala:112)
at com.spotify.spark.bigquery.BigQuerySQLContext.bigQuerySelect(BigQuerySQLContext.scala:93)
at com.powerspace.bigquery.BigQueryExporter.read(BigQueryExporter.scala:24)
Cheers
Hi everyone,
I encountered a weird issue while trying your library. When saving the temp file to gcs, it called the storage api with a weird address: http://google.api.address/null.
I tried debugging through the code to find what was causing the problem and I did not find it, however I solved the issue accidentally.
I wanted to test creating a directory with the security account to see if it was a permission problem, so I added
google-cloud-storagein my dependencies because I couldn't importimport com.google.cloud.storage.StorageOptions, and this solved the issue...Is there a way to make this error more explicit? Is it a problem that is global to google and not this library?
Here is the build.sbt to reproduce the error
And the code
And here is the exception:
Cheers