Skip to content
This repository was archived by the owner on Dec 15, 2021. It is now read-only.
This repository was archived by the owner on Dec 15, 2021. It is now read-only.

404 Not found exception when creating gcs directories #56

@nicolasblaye

Description

@nicolasblaye

Hi everyone,

I encountered a weird issue while trying your library. When saving the temp file to gcs, it called the storage api with a weird address: http://google.api.address/null.

I tried debugging through the code to find what was causing the problem and I did not find it, however I solved the issue accidentally.

I wanted to test creating a directory with the security account to see if it was a permission problem, so I added google-cloud-storage in my dependencies because I couldn't import import com.google.cloud.storage.StorageOptions , and this solved the issue...

Is there a way to make this error more explicit? Is it a problem that is global to google and not this library?

Here is the build.sbt to reproduce the error

 "com.typesafe.scala-logging" %% "scala-logging" % "3.7.2",
  "org.apache.spark" %% "spark-core" % "2.2.0",
  "org.apache.spark" %% "spark-sql" % "2.2.0",
  "com.google.cloud" % "google-cloud-bigquery" % "0.32.0-beta",
  "com.google.cloud.bigdataoss" % "gcs-connector" % "1.6.2-hadoop2",
  "com.spotify" % "spark-bigquery_2.11" % "0.2.2",
  "org.apache.parquet" % "parquet-avro" % "1.9.0"

And the code

object Main extends App {
    implicit val spark = SparkSession
      .builder()
      .appName("Name")
      .master("local[*]")
      .config("google.cloud.auth.service.account.json.keyfile", "/path")
      .config("fs.gs.project.id", "project-id")
      .getOrCreate()
    bqSqlContext.bigQuerySelect(s"SELECT * FROM ${tableName} LIMIT 10")
}

And here is the exception:

Exception in thread "main" com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
	at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:241)
	at com.google.cloud.hadoop.gcsio.BatchHelper.flushIfPossible(BatchHelper.java:118)
	at com.google.cloud.hadoop.gcsio.BatchHelper.flush(BatchHelper.java:132)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfos(GoogleCloudStorageImpl.java:1493)
	at com.google.cloud.hadoop.gcsio.ForwardingGoogleCloudStorage.getItemInfos(ForwardingGoogleCloudStorage.java:221)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfos(GoogleCloudStorageFileSystem.java:1159)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.mkdirs(GoogleCloudStorageFileSystem.java:530)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.mkdirs(GoogleHadoopFileSystemBase.java:1382)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1819)
	at com.google.cloud.hadoop.io.bigquery.AbstractExportToCloudStorage.prepare(AbstractExportToCloudStorage.java:59)
	at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:123)
	at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
	at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
	at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
	at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1368)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
	at org.apache.spark.rdd.RDD.first(RDD.scala:1367)
	at com.spotify.spark.bigquery.BigQuerySQLContext.bigQueryTable(BigQuerySQLContext.scala:112)
	at com.spotify.spark.bigquery.BigQuerySQLContext.bigQuerySelect(BigQuerySQLContext.scala:93)
	at com.powerspace.bigquery.BigQueryExporter.read(BigQueryExporter.scala:24)

Cheers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions