The Apache Avro library failed to parse the header

Spark version: 2.2.0
Spotify/spark-bigquery version: 0.2.2

Hi,

I am trying to use the `saveAsBigQuery` table function to write a schema that has an array of struct as a field. However, I am getting the following error:

`The Apache Avro library failed to parse the header with the follwing error: Invalid namespace: .topic_scores`

The offending field is:
```

{
            "type": [
                {
                    "items": [
                        {
                            "namespace": ".topic_scores",
                            "type": "record",
                            "name": "topic_scores",
                            "fields": [
                                {
                                    "type": "int",
                                    "name": "index"
                                },
                                {
                                    "type": "float",
                                    "name": "score"
                                }
                            ]
                        },
                        "null"
                    ],
                    "type": "array"
                },
                "null"
            ],
            "name": "topic_scores"
        }
```

You can see that the _namespace_ field begins with a dot. My guess is that the issue stems from https://github.com/spotify/spark-bigquery/blob/master/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala#L342-L346

I can't find a way to configure the `recordNamespace` value. According to avro documentation:

You can specify the record name and namespace like this:

```
import com.databricks.spark.avro._
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder().master("local").getOrCreate()
val df = spark.read.avro("src/test/resources/episodes.avro")

val name = "AvroTest"
val namespace = "com.databricks.spark.avro"
val parameters = Map("recordName" -> name, "recordNamespace" -> namespace)

df.write.options(parameters).avro("/tmp/output")
```
I think this is the line that reads that option, and sets the value to an empty string if not provided: https://github.com/databricks/spark-avro/blob/branch-4.0/src/main/scala/com/databricks/spark/avro/DefaultSource.scala#L114

These options are not parameterized anywhere in the Spotify library. Has anyone seen this issue or have a workaround? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Apache Avro library failed to parse the header #57

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

The Apache Avro library failed to parse the header #57

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions