Skip to content

Protobuf (probably avro?) not compatible with Confluent schema registry #347

@xav-ie

Description

@xav-ie

When serializing data with protobuf, you can send data just fine to the schema registry. With data encoded from this library. However, when trying to create a stream from that data, kafka schema registry will try to deserialize it, and run into deserialization issue, it is missing the messages index.

Here is an error log I got only after trying to emit the topic into a stream. The main error is "Invalid message indexes":
ksqldb-server    | [2024-02-15 19:04:11,802] ERROR {"type":0,"deserializationError":{"target":"value","errorMessage":"Error deserializing message from topic: orders-topic","recordB64":null,"cause":["Failed to deserialize data for topic orders-topic to Protobuf: ","Error deserializing Protobuf message for id 1","Invalid message indexes: io.confluent.kafka.schemaregistry.protobuf.MessageIndexes@3fb03c91"],"topic":"orders-topic"},"recordProcessingError":null,"productionError":null,"serializationError":null,"kafkaStreamsThreadError":null} (processing.transient_ORDERS_PROTO_SIMPLE_4698783426306005545.KsqlTopic.Source.deserializer)
ksqldb-server    | [2024-02-15 19:04:11,802] WARN stream-thread [_confluent-ksql-default_transient_transient_ORDERS_PROTO_SIMPLE_4698783426306005545_1708023851703-32aade83-3678-44dc-b176-912cd78ee7d7-StreamThread-1] task [0_0] Skipping record due to deserialization error. topic=[orders-topic] partition=[0] offset=[50] (org.apache.kafka.streams.processor.internals.RecordDeserializer)
ksqldb-server    | org.apache.kafka.common.errors.SerializationException: Error deserializing message from topic: orders-topic
ksqldb-server    |      at io.confluent.ksql.serde.connect.KsqlConnectDeserializer.deserialize(KsqlConnectDeserializer.java:55)
ksqldb-server    |      at io.confluent.ksql.serde.tls.ThreadLocalDeserializer.deserialize(ThreadLocalDeserializer.java:37)
ksqldb-server    |      at io.confluent.ksql.serde.connect.ConnectFormat$StructToListDeserializer.deserialize(ConnectFormat.java:239)
ksqldb-server    |      at io.confluent.ksql.serde.connect.ConnectFormat$StructToListDeserializer.deserialize(ConnectFormat.java:218)
ksqldb-server    |      at io.confluent.ksql.serde.GenericDeserializer.deserialize(GenericDeserializer.java:59)
ksqldb-server    |      at io.confluent.ksql.logging.processing.LoggingDeserializer.tryDeserialize(LoggingDeserializer.java:61)
ksqldb-server    |      at io.confluent.ksql.logging.processing.LoggingDeserializer.deserialize(LoggingDeserializer.java:48)
ksqldb-server    |      at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:58)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.RecordQueue.updateHead(RecordQueue.java:204)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:128)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:304)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:1002)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.TaskManager.addRecordsToTasks(TaskManager.java:1630)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.StreamThread.pollPhase(StreamThread.java:992)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:766)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:617)
ksqldb-server    |      at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:579)
ksqldb-server    | Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic orders-topic to Protobuf:
ksqldb-server    |      at io.confluent.connect.protobuf.ProtobufConverter.toConnectData(ProtobufConverter.java:154)
ksqldb-server    |      at io.confluent.connect.protobuf.ProtobufConverter.toConnectData(ProtobufConverter.java:126)
ksqldb-server    |      at io.confluent.ksql.serde.connect.KsqlConnectDeserializer.deserialize(KsqlConnectDeserializer.java:49)
ksqldb-server    |      ... 18 more
ksqldb-server    | Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Protobuf message for id 1
ksqldb-server    |      at io.confluent.kafka.serializers.protobuf.AbstractKafkaProtobufDeserializer.deserialize(AbstractKafkaProtobufDeserializer.java:228)
ksqldb-server    |      at io.confluent.kafka.serializers.protobuf.AbstractKafkaProtobufDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaProtobufDeserializer.java:292)
ksqldb-server    |      at io.confluent.connect.protobuf.ProtobufConverter$Deserializer.deserialize(ProtobufConverter.java:200)
ksqldb-server    |      at io.confluent.connect.protobuf.ProtobufConverter.toConnectData(ProtobufConverter.java:132)
ksqldb-server    |      ... 20 more
ksqldb-server    | Caused by: java.lang.IllegalArgumentException: Invalid message indexes: io.confluent.kafka.schemaregistry.protobuf.MessageIndexes@3fb03c91
ksqldb-server    |      at io.confluent.kafka.schemaregistry.protobuf.ProtobufSchema.toMessageName(ProtobufSchema.java:2202)
ksqldb-server    |      at io.confluent.kafka.serializers.protobuf.AbstractKafkaProtobufDeserializer.deserialize(AbstractKafkaProtobufDeserializer.java:140)
ksqldb-server    |      ... 23 more

The message indexes part of the binary is actually not included when serializing. I was hoping that this is an actual issue and this library hopes to addess it.

In kafkajs/confluent-schema-registry, there is a PR trying to do just that kafkajs/confluent-schema-registry#258, but I don't really know that much about the mysterious message indexes part. I am trying to learn more, and wanted to create this issue in the meantime. You can find out more about message indexes here:
https://docs.confluent.io/cloud/current/sr/fundamentals/serdes-develop/index.html#wire-format

I think it is funny that their table is very misleading and omits the message-indexes part.

I also think it is funny that I am for some reason allowed to write messages into a topic using a schema id, but I do not get errors. It is not until I try and deserialize the messages I start to get errors...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions