Skip to content

Python: JSONB column type issue with ADBC driver manager for Postgres #3284

@dabla

Description

@dabla

What happened?

Hello, I'm an Apache Airflow contributor working together with @zeroshade to implement an Apache Arrow provider for Airflow. I've been implementing an ADBCHook which allows Airflow to integrate with ADBC as this is how Airflow integrates with different databases like ODBC or JDBC. While testing ADBC in Airflow with a Postgres database I've came across an issue with JSONB columns in tables. If I ommit the JSONB column from the INSERT statement everything works, I don't have any issues with the native Postgres driver and thus native PostgresHook in Airflow.

When logging the schema of the table, I see the JSONB column is typed as a pyarrow string:

table_schema: index: int64
node_type: string
functional_key: string
id: string
name: string
valid_from_date: timestamp[us]
valid_to_date: timestamp[us]
micro_functions: string  # JSONB column type

This SQL statement to perform inserts are generated dynamically by Airflow depending on the targeted database (e.g. dialect) as this is done the same way with other integrations (e.g. ODBC/JDBC) as we also have to support UPSERT statements which isn't supported by the adbc_ingest method, but even while doing a pure INSERT instead of an UPSERT or by using the adbc_ingest method, the same error arises when a JSONB column is present in the table.

Stack Trace

ProgrammingError: INVALID_ARGUMENT: Failed to prepare query: ERROR:  column "micro_functions" is of type jsonb but expression is of type text
LINE 1: ...micro_functions) VALUES ($1,$2,$3,$4,$5,$6,$7,$8)...
                                                             ^
HINT:  You will need to rewrite or cast the expression.

Query was:INSERT INTO nodes.node (index, node_type, functional_key, id, name, valid_from_date, valid_to_date, micro_functions) VALUES ($1,$2,$3,$4,$5,$6,$7,$8) ON CONFLICT (id, valid_from_date) DO UPDATE SET index = excluded.index, node_type = excluded.node_type, functional_key = excluded.functional_key, name = excluded.name, valid_to_date = excluded.valid_to_date, micro_functions = excluded.micro_functions. SQLSTATE: 42804
File "/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 867 in run

File "/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 1159 in _execute_task

File "/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper

File "/usr/local/airflow/includes/common/sql/sql.py", line 178 in execute

File "/usr/local/airflow/plugins/infrabel/hooks/adbc.py", line 385 in insert_rows

File "/usr/local/lib64/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 771 in executemany

File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1674 in adbc_driver_manager._lib._blocking_call_impl

File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1667 in adbc_driver_manager._lib._blocking_call_impl

File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1391 in adbc_driver_manager._lib.AdbcStatement.execute_update

File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 261 in adbc_driver_manager._lib.check_error

How can we reproduce the bug?

CREATE TABLE nodes.node (
	"index" int8 NULL,
	node_type text NULL,
	functional_key text NULL,
	id text NOT NULL,
	"name" text NULL,
	valid_from_date timestamp NOT NULL,
	valid_to_date timestamp NULL,
	micro_functions jsonb NULL,
	CONSTRAINT int_node_pkey PRIMARY KEY (id, valid_from_date)
);
CREATE INDEX node_valid_from_date_valid_to_date_idx ON nodes.node USING btree (valid_from_date, valid_to_date);
CREATE INDEX ix_nodes_node_index ON nodes.node USING btree (index);

Environment/Setup

Tested on Linux with latest pyarrow 21.0.0 and adbc-driver-postgresql 1.7.0.

Metadata

Metadata

Assignees

Labels

Type: bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions