What happened?
Hello, I'm an Apache Airflow contributor working together with @zeroshade to implement an Apache Arrow provider for Airflow. I've been implementing an ADBCHook which allows Airflow to integrate with ADBC as this is how Airflow integrates with different databases like ODBC or JDBC. While testing ADBC in Airflow with a Postgres database I've came across an issue with JSONB columns in tables. If I ommit the JSONB column from the INSERT statement everything works, I don't have any issues with the native Postgres driver and thus native PostgresHook in Airflow.
When logging the schema of the table, I see the JSONB column is typed as a pyarrow string:
table_schema: index: int64
node_type: string
functional_key: string
id: string
name: string
valid_from_date: timestamp[us]
valid_to_date: timestamp[us]
micro_functions: string # JSONB column type
This SQL statement to perform inserts are generated dynamically by Airflow depending on the targeted database (e.g. dialect) as this is done the same way with other integrations (e.g. ODBC/JDBC) as we also have to support UPSERT statements which isn't supported by the adbc_ingest method, but even while doing a pure INSERT instead of an UPSERT or by using the adbc_ingest method, the same error arises when a JSONB column is present in the table.
Stack Trace
ProgrammingError: INVALID_ARGUMENT: Failed to prepare query: ERROR: column "micro_functions" is of type jsonb but expression is of type text
LINE 1: ...micro_functions) VALUES ($1,$2,$3,$4,$5,$6,$7,$8)...
^
HINT: You will need to rewrite or cast the expression.
Query was:INSERT INTO nodes.node (index, node_type, functional_key, id, name, valid_from_date, valid_to_date, micro_functions) VALUES ($1,$2,$3,$4,$5,$6,$7,$8) ON CONFLICT (id, valid_from_date) DO UPDATE SET index = excluded.index, node_type = excluded.node_type, functional_key = excluded.functional_key, name = excluded.name, valid_to_date = excluded.valid_to_date, micro_functions = excluded.micro_functions. SQLSTATE: 42804
File "/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 867 in run
File "/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 1159 in _execute_task
File "/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper
File "/usr/local/airflow/includes/common/sql/sql.py", line 178 in execute
File "/usr/local/airflow/plugins/infrabel/hooks/adbc.py", line 385 in insert_rows
File "/usr/local/lib64/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 771 in executemany
File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1674 in adbc_driver_manager._lib._blocking_call_impl
File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1667 in adbc_driver_manager._lib._blocking_call_impl
File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1391 in adbc_driver_manager._lib.AdbcStatement.execute_update
File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 261 in adbc_driver_manager._lib.check_error
How can we reproduce the bug?
CREATE TABLE nodes.node (
"index" int8 NULL,
node_type text NULL,
functional_key text NULL,
id text NOT NULL,
"name" text NULL,
valid_from_date timestamp NOT NULL,
valid_to_date timestamp NULL,
micro_functions jsonb NULL,
CONSTRAINT int_node_pkey PRIMARY KEY (id, valid_from_date)
);
CREATE INDEX node_valid_from_date_valid_to_date_idx ON nodes.node USING btree (valid_from_date, valid_to_date);
CREATE INDEX ix_nodes_node_index ON nodes.node USING btree (index);
Environment/Setup
Tested on Linux with latest pyarrow 21.0.0 and adbc-driver-postgresql 1.7.0.
What happened?
Hello, I'm an Apache Airflow contributor working together with @zeroshade to implement an Apache Arrow provider for Airflow. I've been implementing an ADBCHook which allows Airflow to integrate with ADBC as this is how Airflow integrates with different databases like ODBC or JDBC. While testing ADBC in Airflow with a Postgres database I've came across an issue with JSONB columns in tables. If I ommit the JSONB column from the INSERT statement everything works, I don't have any issues with the native Postgres driver and thus native PostgresHook in Airflow.
When logging the schema of the table, I see the JSONB column is typed as a pyarrow string:
This SQL statement to perform inserts are generated dynamically by Airflow depending on the targeted database (e.g. dialect) as this is done the same way with other integrations (e.g. ODBC/JDBC) as we also have to support UPSERT statements which isn't supported by the adbc_ingest method, but even while doing a pure INSERT instead of an UPSERT or by using the adbc_ingest method, the same error arises when a JSONB column is present in the table.
Stack Trace
How can we reproduce the bug?
Environment/Setup
Tested on Linux with latest pyarrow 21.0.0 and adbc-driver-postgresql 1.7.0.