diff --git a/docs/apache-airflow-providers-common-sql/dialects.rst b/docs/apache-airflow-providers-common-sql/dialects.rst new file mode 100644 index 0000000000000..22feedc945413 --- /dev/null +++ b/docs/apache-airflow-providers-common-sql/dialects.rst @@ -0,0 +1,55 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +SQL Dialects +============= + +The :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` offers an abstraction layer between the +:class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` implementation and the database. For some database multiple +connection types are available, like native, ODBC and or JDBC. As the :class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` +and the :class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook` are generic hooks which allows you to interact with any +database that has a driver for it, it needed an abstraction layer which allows us to run specialized queries +depending of the database to which we connect and that's why dialects where introduced. + +The default :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` class has following operations +available which underneath use SQLAlchemy to execute, but can be overloaded with specialized implementations +per database: + +- ``placeholder`` specifies the database specific placeholder used in prepared statements (default: ``%s``); +- ``inspector`` returns the SQLAlchemy inspector which allows us to retrieve database metadata; +- ``extract_schema_from_table`` allows us to extract the schema name from a string. +- ``get_column_names`` returns the column names for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_primary_keys`` returns the primary keys for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_target_fields`` returns the columns names that aren't identity or auto incremented columns, this will be used by the insert_rows method of the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` if the target_fields parameter wasn't specified and the Airflow property ``core.dbapihook_resolve_target_fields`` is set to True (default: False). +- ``reserved_words`` returns the reserved words in SQL for the target database using the SQLAlchemy inspector. +- ``generate_insert_sql`` generates the insert SQL statement for the target database. +- ``generate_replace_sql`` generates the upsert SQL statement for the target database. + +At the moment there are only 3 dialects available: + +- ``default`` :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` reuses the generic functionality that was already available in the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook`; +- ``mssql`` :class:`~airflow.providers.microsoft.mssql.dialects.mssql.MsSqlDialect` specialized for Microsoft SQL Server; +- ``postgresql`` :class:`~airflow.providers.postgres.dialects.postgres.PostgresDialect` specialized for PostgreSQL; + +The dialect to be used will be derived from the connection string, which sometimes won't be possible. There is always +the possibility to specify the dialect name through the extra options of the connection: + +.. code-block:: + + dialect_name: 'mssql' + +If a specific dialect isn't available for a database, the default one will be used, same when a non-existing dialect name is specified. diff --git a/docs/apache-airflow-providers-common-sql/index.rst b/docs/apache-airflow-providers-common-sql/index.rst index b24ee78510c86..6659b1affbc1c 100644 --- a/docs/apache-airflow-providers-common-sql/index.rst +++ b/docs/apache-airflow-providers-common-sql/index.rst @@ -43,6 +43,7 @@ Python API <_api/airflow/providers/common/sql/index> Supported Database Types + Dialects .. toctree:: :hidden: diff --git a/docs/apache-airflow-providers-microsoft-mssql/dialects.rst b/docs/apache-airflow-providers-microsoft-mssql/dialects.rst new file mode 100644 index 0000000000000..22feedc945413 --- /dev/null +++ b/docs/apache-airflow-providers-microsoft-mssql/dialects.rst @@ -0,0 +1,55 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +SQL Dialects +============= + +The :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` offers an abstraction layer between the +:class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` implementation and the database. For some database multiple +connection types are available, like native, ODBC and or JDBC. As the :class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` +and the :class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook` are generic hooks which allows you to interact with any +database that has a driver for it, it needed an abstraction layer which allows us to run specialized queries +depending of the database to which we connect and that's why dialects where introduced. + +The default :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` class has following operations +available which underneath use SQLAlchemy to execute, but can be overloaded with specialized implementations +per database: + +- ``placeholder`` specifies the database specific placeholder used in prepared statements (default: ``%s``); +- ``inspector`` returns the SQLAlchemy inspector which allows us to retrieve database metadata; +- ``extract_schema_from_table`` allows us to extract the schema name from a string. +- ``get_column_names`` returns the column names for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_primary_keys`` returns the primary keys for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_target_fields`` returns the columns names that aren't identity or auto incremented columns, this will be used by the insert_rows method of the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` if the target_fields parameter wasn't specified and the Airflow property ``core.dbapihook_resolve_target_fields`` is set to True (default: False). +- ``reserved_words`` returns the reserved words in SQL for the target database using the SQLAlchemy inspector. +- ``generate_insert_sql`` generates the insert SQL statement for the target database. +- ``generate_replace_sql`` generates the upsert SQL statement for the target database. + +At the moment there are only 3 dialects available: + +- ``default`` :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` reuses the generic functionality that was already available in the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook`; +- ``mssql`` :class:`~airflow.providers.microsoft.mssql.dialects.mssql.MsSqlDialect` specialized for Microsoft SQL Server; +- ``postgresql`` :class:`~airflow.providers.postgres.dialects.postgres.PostgresDialect` specialized for PostgreSQL; + +The dialect to be used will be derived from the connection string, which sometimes won't be possible. There is always +the possibility to specify the dialect name through the extra options of the connection: + +.. code-block:: + + dialect_name: 'mssql' + +If a specific dialect isn't available for a database, the default one will be used, same when a non-existing dialect name is specified. diff --git a/docs/apache-airflow-providers-microsoft-mssql/index.rst b/docs/apache-airflow-providers-microsoft-mssql/index.rst index 62089e865459d..b574f05bef995 100644 --- a/docs/apache-airflow-providers-microsoft-mssql/index.rst +++ b/docs/apache-airflow-providers-microsoft-mssql/index.rst @@ -43,6 +43,7 @@ :caption: References Python API <_api/airflow/providers/microsoft/mssql/index> + Dialects .. toctree:: :hidden: diff --git a/docs/apache-airflow-providers-postgres/dialects.rst b/docs/apache-airflow-providers-postgres/dialects.rst new file mode 100644 index 0000000000000..22feedc945413 --- /dev/null +++ b/docs/apache-airflow-providers-postgres/dialects.rst @@ -0,0 +1,55 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +SQL Dialects +============= + +The :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` offers an abstraction layer between the +:class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` implementation and the database. For some database multiple +connection types are available, like native, ODBC and or JDBC. As the :class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` +and the :class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook` are generic hooks which allows you to interact with any +database that has a driver for it, it needed an abstraction layer which allows us to run specialized queries +depending of the database to which we connect and that's why dialects where introduced. + +The default :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` class has following operations +available which underneath use SQLAlchemy to execute, but can be overloaded with specialized implementations +per database: + +- ``placeholder`` specifies the database specific placeholder used in prepared statements (default: ``%s``); +- ``inspector`` returns the SQLAlchemy inspector which allows us to retrieve database metadata; +- ``extract_schema_from_table`` allows us to extract the schema name from a string. +- ``get_column_names`` returns the column names for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_primary_keys`` returns the primary keys for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_target_fields`` returns the columns names that aren't identity or auto incremented columns, this will be used by the insert_rows method of the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` if the target_fields parameter wasn't specified and the Airflow property ``core.dbapihook_resolve_target_fields`` is set to True (default: False). +- ``reserved_words`` returns the reserved words in SQL for the target database using the SQLAlchemy inspector. +- ``generate_insert_sql`` generates the insert SQL statement for the target database. +- ``generate_replace_sql`` generates the upsert SQL statement for the target database. + +At the moment there are only 3 dialects available: + +- ``default`` :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` reuses the generic functionality that was already available in the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook`; +- ``mssql`` :class:`~airflow.providers.microsoft.mssql.dialects.mssql.MsSqlDialect` specialized for Microsoft SQL Server; +- ``postgresql`` :class:`~airflow.providers.postgres.dialects.postgres.PostgresDialect` specialized for PostgreSQL; + +The dialect to be used will be derived from the connection string, which sometimes won't be possible. There is always +the possibility to specify the dialect name through the extra options of the connection: + +.. code-block:: + + dialect_name: 'mssql' + +If a specific dialect isn't available for a database, the default one will be used, same when a non-existing dialect name is specified. diff --git a/docs/apache-airflow-providers-postgres/index.rst b/docs/apache-airflow-providers-postgres/index.rst index 0979b194b1882..b4d698ceb72f1 100644 --- a/docs/apache-airflow-providers-postgres/index.rst +++ b/docs/apache-airflow-providers-postgres/index.rst @@ -43,6 +43,7 @@ :caption: References Python API <_api/airflow/providers/postgres/index> + Dialects .. toctree:: :hidden: diff --git a/providers/src/airflow/providers/common/sql/doc/adr/0003-introduce-notion-of-dialects-in-dbapihook.md b/providers/src/airflow/providers/common/sql/doc/adr/0003-introduce-notion-of-dialects-in-dbapihook.md new file mode 100644 index 0000000000000..09df3145e4ea2 --- /dev/null +++ b/providers/src/airflow/providers/common/sql/doc/adr/0003-introduce-notion-of-dialects-in-dbapihook.md @@ -0,0 +1,61 @@ + + +# 3. Introduce notion of dialects in DbApiHook + +Date: 2025-01-07 + +## Status + +Accepted + +## Context + +This ADR describes the proposition why we wanted to introduce dialects in the ``DBAPIHook`` as we experienced +that the ``_insert_statement_format`` and ``_replace_statement_format`` string formatting properties used by the +``insert_rows`` method in the ``DbApiHook`` where lacking in some cases as the number of parameters passed to the +string format are hard-coded and aren't always sufficient when using different database through the +generic JBDC and ODBC connection types. + +That's why we wanted a generic approach in which the code isn't tied to a specific database hook. + +For example when using MsSQL through ODBC instead of the native ``MsSqlHook``, you won't have the merge into +(e.g. replace) functionality for MSSQL when using the ODBC connection type as that one was only available in +the native ``MsSqlHook``. + +That's where the notion of dialects come into play and allow us to benefit of the same functionalities +independently of which connection type you want to use (ODBC/JDBC or native if available) for a specific +database. + + +## Decision + +We decided the introduce the notion of dialects which allows us to implement database specific functionalities +independently of the used connection type (e.g. hook). That way when using for example the ``insert_rows`` method on +the ``DbApiHook`` for as well ODBC as JDBC as native connection types, it will always be possible to use the replace +into (e.g. merge into) functionality as that won't be tied to a specific implementation with a Hook an thus the +connection type. + + +## Consequences + +The consequence of this decision is that from now on database specific implementations should be done within the +dialect for that database instead of the specialized hook, unless the connection type is tied to the hook, +meaning that there is only one connection type possible and an ODBC/JDBC and in the future maybe even ADBC +(e.g. Apache Arrow) isn't available.