Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ADR document describing why the notion of dialects was introduced in the common sql provider #45456

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
b9abc48
docs: Added a markdown ADR document describing why the notion of dial…
davidblain-infrabel Jan 7, 2025
28c22ba
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 7, 2025
f860651
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 8, 2025
564022c
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 8, 2025
84a77ab
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 9, 2025
19413bc
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 9, 2025
3b78ec3
docs: Added reference to the dialects in the Airflow common sql provi…
davidblain-infrabel Jan 10, 2025
f037896
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 10, 2025
e90e7f7
docs: Reformatted the ADR
davidblain-infrabel Jan 10, 2025
d0b7c10
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 10, 2025
71101b9
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 10, 2025
b5ee559
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 11, 2025
a54e54d
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 13, 2025
dd47990
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 14, 2025
5bbbf25
refactor: Added dialects reference in mssql and postgres provider
davidblain-infrabel Jan 14, 2025
b81fa84
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 14, 2025
c92ce25
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 15, 2025
b0bf1b3
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 17, 2025
d05558f
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 17, 2025
8ea6646
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 17, 2025
81f7358
Merge branch 'main' into feature/common-sql-dialects-docs
dabla Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions docs/apache-airflow-providers-common-sql/dialects.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

SQL Dialects
=============

The :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` offers an abstraction layer between the
:class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` implementation and the database. For some database multiple
connection types are available, like native, ODBC and or JDBC. As the :class:`~airflow.providers.odbc.hooks.odbc.OdbcHook`
and the :class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook` are generic hooks which allows you to interact with any
database that has a driver for it, it needed an abstraction layer which allows us to run specialized queries
depending of the database to which we connect and that's why dialects where introduced.

The default :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` class has following operations
available which underneath use SQLAlchemy to execute, but can be overloaded with specialized implementations
per database:

- ``placeholder`` specifies the database specific placeholder used in prepared statements (default: ``%s``);
- ``inspector`` returns the SQLAlchemy inspector which allows us to retrieve database metadata;
- ``extract_schema_from_table`` allows us to extract the schema name from a string.
- ``get_column_names`` returns the column names for the given table and schema (optional) using the SQLAlchemy inspector.
- ``get_primary_keys`` returns the primary keys for the given table and schema (optional) using the SQLAlchemy inspector.
- ``get_target_fields`` returns the columns names that aren't identity or auto incremented columns, this will be used by the insert_rows method of the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` if the target_fields parameter wasn't specified and the Airflow property ``core.dbapihook_resolve_target_fields`` is set to True (default: False).
- ``reserved_words`` returns the reserved words in SQL for the target database using the SQLAlchemy inspector.
- ``generate_insert_sql`` generates the insert SQL statement for the target database.
- ``generate_replace_sql`` generates the upsert SQL statement for the target database.

At the moment there are only 3 dialects available:

- ``default`` :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` reuses the generic functionality that was already available in the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook`;
- ``mssql`` :class:`~airflow.providers.microsoft.mssql.dialects.mssql.MsSqlDialect` specialized for Microsoft SQL Server;
- ``postgresql`` :class:`~airflow.providers.postgres.dialects.postgres.PostgresDialect` specialized for PostgreSQL;

The dialect to be used will be derived from the connection string, which sometimes won't be possible. There is always
the possibility to specify the dialect name through the extra options of the connection:

.. code-block::

dialect_name: 'mssql'
Comment on lines +42 to +53
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!
I think we better also have dialects.rst in the docs of mssql and postgres and reference the guide from here.
Some users may land directly in the relevant docs rather than in the common.sql doc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at it shortly. Been kinda busy with refactors and stuff :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dabla can you add dialects.rst also in mysql and postgres providers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dabla can you add dialects.rst also in mysql and postgres providers?

I've added the reference to dialects in both indexes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can pass the doc build... i think you have to create a separate doc for each one but if the doc build passes I am fine with it


If a specific dialect isn't available for a database, the default one will be used, same when a non-existing dialect name is specified.
1 change: 1 addition & 0 deletions docs/apache-airflow-providers-common-sql/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@

Python API <_api/airflow/providers/common/sql/index>
Supported Database Types </supported-database-types>
Dialects <dialects>

.. toctree::
:hidden:
Expand Down
1 change: 1 addition & 0 deletions docs/apache-airflow-providers-microsoft-mssql/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
:caption: References

Python API <_api/airflow/providers/microsoft/mssql/index>
Dialects <_api/airflow/providers/common/sql/dialects>

.. toctree::
:hidden:
Expand Down
1 change: 1 addition & 0 deletions docs/apache-airflow-providers-postgres/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
:caption: References

Python API <_api/airflow/providers/postgres/index>
Dialects <_api/airflow/providers/common/sql/dialects>

.. toctree::
:hidden:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# 3. Introduce notion of dialects in DbApiHook

Date: 2025-01-07

## Status

Accepted

## Context

This ADR describes the proposition why we wanted to introduce dialects in the ``DBAPIHook`` as we experienced
that the ``_insert_statement_format`` and ``_replace_statement_format`` string formatting properties used by the
``insert_rows`` method in the ``DbApiHook`` where lacking in some cases as the number of parameters passed to the
string format are hard-coded and aren't always sufficient when using different database through the
generic JBDC and ODBC connection types.

That's why we wanted a generic approach in which the code isn't tied to a specific database hook.

For example when using MsSQL through ODBC instead of the native ``MsSqlHook``, you won't have the merge into
(e.g. replace) functionality for MSSQL when using the ODBC connection type as that one was only available in
the native ``MsSqlHook``.

That's where the notion of dialects come into play and allow us to benefit of the same functionalities
independently of which connection type you want to use (ODBC/JDBC or native if available) for a specific
database.


## Decision

We decided the introduce the notion of dialects which allows us to implement database specific functionalities
independently of the used connection type (e.g. hook). That way when using for example the ``insert_rows`` method on
the ``DbApiHook`` for as well ODBC as JDBC as native connection types, it will always be possible to use the replace
into (e.g. merge into) functionality as that won't be tied to a specific implementation with a Hook an thus the
connection type.


## Consequences

The consequence of this decision is that from now on database specific implementations should be done within the
dialect for that database instead of the specialized hook, unless the connection type is tied to the hook,
meaning that there is only one connection type possible and an ODBC/JDBC and in the future maybe even ADBC
(e.g. Apache Arrow) isn't available.
Loading