Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding sql dialect to destination capabilities #2393

Merged
merged 6 commits into from
Mar 16, 2025

Conversation

anuunchin
Copy link
Contributor

@anuunchin anuunchin commented Mar 11, 2025

Description

Adds sqlglot client to destination capabilities where possible. Special treatment for sqlalchemy destination. See the wrap-up below 👇.

Additionally, this PR makes sqlglot a regular dependency. (Added it under [tool.poetry.dependencies] and removed everywhere else. poetry lock --no-update is ran to update the lock file.)

Related Issues

Additional Context

SQLAlchemy

For the sqlalchemy destination, we set the dialect based on the backend_name of the connection string. As shown below, most of the time the backend_name and the dialect string match, but sometimes they slightly differ (i.e. postgresql and postgres).

- means sqlglot doesn't support a dialect that would be compatible with the given destination. (as far as my googling goes). The backend names for such cases are ommitted for simplicity.

External dialects maintained by SQLAlchemy:

Destination Backend Name SQLGlot Dialect
Actian Data Platform, Vector, Actian X, Ingres - -
Amazon Athena awsathena athena
Amazon Redshift (via psycopg2) redshift redshift
Apache Drill drill drill
Apache Druid druid druid
Apache Hive and Presto presto, hive, trino presto, hive, trino
Apache Solr - -
ClickHouse clickhouse clickhouse
CockroachDB - -
CrateDB - -
Databend - -
Databricks databricks databricks
EXASolution - -
Elasticsearch (readonly) - -
Firebird - -
Firebolt - -
Google BigQuery bigquery bigquery
Google Sheets - -
Greenplum - -
HyperSQL (hsqldb) - -
IBM DB2 and Informix - -
IBM Netezza Performance Server - -
Impala - -
Kinetica - -
Microsoft Access (via pyodbc) - -
Microsoft SQL Server (via python-tds) - -
Microsoft SQL Server (via turbodbc) mssql tsql
MonetDB - -
OpenGauss - -
Rockset - -
SAP ASE (fork of former Sybase dialect) - -
SAP Hana - -
SAP Sybase SQL Anywhere - -
Snowflake snowflake snowflake
Teradata Vantage teradatasql teradata
TiDB - -
YDB - -
YugabyteDB - -

Internal dialects:

Destination Backend Name SQLGlot Dialect
Microsoft SQL Server mssql tsql
MySQL / MariaDB mysql mysql
Oracle Database oracle oracle
PostgreSQL postgresql postgres
SQLite sqlite sqlite

All other destinations

Destination SQLGlot Dialect
athena "athena"
bigquery "bigquery"
clickhouse "clickhouse"
databricks "databricks"
duckdb "duckdb"
filesystem "duckdb"
mssql "tsql"
postgres "postgres"
redshift "redshift"
snowflake "snowflake"
synapse "tsql" (previously noted as working)
dremio "presto" (previously noted as working)
dummy -
motherduck "duckdb" (previously noted as working)
weaviate "n/a"
lancedb "n/a"
qdrant "n/a"

Copy link

netlify bot commented Mar 11, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit e97d5cf
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67d6ba5077164e0008134248

if dialect.name == "mysql" or backend_name in ("mysql", "mariadb"):
# correct max identifier length
# dialect uses 255 (max length for aliases) instead of 64 (max length of identifiers)
caps.max_identifier_length = 64
caps.format_datetime_literal = _format_mysql_datetime_literal
caps.sqlglot_dialect = "mysql"

elif backend_name in [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty cool!!!

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very good, thanks for doing the research into the types. What is missing though is that the IbisReadableRelation should be using this info to determine which dialect to use.

@anuunchin anuunchin force-pushed the feat/2392-sqlglot-dialect branch from e9e1e01 to cfd82cc Compare March 12, 2025 15:40
@anuunchin anuunchin force-pushed the feat/2392-sqlglot-dialect branch from e427b57 to 268287a Compare March 12, 2025 15:59
@anuunchin
Copy link
Contributor Author

The tables in the PR description are also updated - dialects for certain destinations (synapse, dremio, motherduck) were marked as working in ibis_relation.py

@anuunchin anuunchin requested a review from sh-rp March 13, 2025 08:52
Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one additional idea :)

@anuunchin
Copy link
Contributor Author

anuunchin commented Mar 13, 2025

The clickhouse and databricks dialects are supported by ibis.

@anuunchin anuunchin force-pushed the feat/2392-sqlglot-dialect branch from 4c7779f to a80921e Compare March 13, 2025 12:45
@anuunchin anuunchin requested review from sh-rp and rudolfix March 13, 2025 13:04
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very good. just use older version of sqlglot

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved from my side, I am not sure which would be the correct sqlglot version to use here, @rudolfix should approve this.

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit 74689ac into devel Mar 16, 2025
85 of 87 checks passed
@rudolfix rudolfix deleted the feat/2392-sqlglot-dialect branch March 16, 2025 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sql dialect to destination capabilities
3 participants