-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CONTRIB] Normalize schema and table names. #10763
base: develop
Are you sure you want to change the base?
Conversation
👷 Deploy request for niobium-lead-7998 pending review.Visit the deploys page to approve it
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing these changes! I've left a few comments inline.
@@ -995,7 +995,9 @@ def test_connection(self) -> None: | |||
engine: sqlalchemy.Engine = datasource.get_engine() | |||
inspector: sqlalchemy.Inspector = sa.inspect(engine) | |||
|
|||
if self.schema_name and self.schema_name not in inspector.get_schema_names(): | |||
if self.schema_name and self.schema_name not in map( | |||
to_lower_if_not_quoted, inspector.get_schema_names() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also call to_lower_if_not_quoted
on the self.schema_name
used in the map lookup. Right now, this may fail for snowflake since I think their sqlalchemy connector changes names to uppercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add, but this only failed because to_lower_if_not_quoted was already applied to self.schema_name. Applying it again should be redundant.
@@ -87,6 +87,10 @@ def _spark( | |||
return spark_column_metadata | |||
|
|||
|
|||
def _get_normalized_table_name(table_name: str): | |||
return table_name.strip("[]").strip('"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems MSSQL specific. I would rename this method to _get_mssql_normalized_table_name
or something similar and check the dialect before calling via execution_engine.dialect_name == GXSqlDialect.MSSQL
.
I am not familiar with MSSQL. Can square brackets or double quotes appear in table names if preceded by an escape character? That is, are there table names that will be broken with these strip
calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Square brackets are allowed in MSSQL but then need to be escaped when running queries against the table. A better solution would be to let GX add the square brackets to the db object names when needed. Currently, I have to add them when adding a data asset, e.g. asset = datasource.add_table_asset(name="ClientCodes", schema_name="CR", table_name="[247_CLIENT_CODES]").
@@ -106,7 +110,7 @@ def _get_sqlalchemy_column_metadata( | |||
|
|||
return get_sqlalchemy_column_metadata( | |||
execution_engine=execution_engine, | |||
table_selectable=table_selectable, # type: ignore[arg-type] | |||
table_selectable=_get_normalized_table_name(table_selectable), # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above about only calling this normalization for MSSQL execution engines.
https://support.greatexpectations.io/hc/en-us/requests/210
Background: I am building tests against SQL data assets (MSSQL). Schema names and tables are in all caps. Table names start with a number and need to be wrapped in square brackets.
Problem 1:
When adding a SQL data asset, the schema name is automatically changed to lowercase. Later, a connection test fails because the lowercase schema name is compared to the original schema name as defined in the database.
Solution: Apply the function “to_lower_if_not_quoted” to the list of schemas returned by the database.
Problem 2:
Since my table names start with a number, I need to wrap them in square brackets when adding data assets in Great Expectations. This causes a problem with the retrieval of column names which expects the actual table name without the brackets.
Solution: Strip square brackets and double quotes from table names before retrieving column metadata via SqlAlchemy.
invoke lint
(usesruff format
+ruff check
)For more information about contributing, visit our community resources.
After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!