Skip to content

Fix DatabricksSqlHook sqlalchemy_url to include http_path from connection extra#69037

Draft
Subham-KRLX wants to merge 1 commit into
apache:mainfrom
Subham-KRLX:fix/databricks-sqlalchemy-url-http-path
Draft

Fix DatabricksSqlHook sqlalchemy_url to include http_path from connection extra#69037
Subham-KRLX wants to merge 1 commit into
apache:mainfrom
Subham-KRLX:fix/databricks-sqlalchemy-url-http-path

Conversation

@Subham-KRLX

Copy link
Copy Markdown
Contributor

This PR fixes #69031 by ensuring that DatabricksSqlHook.sqlalchemy_url and get_uri() correctly include http_path when it is defined in the connection extra fields.

We centralized the resolution logic into a helper _resolve_http_path(), updated both get_conn() and sqlalchemy_url to use it, and added corresponding unit tests.

closes: #69031


Was generative AI tooling used to co-author this PR?
  • Yes-Claude Sonnet 4.5(For pr description and code research)

@moomindani moomindani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — centralizing the resolution into _resolve_http_path() and reusing it from both get_conn() and sqlalchemy_url is the right direction, and narrowing AirflowException to ValueError matches the repo's current exception guidelines. The connection-extra case (the one #69031 reports) works and is covered by your tests.

However, there is a regression for hooks configured via sql_endpoint_name: sqlalchemy_url (and therefore get_uri()) now performs a live "list SQL warehouses" REST API call inside a property. DbApiHook.dialect_name calls make_url(self.get_uri()) (common.sql sql.py:363), so every insert_rows() — and any other dialect-dependent path — now triggers that network call and fails hard when the API is unreachable. Before this change those paths worked offline (the URL simply lacked http_path, which dialect_name does not need). The existing unit test test_insert_rows_hook_lineage fails on this branch with exactly that chain:

insert_rows → _generate_insert_sql → dialect → dialect_name
  → get_uri → sqlalchemy_url → _resolve_http_path → _get_sql_endpoint_by_name → REST call

Suggestion: keep the API lookup out of the property. For example, give the helper a flag — _resolve_http_path(allow_endpoint_lookup: bool = True) — where get_conn() uses the default, and sqlalchemy_url passes allow_endpoint_lookup=False, falling back to the previous behavior of omitting http_path when only sql_endpoint_name is configured and no cached value exists yet (raising there would reject a valid configuration). The connection-extra resolution — the actual #69031 fix — needs no API call and stays as-is.

It would also be good to add a test where a sql_endpoint_name-configured hook calls get_uri() (or insert_rows()) offline — that would have caught this.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting


url_query = {
"http_path": self._http_path,
"http_path": self._resolve_http_path(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes sqlalchemy_url (a property) perform a REST API call when the hook is configured via sql_endpoint_name — see the review body for the failing chain through dialect_name/insert_rows. Please resolve only from local sources (explicit param, cached value, connection extra) here.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

@belegdol

belegdol commented Jul 2, 2026

Copy link
Copy Markdown

Suggestion: keep the API lookup out of the property. For example, give the helper a flag — _resolve_http_path(allow_endpoint_lookup: bool = True) — where get_conn() uses the default, and sqlalchemy_url passes allow_endpoint_lookup=False, falling back to the previous behavior of omitting http_path when only sql_endpoint_name is configured and no cached value exists yet (raising there would reject a valid configuration). The connection-extra resolution — the actual #69031 fix — needs no API call and stays as-is.

While it would fix #69031, I believe the error will still be there if one defines the hook with sql_endpoint_name defined instead of the http_path - which is supported according to the documentation:

http_path (str | None) – Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or sql_endpoint_name must be specified.

sql_endpoint_name (str | None) – Optional name of Databricks SQL Endpoint. If not specified, http_path must be provided as described above.

Would calling _resolve_http_path during init() be an acceptable solution to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DatabricksSqlHook.sqlalchemy_url lacks http_path if it is defined in a connection

3 participants