This driver supports running queries asynchronously using execute_async()
. This method can return control to the calling code almost immediately instead of blocking until the query completes. execute_async()
never returns a ResultSet
. Instead it returns a query handle that you can use to poll for the query status, cancel the query, or fetch the query results.
You can use this method to submit multiple queries in rapid succession and to pick up running query executions that were kicked-off from a separate thread or Connection
. This can be especially useful for recovering running queries within serverless functions.
Note: Asynchronous execution is not the same as asyncio in Python.
databricks-sql-connector>=3.1.0
To run a query asynchronously, use databricks.sql.client.Connection.execute_async()
. This method takes the same arguments as execute()
. To pick up an existing query run, use databricks.sql.client.Connection.get_async_execution()
. Both methods return an AsyncExecution
object, which lets you interact with a query by exposing these properties and methods:
Properties
AsyncExecution.status
is the last-known status of this query run, expressed as anAsyncExecutionStatus
enumeration value. For example,RUNNING
,FINISHED
, orCANCELED
. When you first callexecute_async()
the resulting status will usually beRUNNING
. Callingsync_status()
will refresh the value of this property. In most usages, you do not need to access.status
directly and can instead use.is_running
,.is_canceled
, and.is_finished
instead.AsyncExecution.query_id
is theUUID
for this query run. You can use this to look-up the query in the Databricks SQL query history.AsyncExecution.query_secret
is theUUID
secret for this query run. Both thequery_id
andsecret
are needed to fetch results for a running query.AsyncExecution.returned_as_direct_result
is a boolean that indicates whether this returned a direct result. See below for more information.
Methods
AsyncExecution.sync_status()
performs a network round-trip to synchronize.status
.AsyncExecution.get_results()
returns theResultSet
for this query run. This is the same return signature as a synchronous.execute()
call. Note that if you callget_results()
on a query that is still running, the code will block until the query finishes running and the result has been fetched.AsyncExecution.cancel()
performs a network round-trip to cancel this query run.AsyncExceution.serialize()
returns a string of thequery_id:query_secret
for this execution.
Note: You do not need to directly instantiate an AsyncExecution
object in your client code. Instead, use the execute_async
and get_async_execution
methods to run or pick up a running query.
This snippet mirrors the synchronous example in this repository's README.
import os
import time
from databricks import sql
host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection:
query = connection.execute_async("SELECT :param `p`, * FROM RANGE(10" {"param": "foo"})
# Poll for the result every 5 seconds
while query.is_running:
time.sleep(5)
query.sync_status()
# this will raise a AsyncExecutionUnrecoverableResultException if the query was canceled
result = query.get_results().fetchall()
Both a query_id
and query_secret
are required to pick up a running query. This example runs a query from one Connection
and fetches its result from a different connection.
import os
import time
from databricks import sql
host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection:
query = connection.execute_async("SELECT :param `p`, * FROM RANGE(1000000000)" {"param": "foo"})
# see note in this document about queries that return direct results
if query.returned_as_direct_result:
assert False, "Queries that return direct results cannot be picked up with get_async_execution()"
# The connection created above has now closed
with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection:
query_id, query_secret = query_id_and_secret.split(":")
query = connection.get_async_execution(query_id, query_secret)
while query.is_running:
time.sleep(5)
query.sync_status()
result = query.get_results().fetchall()
To minimise network roundtrips for small queries, Databricks will eagerly return a query result if the query completes within five seconds and its results can be sent in a single response. This means that execute_async()
may take up to five seconds to return control back to your calling code. When this happens, AsyncExecution.returned_as_direct_result
will evaluate True
and the query result will have already been cached in this AsyncExecution
object. Calling get_results()
will not invoke a network round-trip because the query will not be available at the server.
Queries that execute in this fashion cannot be picked up with get_async_execution()
and their results are not persisted on the server to be fetched by a separate thread. Therefore, before calling .serialize()
to persist a query_id:query_secret
pair, you should check if AsyncExecution.returned_as_direct_result == True
first.