|
4 | 4 | - Create a [Databricks Service Principal](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#what-is-a-service-principal) |
5 | 5 | - You can skip this step and use your own account to get things running quickly, |
6 | 6 | but we strongly recommend creating a dedicated service principal for production use. |
| 7 | + |
| 8 | +#### Authentication Options |
| 9 | + |
| 10 | +You can authenticate with Databricks using either a Personal Access Token or Azure authentication: |
| 11 | + |
| 12 | +**Option 1: Personal Access Token (PAT)** |
| 13 | + |
7 | 14 | - Generate a Databricks Personal Access token following the following guides: |
8 | 15 | - [Service Principals](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#personal-access-tokens) |
9 | 16 | - [Personal Access Tokens](https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens) |
10 | | -- Provision your service account: |
11 | | - - To ingest your workspace's metadata and lineage, your service principal must have all of the following: |
12 | | - - One of: metastore admin role, ownership of, or `USE CATALOG` privilege on any catalogs you want to ingest |
13 | | - - One of: metastore admin role, ownership of, or `USE SCHEMA` privilege on any schemas you want to ingest |
14 | | - - Ownership of or `SELECT` privilege on any tables and views you want to ingest |
15 | | - - [Ownership documentation](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/ownership.html) |
16 | | - - [Privileges documentation](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html) |
17 | | - - To ingest legacy hive_metastore catalog (`include_hive_metastore` - enabled by default), your service principal must have all of the following: |
18 | | - - `READ_METADATA` and `USAGE` privilege on `hive_metastore` catalog |
19 | | - - `READ_METADATA` and `USAGE` privilege on schemas you want to ingest |
20 | | - - `READ_METADATA` and `USAGE` privilege on tables and views you want to ingest |
21 | | - - [Hive Metastore Privileges documentation](https://docs.databricks.com/en/sql/language-manual/sql-ref-privileges-hms.html) |
22 | | - - To ingest your workspace's notebooks and respective lineage, your service principal must have `CAN_READ` privileges on the folders containing the notebooks you want to ingest: [guide](https://docs.databricks.com/en/security/auth-authz/access-control/workspace-acl.html#folder-permissions). |
23 | | - - To `include_usage_statistics` (enabled by default), your service principal must have one of the following: |
24 | | - - `CAN_MANAGE` permissions on any SQL Warehouses you want to ingest: [guide](https://docs.databricks.com/security/auth-authz/access-control/sql-endpoint-acl.html). |
25 | | - - When `usage_data_source` is set to `SYSTEM_TABLES` or `AUTO` (default) with `warehouse_id` configured: `SELECT` privilege on `system.query.history` table for improved performance with large query volumes and multi-workspace setups. |
26 | | - - To ingest `profiling` information with `method: ge`, you need `SELECT` privileges on all profiled tables. |
27 | | - - To ingest `profiling` information with `method: analyze` and `call_analyze: true` (enabled by default), your service principal must have ownership or `MODIFY` privilege on any tables you want to profile. |
28 | | - - Alternatively, you can run [ANALYZE TABLE](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-analyze-table.html) yourself on any tables you want to profile, then set `call_analyze` to `false`. |
29 | | - You will still need `SELECT` privilege on those tables to fetch the results. |
30 | | -- Check the starter recipe below and replace `workspace_url` and `token` with your information from the previous steps. |
| 17 | + |
| 18 | +**Option 2: Azure Authentication (for Azure Databricks)** |
| 19 | + |
| 20 | +- Create an Azure Active Directory application: |
| 21 | + - Follow the [Azure AD app registration guide](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) |
| 22 | + - Note down the `client_id` (Application ID), `tenant_id` (Directory ID), and create a `client_secret` |
| 23 | +- Grant the Azure AD application access to your Databricks workspace: |
| 24 | + - Add the service principal to your Databricks workspace following [this guide](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#add-a-service-principal-to-your-azure-databricks-account-using-the-account-console) |
| 25 | + |
| 26 | +#### Provision your service account: |
| 27 | + |
| 28 | +- To ingest your workspace's metadata and lineage, your service principal must have all of the following: |
| 29 | + - One of: metastore admin role, ownership of, or `USE CATALOG` privilege on any catalogs you want to ingest |
| 30 | + - One of: metastore admin role, ownership of, or `USE SCHEMA` privilege on any schemas you want to ingest |
| 31 | + - Ownership of or `SELECT` privilege on any tables and views you want to ingest |
| 32 | + - [Ownership documentation](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/ownership.html) |
| 33 | + - [Privileges documentation](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html) |
| 34 | +- To ingest legacy hive_metastore catalog (`include_hive_metastore` - enabled by default), your service principal must have all of the following: |
| 35 | + - `READ_METADATA` and `USAGE` privilege on `hive_metastore` catalog |
| 36 | + - `READ_METADATA` and `USAGE` privilege on schemas you want to ingest |
| 37 | + - `READ_METADATA` and `USAGE` privilege on tables and views you want to ingest |
| 38 | + - [Hive Metastore Privileges documentation](https://docs.databricks.com/en/sql/language-manual/sql-ref-privileges-hms.html) |
| 39 | +- To ingest your workspace's notebooks and respective lineage, your service principal must have `CAN_READ` privileges on the folders containing the notebooks you want to ingest: [guide](https://docs.databricks.com/en/security/auth-authz/access-control/workspace-acl.html#folder-permissions). |
| 40 | +- To `include_usage_statistics` (enabled by default), your service principal must have one of the following: |
| 41 | + - `CAN_MANAGE` permissions on any SQL Warehouses you want to ingest: [guide](https://docs.databricks.com/security/auth-authz/access-control/sql-endpoint-acl.html). |
| 42 | + - When `usage_data_source` is set to `SYSTEM_TABLES` or `AUTO` (default) with `warehouse_id` configured: `SELECT` privilege on `system.query.history` table for improved performance with large query volumes and multi-workspace setups. |
| 43 | +- To ingest `profiling` information with `method: ge`, you need `SELECT` privileges on all profiled tables. |
| 44 | +- To ingest `profiling` information with `method: analyze` and `call_analyze: true` (enabled by default), your service principal must have ownership or `MODIFY` privilege on any tables you want to profile. |
| 45 | + - Alternatively, you can run [ANALYZE TABLE](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-analyze-table.html) yourself on any tables you want to profile, then set `call_analyze` to `false`. |
| 46 | + You will still need `SELECT` privilege on those tables to fetch the results. |
| 47 | +- Check the starter recipe below and replace `workspace_url` and either `token` (for PAT authentication) or `azure_auth` credentials (for Azure authentication) with your information from the previous steps. |
0 commit comments