- The Databricks Lakehouse Platform includes tailored user interfaces for which personas?
- Data analysts, machine learning practitioners, data engineers (yes)
- Data architects, data analysts, and IT administrators
- Data analysts, data scientists, and database architects
- Platform administrators, data engineers, and data scientists (no)
- Which statement holistically describes a Lakehouse?
- It is the latest data management paradigm that combines elements from data warehouses and data lakes (yes)
- It is a unified analytics platform used for data management, processing, and analytics (no)
- It is an established data management tool that houses primarily unstructured data (no)
- It is a relatively new data management system that stores all data in its raw state (no)
- Which statement is true about commands run from Databricks Databricks Data Science and Engineering Workspace?
- They are executed by clusters
- They are executed in the Databricks control plane
- They are executed by endpoints (yes)
- They are executed by your web browser
- Which scenario would be best tackled using Databricks Machine Learning?
- Tracking and comparing the results of machine learning experiments.
- Creating a dashboard that will alert business managers of important changes in daily sales revenue
- Setting up access controls to limit data visibility to a particular group within an organization
- Replacing data silos with a single home for structured, semi-structured, and unstructured data
- Which is true about Databricks?
- It provides organizations with a collaborative platform to work with big data, and is primarily aimed at data engineers, data scientists, data analysts, and machine learning practitioners
- It provides organizations with robust data warehouses and is primarily aimed at data engineers and data architects
- It provides organizations with out-of-the-box solutions for managing IT infrastructure and is primarily aimed at IT administrators
- It provides organizations with open data lakes and robust data warehouses, primarily aimed at administrators of big data infrastructure
- Delta Lake improves data performance through indexing. What does this refer to?
- Ensuring that data can only be accessed by users who should have access
- Ordering table data to maximize query efficiency (yes)
- Structuring queries to act as if they are performed in a single operation
- Enforcing data constraints to satisfy business requirements
- Where does Delta Lake fit into the Databricks Lakehouse Platform?
- It runs under the hood of the Databricks Lakehouse Platform to power queries run
- It works in an organization's data warehouse to help migrate data into a data lake
- It sits on top of an organization's open data lake and provides structure to the many types of data stored within that data lake
- It works in concert with existing tools to bring auditing and sharing capabilities to data shared across organizations
- Where does Databricks Machine Learning fit into the Databricks Lakehouse Platform?
- It is one of the core services of the Lakehouse Platform, tailored towards data practitioners who need to query data and publish visual insights
- It is one of the core services of the Lakehouse Platform, tailored towards data practitioners who need to manage users and workspace governance
- It is one of the core services of the Lakehouse Platform, tailored towards data practitioners building data pipelines to make data available to everyone in an organization
- It is one of the core services of the Lakehouse Platform, tailored towards data practitioners building and managing machine learning models
- What is the access point to the Databricks Lakehouse Platform for machine learning practitioners?
- Databricks Data Science and Engineering Workspace
- Databricks Delta Lake
- Databricks Machine Learning
- Databricks SQL
- What are the primary services that comprise the Databricks Lakehouse Platform?
- Unity Catalog, Databricks Notebooks, Databricks Repos (no)
- Databricks SQL, Apache Spark, Delta Lake (no)
- Databricks SQL, Databricks Machine Learning, Databricks Data Science and Data Engineering Workspace (yes)
- Delta Lake, Apache Spark, Databricks Security & Governance (no)
- One of the key features delivered by the Databricks Lakehouse platform is ACID transactions. What describes ACID transactions?
- They ensure that data never falls into an inconsistent state because of an operation that only partially completes (yes)
- They identify rare events or observations in data which are statistically different from the rest of the observations
- They are data structures that organize data into a 2-dimensional table of rows and columns, much like a spreadsheet
- They supervise data transactions to ensure that data is not being misused
- One of the key features delivered by the Databricks Lakehouse platform is data schema enforcement. What describes data schema enforcement?
- It guarantees data freshness by limiting the data that lands in an organization's open data lake, based on whether or not it is GDPR compliant
- It ensures data quality by rejecting writes to a data table that do not match the way that data is structured and organized in that table
- It ensures data quality by rejecting data ingestion into an organization if a dataset's metadata doesn't contain adequate details, as defined by an organization
- It guarantees data freshness by eliminating duplicative data found within a database
- What is true about data stored by Databricks clusters?
- It is stored in the organization's cloud account (yes)
- It requires customer-supplied keys and certificates in order to maximize security (no)
- It is always encrypted without having to worry about key and certificate management (no)
- It is not encrypted since it cannot be accessed by anyone on the outside (no)
- What is the access point to the Databricks Lakehouse platform for business analysts?
- Databricks Machine Learning (no)
- Databricks SQL (yes)
- Databricks Data Science and Engineering Workspace
- Databricks Delta Lake
- What is the access point to the Databricks Lakehouse platform for data engineers?
- Databricks SQL (no)
- Databricks Data Science and Engineering Workspace
- Databricks Machine Learning (no)
- Databricks Delta Lake (no)
- Which scenario would be best tackled using Data Science and Engineering Workspace?
- Using Databricks Notebooks to collaborate with team members in a variety of programming languages
- Tracking and comparing the results of machine learning experiments
- Creating a dashboard that will alert business managers of important changes in daily sales revenue
- Setting up access controls to limit data visibility to a particular group within an organization
- Delta Lake ensures data quality by enabling an organization to do what?
- Providing access to data to only those who should have access
- Applying constraints on the data to ensure that expectations will be met (yes)
- Structuring queries to act as if they are performed in a single operation
- Ordering table data
- Which scenario would be best tackled using Databricks SQL?
- Setting up access controls to limit data visibility to a particular group within an organization
- Replacing data silos with a single home for structured, semi-structured, and unstructured data (no)
- Tracking the results of machine learning experiments
- Creating a dashboard that will alert business managers of important changes in daily sales revenue (yes)
- One of the key features delivered by the Databricks Lakehouse platform is business intelligence (BI) tool support. What describes this, as it relates to Databricks?
- Tools access Databricks through a driver (yes)
- Data is downloaded from Databricks into an intermediate format, then accessed by the tool
- Tools access Databricks through a proxy
- Tools that have integrated Databricks support access Databricks directly (no)
- What is the access point to the Databricks Lakehouse Platform for machine learning practitioners?
- Databricks Machine Learning (yes)
- Databricks SQL
- Databricks Delta Lake
- Databricks Data Science and Engineering Workspace
- What does the Databricks Lakehouse Platform provide to data teams?
- One centralized user interface so that data practitioners work in the same environment
- A completely open-source environment that guarantees seamless integration with the existing tools being used by an organization
- A toolset of compatible, open-source technologies to streamline collaboration among team members, depending on their role (yes)
- An environment called a Lakehouse, which combines elements from data lakes and EDSS system
- What is true about an organization's data when they use Databricks?
- Data that is stored while it's being processed by Databricks is always encrypted (yes)
- Data is always stored in their cloud account
- Data is always stored in the Databricks cloud account
- Organizations must encrypt data to maximize security
- Delta Lake ensures data governance through Unity Catalog. What does this refer to?
- Enforces access control lists on clusters, notebooks, tables and views. (no)
- Enforces data constraints to satisfy business requirements. (no)
- Enforces access control lists on tables and views. (yes)
- Enforces access control lists on clusters and notebooks
- The Databricks Lakehouse Platform is built on top of some of the world's most successful open-source data projects. Which of the following open source projects were originally created by Databricks and come as managed versions in the Databricks Lakehouse Platform?
- Delta Lake, MLflow, Koalas (yes)
- Delta Lake, Docker, Cloudera
- MLflow, Redash, GitLab
- Docker, Redash, Cloudera
- Apache Spark, Alchemist, MLflow
- Apache Spark, Delta Lake, Redash (yes)
- Apache Spark, Apache Airflow, Alchemist
- Apache Spark, Apache Airflow, Delta Lake (no)
- Apache Spark, Alchemist, MLflow (no)
- What describes how the Databricks Lakehouse Platform functions within an organization, at a high level?
- A variety of data types land in an organization's open data lake. Delta Lake augments that data lake by providing structure and governance to that data. That data is then used by data practitioners for their data use cases.
- Streaming data lands in multiple data warehouses. It is filtered through Delta Lake, which determines which data warehouse the data should be kept in. That data is then used by data practitioners for their data use cases.
- Real-time data is captured in Delta Lake and ultimately lands in an organization’s data warehouse. That data is then used by data practitioners for their data use cases.
- Structured, semi-structured, and unstructured data are captured in Delta Lake. The data is then passed to a data lake, where it is given structure and munged for quality. That data is then used by data practitioners for their data use
- The Lakehouse was created by combining the most useful elements of which data management strategies?
- Data lakes and data warehouses
- Data lakes and network databases
- Data warehouses and EDSS systems
- EDSS and OLAP systems
- Which statement is true about queries run from Databricks SQL?
- They automatically connect to business intelligence tools without the need for additional configuration. (no)
- They pass through Delta Lake to ensure that data being retrieved is relevant for use-cases being studied
- They connect directly to an organization’s Delta Lake without using drivers
- They are based on two concepts known as experiments and runs
- What does Databricks SQL allow data practitioners to do?
- Use SQL commands to perform ad-hoc and exploratory data analysis on an organization's data lake (yes)
- Build a centralized repository and registry for features, or input variables, for machine learning models
- Explore and develop features to include in machine learning model development
- View all queries and dashboards created within an organization
- One of the key features delivered by the Databricks Lakehouse platform is support for a wide variety of data. What types of data are supported by Databricks?
- Unstructured, semi-structured, structured
- Semi-structured and structured
- Unstructured and structured
- Unstructured and semi-structured
- One of the key features delivered by the Databricks Lakehouse platform is that data storage is decoupled from compute. What is a direct benefit of decoupling storage from compute?
- The cost of storage and compute are managed separately and can be scaled to accommodate more concurrent users or larger datasets independently
- Compute costs are decreased since less overhead is expended managing storage
- Accessing storage is more performant since it can be distributed more freely
- Storage costs are decreased since less overhead is expended storing information related to the compute environment
- What defines Delta Lake?
- It is a fine-grained, centralized security model for data lakes across clouds that enables users to share, audit, and manage structured and unstructured data
- It is an open protocol for secure real-time exchange of large datasets, which enables secure data sharing across products for the first time
- It is the storage format that data is put into that integrates with all data processing engines used within an organization (yes)
- It helps data engineering teams simplify ETL development and management with declarative pipeline development, automatic data testing, and deep visibility for monitoring and recovery
- What does Databricks Data Science and Engineering Workspace allow data practitioners to do?
- [] Use MLflow to explore and develop features to include in machine learning model development
- [] Use a specialized query interface to view all queries and dashboards created within an organization
- Integrate Databricks notebooks into a CI/CD workflow with Databricks Repos and Databricks Workflows (yes)
- [] Build a centralized repository and registry for features, or input variables, for machine learning models