Exam AZ-305 All Questions

View all questions & answers for the AZ-305 exam

Exam AZ-305 topic 2 question 22 discussion

Actual exam question from Microsoft's AZ-305

Question #: 22
Topic #: 2

HOTSPOT
-

You are designing a data analytics solution that will use Azure Synapse and Azure Data Lake Storage Gen2.

You need to recommend Azure Synapse pools to meet the following requirements:

• Ingest data from Data Lake Storage into hash-distributed tables.
• Implement query, and update data in Delta Lake.

What should you recommend for each requirement? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Show Suggested Answer

Suggested Answer:

by Asten at Jan. 18, 2023, 8:45 a.m.

Comments

Submit Cancel

saiyandjinn

Highly Voted 2 years, 2 months ago

The second question is confusing, and I am not sure what the answer is - Can query delta lake with Serverless SQL pool but won't be able to update it. - Only Apache Spark pools support updates to Delta Lakes files. It can also be used to query long-time series as well if I understand the doc correctly... I think the answer to 2 is Apache Spark tools on that basis...

upvoted 36 times

Fidel_104

1 year, 1 month ago

The question mentions 'Data Lake Storage', not Delta Lake - there is no explicit indication that the data is stored in a delta lake format. Therefore I don't think that the Spark pool is needed. Nevertheless, Delta Lake is indeed a very confusing name for what is essentially a data format ("optimized storage layer").

upvoted 2 times

Fidel_104

1 year, 1 month ago

Ah I take it back, Delta lake is also mentioned later, sry for the confusion.

upvoted 2 times

...

WeepingMaplte

1 year, 1 month ago

Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files Taken from: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format

upvoted 1 times

...

RandomNickname

2 years, 2 months ago

Agree. From what I can find SQL pool can't update delta lake files only Apache Spark can do that, assuming article is accurate below; https://www.jamesserra.com/archive/2022/03/azure-synapse-and-delta-lake/#:~:text=Serverless%20SQL%20pools%20do%20not%20support%20updating%20delta,in%20Azure%20Synapse%20Analytics%20to%20update%20Delta%20Lake.

upvoted 2 times

...

Liveroso

Highly Voted 2 years, 2 months ago

The answer is correct. Azure Synapse Analytics (also named SQL Data Warehouse) is a cloud-based analytics service that allows you to analyze large amounts of data using a combination of on-demand and provisioned resources. It offers several different options for working with data, including: - Dedicated SQL pool: It's best for big and complex tasks. - Serverless Apache Spark pool: It's best for big data analysis and machine learning tasks using Spark SQL and Spark DataFrames. - Serverless SQL pool: This is a service that automatically adjusts the amount of resources you use based on your needs. You only pay for what you use. It's best for small to medium-sized tasks and tasks that change often.

upvoted 22 times

sawanti

1 year, 8 months ago

How can you spent so much time to give explained answers, but you still get them wrong? First answer is correct, second one is Apache Spark pool. Serverless SQL pool doesn't provides updates: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format. Do you see any information about updates there? Updates are possible in Apache Spark: https://docs.delta.io/latest/delta-update.html Btw - what "Apache Spark is best for big data analysis and ML tasks" have in common with Delta Lake updates? Are you copying the answers from the ChatGPT? I have worked with Databricks for 2 years and Apache Spark is the right answer. Apache Spark can be also used for small scenarios as it's not that expensive and is often used by data engineers, not just big data engineers

upvoted 30 times

sawanti

1 year, 8 months ago

Last note - Hash-distributed tables are used for VERY LARGE FACT TABLES. As per documentation (https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute): Consider using a hash-distributed table when: The table size on disk is more than 2 GB.

upvoted 11 times

...

[Removed]

Most Recent 5 months, 1 week ago

WRONG 1. A dedicated SQL pool 2. A serverless Apache Spark pool

upvoted 1 times

...

c_h_r_i_s_

6 months, 1 week ago

1. Ingest data from Data Lake Storage into hash-distributed tables: • A dedicated SQL pool Explanation: Hash-distributed tables are a feature of dedicated SQL pools in Azure Synapse. They allow for efficient data distribution and parallel processing, which is ideal for large-scale data ingestion from Data Lake Storage. 2. Implement, query, and update data in Delta Lake: • A serverless Apache Spark pool Explanation: Serverless Apache Spark pools in Azure Synapse support Delta Lake, providing full read and write capabilities. They allow you to implement, query, and update Delta Lake tables effectively. Answer Area: 1. A dedicated SQL pool 2. A serverless Apache Spark pool

upvoted 2 times

...

Teerawee

7 months, 2 weeks ago

dedicated SQL pool serverless Apache Spark pool

upvoted 1 times

...

Len83

8 months, 2 weeks ago

This question appeared in the exam, August 2024, I gave this same answer for box 1 but answered Apache Spark Pool for box 2. I scored 870

upvoted 2 times

...

Gaz_

10 months ago

From Copilot: To meet the requirements for ingesting data from Data Lake Storage into hash-distributed tables, you should recommend A Dedicated SQL pool. This option is designed for large-scale, high-performance, and secure analytics on Azure. For implementing, querying, and updating data in Delta Lake, you should recommend A serverless Apache Spark pool. This option allows you to run big data analytics and artificial intelligence workloads with Apache Spark, which is compatible with Delta Lake. These recommendations align with Azure's best practices for performance and scalability when working with Synapse and Data Lake Storage Gen2. If you need further details or assistance with the setup, feel free to ask.

upvoted 1 times

...

23169fd

10 months ago

Ingest data from Data Lake Storage into hash-distributed tables: A dedicated SQL pool Implement, query, and update data in Delta Lake: A serverless Apache Spark poo

upvoted 2 times

23169fd

10 months ago

Requirement 1: Ingest data from Data Lake Storage into hash-distributed tables A dedicated SQL pool: This pool is specifically designed for high-performance data warehousing. It allows for the ingestion of large datasets into hash-distributed tables, optimizing performance and scalability. Hash distribution is a key feature of dedicated SQL pools to enhance query performance for large datasets. Recommendation: A dedicated SQL pool Requirement 2: Implement, query, and update data in Delta Lake A serverless Apache Spark pool: Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark. It is optimized for big data workloads and is best utilized with Apache Spark pools. The serverless Apache Spark pool in Azure Synapse provides a managed Spark environment, ideal for working with Delta Lake for querying, updating, and managing large datasets.

upvoted 2 times

...

Lazylinux

11 months, 3 weeks ago

Box 1: is correct => A hash-distributed table distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one distribution. Since identical values always hash to the same distribution, SQL Analytics has built-in knowledge of the row locations. In dedicated SQL pool this knowledge is used to minimize data movement during queries, which improves query performance. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute Box 2: should be Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format

upvoted 1 times

...

Chenn

11 months, 3 weeks ago

Ingest data from Data Lake Storage into hash-distributed tables: For this requirement, I recommend using a Dedicated SQL pool in Azure Synapse. This service is designed for large-scale data processing and supports creating hash-distributed tables to optimize query performance. Implement, query, and update data in Delta Lake: For this requirement, I recommend using a Serverless Apache Spark pool in Azure Synapse. This service provides capabilities for working with Delta Lake as it offers an analytics service that can handle big data processing tasks without the need to provision or manage clusters.

upvoted 1 times

...

RanOlfati

1 year, 1 month ago

Dedicated SQL Pools Purpose: Dedicated SQL pools provide massive parallel processing (MPP) capabilities ideal for handling large volumes of data. They are optimized for complex queries over large datasets and are suitable for building enterprise-level, big data analytics solutions. Spark Pools Purpose: Spark pools in Azure Synapse provide a fully managed Apache Spark environment. They are designed to handle big data processing, analytics, and machine learning tasks. Spark pools can process data in various formats and from multiple sources, including Azure Data Lake Storage.

upvoted 2 times

...

ahmedkmj

1 year, 1 month ago

from ChatGPT : For implementing, querying, and updating data in Delta Lake, the most suitable option among the ones you listed would be A serverless Apache Spark pool. Here's why: Integration with Delta Lake: Apache Spark is tightly integrated with Delta Lake, offering native support for reading from and writing to Delta tables. This integration ensures seamless compatibility and efficient data processing capabilities

upvoted 1 times

...

Paul_white

1 year, 4 months ago

OPTION 2: SERVERLESS APACHE SPARK POOL

upvoted 3 times

...

Exams_Prep_2021

1 year, 6 months ago

Got this on Sept. 29, 2023

upvoted 5 times

...

Forex19

1 year, 7 months ago

I had question at 24th Sep 2023

upvoted 5 times

...

salman_23_c4

1 year, 7 months ago

Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?tabs=x80070002#delta-lake

upvoted 4 times

...

calotta1

1 year, 8 months ago

From MSFT docs: Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake.

upvoted 6 times

...

Load full discussion...

Exam AZ-305 All Questions

View all questions & answers for the AZ-305 exam

Exam AZ-305 topic 2 question 22 discussion

Comments

saiyandjinn

Fidel_104

Fidel_104

WeepingMaplte

RandomNickname

Liveroso

sawanti

sawanti

[Removed]

c_h_r_i_s_

Teerawee

Len83

Gaz_

23169fd

23169fd

Lazylinux

Chenn

RanOlfati

ahmedkmj

Paul_white

Exams_Prep_2021

Forex19

salman_23_c4

calotta1

SY0-701