exam questions

Exam AZ-305 All Questions

View all questions & answers for the AZ-305 exam

Exam AZ-305 topic 2 question 22 discussion

Actual exam question from Microsoft's AZ-305
Question #: 22
Topic #: 2
[All AZ-305 Questions]

HOTSPOT
-

You are designing a data analytics solution that will use Azure Synapse and Azure Data Lake Storage Gen2.

You need to recommend Azure Synapse pools to meet the following requirements:

• Ingest data from Data Lake Storage into hash-distributed tables.
• Implement query, and update data in Delta Lake.

What should you recommend for each requirement? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Show Suggested Answer Hide Answer
Suggested Answer:

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
saiyandjinn
Highly Voted 1 year, 10 months ago
The second question is confusing, and I am not sure what the answer is - Can query delta lake with Serverless SQL pool but won't be able to update it. - Only Apache Spark pools support updates to Delta Lakes files. It can also be used to query long-time series as well if I understand the doc correctly... I think the answer to 2 is Apache Spark tools on that basis...
upvoted 35 times
Fidel_104
9 months ago
The question mentions 'Data Lake Storage', not Delta Lake - there is no explicit indication that the data is stored in a delta lake format. Therefore I don't think that the Spark pool is needed. Nevertheless, Delta Lake is indeed a very confusing name for what is essentially a data format ("optimized storage layer").
upvoted 2 times
Fidel_104
9 months ago
Ah I take it back, Delta lake is also mentioned later, sry for the confusion.
upvoted 1 times
...
...
WeepingMaplte
8 months, 2 weeks ago
Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files Taken from: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format
upvoted 1 times
...
RandomNickname
1 year, 10 months ago
Agree. From what I can find SQL pool can't update delta lake files only Apache Spark can do that, assuming article is accurate below; https://www.jamesserra.com/archive/2022/03/azure-synapse-and-delta-lake/#:~:text=Serverless%20SQL%20pools%20do%20not%20support%20updating%20delta,in%20Azure%20Synapse%20Analytics%20to%20update%20Delta%20Lake.
upvoted 2 times
...
...
Liveroso
Highly Voted 1 year, 10 months ago
The answer is correct. Azure Synapse Analytics (also named SQL Data Warehouse) is a cloud-based analytics service that allows you to analyze large amounts of data using a combination of on-demand and provisioned resources. It offers several different options for working with data, including: - Dedicated SQL pool: It's best for big and complex tasks. - Serverless Apache Spark pool: It's best for big data analysis and machine learning tasks using Spark SQL and Spark DataFrames. - Serverless SQL pool: This is a service that automatically adjusts the amount of resources you use based on your needs. You only pay for what you use. It's best for small to medium-sized tasks and tasks that change often.
upvoted 22 times
sawanti
1 year, 3 months ago
How can you spent so much time to give explained answers, but you still get them wrong? First answer is correct, second one is Apache Spark pool. Serverless SQL pool doesn't provides updates: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format. Do you see any information about updates there? Updates are possible in Apache Spark: https://docs.delta.io/latest/delta-update.html Btw - what "Apache Spark is best for big data analysis and ML tasks" have in common with Delta Lake updates? Are you copying the answers from the ChatGPT? I have worked with Databricks for 2 years and Apache Spark is the right answer. Apache Spark can be also used for small scenarios as it's not that expensive and is often used by data engineers, not just big data engineers
upvoted 28 times
sawanti
1 year, 3 months ago
Last note - Hash-distributed tables are used for VERY LARGE FACT TABLES. As per documentation (https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute): Consider using a hash-distributed table when: The table size on disk is more than 2 GB.
upvoted 10 times
...
...
...
SeMo0o0o0o
Most Recent 3 weeks, 1 day ago
WRONG 1. A dedicated SQL pool 2. A serverless Apache Spark pool
upvoted 1 times
...
c_h_r_i_s_
1 month, 3 weeks ago
1. Ingest data from Data Lake Storage into hash-distributed tables: • A dedicated SQL pool Explanation: Hash-distributed tables are a feature of dedicated SQL pools in Azure Synapse. They allow for efficient data distribution and parallel processing, which is ideal for large-scale data ingestion from Data Lake Storage. 2. Implement, query, and update data in Delta Lake: • A serverless Apache Spark pool Explanation: Serverless Apache Spark pools in Azure Synapse support Delta Lake, providing full read and write capabilities. They allow you to implement, query, and update Delta Lake tables effectively. Answer Area: 1. A dedicated SQL pool 2. A serverless Apache Spark pool
upvoted 2 times
...
Teerawee
2 months, 3 weeks ago
dedicated SQL pool serverless Apache Spark pool
upvoted 1 times
...
Len83
3 months, 3 weeks ago
This question appeared in the exam, August 2024, I gave this same answer for box 1 but answered Apache Spark Pool for box 2. I scored 870
upvoted 2 times
...
Gaz_
5 months, 1 week ago
From Copilot: To meet the requirements for ingesting data from Data Lake Storage into hash-distributed tables, you should recommend A Dedicated SQL pool. This option is designed for large-scale, high-performance, and secure analytics on Azure. For implementing, querying, and updating data in Delta Lake, you should recommend A serverless Apache Spark pool. This option allows you to run big data analytics and artificial intelligence workloads with Apache Spark, which is compatible with Delta Lake. These recommendations align with Azure's best practices for performance and scalability when working with Synapse and Data Lake Storage Gen2. If you need further details or assistance with the setup, feel free to ask.
upvoted 1 times
...
23169fd
5 months, 2 weeks ago
Ingest data from Data Lake Storage into hash-distributed tables: A dedicated SQL pool Implement, query, and update data in Delta Lake: A serverless Apache Spark poo
upvoted 2 times
23169fd
5 months, 2 weeks ago
Requirement 1: Ingest data from Data Lake Storage into hash-distributed tables A dedicated SQL pool: This pool is specifically designed for high-performance data warehousing. It allows for the ingestion of large datasets into hash-distributed tables, optimizing performance and scalability. Hash distribution is a key feature of dedicated SQL pools to enhance query performance for large datasets. Recommendation: A dedicated SQL pool Requirement 2: Implement, query, and update data in Delta Lake A serverless Apache Spark pool: Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark. It is optimized for big data workloads and is best utilized with Apache Spark pools. The serverless Apache Spark pool in Azure Synapse provides a managed Spark environment, ideal for working with Delta Lake for querying, updating, and managing large datasets.
upvoted 2 times
...
...
Lazylinux
7 months ago
Box 1: is correct => A hash-distributed table distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one distribution. Since identical values always hash to the same distribution, SQL Analytics has built-in knowledge of the row locations. In dedicated SQL pool this knowledge is used to minimize data movement during queries, which improves query performance. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute Box 2: should be Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format
upvoted 1 times
...
Chenn
7 months ago
Ingest data from Data Lake Storage into hash-distributed tables: For this requirement, I recommend using a Dedicated SQL pool in Azure Synapse. This service is designed for large-scale data processing and supports creating hash-distributed tables to optimize query performance. Implement, query, and update data in Delta Lake: For this requirement, I recommend using a Serverless Apache Spark pool in Azure Synapse. This service provides capabilities for working with Delta Lake as it offers an analytics service that can handle big data processing tasks without the need to provision or manage clusters.
upvoted 1 times
...
RanOlfati
8 months, 2 weeks ago
Dedicated SQL Pools Purpose: Dedicated SQL pools provide massive parallel processing (MPP) capabilities ideal for handling large volumes of data. They are optimized for complex queries over large datasets and are suitable for building enterprise-level, big data analytics solutions. Spark Pools Purpose: Spark pools in Azure Synapse provide a fully managed Apache Spark environment. They are designed to handle big data processing, analytics, and machine learning tasks. Spark pools can process data in various formats and from multiple sources, including Azure Data Lake Storage.
upvoted 2 times
...
ahmedkmj
8 months, 3 weeks ago
from ChatGPT : For implementing, querying, and updating data in Delta Lake, the most suitable option among the ones you listed would be A serverless Apache Spark pool. Here's why: Integration with Delta Lake: Apache Spark is tightly integrated with Delta Lake, offering native support for reading from and writing to Delta tables. This integration ensures seamless compatibility and efficient data processing capabilities
upvoted 1 times
...
Paul_white
1 year ago
OPTION 2: SERVERLESS APACHE SPARK POOL
upvoted 3 times
...
Exams_Prep_2021
1 year, 2 months ago
Got this on Sept. 29, 2023
upvoted 5 times
...
Forex19
1 year, 2 months ago
I had question at 24th Sep 2023
upvoted 5 times
...
salman_23_c4
1 year, 2 months ago
Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?tabs=x80070002#delta-lake
upvoted 4 times
...
calotta1
1 year, 3 months ago
From MSFT docs: Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake.
upvoted 6 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...