exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 40 discussion

Actual exam question from Microsoft's DP-203
Question #: 40
Topic #: 1
[All DP-203 Questions]

You are implementing a batch dataset in the Parquet format.
Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool.
You need to minimize storage costs for the solution.
What should you do?

  • A. Use Snappy compression for the files.
  • B. Use OPENROWSET to query the Parquet files.
  • C. Create an external table that contains a subset of columns from the Parquet files.
  • D. Store all data as string in the Parquet files.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
m2shines
Highly Voted 3 years, 2 months ago
Answer should be A, because this talks about minimizing storage costs, not querying costs
upvoted 75 times
Homer23
11 months, 2 weeks ago
I found this comparison of compression methods, which explained that A should not be the answer. https://www.linkedin.com/pulse/comparison-compression-methods-parquet-file-format-saurav-mohapatra/ "BROTLI : This is a relatively new codec which offers very high compression ratio , but with lower compression and decompression speeds. This codec is useful when storage space is a major constraint. This technique also offers parallel processing that other methods don't."
upvoted 2 times
...
assU2
3 years, 1 month ago
Isn't snappy a default compressionCodec for parquet in azure? https://docs.microsoft.com/en-us/azure/data-factory/format-parquet
upvoted 24 times
jongert
1 year, 2 months ago
Very confused at first, after thinking about it and rereading this is what I found: It says we are implementing the batch process in parquet format, so we should think about a situation where we write the file and specify snappy compression as an argument explicitly. The phrasing is very confusing I have to say, but if you argue from a 'query externally' perspective, then B and C would yield the same benefit. Therefore, A makes the most sense and connects best with the question.
upvoted 2 times
...
...
...
Aslam208
Highly Voted 3 years, 2 months ago
C is the correct answer, as an external table with a subset of columns with parquet files would be cost-effective.
upvoted 23 times
Massy
2 years, 9 months ago
in serverless sql pool you don't create a copy of the data, so how could be cost effective?
upvoted 2 times
Bro111
2 years, 2 months ago
Don't forget that there is Transaction cost part of storage cost, so taking a subset of columns will lower transaction cost consequently storage cost.
upvoted 1 times
...
...
RehanRajput
2 years, 9 months ago
This is not correct. 1. External tables are are not saved in the database. (This is why they're external) 2. You're assuming that the SQL Serverless pools have a local storage. They don't -- > https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool
upvoted 5 times
Aditya0891
2 years, 8 months ago
well there is a possibility to create an external table and load only the required columns using openrowset in serverless sql pool to a different container in ADLS. Remember serverless sql pool does support cetas with openrowset but dedicated pool doesn't support loading data using openrowset. So basically the solution could be load the required columns using cetas using openrowset to a differnet container and delete the source data from previous container after loading the filtered data to a different container in ADLS
upvoted 2 times
Aditya0891
2 years, 8 months ago
check this https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-cetas. Answer C is correct
upvoted 4 times
...
...
...
...
IMadnan
Most Recent 1 week ago
Selected Answer: A
Parquet is a columnar storage format that inherently provides compression benefits. Applying Snappy compression on top of Parquet's internal compression will further reduce the storage footprint of the files in Azure Data Lake Storage Gen2. Snappy is a well-suited compression codec for analytical workloads, offering a good balance between compression ratio and decompression speed, which is important for Azure Synapse Analytics serverless SQL pool to efficiently query the data. Options B, C, and D do not directly address minimizing storage costs for the Parquet files themselves. Option B is about query access, Option C is about query efficiency but not storage, and Option D is counterproductive to storage cost minimization.
upvoted 1 times
...
moize
2 months, 3 weeks ago
Selected Answer: A
A. Utilisez la compression Snappy pour les fichiers Cette approche permet de réduire la taille des fichiers Parquet, ce qui minimise les coûts de stockage dans Azure Data Lake Storage Gen2 tout en restant compatible avec Azure Synapse Analytics.
upvoted 1 times
...
EmnCours
2 months, 3 weeks ago
Selected Answer: A
https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs-legacy
upvoted 1 times
...
a85becd
5 months, 3 weeks ago
Selected Answer: A
Using Snappy compression (Option A) is specifically designed to reduce the size of Parquet files, thereby directly minimizing storage costs.
upvoted 1 times
...
Danweo
7 months, 3 weeks ago
Selected Answer: C
The question is confusing but I believe it is C, because you can use CETAS to store this external table in Gen2 (this is the storage solution), from there you will query it using serverless SQL pool.
upvoted 1 times
...
Dusica
9 months, 4 weeks ago
A, B and C they are all acceptable, D is just stupid But pay attention to "You need to minimize storage costs for the solution" that means snappy parquet compresson - A is correct
upvoted 2 times
...
dgerok
10 months, 1 week ago
Selected Answer: A
Use Snappy compression for the files is the only answer, which is about minimizing cost of storage. While one is using serverless SQL pool, the external tables are available, which are the only metadata...
upvoted 3 times
...
Elanche
10 months, 3 weeks ago
Using Snappy compression for the Parquet files helps minimize storage costs while still maintaining good compression efficiency. Snappy is a compression library that offers a good balance between compression ratio and processing speed. By compressing the data using Snappy, you can significantly reduce the amount of storage required for your dataset. Option B, using OPENROWSET to query the Parquet files, doesn't directly impact storage costs. It's a method for querying data but doesn't address storage optimization. Option C, creating an external table with a subset of columns, may help reduce query costs by minimizing the amount of data that needs to be processed during queries. However, it doesn't directly address storage costs. Option D, storing all data as strings in the Parquet files, would likely increase storage costs rather than minimize them. Storing data as strings without appropriate compression would result in larger file sizes compared to using efficient compression algorithms like Snappy.
upvoted 6 times
...
ankeshpatel2112
10 months, 4 weeks ago
A. Use Snappy compression for the files.
upvoted 2 times
...
Zen9nez
12 months ago
The answer is C - Parquet has default SNAPPY compression which cannot be overwritten so why would I apply SNAPPY again?
upvoted 3 times
...
s_unsworth
12 months ago
Selected Answer: A
Further information required for this question. There isn't enough information to go off as to what is being asked. The initial question is in regards to storage which would result in using the snappy compression answer. If you are asking about querying the data then this should be clearly defined in the question. If someone was to create a User Story with regards to this (As a Manager I want to store data in the data lake at the reduced cost) then you wouldn't be providing them with an External table. You would give them information on storage.
upvoted 2 times
...
Joanna0
1 year, 1 month ago
Selected Answer: A
Snappy compression can reduce the size of Parquet files by up to 70%. This can save you a significant amount of money on storage costs.
upvoted 1 times
...
[Removed]
1 year, 5 months ago
Selected Answer: A
Snappy
upvoted 2 times
...
kkk5566
1 year, 5 months ago
Selected Answer: A
using compression
upvoted 2 times
...
kkk5566
1 year, 6 months ago
To minimize storage costs for the solution, you should use Snappy compression for the files. Snappy is a fast and efficient data compression and decompression library that can be used to compress Parquet files. This will help reduce the size of the data files and minimize storage costs in Azure Data Lake Storage Gen2. So, the correct answer is A. Use Snappy compression for the files
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago