exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 5 discussion

Actual exam question from Microsoft's DP-203
Question #: 5
Topic #: 1
[All DP-203 Questions]

HOTSPOT -
You are planning the deployment of Azure Data Lake Storage Gen2.
You have the following two reports that will access the data lake:
✑ Report1: Reads three columns from a file that contains 50 columns.
✑ Report2: Queries a single record based on a timestamp.
You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times.
What should you recommend for each report? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Report1: CSV -
CSV: The destination writes records as delimited data.

Report2: AVRO -
AVRO supports timestamps.
Not Parquet, TSV: Not options for Azure Data Lake Storage Gen2.
Reference:
https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Destinations/ADLS-G2-D.html

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
alain2
Highly Voted 3 years, 11 months ago
1: Parquet - column-oriented binary file format 2: AVRO - Row based format, and has logical type timestamp https://youtu.be/UrWthx8T3UY
upvoted 224 times
technoguy
1 month ago
This is Correct since we only have to read three columns its best to use column based storage AVRO for report2 is fine because as per the AVRO format defintin it already got timestmp
upvoted 2 times
...
azurestudent1498
3 years ago
this is correct.
upvoted 3 times
...
XiltroX
2 years, 5 months ago
Thanks for the video share, this really helps. Cheers.
upvoted 2 times
...
terajuana
3 years, 10 months ago
the web is full of old information. timestamp support has been added to parquet
upvoted 9 times
vlad888
3 years, 10 months ago
Ok, but in 1st case we need only 3 of 50 columns. Parquet i columnar format. In 2nd Avro because ideal for read full row
upvoted 25 times
...
...
...
Himlo24
Highly Voted 3 years, 11 months ago
Shouldn't the answer for Report 1 be Parquet? Because Parquet format is Columnar and should be best for reading a few columns only.
upvoted 33 times
...
ngabonzic
Most Recent 1 month, 1 week ago
1.Parquet 2.AVRO
upvoted 1 times
...
krishna1303
3 months, 1 week ago
report 1: Parquet (column-based file format optimized for reading specific columns) report 2:Avro(Row-based file format)
upvoted 1 times
...
ff5037f
6 months ago
the answer should be Parquet for report 1
upvoted 1 times
...
marcin1212
7 months ago
The goal is: The solution must minimize read times. I made small test on Databrick plus DataLake. The same file saved as Parquet and Avro 9 mln of records. Parquet ~150 MB Avro ~700MB Reading Parquet is always 10 times faster that Avro. I checked: - for all data or small range of data with condition - all or only one column So I will select option: - Parquet - Parquet
upvoted 3 times
dev2dev
3 years, 3 months ago
how can be faster read is same as number of reads?
upvoted 1 times
...
...
ragz_87
7 months ago
1. Parquet 2. Avro https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices "Consider using the Avro file format in cases where your I/O patterns are more write heavy, or the query patterns favor retrieving multiple rows of records in their entirety. Consider Parquet and ORC file formats when the I/O patterns are more read heavy or when the query patterns are focused on a subset of columns in the records."
upvoted 9 times
SebK
3 years, 1 month ago
Thank you.
upvoted 1 times
...
...
akk_1289
7 months ago
To minimize read times for the two reports, it is recommended to store the data in the data lake in the parquet format. Parquet is a columnar storage format that is optimized for querying large datasets. It stores data in a compact and efficient manner, allowing for fast querying and filtering of data. In this case, Report1 needs to read only three columns from a file that contains 50 columns. Since parquet stores data in a columnar format, the query can skip reading the unnecessary columns and only read the required ones, which can greatly improve the read performance. Report2 needs to query a single record based on a timestamp. Parquet also supports efficient filtering and querying based on specific values, such as timestamps, making it a good choice for this report as well. Other formats, such as avro, csv, and tsv, may not provide the same level of performance for these types of queries. Therefore, it is recommended to use parquet to store the data in the data lake.
upvoted 9 times
...
lisa710
7 months ago
Report 1: parquet Columnar Storage: Parquet stores data in columns, allowing efficient reading of only the required columns (3 out of 50). Report 2: Avro. Fast Single-Record Access: Optimized row-based formats excel at quickly accessing individual records based on a specific condition, such as a timestamp. I don't understand why incorrect answers are being provided, causing confusion.
upvoted 1 times
...
ypan
7 months ago
Recommendations: Report1: Use Parquet Reason: Parquet is a columnar storage format that is optimized for reading specific columns. Since Report1 needs to read only three columns out of 50, Parquet allows reading just those columns efficiently without scanning the entire file. Report2: Use Avro Reason: Avro is a row-based storage format, which is efficient for retrieving entire rows based on a specific condition, such as a timestamp. It allows quick access to individual records, making it suitable for Report2's requirement.
upvoted 3 times
...
ItsPayakan
7 months, 1 week ago
Why cant it be Parquet for both reports? Parquet supports timestamp too
upvoted 1 times
...
207680a
9 months ago
1- Parquet 2- Avro Parquet for Column selection(stored in columnar format) and Avro for row selection (stored in row format)
upvoted 1 times
...
Vaibhav251999
1 year ago
1. Parquet 2. Avro
upvoted 2 times
...
Alongi
1 year, 1 month ago
Parquet Avro
upvoted 1 times
...
mykel71
1 year, 2 months ago
1. Parquet - Column oriented data store 2. AVRO - supports timestamps
upvoted 1 times
...
sdg2844
1 year, 3 months ago
Agree: 1: Parquet - ideal for columnar forma 2: AVRO: Row-based with logical timestamp
upvoted 2 times
...
matiandal
1 year, 6 months ago
--> TLDR <-- AVRO PARQUET ORC Anal. Queries v v Write Ops (ETL ops) v Nested Data v ACID Properties v Sch.Flexibility v
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago