exam questions

Exam DP-200 All Questions

View all questions & answers for the DP-200 exam

Exam DP-200 topic 1 question 19 discussion

Actual exam question from Microsoft's DP-200
Question #: 19
Topic #: 1
[All DP-200 Questions]

HOTSPOT -
You are developing a solution using a Lambda architecture on Microsoft Azure.
The data at rest layer must meet the following requirements:
Data storage:
✑ Serve as a repository for high volumes of large files in various formats.
✑ Implement optimized storage for big data analytics workloads.
✑ Ensure that data can be organized using a hierarchical structure.
Batch processing:
✑ Use a managed solution for in-memory computation processing.
✑ Natively support Scala, Python, and R programming languages.
✑ Provide the ability to resize and terminate the cluster automatically.
Analytical data store:
✑ Support parallel processing.
✑ Use columnar storage.
✑ Support SQL-based languages.
You need to identify the correct technologies to build the Lambda architecture.
Which technologies should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Data storage: Azure Data Lake Store
A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized. With the hierarchical namespace enabled, a storage account becomes capable of providing the scalability and cost-effectiveness of object storage, with file system semantics that are familiar to analytics engines and frameworks.
Batch processing: HD Insight Spark
Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications.
HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP,
MapReduce.
Languages: R, Python, Java, Scala, SQL
Analytic data store: Azure Synapse Analytics
Azure Synapse Analytics Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP).
Azure Synapse Analytics stores data into relational tables with columnar storage.
Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gallego82
Highly Voted 4 years ago
I think that in batch processing the answer sould be Azure DataBricks, due to the link provided an it´s capabilities: Azure Databricks is an Apache Spark-based analytics platform. You can think of it as "Spark as a service." It's the easiest way to use Spark on the Azure platform. Languages: R, Python, Java, Scala, Spark SQL Fast cluster start times, autotermination, autoscaling. Manages the Spark cluster for you. Built-in integration with Azure Blob Storage, Azure Data Lake Storage (ADLS), Azure Synapse, and other services. See Data Sources. User authentication with Azure Active Directory. Web-based notebooks for collaboration and data exploration. Supports GPU-enabled cluster
upvoted 38 times
...
Pairon
Highly Voted 4 years ago
Agree with the comments above. Databricks enables you to autoscale and autoterminate your cluster and enables also to in-memory processing because of the undelying Spark engine.
upvoted 6 times
...
Palp
Most Recent 3 years, 10 months ago
Batch processing is Spark as it provides in memory operations
upvoted 1 times
...
AZ20
3 years, 10 months ago
"terminate the cluster automatically" - I think this line makes Databricks a more suitable choice Rest requirements suits both HDInsight and Databricks equally.
upvoted 3 times
...
MYR55
3 years, 11 months ago
ADLS HDInsight Spark ( https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-autoscale-clusters ) keyword here is in-memory processing Azure Synapse Analytics
upvoted 1 times
eurekamike
3 years, 10 months ago
Databricks has in-memory processing
upvoted 1 times
...
...
maciejt
3 years, 11 months ago
Why not cosmos for analytical datastore?
upvoted 1 times
...
NamishBansal
3 years, 11 months ago
For third one Synapse will work but why will Cosmos not work?
upvoted 2 times
...
ssanka
4 years ago
I think answer should be cosmos db for 3rd one. Azure synapse doesn't support columnar storage right ?
upvoted 2 times
meswapnilspal
3 years, 12 months ago
it does. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is
upvoted 2 times
...
...
Wendy_DK
4 years ago
Batch processing should be Azure Databricks
upvoted 2 times
...
Manoel_Benicio
4 years ago
that´s correct so the answers would be: Azure DataBricks, Azure DataLake and ASA (Azure Synapse)
upvoted 4 times
...
LG5
4 years ago
Batch processing should be Azure Databricks right?
upvoted 5 times
eliabsbueno
4 years ago
Yes! HDInsight does not support autotermination natively
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago