exam questions

Exam DP-200 All Questions

View all questions & answers for the DP-200 exam

Exam DP-200 topic 2 question 16 discussion

Actual exam question from Microsoft's DP-200
Question #: 16
Topic #: 2
[All DP-200 Questions]

HOTSPOT -
A company is deploying a service-based data environment. You are developing a solution to process this data.
The solution must meet the following requirements:
✑ Use an Azure HDInsight cluster for data ingestion from a relational database in a different cloud service
✑ Use an Azure Data Lake Storage account to store processed data
✑ Allow users to download processed data
You need to recommend technologies for the solution.
Which technologies should you use? To answer, select the appropriate options in the answer area.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: Apache Sqoop -
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Azure HDInsight is a cloud distribution of the Hadoop components from the Hortonworks Data Platform (HDP).
Incorrect Answers:
DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting.
It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. Its MapReduce pedigree has endowed it with some quirks in both its semantics and execution.
RevoScaleR is a collection of proprietary functions in Machine Learning Server used for practicing data science at scale. For data scientists, RevoScaleR gives you data-related functions for import, transformation and manipulation, summarization, visualization, and analysis.

Box 2: Apache Kafka -
Apache Kafka is a distributed streaming platform.
A streaming platform has three key capabilities:
Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
Store streams of records in a fault-tolerant durable way.
Process streams of records as they occur.
Kafka is generally used for two broad classes of applications:
Building real-time streaming data pipelines that reliably get data between systems or applications
Building real-time streaming applications that transform or react to the streams of data

Box 3: Ambari Hive View -
You can run Hive queries by using Apache Ambari Hive View. The Hive View allows you to author, optimize, and run Hive queries from your web browser.
References:
https://sqoop.apache.org/
https://kafka.apache.org/intro
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-ambari-view

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
unidigm
3 years, 11 months ago
Apache Sqoop, Apache Hive, Ambari Hive View
upvoted 4 times
...
Kratik
3 years, 11 months ago
For Process, I think it should be hive For Download, the answer seems correct. But instead of 'Ambari Hive View', I think it should be 'Apache Hive View'
upvoted 1 times
...
For Process it should be 'Hive' as it provide full storage mechanism
upvoted 1 times
...
tucho
4 years ago
can't be "hive" for process task?
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago