Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 78 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 78
Topic #: 1
[All Professional Data Engineer Questions]

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

  • A. PigLatin using Pig
  • B. HiveQL using Hive
  • C. Java using MapReduce
  • D. Python using MapReduce
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
[Removed]
Highly Voted 4 years, 8 months ago
Answer: A Description: Pig is scripting language which can be used for checkpointing and splitting pipelines
upvoted 23 times
...
[Removed]
Highly Voted 4 years, 8 months ago
Should be A
upvoted 15 times
...
SamuelTsch
Most Recent 1 month ago
Selected Answer: A
I would go to A. C, D are similar. So both are excluded. B, Hive is actually a data warehouse system. I don't use Apache Pig. But, BCD are wrong. Then A should be correct.
upvoted 1 times
...
AnonymousPanda
1 year, 3 months ago
Selected Answer: A
A as others have said
upvoted 1 times
...
Oleksandr0501
1 year, 7 months ago
Selected Answer: C
Comment content is too short
upvoted 2 times
...
juliobs
1 year, 8 months ago
Selected Answer: A
PigLatin is the correct answer, however... the last release was 6 years ago and has lots of bugs.
upvoted 2 times
...
musumusu
1 year, 9 months ago
This answer depends which language you are comfortable with. Hadoop is your framework, where mapReduce is your Native programming model in JAVA, which is designed to scale, parallel processing, restart pipeline from any checkpoint etc. , So if you are comfortable with JAVA, you can customize your checkpoint at lowlevel in better way. otherwise, choose PIG which is another programming concept run over JAVA but then you need to learn this also, if not choose python as it can be deployed with hadoop because hadoop has been making updates for python clients regularly. Option C: is the best one.
upvoted 7 times
...
samdhimal
1 year, 10 months ago
C. Java using MapReduce or D. Python using MapReduce Apache Hadoop is a distributed computing framework that allows you to process large datasets using the MapReduce programming model. There are several options for writing ETL pipelines to run on a Hadoop cluster, but the most common are using Java or Python with the MapReduce programming model.
upvoted 4 times
samdhimal
1 year, 10 months ago
A. PigLatin using Pig is a high-level data flow language that is used to create ETL pipelines. Pig is built on top of Hadoop, and it allows you to write scripts in PigLatin, a SQL-like language that is used to process data in Hadoop. Pig is a simpler option than MapReduce but it lacks some capabilities like the control over low-level data manipulation operations. B. HiveQL using Hive is a SQL-like language for querying and managing large datasets stored in Hadoop's distributed file system. Hive is built on top of Hadoop and it provides an SQL-like interface for querying data stored in Hadoop. Hive is more suitable for querying and managing large datasets stored in Hadoop than for ETL pipelines. Both Java and Python using MapReduce provide low-level control over data manipulation operations, and they allow you to write custom mapper and reducer functions that can be used to process data in a Hadoop cluster. The choice between Java and Python will depend on the development team's expertise and preference.
upvoted 3 times
cetanx
1 year, 6 months ago
It has to be C because while Pig can be used to simplify the writing of complex data transformation tasks and can store intermediate results, it doesn't provide the detailed control over checkpointing and pipeline splitting in the way that is typically implied by those terms. also, while one can write MapReduce jobs in languages other than Java (like Python) using Hadoop Streaming or other similar APIs, it may not be as efficient or as seamless as using Java due to the JVM-native nature of Hadoop.
upvoted 2 times
...
...
...
Koushik25sep
2 years, 2 months ago
Selected Answer: A
Description: Pig is scripting language which can be used for checkpointing and splitting pipelines
upvoted 1 times
...
BigDataBB
2 years, 9 months ago
Why not D?
upvoted 1 times
...
rbeeraka
2 years, 10 months ago
Selected Answer: A
PigLatin supports checkpoints
upvoted 1 times
...
davidqianwen
2 years, 10 months ago
Selected Answer: A
Answer: A
upvoted 1 times
...
maddy5835
3 years, 1 month ago
Pig is just a scripting language, how pig can be used in creation of pipelines, should be answer from c & D
upvoted 3 times
...
sumanshu
3 years, 4 months ago
Vote for A
upvoted 1 times
...
kdiab
3 years, 9 months ago
Found this slideset that puts in favor answer A (pig) : https://poloclub.github.io/cx4242-2019fall-campus/slides/17-CSE6242-612-ScalingUp-hive.pdf
upvoted 2 times
...
IsaB
4 years, 2 months ago
Is this really a question that could appear in Google Cloud Professional Data Engineer Exam? What does it have to do with Google Cloud? I would use DataProc no?
upvoted 10 times
Pupina
4 years, 1 month ago
Did you take the exam? I am ready to do it this month
upvoted 1 times
...
MaxNRG
2 years, 11 months ago
seems like a very old question :) not sure it's actual
upvoted 2 times
...
...
haroldbenites
4 years, 3 months ago
A is correct
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...