Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 78 discussion

Actual exam question from Google's Professional Data Engineer

Question #: 78
Topic #: 1

[All Professional Data Engineer Questions]

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

A. PigLatin using Pig
B. HiveQL using Hive
C. Java using MapReduce
D. Python using MapReduce

Show Suggested Answer

Suggested Answer: A 🗳️

by [deleted] at March 21, 2020, 6:11 p.m.

Comments

Submit Cancel

[Removed]

Highly Voted 4 years, 8 months ago

Answer: A Description: Pig is scripting language which can be used for checkpointing and splitting pipelines

upvoted 23 times

...

[Removed]

Highly Voted 4 years, 8 months ago

Should be A

upvoted 15 times

...

SamuelTsch

Most Recent 1 month ago

Selected Answer: A

I would go to A. C, D are similar. So both are excluded. B, Hive is actually a data warehouse system. I don't use Apache Pig. But, BCD are wrong. Then A should be correct.

upvoted 1 times

...

AnonymousPanda

1 year, 3 months ago

Selected Answer: A

A as others have said

upvoted 1 times

...

Oleksandr0501

1 year, 7 months ago

Selected Answer: C

Comment content is too short

upvoted 2 times

...

juliobs

1 year, 8 months ago

Selected Answer: A

PigLatin is the correct answer, however... the last release was 6 years ago and has lots of bugs.

upvoted 2 times

...

musumusu

1 year, 9 months ago

This answer depends which language you are comfortable with. Hadoop is your framework, where mapReduce is your Native programming model in JAVA, which is designed to scale, parallel processing, restart pipeline from any checkpoint etc. , So if you are comfortable with JAVA, you can customize your checkpoint at lowlevel in better way. otherwise, choose PIG which is another programming concept run over JAVA but then you need to learn this also, if not choose python as it can be deployed with hadoop because hadoop has been making updates for python clients regularly. Option C: is the best one.

upvoted 7 times

...

samdhimal

1 year, 10 months ago

C. Java using MapReduce or D. Python using MapReduce Apache Hadoop is a distributed computing framework that allows you to process large datasets using the MapReduce programming model. There are several options for writing ETL pipelines to run on a Hadoop cluster, but the most common are using Java or Python with the MapReduce programming model.

upvoted 4 times

samdhimal

1 year, 10 months ago

A. PigLatin using Pig is a high-level data flow language that is used to create ETL pipelines. Pig is built on top of Hadoop, and it allows you to write scripts in PigLatin, a SQL-like language that is used to process data in Hadoop. Pig is a simpler option than MapReduce but it lacks some capabilities like the control over low-level data manipulation operations. B. HiveQL using Hive is a SQL-like language for querying and managing large datasets stored in Hadoop's distributed file system. Hive is built on top of Hadoop and it provides an SQL-like interface for querying data stored in Hadoop. Hive is more suitable for querying and managing large datasets stored in Hadoop than for ETL pipelines. Both Java and Python using MapReduce provide low-level control over data manipulation operations, and they allow you to write custom mapper and reducer functions that can be used to process data in a Hadoop cluster. The choice between Java and Python will depend on the development team's expertise and preference.

upvoted 3 times

cetanx

1 year, 6 months ago

It has to be C because while Pig can be used to simplify the writing of complex data transformation tasks and can store intermediate results, it doesn't provide the detailed control over checkpointing and pipeline splitting in the way that is typically implied by those terms. also, while one can write MapReduce jobs in languages other than Java (like Python) using Hadoop Streaming or other similar APIs, it may not be as efficient or as seamless as using Java due to the JVM-native nature of Hadoop.

upvoted 2 times

...

Koushik25sep

2 years, 2 months ago

Selected Answer: A

Description: Pig is scripting language which can be used for checkpointing and splitting pipelines

upvoted 1 times

...

BigDataBB

2 years, 9 months ago

Why not D?

upvoted 1 times

...

rbeeraka

2 years, 10 months ago

Selected Answer: A

PigLatin supports checkpoints

upvoted 1 times

...

davidqianwen

2 years, 10 months ago

Selected Answer: A

Answer: A

upvoted 1 times

...

maddy5835

3 years, 1 month ago

Pig is just a scripting language, how pig can be used in creation of pipelines, should be answer from c & D

upvoted 3 times

...

sumanshu

3 years, 4 months ago

Vote for A

upvoted 1 times

...

kdiab

3 years, 9 months ago

Found this slideset that puts in favor answer A (pig) : https://poloclub.github.io/cx4242-2019fall-campus/slides/17-CSE6242-612-ScalingUp-hive.pdf

upvoted 2 times

...

IsaB

4 years, 2 months ago

Is this really a question that could appear in Google Cloud Professional Data Engineer Exam? What does it have to do with Google Cloud? I would use DataProc no?

upvoted 10 times

Pupina

4 years, 1 month ago

Did you take the exam? I am ready to do it this month

upvoted 1 times

...

MaxNRG

2 years, 11 months ago

seems like a very old question :) not sure it's actual

upvoted 2 times

...

haroldbenites

4 years, 3 months ago

A is correct

upvoted 1 times

...

Load full discussion...

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 78 discussion

Comments

[Removed]

[Removed]

SamuelTsch

AnonymousPanda

Oleksandr0501

juliobs

musumusu

samdhimal

samdhimal

cetanx

Koushik25sep

BigDataBB

rbeeraka

davidqianwen

maddy5835

sumanshu

kdiab

IsaB

Pupina

MaxNRG

haroldbenites

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019