Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 275 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 275
Topic #: 1
[All Professional Data Engineer Questions]

You created an analytics environment on Google Cloud so that your data scientist team can explore data without impacting the on-premises Apache Hadoop solution. The data in the on-premises Hadoop Distributed File System (HDFS) cluster is in Optimized Row Columnar (ORC) formatted files with multiple columns of Hive partitioning. The data scientist team needs to be able to explore the data in a similar way as they used the on-premises HDFS cluster with SQL on the Hive query engine. You need to choose the most cost-effective storage and processing solution. What should you do?

  • A. Import the ORC files to Bigtable tables for the data scientist team.
  • B. Import the ORC files to BigQuery tables for the data scientist team.
  • C. Copy the ORC files on Cloud Storage, then deploy a Dataproc cluster for the data scientist team.
  • D. Copy the ORC files on Cloud Storage, then create external BigQuery tables for the data scientist team.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
raaad
Highly Voted 10 months, 2 weeks ago
Selected Answer: D
- It leverages the strengths of BigQuery for SQL-based exploration while avoiding additional costs and complexity associated with data transformation or migration. - The data remains in ORC format in Cloud Storage, and BigQuery's external tables feature allows direct querying of this data.
upvoted 7 times
nadavw
2 months, 3 weeks ago
There is a requirement to use a 'hive query engine'', and BQ is using only the hive metastore and his own engine, so 'D' seems a better fit here.
upvoted 1 times
...
...
SamuelTsch
Most Recent 3 weeks, 2 days ago
Selected Answer: B
using external tables have always limitations - affecting performance, no preview of the data and no cost estimation. So, why option D is correct?
upvoted 1 times
...
kaisarfarel
8 months, 1 week ago
I think C is the correct answer, DS want to explore the data in a "similar way as they used the on-premises HDFS cluster with SQL on the Hive query engine". Dataproc can help to create clusters quickly with the Hadoop cluster. CMIIW
upvoted 4 times
...
hanoverquay
8 months, 1 week ago
Selected Answer: D
option d
upvoted 1 times
...
0725f1f
8 months, 2 weeks ago
Selected Answer: C
it is talking about partition as well
upvoted 2 times
...
JyoGCP
9 months ago
Selected Answer: D
Option D
upvoted 1 times
...
Matt_108
10 months, 2 weeks ago
Selected Answer: D
Option D - leverages BigQuery for SQL-based exploration on direct querying to cloud storage
upvoted 2 times
...
Smakyel79
10 months, 2 weeks ago
Selected Answer: D
This approach leverages BigQuery's powerful analytics capabilities without the overhead of data transformation or maintaining a separate cluster, while also allowing your team to use SQL for data exploration, similar to their experience with the on-premises Hadoop/Hive environment.
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...