exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 86 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 86
Topic #: 1
[All Professional Data Engineer Questions]

You have an Apache Kafka cluster on-prem with topics containing web application logs. You need to replicate the data to Google Cloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins.
What should you do?

  • A. Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.
  • B. Deploy a Kafka cluster on GCE VM Instances with the Pub/Sub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.
  • C. Deploy the Pub/Sub Kafka connector to your on-prem Kafka cluster and configure Pub/Sub as a Source connector. Use a Dataflow job to read from Pub/Sub and write to GCS.
  • D. Deploy the Pub/Sub Kafka connector to your on-prem Kafka cluster and configure Pub/Sub as a Sink connector. Use a Dataflow job to read from Pub/Sub and write to GCS.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Ganshank
Highly Voted 3 years, 6 months ago
A. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 The solution specifically mentions mirroring and minimizing the use of Kafka Connect plugin. D would be the more Google Cloud-native way of implementing the same, but the requirement is better met by A.
upvoted 34 times
...
[Removed]
Highly Voted 3 years, 7 months ago
Answer: A Description: Question says mirroring and avoid kafka connect plugins
upvoted 11 times
...
Qix
Most Recent 4 months, 3 weeks ago
Pub/Sub Kafka connector requires Kafka Connect, as described here https://cloud.google.com/pubsub/docs/connect_kafka Deployment of Kafka Connect is explicitly excluded by the requirements. So the only option available is A
upvoted 4 times
...
samdhimal
8 months, 4 weeks ago
Option A: Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS. This option involves setting up a separate Kafka cluster in Google Cloud, and then configuring the on-prem cluster to mirror the topics to this cluster. The data from the Google Cloud Kafka cluster can then be read using either a Dataproc cluster or a Dataflow job and written to Cloud Storage for analysis in BigQuery.
upvoted 3 times
samdhimal
8 months, 4 weeks ago
Option B: Deploy a Kafka cluster on GCE VM Instances with the Pub/Sub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS. This option is similar to Option A, but involves using the Pub/Sub Kafka connector as a sink connector instead of mirroring the topics from the on-prem cluster. This option would result in the same duplication of data and additional resources required as Option A, making it less desirable.
upvoted 1 times
samdhimal
8 months, 3 weeks ago
Sorry. I messed up. The answer is probably A. My badd....
upvoted 1 times
...
...
samdhimal
8 months, 4 weeks ago
Option D: Deploy the Pub/Sub Kafka connector to your on-prem Kafka cluster and configure Pub/Sub as a Sink connector. Use a Dataflow job to read from Pub/Sub and write to GCS. This option involves deploying the Pub/Sub Kafka connector on the on-prem cluster, but configuring it as a sink connector. In this case, the data from the on-prem Kafka cluster would be sent directly to Pub/Sub, which would act as the final destination for the data. A Dataflow job would then be used to read the data from Pub/Sub and write it to Cloud Storage for analysis in BigQuery. This option would result in the data being stored in both the on-prem cluster and Pub/Sub, making it less desirable compared to option C, where the data is only stored in Pub/Sub as an intermediary between the on-prem cluster and Google Cloud.
upvoted 1 times
musumusu
8 months, 1 week ago
you use chatgpt replies, if you instruct chat gpt that you don't need to use plugins as per question say, it will answer A
upvoted 1 times
...
...
samdhimal
8 months, 4 weeks ago
Option C: Deploy the Pub/Sub Kafka connector to your on-prem Kafka cluster and configure Pub/Sub as a Source connector. Use a Dataflow job to read from Pub/Sub and write to GCS. This option involves deploying the Pub/Sub Kafka connector directly on the on-prem cluster, and configuring it as a source connector. The data from the on-prem Kafka cluster is then sent directly to Pub/Sub, which acts as an intermediary between the on-prem cluster and the data stored in Google Cloud. A Dataflow job is then used to read the data from Pub/Sub and write it to Cloud Storage for analysis in BigQuery. This option avoids the duplication of data and additional resources required by the other options, making it the preferred option.
upvoted 2 times
...
...
zellck
10 months, 4 weeks ago
Selected Answer: A
A is the answer.
upvoted 1 times
...
Afonya
1 year ago
Selected Answer: A
"The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins."
upvoted 1 times
...
somnathmaddi
1 year ago
D is the right answer
upvoted 3 times
...
clouditis
1 year, 1 month ago
D is the right answer
upvoted 2 times
...
hendrixlives
1 year, 10 months ago
Selected Answer: A
"A" is the answer which complies with the requirements (specifically, "The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins"). Indeed, one of the uses of what is called "Geo-Replication" (or Cross-Cluster Data Mirroring) in Kafka is precisely cloud migrations: https://kafka.apache.org/documentation/#georeplication However I agree with Ganshank, and the optimal "Google way" way would be "D", installing the Pub/Sub Kafka connector to move the data from on-prem to GCP.
upvoted 6 times
...
gcp_k
2 years ago
Going with "D" Refer: https://stackoverflow.com/questions/55277188/kafka-to-google-pub-sub-using-sink-connector
upvoted 3 times
baubaumiaomiao
1 year, 10 months ago
"avoid deployment of Kafka Connect plugins"
upvoted 1 times
...
...
sumanshu
2 years, 3 months ago
Vote for A
upvoted 1 times
...
daghayeghi
2 years, 7 months ago
Answer: A Description: Question says mirroring to avoid kafka connect plugins
upvoted 3 times
...
Allan222
2 years, 8 months ago
Correct is D
upvoted 1 times
sumanshu
2 years, 3 months ago
As per question - "avoid deployment of Kafka Connect plugins."
upvoted 1 times
...
...
vakati
3 years, 1 month ago
A. the best solution would be D but given the restriction here to use mirroring and avoid connectors, A would be the natural choice
upvoted 3 times
...
Tanmoyk
3 years, 1 month ago
D should be the correct answer. Configure pub/sub as sink
upvoted 4 times
...
haroldbenites
3 years, 2 months ago
C is correct. https://docs.confluent.io/current/connect/kafka-connect-gcp-pubsub/index.html
upvoted 2 times
haroldbenites
3 years, 2 months ago
Correct Answer: D Why is this correct? You can connect Kafka to GCP by using a connector. The 'downstream' service (Pub/Sub) will use a sink connector.
upvoted 1 times
sumanshu
2 years, 3 months ago
Question says : avoid deployment of Kafka Connect plugins.
upvoted 2 times
...
...
...
clouditis
3 years, 2 months ago
its D, why would google prefer Kafka in their own cert questions! :)
upvoted 3 times
Ral17
2 years, 1 month ago
Because the questions mentions to avoid deployment of Kafka connect plugins
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago