exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 37 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 37
Topic #: 1
[All Certified Data Engineer Professional Questions]

A small company based in the United States has recently contracted a consulting firm in India to implement several new data engineering pipelines to power artificial intelligence applications. All the company's data is stored in regional cloud storage in the United States.
The workspace administrator at the company is uncertain about where the Databricks workspace used by the contractors should be deployed.
Assuming that all data governance considerations are accounted for, which statement accurately informs this decision?

  • A. Databricks runs HDFS on cloud volume storage; as such, cloud virtual machines must be deployed in the region where the data is stored.
  • B. Databricks workspaces do not rely on any regional infrastructure; as such, the decision should be made based upon what is most convenient for the workspace administrator.
  • C. Cross-region reads and writes can incur significant costs and latency; whenever possible, compute should be deployed in the same region the data is stored.
  • D. Databricks leverages user workstations as the driver during interactive development; as such, users should always use a workspace deployed in a region they are physically near.
  • E. Databricks notebooks send all executable code from the user’s browser to virtual machines over the open internet; whenever possible, choosing a workspace region near the end users is the most secure.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
3 weeks, 2 days ago
Selected Answer: C
C is the correct answer.
upvoted 1 times
2 months ago
(C) The decision is about where the Databricks workspace used by the contractors should be deployed. The contractors are based in India, while all the company's data is stored in regional cloud storage in the United States. When choosing a region for deploying a Databricks workspace, one of the important factors to consider is the proximity to the data sources and sinks. Cross-region reads and writes can incur significant costs and latency due to network bandwidth and data transfer fees. Therefore, whenever possible, compute should be deployed in the same region the data is stored to optimize performance and reduce costs
upvoted 2 times
6 months, 2 weeks ago
Selected Answer: C
C is the answer.
upvoted 3 times
7 months ago
Selected Answer: C
An important part of data governance is usage cost, and, as a general data engineering practice, egress costs related to moving data between regions is always an important consideration. Having the workspaces located in a different region than the contractors will incur to them in very little nuisance, while greatly saving in this sense.
upvoted 2 times
7 months, 1 week ago
Selected Answer: B
From where data engineering team developes pipelines is independent of where the data objects reside in the cloud storage.
upvoted 1 times
2 months, 2 weeks ago
These pipelines will create clusters (machines) which will reside in a different region than the data and that will cause latency issues. So C should be the correct option.
upvoted 2 times
9 months, 2 weeks ago
C is correct.
upvoted 2 times
Community vote distribution
A (35%)
C (25%)
B (20%)
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

Loading ...
Someone Bought Contributor Access for:
London, 1 minute ago