exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 226 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 226
Topic #: 1
[All Certified Data Engineer Professional Questions]

A Data Engineer wants to run unit tests using common Python testing frameworks on Python functions defined across several Databricks notebooks currently used in production.

How can the data engineer run unit tests against functions that work with data in production?

  • A. Define and import unit test functions from a separate Databricks notebook
  • B. Define and unit test functions using Files in Repos
  • C. Run unit tests against non-production data that closely mirrors production
  • D. Define unit tests and functions within the same notebook
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
arekm
1 month ago
Selected Answer: B
B - to test functions we need to import them in our unit tests. This means that storing functions in notebooks is not a good idea. You store them separately - as "files" - and import in notebooks the same way you import them in unit tests.
upvoted 1 times
...
Thameur01
1 month, 4 weeks ago
Selected Answer: B
Databricks Repos is the recommended way to organize and manage code, including functions and unit tests, in a scalable and maintainable way. By defining your functions and unit tests in Files in Repos, you can: Modularize Your Code: Functions can be organized into separate Python files or modules, making them reusable and easier to test. Use Standard Testing Frameworks: Frameworks like pytest or unittest can be used to write and execute unit tests against these functions. Version Control Integration: Files in Repos can be version-controlled using Git, ensuring traceability and collaboration. Production Data Testing: With proper safeguards, you can design unit tests to test production-like data while maintaining modularity and separation from production pipelines.
upvoted 2 times
...
benni_ale
1 month, 4 weeks ago
Selected Answer: C
It is true that or Python and R notebooks, Databricks recommends storing functions and their unit tests outside of notebooks so given the question is in Python environment some could argue B being the correct solution. Nevertheless that is only an "advice" and Databricks more generally states that in general, it is a best practice to not run unit tests against functions that work with data in production. This is especially important for functions that add, remove, or otherwise change data. To protect your production data from being compromised by your unit tests in unexpected ways, you should run unit tests against non-production data. So I would got for option C. Alternatively I would have gone for B. https://docs.databricks.com/en/notebooks/testing.html?utm_source=chatgpt.com#write-unit-tests
upvoted 1 times
arekm
1 month ago
This does not really answer the question.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago