Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 83 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 83
Topic #: 1
[All Certified Data Engineer Professional Questions]

All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:

key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG

There are 5 unique topics being ingested. Only the "registration" topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.

Which of the following solutions meets the requirements?

  • A. All data should be deleted biweekly; Delta Lake's time travel functionality should be leveraged to maintain a history of non-PII information.
  • B. Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory.
  • C. Because the value field is stored as binary data, this information is not considered PII and no special precautions should be taken.
  • D. Separate object storage containers should be specified based on the partition field, allowing isolation at the storage level.
  • E. Data should be partitioned by the topic field, allowing ACLs and delete statements to leverage partition boundaries.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
mouad_attaqi
Highly Voted 1 year ago
Selected Answer: E
I think answer E is correct, as by default partitionning by a column will create a separate folder for each subset data linked to the partition
upvoted 12 times
...
benni_ale
Most Recent 1 month ago
Selected Answer: E
E E E E E
upvoted 1 times
...
ojudz08
9 months, 1 week ago
Selected Answer: D
i think it's best to isolate the storage to avoid mistakenly deleting tables in the same storage so I go with D
upvoted 1 times
...
spaceexplorer
10 months ago
Selected Answer: E
E is correct
upvoted 1 times
...
ervinshang
11 months ago
Selected Answer: E
E is correct
upvoted 2 times
...
aragorn_brego
1 year ago
Selected Answer: E
Partitioning data by the topic field would allow the data engineering team to apply access control lists (ACLs) to restrict access to the partition containing the "registration" topic, which holds PII. Furthermore, the team can set up automated deletion policies that specifically target the partition with PII data to delete records after 14 days, without affecting the data in other partitions. This approach meets both the privacy requirements for PII and the data retention goals for non-PII information.
upvoted 2 times
...
Dileepvikram
1 year ago
I think answer is E
upvoted 3 times
...
[Removed]
1 year ago
Selected Answer: B
The solution that meets the requirements is: B. Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory. Partitioning the data by the registration field allows the directory containing PII records to be isolated and access restricted via ACLs. Additionally, the data retention requirements can be met by setting up a separate job or process to remove PII records that are 14 days old. For non-PII records, they can be retained indefinitely utilizing Delta Lake's time travel functionality.
upvoted 1 times
mouad_attaqi
1 year ago
There is no such thing as Registration field, it's a distinct topic
upvoted 2 times
...
sturcu
1 year ago
you cannot restricts privileges. with ACLs on a partition. Documentations states that Securable objects in the Hive metastore are: DB, Tables, Views, Functions: https://docs.databricks.com/en/data-governance/table-acls/object-privileges.html#securable-objects
upvoted 1 times
...
...
sturcu
1 year, 1 month ago
Selected Answer: D
Correct
upvoted 1 times
sturcu
1 year ago
https://docs.databricks.com/en/data-governance/table-acls/object-privileges.html#securable-objects
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...