Exam Certified Generative AI Engineer Associate topic 1 question 52 discussion

Actual exam question from Databricks's Certified Generative AI Engineer Associate

Question #: 52
Topic #: 1

[All Certified Generative AI Engineer Associate Questions]

A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF documents and chunk them in preparation for storing in a Databricks Vector Search index. Currently, the two columns of their dataframe include the original filename as a string and an array of text chunks from that document.

What set of steps should the Generative AI Engineer perform to store the chunks in a ready-to-ingest manner for Databricks Vector Search?

A. Use PySpark’s autoloader to apply a UDF across all chunks, formatting them in a JSON structure for Vector Search ingestion.
B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and enable change feed on the output Delta table.
C. Utilize the original filename as the unique identifier and save the dataframe as is.
D. Create a unique identifier for each document, flatten the dataframe to one chunk per row and save to an output Delta table.

Show Suggested Answer

Suggested Answer: D 🗳️

by CoolSmartDude at April 18, 2025, 9:30 p.m.

Comments

Submit Cancel

CoolSmartDude

1 week, 4 days ago

Selected Answer: B

D is close but misses the key requirement to enable change feed, which is necessary for ingestion into Vector Search.

upvoted 1 times

...

Exam Certified Generative AI Engineer Associate All Questions

View all questions & answers for the Certified Generative AI Engineer Associate exam

Exam Certified Generative AI Engineer Associate topic 1 question 52 discussion

Comments

CoolSmartDude

SY0-701