Exam Certified Data Engineer Associate topic 1 question 71 discussion

Actual exam question from Databricks's Certified Data Engineer Associate

Question #: 71
Topic #: 1

[All Certified Data Engineer Associate Questions]

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

A. There was a type mismatch between the specific schema and the inferred schema
B. JSON data is a text-based format
C. Auto Loader only works with string data
D. All of the fields had at least one null value
E. Auto Loader cannot infer the schema of ingested data

Show Suggested Answer

Suggested Answer: B 🗳️

by meow_akk at Oct. 22, 2023, 5:19 a.m.

Comments

Submit Cancel

AndreFR

1 year ago

Selected Answer: B

https://docs.databricks.com/en/ingestion/auto-loader/schema.html#how-does-auto-loader-schema-inference-work By default, Auto Loader schema inference seeks to avoid schema evolution issues due to type mismatches. For formats that don’t encode data types (JSON and CSV), Auto Loader infers all columns as strings (including nested fields in JSON files).

upvoted 4 times

...

nedlo

1 year, 1 month ago

Selected Answer: B

Its B "By default, Auto Loader schema inference seeks to avoid schema evolution issues due to type mismatches. For formats that don’t encode data types (JSON and CSV), Auto Loader infers all columns as strings (including nested fields in JSON files). For formats with typed schema (Parquet and Avro), Auto Loader samples a subset of files and merges the schemas of individual files. This behavior is summarized in the following table:" https://docs.databricks.com/en/ingestion/auto-loader/schema.html

upvoted 2 times

...

55f31c8

1 year, 1 month ago

Selected Answer: B

https://docs.databricks.com/en/ingestion/auto-loader/schema.html#how-does-auto-loader-schema-inference-work

upvoted 2 times

...

meow_akk

1 year, 2 months ago

The correct answer is: B. JSON data is a text-based format JSON data is a text-based format that uses strings to represent all values. When Auto Loader infers the schema of JSON data, it assumes that all values are strings. This is because Auto Loader cannot determine the type of a value based on its string representation. https://docs.databricks.com/en/ingestion/auto-loader/schema.html For example, the following JSON string represents a value that is logically a boolean: JSON "true" Use code with caution. Learn more However, Auto Loader would infer that the type of this value is string. This is because Auto Loader cannot determine that the value is a boolean based on its string representation. In order to get Auto Loader to infer the correct types for columns, the data engineer can provide type inference or schema hints. Type inference hints can be used to specify the types of specific columns. Schema hints can be used to provide the entire schema of the data. Therefore, the correct answer is B. JSON data is a text-based format.

upvoted 2 times

...