Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 12 discussion

Actual exam question from Databricks's Certified Data Engineer Associate

Question #: 12
Topic #: 1

[All Certified Data Engineer Associate Questions]

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following commands could the data engineering team use to access sales in PySpark?

A. SELECT * FROM sales
B. There is no way to share data between PySpark and SQL.
C. spark.sql("sales")D. spark.delta.table("sales")
E. spark.table("sales")

Show Suggested Answer

Suggested Answer: E 🗳️

by XiltroX at April 1, 2023, 4:03 p.m.

Comments

Submit Cancel

Atnafu

Highly Voted 1 year, 11 months ago

E The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name ("sales"), the data engineering team can read the Delta table and perform various operations on it using PySpark. Option A, SELECT * FROM sales, is a SQL syntax and cannot be directly used in PySpark. Option B, "There is no way to share data between PySpark and SQL," is incorrect. PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs. Option C, spark.sql("sales"), is a valid command to execute SQL queries on registered tables in PySpark. However, in this case, the "sales" argument alone is not a valid SQL query. Option D, spark.delta.table("sales"), is a specific method provided by Delta Lake to access Delta tables directly. While it can be used to access the "sales" table, it is not the most common approach in PySpark.

upvoted 20 times

...

melvin_muthu

Most Recent 2 months ago

Selected Answer: E

Spark.table reads the table as dataframe

upvoted 2 times

...

SoumyaHK

2 months, 1 week ago

Selected Answer: E

The response is E. I do not see any option D

upvoted 1 times

...

dhohigh

5 months, 1 week ago

Selected Answer: E

This answer is pure python and is a simple solution for the Question.

upvoted 1 times

...

9d4d68a

10 months, 1 week ago

To access the Delta table sales using PySpark, the data engineering team can use the following command: E. spark.table("sales") This command allows them to load the table into a PySpark DataFrame, which they can then use for their tests and data processing in Python. No, the command spark.delta.table("table name") does not exist in PySpark. To access a Delta table, you should use: spark.table("table name") Or, if you need to use Delta-specific functionality, you would typically use Delta's APIs or spark.read.format("delta").table("table name") to read the table into a DataFrame.

upvoted 1 times

...

80370eb

10 months, 3 weeks ago

Selected Answer: E

E. spark.table("sales") This command allows the team to access the table using PySpark, enabling them to implement their tests in Python.

upvoted 1 times

...

souldiv

11 months, 2 weeks ago

spark.table() . E is the correct one

upvoted 1 times

...

benni_ale

1 year, 2 months ago

Selected Answer: E

E is correct

upvoted 1 times

...

benni_ale

1 year, 2 months ago

Selected Answer: E

e is correct

upvoted 2 times

...

Itmma

1 year, 3 months ago

Selected Answer: E

E is correct

upvoted 1 times

...

SerGrey

1 year, 6 months ago

Selected Answer: E

Correct answer is E

upvoted 1 times

...

Garyn

1 year, 6 months ago

Selected Answer: E

E. spark.table("sales") The spark.table() function in PySpark allows access to a registered table within the SparkSession. In this case, "sales" is the name of the Delta table created by the data analyst, and the spark.table() function enables access to this table for performing data engineering tests using Python (PySpark).

upvoted 4 times

...

csd

1 year, 6 months ago

C is correct Answer

upvoted 1 times

...

awofalus

1 year, 7 months ago

Selected Answer: E

Correct is E

upvoted 1 times

...

KalavathiP

1 year, 9 months ago

Selected Answer: E

E is correct

upvoted 1 times

...

d_b47

1 year, 9 months ago

Selected Answer: E

delta is default.

upvoted 1 times

...

ThomasReps

2 years ago

Selected Answer: E

It's E. As stated by others, the default format is delta If you try to run D, you get an error, that there are no "delta"-command for spark: "AttributeError: 'SparkSession' object has no attribute 'delta'". If you want to explicit tell it should be delta, then you need an ".option(format='delta')" insted.

upvoted 2 times

...

Load full discussion...