exam questions

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 12 discussion

Actual exam question from Databricks's Certified Data Engineer Associate
Question #: 12
Topic #: 1
[All Certified Data Engineer Associate Questions]

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following commands could the data engineering team use to access sales in PySpark?

  • A. SELECT * FROM sales
  • B. There is no way to share data between PySpark and SQL.
  • C. spark.sql("sales")D. spark.delta.table("sales")
  • E. spark.table("sales")
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️
Community vote distribution
E (96%)
4%

Comments

Chosen Answer:
This is a voting comment. You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Atnafu
Highly Voted 1 year, 9 months ago
E The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name ("sales"), the data engineering team can read the Delta table and perform various operations on it using PySpark. Option A, SELECT * FROM sales, is a SQL syntax and cannot be directly used in PySpark. Option B, "There is no way to share data between PySpark and SQL," is incorrect. PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs. Option C, spark.sql("sales"), is a valid command to execute SQL queries on registered tables in PySpark. However, in this case, the "sales" argument alone is not a valid SQL query. Option D, spark.delta.table("sales"), is a specific method provided by Delta Lake to access Delta tables directly. While it can be used to access the "sales" table, it is not the most common approach in PySpark.
upvoted 17 times
...
SoumyaHK
Most Recent 3 days, 10 hours ago
Selected Answer: E
The response is E. I do not see any option D
upvoted 1 times
...
dhohigh
3 months ago
Selected Answer: E
This answer is pure python and is a simple solution for the Question.
upvoted 1 times
...
9d4d68a
8 months ago
To access the Delta table sales using PySpark, the data engineering team can use the following command: E. spark.table("sales") This command allows them to load the table into a PySpark DataFrame, which they can then use for their tests and data processing in Python. No, the command spark.delta.table("table name") does not exist in PySpark. To access a Delta table, you should use: spark.table("table name") Or, if you need to use Delta-specific functionality, you would typically use Delta's APIs or spark.read.format("delta").table("table name") to read the table into a DataFrame.
upvoted 1 times
...
80370eb
8 months, 3 weeks ago
Selected Answer: E
E. spark.table("sales") This command allows the team to access the table using PySpark, enabling them to implement their tests in Python.
upvoted 1 times
...
souldiv
9 months, 1 week ago
spark.table() . E is the correct one
upvoted 1 times
...
benni_ale
1 year ago
Selected Answer: E
E is correct
upvoted 1 times
...
benni_ale
1 year ago
Selected Answer: E
e is correct
upvoted 2 times
...
Itmma
1 year, 1 month ago
Selected Answer: E
E is correct
upvoted 1 times
...
SerGrey
1 year, 3 months ago
Selected Answer: E
Correct answer is E
upvoted 1 times
...
Garyn
1 year, 3 months ago
Selected Answer: E
E. spark.table("sales") The spark.table() function in PySpark allows access to a registered table within the SparkSession. In this case, "sales" is the name of the Delta table created by the data analyst, and the spark.table() function enables access to this table for performing data engineering tests using Python (PySpark).
upvoted 4 times
...
csd
1 year, 4 months ago
C is correct Answer
upvoted 1 times
...
awofalus
1 year, 5 months ago
Selected Answer: E
Correct is E
upvoted 1 times
...
KalavathiP
1 year, 7 months ago
Selected Answer: E
E is correct
upvoted 1 times
...
d_b47
1 year, 7 months ago
Selected Answer: E
delta is default.
upvoted 1 times
...
ThomasReps
1 year, 10 months ago
Selected Answer: E
It's E. As stated by others, the default format is delta If you try to run D, you get an error, that there are no "delta"-command for spark: "AttributeError: 'SparkSession' object has no attribute 'delta'". If you want to explicit tell it should be delta, then you need an ".option(format='delta')" insted.
upvoted 2 times
...
Dwarakkrishna
1 year, 10 months ago
You access data in Delta tables by the table name or the table path, as shown in the following examples: people_df = spark.read.table(table_name) display(people_df)
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
220-1102
San Jose, 1 minute ago