Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 12 discussion

Actual exam question from Databricks's Certified Data Engineer Associate
Question #: 12
Topic #: 1
[All Certified Data Engineer Associate Questions]

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following commands could the data engineering team use to access sales in PySpark?

  • A. SELECT * FROM sales
  • B. There is no way to share data between PySpark and SQL.
  • C. spark.sql("sales")D. spark.delta.table("sales")
  • E. spark.table("sales")
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Atnafu
Highly Voted 1 year, 4 months ago
E The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name ("sales"), the data engineering team can read the Delta table and perform various operations on it using PySpark. Option A, SELECT * FROM sales, is a SQL syntax and cannot be directly used in PySpark. Option B, "There is no way to share data between PySpark and SQL," is incorrect. PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs. Option C, spark.sql("sales"), is a valid command to execute SQL queries on registered tables in PySpark. However, in this case, the "sales" argument alone is not a valid SQL query. Option D, spark.delta.table("sales"), is a specific method provided by Delta Lake to access Delta tables directly. While it can be used to access the "sales" table, it is not the most common approach in PySpark.
upvoted 10 times
...
9d4d68a
Most Recent 3 months ago
To access the Delta table sales using PySpark, the data engineering team can use the following command: E. spark.table("sales") This command allows them to load the table into a PySpark DataFrame, which they can then use for their tests and data processing in Python. No, the command spark.delta.table("table name") does not exist in PySpark. To access a Delta table, you should use: spark.table("table name") Or, if you need to use Delta-specific functionality, you would typically use Delta's APIs or spark.read.format("delta").table("table name") to read the table into a DataFrame.
upvoted 1 times
...
80370eb
3 months, 2 weeks ago
Selected Answer: E
E. spark.table("sales") This command allows the team to access the table using PySpark, enabling them to implement their tests in Python.
upvoted 1 times
...
souldiv
4 months, 1 week ago
spark.table() . E is the correct one
upvoted 1 times
...
benni_ale
7 months ago
Selected Answer: E
E is correct
upvoted 1 times
...
benni_ale
7 months, 3 weeks ago
Selected Answer: E
e is correct
upvoted 2 times
...
Itmma
8 months, 1 week ago
Selected Answer: E
E is correct
upvoted 1 times
...
SerGrey
10 months, 3 weeks ago
Selected Answer: E
Correct answer is E
upvoted 1 times
...
Garyn
11 months ago
Selected Answer: E
E. spark.table("sales") The spark.table() function in PySpark allows access to a registered table within the SparkSession. In this case, "sales" is the name of the Delta table created by the data analyst, and the spark.table() function enables access to this table for performing data engineering tests using Python (PySpark).
upvoted 4 times
...
csd
11 months ago
C is correct Answer
upvoted 1 times
...
awofalus
1 year ago
Selected Answer: E
Correct is E
upvoted 1 times
...
KalavathiP
1 year, 2 months ago
Selected Answer: E
E is correct
upvoted 1 times
...
d_b47
1 year, 2 months ago
Selected Answer: E
delta is default.
upvoted 1 times
...
ThomasReps
1 year, 5 months ago
Selected Answer: E
It's E. As stated by others, the default format is delta If you try to run D, you get an error, that there are no "delta"-command for spark: "AttributeError: 'SparkSession' object has no attribute 'delta'". If you want to explicit tell it should be delta, then you need an ".option(format='delta')" insted.
upvoted 2 times
...
Dwarakkrishna
1 year, 5 months ago
You access data in Delta tables by the table name or the table path, as shown in the following examples: people_df = spark.read.table(table_name) display(people_df)
upvoted 1 times
...
prasioso
1 year, 6 months ago
I believe the answer is E as in databricks the default tables are delta tables hence spark.table should be enough. Have not seen a spark.delta.table function before.
upvoted 1 times
...
Tickxit
1 year, 6 months ago
Selected Answer: E
E: spark.table or spark.read.table
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...