Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 60 discussion

Actual exam question from Databricks's Certified Data Engineer Associate
Question #: 60
Topic #: 1
[All Certified Data Engineer Associate Questions]

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

  • A. SELECT * FROM sales
  • B. spark.delta.table
  • C. spark.sql
  • D. There is no way to share data between PySpark and SQL.
  • E. spark.table
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
kishanu
Highly Voted 1 year, 1 month ago
Selected Answer: C
spark.sql() should be used to execute a SQL query with Pyspark spark.table() can only be used to load a table and not run a query.
upvoted 8 times
...
benni_ale
Most Recent 7 months ago
Selected Answer: E
I am not sure wheter it is C or E . I see majority went for E but you can still query your data with spark.table by using purely pyspark syntax . I don't see any part of the question specifying you HAVE to use SQL syntax.
upvoted 1 times
CommanderBigMac
2 months ago
I think that is the first part of the question "has developed a query". From a quick google, it looks like queries as SQL, scripts are python. So when it says a query was developed, it means its coded in sql.
upvoted 2 times
...
...
meow_akk
1 year, 1 month ago
C is correct EG : from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() df = spark.sql("SELECT * FROM sales") print(df.count())
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...