Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 93 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 93
Topic #: 1

[All Certified Data Engineer Professional Questions]

You are performing a join operation to combine values from a static userLookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

A. userLookup.join(streamingDF, ["userid"], how="inner")
B. streamingDF.join(userLookup, ["user_id"], how="outer")
C. streamingDF.join(userLookup, ["user_id”], how="left")
D. streamingDF.join(userLookup, ["userid"], how="inner")
E. userLookup.join(streamingDF, ["user_id"], how="right")

Show Suggested Answer

Suggested Answer: B 🗳️

by Enduresoul at Nov. 26, 2023, 6:50 p.m.

Comments

Submit Cancel

Enduresoul

Highly Voted 1 year, 1 month ago

Selected Answer: B

Answer B is correct: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#support-matrix-for-joins-in-streaming-queries When we take a look in the supported join matrix between static and stream inputs, we can identify, that Stream-Static + outer is not supported. Answer E is wrong, because the Static-Stream + right join is supported.

upvoted 11 times

...

KadELbied

Most Recent 2 months ago

Selected Answer: B

Suretly B

upvoted 1 times

...

_lene_

5 months, 3 weeks ago

Selected Answer: B

according to the Support matrix for joins in streaming queries

upvoted 1 times

...

AlejandroU

6 months, 1 week ago

Selected Answer: B

Answer B. We can directly discard options C and D since the streaming DataFrame (streamingDF) must be the left table in the join and the join type must be inner join or left outer join. Thus, the most directly invalid code block due to an unsupported join type is B.

upvoted 1 times

...

imatheushenrique

7 months ago

B. We match all the records from a static DataFrame on the left with a stream DataFrame on the right. If records do not match from the static DF (Left) to stream DF (Right), then the system cannot return null since the data changes on stream DF (Right), and we cannot guarantee if we will get matching records. That is why full_outer join is not supported.

upvoted 2 times

...

hal2401me

9 months, 3 weeks ago

Selected Answer: E

in my exam today, BCD are removed. i chose E, because I recall that stream-static right join are less supported.

upvoted 4 times

...

Curious76

10 months, 1 week ago

Selected Answer: B

b is correct

upvoted 1 times

...

vctrhugo

11 months ago

Selected Answer: B

Specifically, outer joins are not supported with a static DataFrame on the right and a streaming DataFrame on the left. This is because it’s not possible to guarantee all necessary rows will be available in the streaming DataFrame for every micro-batch.

upvoted 1 times

...