Exam Certified Associate Developer for Apache Spark topic 1 question 93 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 93
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

The code block shown below should efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId.

Choose the response that correctly fills in the numbered blanks within the code block to complete this task.

Code block:

__1__.join(__2__(__3__), "storeId")

A. 1. employeesDF
2. broadcast
3. storesDF
B. 1. broadcast(employeesDF)
2. broadcast
3. storesDF
C. 1. broadcast
2. employeesDF
3. storesDF
D. 1. storesDF
2. broadcast
3. employeesDF
E. 1. broadcast(storesDF)
2. broadcast
3. employeesDF

Show Suggested Answer

Suggested Answer: A 🗳️

by ryanmu at June 23, 2023, 3:40 p.m.

Comments

Submit Cancel

ryanmu

Highly Voted 1 year, 6 months ago

Correct answer is A. storesDF is smaller and should be broadcasted.

upvoted 8 times

cookiemonster42

1 year, 5 months ago

Agreed!

upvoted 1 times

...

azure_bimonster

Most Recent 11 months, 1 week ago

Selected Answer: A

I would go with A as storesDF is smaller and right one to broadcast

upvoted 1 times

...

veli4ko

1 year, 3 months ago

А is the correct answer!

upvoted 2 times

...

thanab

1 year, 4 months ago

A The correct answer is: A. 1. employeesDF 2. broadcast 3. storesDF So the correct code would be: ```scala employeesDF.join(broadcast(storesDF), "storeId") ``` This code will perform a broadcast join of the DataFrame `storesDF` (which is smaller) with the much larger DataFrame `employeesDF` using the key column `storeId`. The `broadcast()` function is used to mark a DataFrame to be broadcast when performing a join operation. The smaller DataFrame `storesDF` is broadcasted to all nodes, where it's joined with the larger DataFrame `employeesDF`.

upvoted 1 times

...

Ram459

1 year, 4 months ago

Selected Answer: A

smaller dataset needs to be broadcasted

upvoted 2 times

...

cookiemonster42

1 year, 5 months ago

Selected Answer: A

the larger dataset has to be the initial and the smaller one should be broadcasted

upvoted 2 times

...