Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 93 discussion

The code block shown below should efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId.

Choose the response that correctly fills in the numbered blanks within the code block to complete this task.

Code block:

__1__.join(__2__(__3__), "storeId")

  • A. 1. employeesDF
    2. broadcast
    3. storesDF
  • B. 1. broadcast(employeesDF)
    2. broadcast
    3. storesDF
  • C. 1. broadcast
    2. employeesDF
    3. storesDF
  • D. 1. storesDF
    2. broadcast
    3. employeesDF
  • E. 1. broadcast(storesDF)
    2. broadcast
    3. employeesDF
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ryanmu
Highly Voted 1 year, 5 months ago
Correct answer is A. storesDF is smaller and should be broadcasted.
upvoted 8 times
cookiemonster42
1 year, 3 months ago
Agreed!
upvoted 1 times
...
...
azure_bimonster
Most Recent 9 months, 3 weeks ago
Selected Answer: A
I would go with A as storesDF is smaller and right one to broadcast
upvoted 1 times
...
veli4ko
1 year, 1 month ago
А is the correct answer!
upvoted 2 times
...
thanab
1 year, 2 months ago
A The correct answer is: A. 1. employeesDF 2. broadcast 3. storesDF So the correct code would be: ```scala employeesDF.join(broadcast(storesDF), "storeId") ``` This code will perform a broadcast join of the DataFrame `storesDF` (which is smaller) with the much larger DataFrame `employeesDF` using the key column `storeId`. The `broadcast()` function is used to mark a DataFrame to be broadcast when performing a join operation. The smaller DataFrame `storesDF` is broadcasted to all nodes, where it's joined with the larger DataFrame `employeesDF`.
upvoted 1 times
...
Ram459
1 year, 3 months ago
Selected Answer: A
smaller dataset needs to be broadcasted
upvoted 2 times
...
cookiemonster42
1 year, 3 months ago
Selected Answer: A
the larger dataset has to be the initial and the smaller one should be broadcasted
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...