Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 14 discussion

Of the following situations, in which will it be most advantageous to store DataFrame df at the MEMORY_AND_DISK storage level rather than the MEMORY_ONLY storage level?

  • A. When all of the computed data in DataFrame df can fit into memory.
  • B. When the memory is full and it’s faster to recompute all the data in DataFrame df rather than read it from disk.
  • C. When it’s faster to recompute all the data in DataFrame df that cannot fit into memory based on its logical plan rather than read it from disk.
  • D. When it’s faster to read all the computed data in DataFrame df that cannot fit into memory from disk rather than recompute it based on its logical plan.
  • E. The storage level MENORY_ONLY will always be more advantageous because it’s faster to read data from memory than it is to read data from disk.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
sousouka
Highly Voted 1 year, 8 months ago
D. When it’s faster to read all the computed data in DataFrame df that cannot fit into memory from disk rather than recompute it based on its logical plan.
upvoted 9 times
...
ZSun
Highly Voted 1 year, 5 months ago
All other explanation is either wrong or misleading. To understand the question, you need to understand the difference between Memory_only and Memory_and_Disk 1. Memory_and_Disk, which is the default mode for cache ro persist. That means, if the data size is larger than the memory, it will store the extra data in disk. next time when we n eed to read data, we will read data firstly from memory, and then read from disk. 2. Memory_Only means, if the data size is larger than memory, it will not store the extra data. next time we read data, we will read from memory first and then recompute the extra data which cannot store in memory. PS. Mr. 4be8126 is wrong about raising error when out of memory. Therefore, the difference/balance between Memory_only and memory_and_disk lay in how they handle the extra data out of memory. which is option D, if it is faster to read data from disk is faster than recompute it, then memory_and_disk.
upvoted 7 times
...
newusername
Most Recent 1 year ago
Selected Answer: D
D is correct
upvoted 1 times
...
astone42
1 year, 3 months ago
Selected Answer: D
D is correct
upvoted 1 times
...
singh100
1 year, 3 months ago
D. It is faster to read the computed data from disk instead of recomputing it based on its logical plan when the recomputation is costly and time-consuming.
upvoted 1 times
...
SonicBoom10C9
1 year, 6 months ago
Selected Answer: D
If it's faster to read from memory and can fit in, then there is no reason to use Memory_and_disk, Memory_only is sufficient. Also, if it's faster to compute than read from disk, that's what you would do. The only options is when it's too big to fit in memory and too expensive to recompute, so reading from disk (or rather caching from disk into memory on the fly) is faster.
upvoted 1 times
...
4be8126
1 year, 6 months ago
Selected Answer: D
The most advantageous situation to store a DataFrame at the MEMORY_AND_DISK storage level instead of the MEMORY_ONLY storage level is option D - when it’s faster to read all the computed data in DataFrame df that cannot fit into memory from disk rather than recompute it based on its logical plan. This is because the MEMORY_ONLY storage level only stores data in memory, which can result in an out-of-memory error if the data exceeds the available memory. On the other hand, the MEMORY_AND_DISK storage level will spill data to disk if there is not enough memory available, allowing more data to be processed without errors. In situations where the computed data can fit entirely into memory, it is best to use the MEMORY_ONLY storage level as it will be faster than reading from disk. However, when there is not enough memory to store all the computed data, it may be necessary to use the MEMORY_AND_DISK storage level.
upvoted 1 times
...
sly75
1 year, 6 months ago
Yes but what about the link with the question ? I would say B too :)
upvoted 1 times
...
Indiee
1 year, 7 months ago
Answer is D. This is the whole idea behind caching
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...