Which indicators would you look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally? Assume you are using Spark’s MEMORY_ONLY storage level.
A.
Size on Disk is < Size in Memory
B.
The RDD Block Name includes the “*” annotation signaling a failure to cache
C.
Size on Disk is > 0
D.
The number of Cached Partitions > the number of Spark Partitions
E.
On Heap Memory Usage is within 75% of Off Heap Memory Usage
C. Size on Disk is > 0
When using Spark's MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to disk, which can lead to degraded performance as reading from disk is slower than reading from memory.
Under MEMORY_ONLY, Spark does not write to disk, so Size on Disk should be 0.
Under MEMORY_ONLY, off-heap memory is not used
In the Storage tab, an asterisk (*) next to the RDD block name (e.g., rdd_42_3*) indicates the partition could not be cached due to memory constraints
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
vctrhugo
Highly Voted 1 year, 5 months ago79f0e18
Most Recent 5 days, 19 hours agoKadELbied
2 months agobenni_ale
7 months agoIsio05
1 year, 1 month ago