An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable. Which instance purchasing option will meet these requirements MOST cost-effectively?
A.
Run the primary node, core nodes, and task nodes on On-Demand Instances.
B.
Run the primary node, core nodes, and task nodes on Spot Instances.
C.
Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.
D.
Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.
Acceptable data loss: Spot can be used but
you can't change an instance purchasing option while a cluster is running. To change from On-Demand to Spot Instances or vice versa, for the primary and core nodes, you must terminate the cluster and launch a new one. For task nodes, you can launch a new task instance group or instance fleet, and remove the old one.
Unacceptable data loss:
Primary node : Cluster doesn't start sometimes If spot only used
Core nodes: Possible of partial data loss HDFS
Task nodes: No data loss and do not hold persistent data
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html#emr-plan-spot-instances
"The task nodes process data but do not hold persistent data in HDFS. If they terminate because the Spot price has risen above your maximum Spot price, no data is lost"
upvoted 2 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Saransundar
3 months agoSaransundar
3 months agoGiorgioGss
3 months, 1 week ago