exam questions

Exam AWS Certified Solutions Architect - Professional SAP-C02 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Professional SAP-C02 exam

Exam AWS Certified Solutions Architect - Professional SAP-C02 topic 1 question 263 discussion

A solutions architect needs to review the design of an Amazon EMR cluster that is using the EMR File System (EMRFS). The cluster performs tasks that are critical to business needs. The cluster is running Amazon EC2 On-Demand Instances at all times for all task, primary, and core nodes. The EMR tasks run each morning, starting at 1:00 AM. and take 6 hours to finish running. The amount of time to complete the processing is not a priority because the data is not referenced until late in the day.

The solutions architect must review the architecture and suggest a solution to minimize the compute costs.

Which solution should the solutions architect recommend to meet these requirements?

  • A. Launch all task, primary, and core nodes on Spot Instances in an instance fleet. Terminate the cluster, including all instances, when the processing is completed.
  • B. Launch the primary and core nodes on On-Demand Instances. Launch the task nodes on Spot Instances in an instance fleet. Terminate the cluster, including all instances, when the processing is completed. Purchase Compute Savings Plans to cover the On-Demand Instance usage.
  • C. Continue to launch all nodes on On-Demand Instances. Terminate the cluster, including all instances, when the processing is completed. Purchase Compute Savings Plans to cover the On-Demand Instance usage.
  • D. Launch the primary and core nodes on On-Demand Instances. Launch the task nodes on Spot Instances in an instance fleet. Terminate only the task node instances when the processing is completed. Purchase Compute Savings Plans to cover the On-Demand Instance usage.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
aviathor
Highly Voted 1 year, 8 months ago
Selected Answer: D
The problem statement says: "The EMR tasks run each morning, starting at 1:00 AM. and take 6 hours to finish running. The amount of time to complete the processing is not a priority because *the data is not referenced until late in the day.*" So later in the day, clients will be using the cluster to read data. Therefore my understanding is that core and primary nodes need to be available, but the task nodes can be terminated once the tasks have finished their daily run.
upvoted 27 times
sashenka
6 months ago
One does not need the cluster to read the data. MRFS enables storing persistent data in Amazon S3. This means data remains available even after an EMR cluster is terminated, allowing for cost savings and data reuse across multiple clusters.
upvoted 2 times
...
...
javitech83
Highly Voted 1 year, 9 months ago
Selected Answer: D
Correct Answer is D. In B it has no sense to temrinate primary instance if we have already purchase a saving plan.
upvoted 15 times
sashenka
6 months ago
One chooses the usage commitment when purchasing a Compute Savings Plan. So, one can base it on the fact that the on-demand nodes will only need to run for a min amount of time. In this case for 6 hrs a day.
upvoted 1 times
...
...
LeoSantos121212121212121
Most Recent 1 month ago
Selected Answer: B
I go with B, once all the data is processed, there is no point in keeping the cluster running.
upvoted 1 times
...
youonebe
5 months ago
Selected Answer: B
B - Correct answer D - Keeping the cluster running with only task nodes terminated wastes resources Unnecessary costs incurred by maintaining primary and core nodes when not needed
upvoted 2 times
...
sashenka
6 months ago
Selected Answer: B
MRFS enables storing persistent data in Amazon S3. This means data remains available even after an EMR cluster is terminated, allowing for cost savings and data reuse across multiple clusters.
upvoted 3 times
sashenka
5 months, 2 weeks ago
The question specifically references that the EMR cluster is built with MRFS vs the default of HDFS which is not persistent. This approach of using EMRFS with transient clusters is not only possible but is considered the recommended architecture pattern for EMR deployments.
upvoted 1 times
...
...
pk0619
6 months, 1 week ago
Selected Answer: B
Just B
upvoted 2 times
...
that1guy
6 months, 1 week ago
Selected Answer: B
B, with EMRFS we can decouple storage from the nodes and write directly to S3, no need to keep all the nodes running. If you were to use HDFS you would have to keep the core nodes running as they store the data for HDFS.
upvoted 1 times
...
teo2157
11 months, 3 weeks ago
Selected Answer: A
The key point here is "Amazon EMR cluster that is using the EMR File System (EMRFS)", the EMR File System use S3 as persistent storage, so one the cluster finished the processing of data, the data is ready for the users but the cluster is no longer needed and it can be terminated without any issue.
upvoted 1 times
helloworldabc
8 months, 1 week ago
just D
upvoted 1 times
...
teo2157
10 months, 2 weeks ago
Changing my mind to B as the process is business critical and you shouldn´t use spot instances for any critical processing but the cluster can be terminated as the data is in S3 once it's processed.
upvoted 1 times
...
...
seetpt
11 months, 3 weeks ago
Selected Answer: D
D for me
upvoted 1 times
...
43c89f4
12 months ago
B - we should not terminate the cluster. D - once task is done can terminate the node. so my answer is D
upvoted 1 times
...
TonytheTiger
1 year ago
Selected Answer: D
Option D: How To / Use Case https://aws.amazon.com/blogs/big-data/strategies-for-reducing-your-amazon-emr-costs/
upvoted 2 times
...
Keval12345
1 year ago
Selected Answer: D
Terminating all instances make sense as these are not frequent jobs. They are run on once a day https://www.cloudforecast.io/blog/aws-emr-cost-optimization-guide/
upvoted 2 times
...
pangchn
1 year ago
Selected Answer: D
D for the one who chose B, the computer savings plan is a hourly commitment for consistent usage pattern. You will be charged even you shutdown the whole stack
upvoted 2 times
altonh
1 month, 4 weeks ago
But your hourly commitment will be lower.
upvoted 1 times
...
...
yog927
1 year, 1 month ago
Selected Answer: B
We can terminate the cluster and then read results from S3. Refer below EMR faq: Q: How does Amazon EMR use Amazon EC2 and Amazon S3? You can upload your input data and a data processing application into Amazon S3. Amazon EMR then launches a number of Amazon EC2 instances that you specified. The service begins the cluster execution while pulling the input data from Amazon S3 using S3 URI scheme into the launched Amazon EC2 instances. Once the cluster is finished, Amazon EMR transfers the output data to Amazon S3, where you can then retrieve it or use as input in another cluster. https://aws.amazon.com/emr/faqs/
upvoted 5 times
...
Dgix
1 year, 1 month ago
Selected Answer: B
We _can_ terminate the entire cluster, as EMRFS is specified – which stores the computational results in S3. Therefore, the cluster is not required after processing.
upvoted 3 times
...
career360guru
1 year, 1 month ago
Selected Answer: D
Option D because processed data is used later in the day.
upvoted 2 times
...
a54b16f
1 year, 1 month ago
Selected Answer: D
The difference between D and B is that whether to terminate whole EMR cluster, or do we need the EMR cluster after the 6 hour processing. The answer is yes, " the data is not referenced until late in the day" , EMRFS can't be access without EMR cluster. You may argue that you can access the underlying s3 directly. But, you would loss the benefits of EMR/EMRFS, which provide security control, and most importantly, performance and system throughput related to big data
upvoted 3 times
Syre
7 months, 2 weeks ago
Yes, EMRFS can be accessed without an active EMR cluster because EMRFS stores data in Amazon S3, which is a persistent object storage service independent of the EMR cluster. Here's how it works: EMRFS is essentially an extension of Amazon S3, allowing EMR clusters to use S3 as a storage layer for data. When you terminate an EMR cluster, the data in S3 remains intact and accessible
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago