Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud
Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google
BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?
jvg637
Highly Voted 4 years, 8 months agorickywck
Highly Voted 4 years, 8 months agotheseawillclaim
Most Recent 1 year, 4 months agoenivid007
4 months agoabi01a
1 year, 7 months agosamdhimal
1 year, 10 months agoRodolfo_Marcos
1 year, 10 months agoDipT
1 year, 11 months agoDGames
1 year, 11 months agoodacir
1 year, 11 months agoRemi2021
2 years, 2 months agoFrankT2L
2 years, 6 months agoRemi2021
2 years, 8 months agoOmJanmeda
2 years, 8 months agoYaa
2 years, 9 months agobyash1
2 years, 10 months agomedeis_jar
2 years, 10 months agoMaxNRG
2 years, 11 months ago