Exam Certified Machine Learning Associate topic 1 question 12 discussion

Actual exam question from Databricks's Certified Machine Learning Associate

Question #: 12
Topic #: 1

[All Certified Machine Learning Associate Questions]

A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult.
Which of the following describes why?

A. Gradient boosting is not a linear algebra-based algorithm which is required for parallelization.
B. Gradient boosting requires access to all data at once which cannot happen during parallelization.
C. Gradient boosting calculates gradients in evaluation metrics using all cores which prevents parallelization.
D. Gradient boosting is an iterative algorithm that requires information from the previous iteration to perform the next step.
E. Gradient boosting uses decision trees in each iteration which cannot be parallelized.

Show Suggested Answer

Suggested Answer: D 🗳️

by Deuterium44 at Nov. 7, 2024, 10:13 a.m.

Comments

Submit Cancel

Deuterium44

7 months, 4 weeks ago

Selected Answer: D

D : Gradient boosting is an iterative, sequential algorithm where each tree is trained to correct the errors of the previous trees. This dependency on prior iterations means that each step relies on the output of the previous step

upvoted 1 times

...