A cloud-native organization is not meeting their service level objective (SLO) but has not exhausted their error budget. What should the organization prioritize?
If you're within the error budget, there's no need to pause for stability just yet. In Site Reliability Engineering (SRE) practice, an error budget is the acceptable amount of unreliability based on the SLO. if the SLO is not being met, but the error budget is not yet exhausted, it indicates that there’s still tolerance for some risk. Therefore, the organization can prioritize speed and innovation, such as releasing new features, as long as they stay within the error budget.
C. Stability to avoid prolonged user downtime
Since the organization is not meeting their SLO, they need to prioritize stability to avoid prolonged user downtime. This means focusing on fixing the underlying issues that are causing the SLO violations, rather than innovating on new features or releasing new features quickly.
In this scenario, the organization should prioritize stability to avoid prolonged user downtime. This means focusing on ensuring that the service is stable and reliable, even if it means delaying new feature releases or reducing the rate of innovation temporarily. By prioritizing stability, the organization can prevent prolonged outages or downtime, which can negatively impact the user experience and erode customer trust. Once stability is achieved, the organization can then focus on innovation and improving the user experience.
They still have error budget left.
The error budget gives developers clarity into how many failed fixes they can attempt without affecting the end user experience.
"The error budget is typically the space between the SLA and the SLO. This error budget gives developers
clarity into how many failed fixes they can attempt without affecting the end user experience."
When we monitor SLOs, our system begins to aggregate the data, and ties that back into error budgets which determines the allowable amount of system downtime or latency over a specific timeframe.
Response is correct (C). Both Devs and SRE team must ensure that the error budget does not become exhausted. To avoid it, releases have to stop for the time being until the error budget resets. The team would have to reprioritise to focus on reliability to get it back to an acceptable state.
Error budgets let you track how many bad individual events (like requests) are allowed to occur during the remainder of your compliance period before you violate the SLO. You can use the error budget to help you manage maintenance tasks like deployment of new versions https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring#defn-error-budget
“The development team can ‘spend’ this error budget in any way they like. If the product is currently running flawlessly, with few or no errors, they can launch whatever they want, whenever they want. Conversely, if they have met or exceeded the error budget and are operating at or below the defined SLA, all launches are frozen until they reduce the number of errors to a level that allows the launch to proceed.”
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Minza
3 days, 17 hours agochesterlp
3 months, 3 weeks agochai_gpt
4 months, 3 weeks ago__rajan__
5 months, 1 week agoJkzz
10 months, 1 week agoArimaverick
1 year agoSoftSami
1 year agoMatro71
1 year agotbolick6
1 year agomtpro
1 year, 4 months agoAkshay0403
1 year, 4 months agozellck
1 year, 6 months agoGigante
1 year, 6 months agoSbgani
1 year, 6 months agoLimeCake
1 year, 6 months agoGovindaraj
1 year, 6 months agojexmtropicscheatchatya
1 year, 6 months agomoi23
1 year, 1 month ago