Jump to content

SLO/Template instructions/Client-facing

From Wikitech

Who are the service's clients?

If all the clients are other internal WMF services, they should be identifiable. If some or all clients are external users (human or automated) characterize them in as much detail as possible, for the purpose of assessing their reliability needs.

For complicated services with a variety of distinct use cases, one way to catalogue your clients is to make a list of user journeys -- that is, scenarios like "a user logging into their account" or "a user editing an article" -- that eventually depend on your service's functionality. Then break down that list of user journeys by identifying, in each case, what piece of software directly contacts your service to play its role. ("For logged-in page views, service X calls us to fetch a key, but for edits, service Y fetches the key from us and then service Z writes it back.")

📝 List, or characterize, the service's clients. You don't need to list your clients' clients, unless they also depend on you directly.

What are the request classes?

Some services receive more than one kind of request, where each is subject to a different SLO. For example, read requests may have better latency guarantees than write requests.

Some classes of requests might be ineligible for any SLO guarantees, such as batch requests over a certain size. (These requests aren't invalid -- it goes without saying that malformed requests will be served errors. SLO-ineligible requests might still be served in a "best effort" fashion -- no guarantees, but in practice they'll often work.)

Ideally, the classes should be constructed such that a request can be classified based only on the request -- not on the response or server state. In other words, a client should be able to classify a request before sending it. In practice, this isn't always possible, and that's okay.

As above, one way to catalogue your request classes is to make a list of user journeys and identify what types of request are made in each case. (In the above example, maybe the fetches from services X and Y are functionally the same, making up one request class, and the write from service Z belongs to another. Or maybe the fetch from service Y requires a freshness guarantee and consequently has a longer latency deadline.)

📝 If your service has only one request class, you can delete this section. Otherwise, list all request classes, and the criteria for determining what class a request belongs to. If any classes are ineligible for the SLO as described above, label them.

Next, move on to writing the service level indicators.