Quite often these days, I hear that UODP and reliability are two things that contradict each other. Indeed, at first, if you are focusing on onboarding customers, how/why/when one should be focusing on reliability? Does this even need? How can one justify the objective "to improve reliability" in the expanse of reducing the number of customers that we will be onboarding in the next month/quarter/year?
All these questions usually are symptoms of misunderstanding some parts of UODP frameworks, which leads to one of the several outcomes:
- teams do not do anything related to reliability at all until it is too late
- teams religiously following the book with "best reliability practices" and applying everything from the book
The second (applying all possible best practices) case often much more enjoyable. Since often such team straggling to calculate how much resources they should put on reliability vs other initiatives. All of them? But what is even more interesting, in such a case, is that the team still often ends up with un-happy customers.
So, let's focus today on the following question: according to the UODP, when and how should we prioritize reliability efforts?
Define Reliability
Let start with defining the reliability according to the UODP framework. The critical rule of UODP is this: we should always be working backward from the customer's needs. In case one's product does not have customers, one does not care. In case a product has one customer, the process of releasing/escalation/communication can be established with direct engagement with the customer. However, when one has thousands of customers, we no longer can maintain direct connections with all of them! How will you make sure that you still focusing on a customer when you have thousands of them?
Now let's look at the teams claiming that they are using UODP, and do not do reliability. Such a case almost always means that they are focusing only on either new customers or on a very small amount of the old customers. Since they are still focusing on the customers, it gives the illusion that work is always done backward from the customer's needs. But in reality, while the team is focusing on new customers, existing customers might be struggling.
But how to be customer-oriented when you have thousands of customers? Here is where reliability can help. According to UODP, reliability includes two parts:
- Building a model of your average user
- Keep this user happy
This definition provides two key points.
First, as you can see, it defines when one needs to start paying attention to reliability: when you have a total number of customers so big that no longer can be covered by deep engagements.
Secondly, this definition helps to avoid doing reliability for the sake of doing reliability. It might be surprising, but time to time, measuring too many things is as wrong as measuring nothing. I've seen many teams investing heavily in reliability just to find out that their customers still not happy. UODP helps you to avoid the "limping cow problem".
How To Implement Reliability?
Ok, so now we have a good understanding of when and why we should be paying attention to reliability. But how much resources to allocate? And to what exactly?
Let's start by answering the question of what exactly does it means to "build a model of your customer" and "keep it happy"? UODP always prioritizes deep engagement; for you modeled customers, deep engagement means the following things:
- From your previous deep engagements, you should create a list of main critical user journeys (CUJ) that your users frequently doing
- Build key service level indicators (SLI) that will allow you to measure the "health status" of each CUJ
- Define service level objective (SLO) per each SLI
Now you have created "deep engagement" with your "modeled user", you will know when the user is unhappy. More importantly, now you will know what you need to fix and by when, and therefore a question for the resource investment will be resolved on its own.