At Takealot, we work with Python-based microservices. There are many moving parts and some services may be down at any time. To ensure that everything still works, we need to build in mechanisms to handle failure when requesting data from an upstream service.
We developed some patterns using retries and circuit breakers to make sure that we can handle a service being slow or unavailable, as well as not overwhelming it when it becomes available again.
I will take you through some of techniques that we use to achieve this, as well as a live demo of the techniques in action.