Sponsored Webinar + Live Q&A
Using Generic Mitigations and Playbooks as Code to Improve Reliability
Every company lives in fear of an outage, or other production incident that wakes engineers up in the middle of the night, and results in unhappy customers trying to access services. It's more vital than ever to mitigate the customer impact of outages as fast as possible. In this session, StackPulse CTO Leonid Belkind discusses the concept of "generic mitigations" and how this approach leads to a saner incident experience for on-call teams as well as improved customer experience during outages. He'll also give tips on how to get started with generic mitigations by turning incident response playbooks from documents to executable code - and the benefits this brings.
Speaker
Leonid Belkind
Co-Founder and Chief Technology Officer @StackPulse
Leonid Belkind is a Co-Founder and Chief Technology Officer at StackPulse, a Site Reliability Engineering orchestration platform. Prior to StackPulse, Leonid co-founded (and was CTO of) Luminate Security, a pioneer in Zero Trust Network Access and Secure Access Services Edge. At Luminate, Leonid...
Read moreFind Leonid Belkind at:
Session Sponsored By
StackPulse is a reliability platform for developers, SREs and on-call engineers. StackPulse helps teams detect, respond to, and remediate production incidents faster by turning troubleshooting and incident response workflows into code-based playbooks. Teams use StackPulse to reduce alert fatigue, eliminate manual toil, and provide more reliable services to their end users.
From the same track
Finding Harmony: Resilient Systems and Compensatory Growth
Thursday May 27 / 01:10PM EDT
Many engineering teams are locked in the Sisyphean battle to prevent incidents while also accepting the SRE tenet that failure is normal. Often this results in burnout. What if there’s a better, more harmonious way? I’ll share how truly embracing failure can lead to more resilient...
Jason Yee
Director of Advocacy @Gremlin