The video on-demand of this session is available to logged in QCon attendees only. Please login to your QCon account to watch the session.

Session + Live Q&A

Resources & Transactions: A Fundamental Duality in Observability

Fundamentally, there are only two types of “things worth observing” when it comes to production systems:

  1. Resources, and
  2. Transactions

The tricky (and interesting) part is that they’re entirely codependent. “Transactions” are the things that traverse your system and (hopefully) “do something.” The classic example would be an end-user request that propagates across networks and process boundaries. “Resources” are the things in your system that transactions *consume* in order to do their work. Resources survive across transactions. As Transactions transact, they use Resources. And as Resources saturate, Transactions suffer.

You know what that suffering feels like: ordinary DB queries that suddenly get really slow; backend services that don’t respond to requests anymore; OOM flapping; etc. The tricky part is that *end-users don’t care about Resources!* At all. End-users only care about their Transactions, and yet, as operators, we cannot disentagle those from the Resources they consume. It turns out that observability can help here, but only if we move far beyond the misleading telemetry-based silos of “traces, metrics, and logs.” We must reframe the question in terms of *Change*, and the join points between Resources and Transactions. In this talk, we will explore these topics, both theoretically and through some real-world examples, and develop an intuition for how we can understand our systems more completely.


Speaker

Ben Sigelman

CEO and co-founder @LightStepHQ, Co-creator @OpenTracing API standard

Ben Sigelman is a Cofounder & CEO at Lightstep, a company that makes complex microservice applications more transparent and reliable. He is an expert in distributed tracing and also co-founded the OpenTelemetry project.

Read more
Find Ben Sigelman at:

Date

Tuesday May 18 / 12:00PM EDT (40 minutes)

Track

Observability and Understandability in Production

Topics

ObservabilityDevops

Add to Calendar

Add to calendar

Share

From the same track

Session + Live Q&A Incident Management

More More More! Why the Most Resilient Companies Want More Incidents

Tuesday May 18 / 10:00AM EDT

Major tech companies like Facebook, Google, and Netflix want more incidents, not fewer. NASA wants them so urgently that they import incidents from other companies. The reason? Postmortems. This talk will focus on how companies of any scale can improve their ingestion of understandability by...

John Egan

CEO and Co-Founder @Kintaba

Session + Live Q&A Observability

Observing and Understanding Failures: SRE Apprentices

Tuesday May 18 / 11:00AM EDT

In this session, Tammy will share how Padawans and Jedis can inspire and teach us how to help people of a wide variety of backgrounds, ages, and experience levels to observe and understand failures in production. Tammy will share how she and a colleague created an SRE Apprentice program to hire...

Tammy Bryant Butow

Principal Site Reliability Engineer @Gremlin

PANEL DISCUSSION + Live Q&A Observability

Panel: Observability and Understandability

Tuesday May 18 / 01:00PM EDT

This panel will feature experienced practitioners who have worked in the engineering teams of Google, Facebook, Dropbox. & MongoDB. They are all now working at startups focused on helping engineers improve their ability to reduce downtime and customer-impacting failures. Hear from this panel...

Jason Yee

Director of Advocacy @Gremlin

John Egan

CEO and Co-Founder @Kintaba

Ben Sigelman

CEO and co-founder @LightStepHQ, Co-creator @OpenTracing API standard

View full Schedule