I was excited by the title and thought that this was going to be about versioning the observability contracts of services, dashboards, alerts, etc., which are typically exceptionally brittle. Boy am I disappointed.<p>I get what Charity is shouting. And Honeycomb is incredible. But I think this framing overly simplifies things.<p>Let's step back and imagine everything emitted JSON only. No other form of telemetry is allowed. This is functionally equivalent to wide events albeit inherently flawed and problematic as I'll demonstrate.<p>Every time something happens somewhere you emit an Event object. You slurp these to a central place, and now you can count them, connect them as a graph, index and search, compress, transpose, etc. etc.<p>I agree, this works! Let's assume we build it and all the necessary query and aggregation tools, storage, dashboards, whatever. Hurray! But sooner or later you will have this problem: a developer comes to you and says "my service is falling over" and you'll look and see that for every 1 MiB of traffic it receives, it also sends roughly 1 MiB of traffic, but it produces 10 MiB of JSON Event objects. Possibly more. Look, this is a very complex service, or so they tell you.<p>You smile and tell them "not a problem! We'll simply pre-aggregate some of these events in the service and emit a periodic summary." Done and done.<p>Then you find out there's a certain request that causes problems, so you add more Events, but this also causes an unacceptable amount of Event traffic. Not to worry, we can add a special flag to only emit extra logs for certain requests, or we'll randomly add extra logging ~5% of the time. That should do it.<p>Great! It all works. That's the end of this story, but the result is that you've re-invented metrics and traces. Sure, logs -- or "wide events" that are for the sake of this example the same thing -- work well enough for almost everything, except of course for all the places they don't. And now where they don't, you have to reinvent all this <i>stuff</i>.<p>Metrics and traces solve these problems upfront in a way that's designed to accommodate scaling problems before you suffer an outage, without necessarily making your life significantly harder along the way. At least that's the intention, regardless of whether or not that's true in practice -- certainly not addressed by TFA.<p>What's more is that in practice metrics and traces <i>today</i> are in fact <i>wide events</i>. They're <i>metrics</i> events, or <i>tracing</i> events. It doesn't really matter if a metric ends up scraped by a Prometheus metrics page or emitted as a JSON log line. That's besides the point. The point is they are fit for purpose.<p>Observability 2.0 doesn't fix this, it just shifts the problem around. Remind me, how did we do things <i>before</i> Observability 1.0? Because as far as I can tell it's strikingly similar in appearance to Observability 2.0.<p>So forgive me if my interpretation of all of this is lipstick on the pig that is Observability 0.1<p>And finally, I <i>get</i> you <i>can</i> make it work. Google certainly gets that. But then they built Monarch anyways. Why? It's worth understanding if you ask me. Perhaps we should start by educating the general audience on this matter, but then I'm guessing that would perhaps not aid in the sale of a solution that eschews those very learnings.