科技回声

12 条评论

Great article! I always love hearing Stripe talking about their internals.I've been using this practice and I agree that it's incredibly useful. I think because people tend to think in terms of "logs", they end up overlooking the much more useful construct of "canonical logs". Many fine-grained logs themselves are almost always less useful than the fewer fully-described canonical logs. Other observability tools often call these "events" instead of "logs" for that reason.There's a tool call Honeycomb [1] that gives you exactly what this article's talking about in a really nicely designed package out of the box. And since it handles all of the ingestion and visualization, you don't have to worry about setting up Kafka, or the performance of logplexes, or teaching everyone SQL, or how to get nice graphs. I was a little skeptical at first, but after using it for over a year now I'm completely converted.If you record fully-described "events" for each request, and you use sub-spans for the smaller segments of requests, you also get a waterfall-style trace visualization. Which eliminates the last need for fine-grained logs completely.If this article seems interesting to you, I'd highly, highly recommend Honeycomb. (Completely unaffiliated, I just think it's a great product.)[1]: <a href="https://www.honeycomb.io/" rel="nofollow">https://www.honeycomb.io/</a>

评论 #20569957 未加载

评论 #20572641 未加载

评论 #20570642 未加载

评论 #20571104 未加载

评论 #20570798 未加载

firethief将近 6 年前

It's interesting that they've found denormalizing their log data so useful. I'm suprised to hear that that performs better for practical queries than a database with appropriate indexes, and that they've been able to build more ergonomic interfaces to query that than the standard relational approach a lot of people already have experience with. But I don't know much about log management at scale, so I'm only mildly surprised.

评论 #20569306 未加载

评论 #20570728 未加载

评论 #20569324 未加载

manigandham将近 6 年前

Strange that they went with plain text when the industry is converging on (newline delimited) JSON logs for structured data. This also serves as the backbone of observability with metrics and tracing also being folded into and output as JSON.Call them events and you can claim all the event-sourcing buzzwords too.

评论 #20570889 未加载

chrisweekly将近 6 年前

Related tangent: I can't say enough good things about [lnav](<a href="https://lnav.org" rel="nofollow">https://lnav.org</a>). It's like a mini-ETL powertool at your fingertips, w/ an embedded SQLite db and a terrific API. As of mid-2016 when I first used it, querying logs was extremely easy, and reasonably fast (w/ up to several million rows). Highest recommendation.Disclaimer: I have no affiliation w/ the project or its maintainer -- but out of gratitude I mention it pretty much every time it's appropriate.

osswid将近 6 年前

We've been using logging like this but with jsonl lines. Still easy to grep as straight text, but very handy to be able to parse with jq or other tools and be able to have rich values (or even substructures) as part of the log lines.

edsiper2将近 6 年前

Log structure is really important, from the examples provided I would suggest the same approach can be used using a full 'logfmt' style, so timestamp and the event type can be set as keys, e.g:<pre><code> ts="2019-03-18 22:48:32.990" event="Request started" http_method=POST http_path=/v1/charges request_id=req_123 </code></pre> the main difference is that you make easier the parsing since many tools can parse lOgfmt without problems.One interesting use-case here for 'me' is the ability to perform queries in a schema-less fashion and I will do a quick speech on what we are working on Fluent Bit[0] (open source log project), pretty much the ability to query your data while is still in motion (stream processing on the edge[1]). Consider the following data samples in a log file:<pre><code> ts="2019-03-18 22:48:32.990" event="Request started" http_method=POST http_path=/v1/charges request_id=req_123 ts="2019-03-18 22:48:32.991" event="User authenticated" auth_type=api_key key_id=mk_123 user_id=usr_123 ts="2019-03-18 22:48:32.992" event="Rate limiting ran" rate_allowed=true rate_quota=100 rate_remaining=99 ts="2019-03-18 22:48:32.998" event="Charge created" charge_id=ch_123 permissions_used=account_write team=acquiring ts="2019-03-18 22:48:32.999" event="Request finished" alloc_count=9123 database_queries=34 duration=0.009 http_status=200 </code></pre> so if I wanted to retrieve all events associated for user 123 I would process the file as follows:<pre><code> $ fluent-bit -R conf/parsers.conf \ -i tail -p alias=data -p path=canonical.log -p parser=logfmt \ -T "SELECT * FROM STREAM:data WHERE user_id='usr_123';" -o null -f 1 </code></pre> the output is:<pre><code> [1552949312.991000, {"event"=>"User authenticated", "auth_type"=>"api_key", "key_id"=>"mk_123", "user_id"=>"usr_123"}] </code></pre> the results are in a raw mode but can be exported to stdout in json, to elasticsearch, kafka or any output destination supported.One of the great things of the stream processor engine is that you can create new streams of data based on results, use windows of time (tumbling) for aggregation queries and such.[0] <a href="https://fluentbit.io" rel="nofollow">https://fluentbit.io</a>[1] <a href="https://docs.fluentbit.io/stream-processing" rel="nofollow">https://docs.fluentbit.io/stream-processing</a>

sethammons将近 6 年前

This is not unlike what we've been doing for years. We generate billions of log lines like this daily as json and inspect them with splunk. By having consistent values across log lines, we can query and do neat things. "What was our system timing in relation to users who have feature x?" "What correlations can we find between users whose requests took too long and were not throttled? -> ah, 99% of those requests show $correlation_in_other_kv_pair!"

评论 #20569595 未加载

thomas536将近 6 年前

What's the primary use case for this? I almost always only look at logs to debug things; very rarely to perform some sort of event math/analysis.

评论 #20571676 未加载

perq将近 6 年前

Very similar to what logsense.com does with logs except you don't need to have canonical log line as patterns are found automatically.

msoad将近 6 年前

is there any legal restriction of how long you can keep internal systems logs? if it's done right they don't contain PIIs but they _can_ be used to track people if you have enough logs.

评论 #20569304 未加载

评论 #20569368 未加载

评论 #20571177 未加载

winrid将近 6 年前

This kind of debugging is one reason I loved Sumologic's Join query feature.

chasers将近 6 年前

We do exactly this with Logflare and send stuff to BigQuery.

12 条评论

ianstormtaylor将近 6 年前

评论 #20569957 未加载

评论 #20572641 未加载

评论 #20570642 未加载

评论 #20571104 未加载

评论 #20570798 未加载

firethief将近 6 年前

评论 #20569306 未加载

评论 #20570728 未加载

评论 #20569324 未加载

manigandham将近 6 年前

评论 #20570889 未加载

chrisweekly将近 6 年前

osswid将近 6 年前

edsiper2将近 6 年前

sethammons将近 6 年前

评论 #20569595 未加载

thomas536将近 6 年前

What's the primary use case for this? I almost always only look at logs to debug things; very rarely to perform some sort of event math/analysis.

评论 #20571676 未加载

perq将近 6 年前

Very similar to what logsense.com does with logs except you don't need to have canonical log line as patterns are found automatically.

msoad将近 6 年前

is there any legal restriction of how long you can keep internal systems logs? if it's done right they don't contain PIIs but they _can_ be used to track people if you have enough logs.

Fast and flexible observability with canonical log lines

12 条评论

Fast and flexible observability with canonical log lines

12 条评论