Log Everything as JSON. Make Your Life Easier.

155 点作者 kiyoto大约 13 年前

27 条评论

NyxWulf大约 13 年前

I've seen several articles like this, and there are a number of things to consider.Logging to ascii means that the standard unix tools work out of the box with your log files. Especially if you use something like a tab delimiter, then you typically don't need to specify the delimiter.As an upside you are aren't storing the column definition in every single line, which if you are doing large volume traffic definitely matters. For instance we store gigabytes of log files per hour, grossing up that space by a significant margin impacts storage, transit and process times during write (marshallers and custom log formatting). Writes are the hardest to scale, so if I'm going to add scale or extra parsing time, I'd rather handle that in Hadoop where I can throw massive parallel resources at it.Next you can achieve much of the advantages of json or protocol buffers by having a defined format and a structured release process before someone can change the format. Add fields to the end and don't remove defunct fields. This is the same process you have to use with protocol buffers or conceptually with JSON to have it work.Overall there are advantages to these other formats, but the articles like this that I've seen gloss over the havoc this creates with a standard linux tool chain. You can process a LOT of data with simple tools like Gawk and bash pipelines. It turns out you can even scale those same processes all the way up to Hadoop streaming.

评论 #3897864 未加载

评论 #3897360 未加载

评论 #3897933 未加载

评论 #3897259 未加载

评论 #3897596 未加载

评论 #3898361 未加载

评论 #3898112 未加载

评论 #3899819 未加载

rachelbythebay大约 13 年前

This article feels like it would work just as well with "Protocol Buffers", "Thrift", "XML", or even maybe "ASN.1". If that's truly the case, maybe the better thing to say is "please don't (only) log in ASCII", followed by "please use a format which is hard to get wrong".JSON scares me a little. Don't you have to worry about escaping a whole bunch of characters just in case something gets the wrong idea about what you have in a field? I saw a page not too long ago which listed about a dozen characters which should be substituted in some manner when used in JSON.Full disclosure: I got tired of ASCII logging from my web server and wrote something to stream binary protocol buffers (!) to a file instead. <a href="http://rachelbythebay.com/w/2012/02/12/progress/" rel="nofollow">http://rachelbythebay.com/w/2012/02/12/progress/</a>

评论 #3896983 未加载

评论 #3896963 未加载

评论 #3897162 未加载

评论 #3897382 未加载

评论 #3896930 未加载

评论 #3896936 未加载

评论 #3897044 未加载

评论 #3896925 未加载

skrebbel大约 13 年前

The real takeaway is that log files invariably tend to become interfaces for something. They often end up being used for monitoring tools, business intelligence, system diagnostic tools, system tests, and so on. And they're great for this. But not when they contain sentences like "Opening conection...", which break half those tools the moment someone fixes the typo.The log strings became an interface. Avoid this. If it's an interface, it has to be specced, and it has to allow for backward compatibility, just like any other interface that crosses component / tool boundaries.Whether you do the actual data storage with JSON or something else doesn't matter. It's an implementation detail (though I agree that keeping it not only machine-readable, but also human-readable, is probably a good thing).Design the classes that represent log files, and treat them like they're part of a library API. Don't remove fields. Ideally, use the same classes for writing (from your main software) and parsing the logs (from all that other tooling) and include version information in the parser so that the class interface can be current yet the data can be ancient.

评论 #3897902 未加载

sciurus大约 13 年前

There's been a lot of noise about logging in the linux ecosystem lately.There's Project Lumberjack (<a href="http://bazsi.blogs.balabit.com/2012/02/project-lumberjack-to-improve-linux-logging/" rel="nofollow">http://bazsi.blogs.balabit.com/2012/02/project-lumberjack-to...</a>) to encourage applications to generate structured logs and better document/integrate tools for working with those logs. The proposed structure is Common Event Expression (<a href="http://cee.mitre.org/" rel="nofollow">http://cee.mitre.org/</a>).At the last kernel summit, ideas (<a href="http://lwn.net/Articles/492125/" rel="nofollow">http://lwn.net/Articles/492125/</a>) were presented on how to make kernel messages more structured.More radically, there's The Journal (<a href="http://lwn.net/Articles/468049/" rel="nofollow">http://lwn.net/Articles/468049/</a>), a proposed replacement for syslog.

评论 #3899271 未加载

a3_nm大约 13 年前

What if I need, say, to find the 10 IPs that make the most requests? With the Apache log format, I can write the following in about 15 seconds:<pre><code> cut -d ' ' -f1 log | sort | uniq -c | sort -nr | head </code></pre> Say you need to follow accesses to a particular file? The following quick and dirty one-liner probably works well enough:<pre><code> tail -f log | grep --line-buffered file.pdf </code></pre> How do you do that with json?Granted, as soon as your logs stop being a sequence of records (lines) with a fixed sequence of neatly delimited records, you will need something more than text. However, I still don't know of tools to work with json from the command line that are as concise, efficient, flexible and robust as the standard unix utilities for text.

评论 #3898378 未加载

评论 #3898424 未加载

评论 #3898141 未加载

评论 #3897945 未加载

Ixiaus大约 13 年前

I dunno, this feels like the "web developers" approach to logging. I can't say that it wouldn't be cool to be able to parse logs into a structured format, but honestly, tools are already there to parse logs that are very powerful (gawk + shell pipes + {whatever_unix_tool_you_can_think_of}). If you don't have programmers that can knock out a real quick awk liner to process a logfile for you in any custom way you want, then I could see where this approach (using JSON) is useful because then they can use something they are familiar with instead of something they are not. But really, you should know how to use the Unix tool chain if you're a programmer.

leif大约 13 年前

TSV and you're donesmallerreadable, esp. with `column -t <log`works with awk/cut/join/grep/sort/column/etc./etc./etc.if you have complicated enough logs that you can't maintain the shell scripts that parse them, you probably also have enough log data that json's going to blow up your space and you probably want indexes anyway, so throw it in a real database (oh hi I work for one of these, log analysis is actually one of our strong suits)but others have already commented to this effect

delinka大约 13 年前

"Alex ... [realizes] that someone added an extra field in each line"Someone?!? Who's touching the server configuration and why? Unless Alex put a publicly accessible web interface on his .conf files, this shouldn't be happening.Back on topic. The increase in size for logging in JSON could easily be a deal breaker.

评论 #3898478 未加载

评论 #3899277 未加载

jakejake大约 13 年前

I've done various different log formats over the years including JSON.One thing I've done for logging errors or warning is to log them in RSS format. I monitor them just like any RSS feed. It's really handy because there's already tons of ways to read these logs so we don't have to create anything.I wouldn't use this for a debug log because it would probably be unusable if there was a large volume of logs, but for watching errors it's great.

zmj大约 13 年前

This idea is as old as Lisp. <a href="http://sites.google.com/site/steveyegge2/the-emacs-problem" rel="nofollow">http://sites.google.com/site/steveyegge2/the-emacs-problem</a>

jacques_chester大约 13 年前

One of the non-functional requirements of logs is that they should be fast to write. Marshalling data into a structured format takes longer than spitting out sprintfs.If you really need structure for ease of querying, you might as well go all the way and throw it into a proper data store.

评论 #3897001 未加载

评论 #3896992 未加载

frsyuki大约 13 年前

We're also using Fluentd as well as original JSON-based logging libraries.Fluentd deals with JSON-based logs. JSON is good for human facing interface because it is human readable and GREPable.On the other side, Fluentd handles logs in MessagePack format internally. Msgpack is a serialization format compatible with JSON and can be an efficient replacement of JSON.I wrote plugin for Fluentd that send those structured logs to Librato Metrics (<a href="https://metrics.librato.com/" rel="nofollow">https://metrics.librato.com/</a>) which provides charting and dashboard features.With Fluentd, our logs became program-friendly as well as human-firnedly.

dasil003大约 13 年前

Loggly supports this, and they provided a good interface for querying the data as well. We used it for a while as a way to unify a couple GB of daily log data from our Rails app running on multiple instances. I even wrote a library that allows you to quickly add arbitrary keys to the request log entry anywhere in the app.Unfortunately we had to disable it temporarily as the Ruby client did not cope well when latency increased to the Loggly service. It was fine for a while since we are both on AWS, but one day our site started getting super slow. It took a while to track down the problem because the Loggly client has threaded delivery, so a given request would not be delayed. But the problem was that the next request couldn't be started until the delivery thread terminated.Okay I realize this is not the best architecture. There should be a completely isolated process that's pushing the queued logs to Loggly so that the app never deals with anything but a local logging service. Loggly supports syslogng, but that would be standard logging not JSON, so I think if we want to go this route we need to come up with something on our own...

Simpletoon大约 13 年前

I only need three programs to deal with anti-ASCII, pro-complexity JSON, XML, etc. crowd: tr, sed and lex.All the effort these Javascripters expend putting data JSON just gets undone by my custom UNIX-style filters; then I can actually work with the text.Are they making life easier? For who? Seems like it's just more work for everybody, translating text back and forth into myriad formats.But what can you do?

评论 #3897699 未加载

rhizome大约 13 年前

Except that fixed-field loglines are much faster to process than parsing JSON, which makes a difference when working with large logs.

评论 #3896937 未加载

daenz大约 13 年前

Logging to mongo (as JSON) has proven useful to us. Makes it easy to slice and dice the data.

评论 #3896971 未加载

joelthelion大约 13 年前

I would love to see a JSON based shell, instead of the traditional shells based on raw strings. Heck, we could have a whole ecosystem of tools built around JSON or similar semi-structured representations.

mmphosis大约 13 年前

Log Many things as [my favorite format]. Make My Life Easier by doing the difficult work.I would log in a fast compact, but not limited, and heavily documented binary format at a hardware level with lots of fail-safes. Maybe what I am doing is more appropriately called creating a journal. [My favorite scheduler] would very lazily and at opportunistic idle times convert the older non-human readable binary logs and insert the log data into [my favorite] database as very query-friendly information.

评论 #3897505 未加载

Hopka大约 13 年前

How do you even log as JSON?Is your entire log file a giant JSON array? That would be challenging for most parsers I know because they would have to read the entire array into memory first.Or do you log one JSON object per line? Then you would get problems as soon as you have line breaks inside strings and still have to parse until the object ends in some other line. Also, JSON objects do not have to be single-line to be valid, so you would in fact be working with some self-defined subset of JSON.

评论 #3898177 未加载

wolframarnold大约 13 年前

I like this idea a lot. Frameworks like Rails come with excellent log messages, granularity and a pub/sub mechanism. Often this can be a lower hanging fruit than throwing in a ton of custom instrumentation for some third party analytics tool, especially when you're pressed for time.My question is how fluentd can be hooked into Rails so that Rails' native messages use it and how does it work in the Heroku infrastructure?

kablamo大约 13 年前

I've been thinking about this recently as well. I wrote a simple JSON logger for Perl recently. It will probably be on CPAN this weekend. Until then you can see it on prepan and github.prepan <a href="http://prepan.org/module/3Yz7PYrBSd" rel="nofollow">http://prepan.org/module/3Yz7PYrBSd</a>github <a href="https://github.com/kablamo/Log-JSON" rel="nofollow">https://github.com/kablamo/Log-JSON</a>

thezilch大约 13 年前

Or provide unit test for said log parser and require (or don't) all tests to pass pre-commit. A JSON struct isn't going to stop your colleague from removing nor renaming a field. Removing the logging all together. Or changing the format himself, if your company is really setup for allowing colleagues to so easily break your code -- not that anyone's perfect.

anonymoushn大约 13 年前

Is it worth switching to JSON to avoid having to edit your bash 1-liner when you change the format of the log?

sauravc大约 13 年前

We've been logging all of our analytics data in JSON for years now.

majmun大约 13 年前

I tried this , it was no good ( because of escaping of special characters, and parsing performance. )then i switch to newline and n r . and all my problems were solved (for now)

webjunkie大约 13 年前

Okay, and as soon as I switch to JSON, I have not just 5 million referrers logged per day, I also have 5 million times the word "referrer" in my log. Nice.

评论 #3898476 未加载

wooptoo大约 13 年前

Why not go even further and store them in MongoDB?