Grepping logs is still terrible

100 点作者 _5csa大约 10 年前

27 条评论

ghshephard大约 10 年前

It's beyond me how he doesn't understand that text logs are a universal format, easily accessible, that can be instantly turned into whatever binary format you desire with a highly efficient insertion process (Splunk is just one of those that does a great job).Here is the thing he doesn't seem to understand - all of us who are sysadmins absolutely understand the value of placing complex and large log files into database so that we can query them efficiently. We also understand why having multi-terabyte text log files is not useful.But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location. Because you know what everyone has their own idea of what that primary storage location should be, and they are mostly incompatible with each other.The nice thing about text - for the last 40 years it's been universally readable, and will be for the next 40 years. Many of these binary repositories will be unreadable within a short period, and will be immediately unreadable to those people who don't know the magic tool to open them.

评论 #9504131 未加载

评论 #9504159 未加载

评论 #9504105 未加载

评论 #9504507 未加载

评论 #9506459 未加载

thaumaturgy大约 10 年前

Cool, so which standard binary log storage format should we all switch to?Should I submit patches to jawstats so that it'll support google-log-format 1.0 beta, or the newer Amazon Cloud Storage 5 format? Or both? Or just go with the older Microsoft Log Storage Format? Or wait until Gruber releases Fireball Format? Has he decided yet whether to store dates as little-endian Unix 64 bit int timestamps, or is he still thinking about going with the Visual FoxPro date format, y'know, where the first 4 bytes are a 32-bit little-endian integer representation of the Julian date (so Oct. 15, 1582 = 2299161) and the last 4 bytes are the little-endian integer time of day represented as milliseconds since midnight? (True story, I had to figure that one out once. Without documentation.)Should I write a new plugin for Sublime Text to handle the binary log formats? Or write something that will read the binary storage format and spit out text? Or is that too inefficient? Or should I give up on reading logs in a text form at all and write a GUI for it (maybe in Visual Basic)?Do you know when I should expect suexec to start writing the same binary log format as Apache, or should I give up waiting on that and just write a daemon to read the suexec binary logs and translate them to the Apache binary logs?Should I take the time to write a natural language parsing search engine for my custom binary log format? Do you think that's worth the time investment? I would really like to be able to search for common misspellings when users ask about a missing email, you know, like "/[^\s]+@domain.com/" does now.I look forward to your guidance. I've been eagerly awaiting the day that I can have an urgent situation on my hands and I can dig through server logs with all of the ease and convenience of the Windows system logs.

评论 #9504586 未加载

pjc50大约 10 年前

Binary logs may be fine for you, but don't force it on us!This is really the important point here. For small systems, grep works fine. The number of people administering small systems is much greater than the number of people administering large systems. The systemd controversy has caused people to fear that change they don't want will be imposed on them and their objections insultingly dismissed: a consequence of incredibly bad social "change management" by its proponents.They are therefore deploying pre-emptive rhetorical covering fire against the day when greppable logs will be removed from the popular Linux distributions. Plain text is the lingua franca; binary formats bind you to their tools with a particular set of design choices, bugs and disadvantages. My adhoc log grepping workflow has a different set of bugs and disadvantages, but they're mine.

评论 #9504180 未加载

评论 #9504273 未加载

评论 #9504142 未加载

评论 #9504823 未加载

rlpb大约 10 年前

Take this philosophy to an extreme and you end up with a dedicated data format and tooling/APIs to access the data for every subsystem, not just logging. Essentially, this is Windows.The downside to this is that now you don't have a set of global tools which can easily operate across these separate datasets without writing code against an API. I hear PowerShell tackles this; I don't know how well. The general principle though harms velocity at just getting something simple done, to the benefit of being able to do extremely complex things more easily. See Event Viewer for a good example of this.Logs don't exist in isolation. I want to use generally global tooling to access and manipulate everything. I don't want to have to write (non-shell) code, recall a logging-specific API or to have to take the extra step of converting my logs back to the text domain in order to manipulate data from them against text files I have exported from elsewhere for a one-off job. An example might be if I have a bunch of mbox files and need to process them against log files that have message IDs in them. I could have an API to read the emails, and an API to read the logs, or I could just use textutils because I know an exact, validating regexp is not necessary and log format injection would have no consequence in this particular task.I do see the benefits of having logs be better structured data, but I also see downsides of taking plain text logs away. Claiming that there are no downsides, and therefore no trade-off to be made, is futile. It's like playing whack-a-mole, because nobody is capable of covering every single use case.

评论 #9504706 未加载

mugsie大约 10 年前

Honestly - I agree about the ELK stack side - piping all your logs into ES / Logstash is a great idea. (Or Splunk / Greylog / Logentries)If you run any sort of distributed system, this is vital. And while that counts as binary logs, I would argue that on the local boxes it should stay text.I would agree, if you are running any sort of complex queries on your data - go to logstash, and do it there - it much nicer than regexes.If on the other hand, you just want to see how a development environment is getting on, or to troubleshoot a known bad component tail'ing to | grep (or just tail'ing depending on the verbosity of your logs) is fine.I don't have to remember some weird incantation to see the local logs, worry about corruption etc.One problem I will point out with the setup described is syslon-ng can be blocking. If the user is disconnected from the central logstash, and their local one dies, as soon as the FIFO queue in syslog-ng fills, good luck writing to /dev/log , which means things like 'sudo' and 'login' have .... issues.Instead, if you have text files being written out, and something like beaver collecting them and sending them to logstash, you have the best of both worlds.

Spooky23大约 10 年前

Windows has had binary logging forever. Is windows administration some wonderland of awesome capability for getting intelligence out of logs? Hell no.For administering Unix like systems, the ability to use a variety of tools to process streams of text is an advantage and valuable capability.That said, your needs do change when you're talking about managing 10 vs 10,000 vs 100,000 hosts. I think what you're really seeing here is a movement to "industrialize" the operations of these systems and push capabilities from paid management tools into the OS.

评论 #9505341 未加载

indymike大约 10 年前

Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are down/crashing/losing data is far worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...

phn大约 10 年前

Yeah,in the presence of adequate tooling you don't need to grep logs. But how much more effort is required to use those tool-friendly loggings? Where is your god when the tool fails?For me the main reason to access plaintext logs is they seldom fail, and they are simple. They are a bore to analyse, they CAN be analysed.Anyway, this discussion only makes sense if the task at hand involves heavy log analysis, don't complicate what is simple when it isn't needed.As for the razor analogy, you're right, however I wouldn't change my beard to be "razor compatible only". In the software world I'd say it is still not uncommon to find yourself "stranded in a desert island".

laumars大约 10 年前

Oh jeez. Yes there are better and more performant tools for parsing optimised binary databases; nobody disputes that. And yes, tools like Splunk are more user friendly than grep; nobody disputes that either. But to advocate a binary only system for logs is short sighted because logs are the goto when everything else fails and thus need to be readable when every other tool dies. There's quite a few scenarios that could cause this too:<pre><code> * log file corruption - text parsing would still work, * tooling gets deleted - there's a million ways you can still render plain text even when you've lost half your POSIX/GNU userland, * network connection problems, breaking push to a centralised database - local text copies would still be readable. </code></pre> In his previous blog post he commented that there's no point running both a local text version and a binary version, but since the entirety of his rant is really about tooling rather than log file format, I'm yet to see a convincing argument against running the two paradigms in parallel.

评论 #9504519 未加载

评论 #9504428 未加载

arpa大约 10 年前

This is a discussion for a sake of discussion. The way I see it is that author has a niche situation on his hands and therefore should use a product designed for that particular niche, instead of complaining how everyone's wrong and trying to shove his perspective down peoples' throats.

4ydx大约 10 年前

Sounds like somebody in the systemd camp. I really dislike added complexity when it is totally unnecessary. If people want to transform their logs into a different storage format, that is up to them. Text files, however, are a fantastically simple way of storing... (drumroll please) text. Surprising /s

robinhouston大约 10 年前

> For example: find all logs between 2013-12-24 and 2015-04-11, valid dates only.That’s a straw man. If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates. But I suppose<pre><code> 2013-12-(2[4-9]|3.)|2014-..-..|2015-0([123]-..|4-(0.|1[01])) </code></pre> doesn’t look so bad.The whole thing is similarly exaggerated.

评论 #9504139 未加载

评论 #9504097 未加载

评论 #9506213 未加载

评论 #9504423 未加载

评论 #9504533 未加载

评论 #9504279 未加载

erikb大约 10 年前

After reading the article I wonder if there are lots of tools that do all the binary advantages in indexes but leave the logs as text files, why that is not fine. To get the binary advantage the log does not have to be binary.The example with the timestamps is also strange. No matter how you store the timestamps, parsing a humanly reasonable query like "give me 10 hours starting from last Friday 2am" to an actual filter is a complex problem. The problem is complex no matter how you store your timestamp. You can choose to do the complexity before and create complex index structures. You can choose to have complex algorithms to parse simple timestamps in binary or text form, you can build complex regexes. But something needs to be complex, because the problem space is. Just being binary doesn't help you.And that's really the point here, isn't it? Just being binary in itself is not an advantage. It doesn't even mean by itself that it will save disk space. But text in itself is an advantage, always, because text can be read by humans without help (and in some instances without any training or IT education), binary not.Yesterday I was thinking there might be something about binary logs. Now I'm convinced there isn't. The only disadvantage seems to be that you also lose disk space if you store it in clear text. But disk space isn't an issue in most situations (and in many situations where it is an issue you might have resources and tools at hand to handle that as well) It is added complexity for no real advantage. Thanks for clearing that up.

评论 #9504459 未加载

评论 #9504378 未加载

评论 #9504574 未加载

indymike大约 10 年前

Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are is worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...

4ydx大约 10 年前

My main problem with this is that ascii is not something that will ever change over time. The data format is wonderfully static. Forever. Introduce a binary format? You get versioning. It is a major downside.

Frondo大约 10 年前

What you lose when you move away from text logs is not any real benefit; what you lose is the illusion of control you have with text logs.Text logs can be corrupted, text logs can be made unusable, you need a ton of domain-specific knowledge to even begin to make sense of text logs, etc.But there's always a sense that, if you had the time, you could still personally extract meaning from them. With binary logs, you couldn't personally sit there and read them out line by line.The issue is psychology, not pragmatism, and that's why text logs have been so sticky for so long.

评论 #9509503 未加载

jack9大约 10 年前

> Does database store the data in text files? No? That's my point.This guy is a first class idiot who knows enough to reformulate a decided issue into yet another troll article. "a database (which then goes and stores the data in a binary format)". How about a text file IS a database. It's encoded 1s and 0s in a universal format instead of the binary DB format which can be corrupted with the slightest modification or hardware failure.

KaiserPro大约 10 年前

I think there are a number of issues that are getting mushed into one.* Journal is just terrible.* some text logs are perfectly fine.* when you are in rescue mode, you want text logs* some people use text logs as a way to compile metricsI think the most annoying thing for me about journald is that it forces you to do something their way. However its optional, and in centos7 its turned off, or its beaten into such a way that I haven't noticed its there.... (if that is the case, I've not really bothered to look, I poked about to see if logs still live in /var/log/ they did, and that was the end of it. Yes, I know that if this is the case, I've just undermined my case. Shhhhh.)/var/log/messages for kernel oopes, auth for login, and all the traditional systemy type things are good for text logs. Mainly because 99.9% of the time you get less than 10 lines a minute.being able to sed, grep, tee and pipe text files are brilliant on a slow connection with limited time/mental capacity. ie. a rescue situation. I'm sure there will be a multitude of stable tools that'll popup to deal with a standardised binary log format, in about ten years.The last point is the big kicker here. This is where, quite correctly its time to question the use of grep. Regex is terrible. Its a force/problem amplfier. If you get it correct, well done. Wrong? you might not even know.Unless you don't have a choice, you need to make sure that your app kicks out metrics directly. Or as close to directly as possible. Failing that you need to use something like elastic search. However because you're getting the metrics as an afterthought, you have to do much more work to make sure that they are correct. (although forcing metrics into an app is often non trivial)If you're starting from scratch, writing custom software, and think that log diving is a great way to collect metrics, you've failed.if you are using off the shelf parts, its worth Spending the time and interrogating the API to gather stats directly. you never know, collectd might have already done the hard work for you.The basic argument he puts forth is this: text logs are a terrible way to interchange and store metrics. And yes, he is correct.

评论 #9504369 未加载

sika_grr大约 10 年前

Of course you need to log some data in textual format for emergencies, but if you had a tool that indexes events on timestamps, servers, monitorees, severity and event type, while severely reducing the storage required, you would be able to log much more data, and find problems faster. Arguing binary vs text logs is like arguing serial port vs USB on some industrial systems.

arenaninja大约 10 年前

Great to see some effort in this area. I've been using New Relic and it's pretty great for errors because we've setup Slack/email notifications. However, there's nothing for general log (e.g.: access log) parsing. I'm installing an ELK stack on my machine right now and hope that it's enough

amelius大约 10 年前

Doesn't this just mean that we should have a more "intelligent" version of grep? For example, this "supergrep" could periodically index the files it is used on, so searching becomes faster.

erikb大约 10 年前

*edit: I'm wrong, this was not the link posted yesterday. <a href="https://news.ycombinator.com/item?id=9496850" rel="nofollow">https://news.ycombinator.com/item?id=9496850</a>

hartator大约 10 年前

Isn't everything will be solved by a some kind of grep that's date/timespan aware?

lurkinggrue大约 10 年前

But then how will I watch the log files go by in real time?

deathanatos大约 10 年前

It seems to me that most of the worry about a binary log file being "opaque" could be solved with a single utility:<pre><code> log-cat <binary-log-file> </code></pre> … that just outputs it in text. Then you can attack the problem with whatever text-based tools you want.But to me, having a utility that I could do things like, get a range of log lines — in sorted order —, or, grep on just the message, would be amazing. These are all things that proponents of grep I'm sure will say "you can!" do with grep… but you can't.The dates example was a good one. I'd much rather:<pre><code> log-cat <bin-log> --from 2014-12-14 --to 2015-01-27 </code></pre> Also, my log files are not "sorted". They are, but they're sorted _per-process_, and I might have multiple instances of some daemon running (perhaps on this VM, perhaps across many VMs), and it's really useful to see their logs merged together[2]. For this, you need to understand the notion of where a record starts and ends, because you need to re-order whole records. (And log records' messages are _going_ to contain newlines. I'm not logging a backtrace on one line.) grep doesn't sort. |sort doesn't know enough about a text log to adequately sort, but<pre><code> $ log-cat logs/*.log --from 2014-12-14 --to 2015-01-27 <sorted output!> </code></pre> Binary files offer the opportunity for structured data. It's really annoying to try to find all 5xx's in a log, and your grep matches the process ID, the line number, the time of day…I've seen some well-meaning attempts at trying to do JSON logs, s.t. each line is a JSON object[1]. (I've also seen it attempted were all that is available is a rudimentary format string, and the first " breaks everything.)Lastly, log files sometimes go into metrics (I don't really think this is a good idea, personally, but we need better libraries here too…). Is your log format even parseable? I've yet to run across one that had an unambiguous grammar: a newline in the middle of a log message, with the right text on the second line, can easily get picked up as a date, and suddenly, it's a new record. Every log file "parser" I've seen was a heuristic matcher, and I've seem most all of them make mistakes. With the simple "log-cat" above, you can instantly turn a binary log into a text one. The reverse — if possible — is likely to be a "best-effort" transformation.[1]: the log writer is forbidden to output a newline inside the object. This doesn't diminish what you can output in JSON, and allows newline to be the record separator.[2]: I get requests from mobile developers tell me that the server isn't acting correctly all the time. In order to debug the situation, I first need to _find_ their request in the log. I don't know what process on what VM handled their request, but I often have a _very_ narrow time-range that it occurred in.

评论 #9506811 未加载

imaginenore大约 10 年前

Logstash, Kibana, Splunk

geographomics大约 10 年前

Windows systems have had better log querying tools than grep for years now, with a well structured log file format to match. It's good to see Linux distributions finally catching up in this regard.Not that the log files on Linux are all entirely text-based anyway. The wtmpx and btmpx files are of a binary format, with specialised tools for querying. I don't see anyone complaining about these and insisting that they be converted to a text-only format.

27 条评论

ghshephard大约 10 年前

评论 #9504131 未加载

评论 #9504159 未加载

评论 #9504105 未加载

评论 #9504507 未加载

评论 #9506459 未加载

thaumaturgy大约 10 年前

评论 #9504586 未加载

pjc50大约 10 年前

评论 #9504180 未加载

评论 #9504273 未加载

评论 #9504142 未加载

评论 #9504823 未加载

rlpb大约 10 年前

评论 #9504706 未加载

mugsie大约 10 年前

Spooky23大约 10 年前

评论 #9505341 未加载

indymike大约 10 年前

phn大约 10 年前

laumars大约 10 年前

评论 #9504519 未加载

评论 #9504428 未加载

arpa大约 10 年前

4ydx大约 10 年前

robinhouston大约 10 年前

评论 #9504139 未加载

评论 #9504097 未加载

评论 #9506213 未加载

评论 #9504423 未加载

评论 #9504533 未加载

评论 #9504279 未加载

erikb大约 10 年前

评论 #9504459 未加载

评论 #9504378 未加载

评论 #9504574 未加载

indymike大约 10 年前

4ydx大约 10 年前

Frondo大约 10 年前

评论 #9509503 未加载

jack9大约 10 年前

KaiserPro大约 10 年前

评论 #9504369 未加载

sika_grr大约 10 年前

arenaninja大约 10 年前

amelius大约 10 年前

Doesn't this just mean that we should have a more "intelligent" version of grep? For example, this "supergrep" could periodically index the files it is used on, so searching becomes faster.

erikb大约 10 年前

*edit: I'm wrong, this was not the link posted yesterday. <a href="https://news.ycombinator.com/item?id=9496850" rel="nofollow">https://news.ycombinator.com/item?id=9496850</a>

hartator大约 10 年前

Isn't everything will be solved by a some kind of grep that's date/timespan aware?

lurkinggrue大约 10 年前

But then how will I watch the log files go by in real time?

deathanatos大约 10 年前

评论 #9506811 未加载

imaginenore大约 10 年前

Logstash, Kibana, Splunk

geographomics大约 10 年前