<i>"Imagine having to go through 2.5GB of log entries from a failed software build — 3 million lines — to search for a bug or a regression that happened on line 1M."</i><p>If your build log files approach gigabytes in size, what you're going to need to do is <i>not</i> search for that bug. You're going to need to rethink your career and life choices. You're going to need to take a vacation, do some traveling, get some perspective. Figure out where it all went wrong and what to make of the years that remain. Life is short.
It would really be interesting to learn more about their domain. Builds logs you have control over and should be able to keep clean and actionable. Service or batch job logs could present a more difficult problem though. It would be useful to try and spot a key message halfway through a failed 20 minute process that is the reason you hit an error at the end. Again though using logging levels and grep should usually be enough. You can also flag excessively noisy code and improve the signal to noise of the output it produces.<p>In specific with a build system like Bazel you might execute 1,000,000 actions on a build producing lots of <i>internal</i> output but you only see the errors, and at most you have a few hundred lines to look through. That is managed in a few
key ways:<p>- Test output is completely hidden unless that specific test fails<p>- Build actions only produce output when they have an error, and it’s easy to keep it that way (because they stand out, and can be quickly fixed)<p>- Bazel does not tell you about <i>every</i> action that is run only ones that fail or take or a long time
hey folks, I am one of the authors of the article. Seeing an interesting conversation here.
Wanted to clarify a few points:
Our logs in general, and build logs produced by Jenkins in particular are all over the place. The usual suspects like Java build logs can be grepped or tailed, but the complicated use cases, that actually do produce anything from 10-20 megabyte to gigabytes of output in console leave very little room to investigate - and we do have lots and lots of these.
I am happy to answer any questions here.
Friendly reminder: if you can grep, then grep. If you can tail, then tail. If you can diff, then diff. This is an idea to try for all the cases left over.
Not sure I understand the problem they are solving, aren’t errors in logs something than can just be grepped for?<p>(Also: really cool illustrations. Very unexpected in a tech blog.)