TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Using GPT-3 for plain language incident root cause from logs

106 pointsby stochastimusover 4 years ago

8 comments

wzddover 4 years ago
It&#x27;s nice to get the textual description, but pretty much every specific detail of the extended explanation teased out at the end includes things which are more or less incorrect but which nonetheless sound very believable. In essence, what happened at the end was GPT-3 was asked to write an OOM-killer-inspired story. I think this should be a cautionary tale against trying to use GPT-3 to provide commentary beyond a high-level summary.<p>This isn&#x27;t a slight against the short-summary technique, which seems very cool.<p>Details: oom_adj isn&#x27;t a flag, it&#x27;s an int which can disable OOM on a per-process-leader basis but can also but can also be used to reduce the &quot;badness&quot; of a process when considering what to kill. Oom_adj is also deprecated and has been replaced by oom_score_adj. The OOM algorithm isn&#x27;t called RSS. It doesn&#x27;t seem to have been explicitly named, but the function which performs the key calculation is named oom_badness. This function assigns an integer &quot;badness&quot; to each process. A process&#x27; resident set size <i>is</i> an important part of calculating badness, but it&#x27;s affected by several other factors (what they are depends on kernel version but they include the adjustment parameter). RSS is not (part of) the OOM calculation &quot;by default&quot; -- it&#x27;s always included unless OOM is disabled entirely. RSS isn&#x27;t a comparison of reserved physical memory against current virtual size, it&#x27;s just the amount of RAM currently occupied by a process (i.e. not in swap or on disk). The OOM killer doesn&#x27;t compare RSS against virtual size. RSS doesn&#x27;t trigger the OOM killer. RSS isn&#x27;t an algorithm.<p>Another interesting aspect of this, of course, is that GPT-3 likely wasn&#x27;t trained on any specific kernel version, but on a large number of versions depending on which part of the Internet it happened to be reading. This means that it probably can&#x27;t give a good account of any single version of fast-changing parts of the kernel like the OOM killer.<p>Source: <a href="https:&#x2F;&#x2F;github.com&#x2F;torvalds&#x2F;linux&#x2F;blob&#x2F;master&#x2F;mm&#x2F;oom_kill.c" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;torvalds&#x2F;linux&#x2F;blob&#x2F;master&#x2F;mm&#x2F;oom_kill.c</a>
评论 #25756517 未加载
bbuover 4 years ago
This is pretty cool! However, these two samples are very simple to solve. I&#x27;d love an &quot;AI&quot; to find root causes for problems that are not obvious. Just throw the whole log collection at it and let it solve all the issues. One can dream ;)
评论 #25754207 未加载
评论 #25754211 未加载
评论 #25754387 未加载
mckirkover 4 years ago
That&#x27;s cool and all, but I&#x27;m pretty sure what we really want to see is<p>&quot;The expert described what had happened, in the form of a Haiku:&quot;
评论 #25754662 未加载
ativzzzover 4 years ago
So what do you do when GPT generates nonsense? Because it sometimes will, at least during my experiments, create something that is irrelevant or just plain wrong and would require human intervention. In other words, what is an acceptable failure rate for these summaries you generate?
评论 #25756580 未加载
评论 #25756885 未加载
brianjunyinchanover 4 years ago
Super interesting. I wonder what other latent domain-specific intelligence GPT-3 picked up during training, that is parseable with text in and text out. Like a flash cards generator?
评论 #25754513 未加载
评论 #25754246 未加载
EQVEYWDCHQover 4 years ago
This is interesting - I worked on a similar use case by parsing and tokenizing ZooKeeper logs, then converting logs to integer sequences and trying to determine whether or not services were going to experience a fault by training on said sequences, and thus determining what the cause of the fault was&#x2F;would be. Wasn&#x27;t too successful but definitely showed me how difficult it can be to work backwards from logs to root cause, esp. with limited data.
king_magicover 4 years ago
I&#x27;m fairly bearish on GPT-3, but this is actually a pretty cool application.
评论 #25755739 未加载
jacques_chesterover 4 years ago
Is there a reason I&#x27;d use this approach over a process mining &#x2F; log mining system? I feel like it needs me to guess the right question to get an answer.
评论 #25755133 未加载