TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How I hunt down and fix errors in production

64 点作者 The_Amp_Walrus大约 3 年前

8 条评论

aaronbwebber大约 3 年前
An important step here that is missing here is evaluating if your fix is going to cause other, potentially worse problems. I suspect that in this case, it&#x27;s fairly unlikely that increasing the maximum POST body size to 60 MB is going to cause problems - eyeballing that Sendgrid chart, it looks like we are not dealing with very high throughput here. But it&#x27;s not hard to imagine a situation where tripling the max POST body size would result in a large increase in server memory usage, which could result in things like OOM kills, which could result in a lot of people not getting their reply emails or whatever.<p>So don&#x27;t just rush a fix out. Think about what the effects of a configuration change like this might be, and whether you are just making more problems for yourself down the line trying to fix something quickly.
评论 #31257315 未加载
评论 #31256883 未加载
mtippett大约 3 年前
I agree with most of what is suggested in the article.<p>However a big part is missing is the reality that there are a set of hypotheses (is that right) in play at any point in time. A lot of debugging is the cycle of<p>1. Think about the system, gather any available data - you can&#x27;t boil the ocean 2. Consider a set of hypotheses possible cause (even if it is a partial cause) 3. Seek any method to either refute or confirm the possible cause which gives more data.<p>Wash, rinse, repeat. Each cycle will likely get closer to the problem.<p>Each cycle also is likely to find other tech debt that needs to be solved.<p>Rarely is there a single hypothesis that is right first time. Although an experienced person will prune out a lot of poor ideas automatically, and likely subconsciously.<p>Observability goes a long way to getting the data needed to confirm or refute.
notaspecialist大约 3 年前
When a user comes over and says &quot;this isn&#x27;t happening&quot; I write a test and sure enough, the test fails. I fix the case, re-run all the tests, push to UAT, and ask the user to verify it works in the UAT system. It&#x27;s pushed into production after hours.<p>Prior to TDD I would spend hours stepping through code, setting variables to replicate the scenario, scratching my head, and usually fix it after a week or so. Then I would get a bug report of something else weird happening. And repeat that process.
评论 #31259262 未加载
chaps大约 3 年前
Here&#x27;s how I do it:<p><pre><code> xargs -I&#x27;hostname&#x27; -a hosts.txt -P128 bash -c &quot;ssh &#x27;hostname&#x27; find &#x2F; -type f -mmin -20 | xargs -P128 -Ifilename grep -cHia error filename 2&gt;&#x2F;dev&#x2F;null | sed &#x27;s&#x2F;^&#x2F;hostname:&#x2F;&#x27; ; :&quot; | sort -nrk3 -t&#x27;:&#x27;</code></pre>
评论 #31257108 未加载
评论 #31256869 未加载
评论 #31256884 未加载
rmbyrro大约 3 年前
If there&#x27;s an issue receiving emails, there&#x27;s an endpoint <i>&#x2F;email&#x2F;receive&#x2F;</i> and nginx logs files, I would have promptly searched these logs for &quot;[error] * &#x2F;email&#x2F;receive&#x2F;&quot;
ricardobayes大约 3 年前
Lately the only technical question we ask when hiring is to debug an issue. Experience in this is really difficult to fake unlike memorizing leetcode issues etc.
评论 #31257837 未加载
ge96大约 3 年前
random thoughts about this subject<p>- sucks when your bug completely blows your project up (type error blank page)<p>- I&#x27;m tempted to track every click&#x2F;event and log it for reproducibility<p>- sucks when your product fails not because of a bug but just people not knowing how to use it (training issue I guess) eg. permissions not accepted, why isn&#x27;t it working?
invalidname大约 3 年前
I&#x27;m very much in favor of this but his observability stack is seriously lacking. With Developer Observability tools this is much easier and more powerful: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=k0DPO5jlZtU" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=k0DPO5jlZtU</a>
评论 #31257207 未加载
评论 #31257044 未加载