科技回声

8 条评论

An important step here that is missing here is evaluating if your fix is going to cause other, potentially worse problems. I suspect that in this case, it's fairly unlikely that increasing the maximum POST body size to 60 MB is going to cause problems - eyeballing that Sendgrid chart, it looks like we are not dealing with very high throughput here. But it's not hard to imagine a situation where tripling the max POST body size would result in a large increase in server memory usage, which could result in things like OOM kills, which could result in a lot of people not getting their reply emails or whatever.So don't just rush a fix out. Think about what the effects of a configuration change like this might be, and whether you are just making more problems for yourself down the line trying to fix something quickly.

评论 #31257315 未加载

评论 #31256883 未加载

mtippett大约 3 年前

I agree with most of what is suggested in the article.However a big part is missing is the reality that there are a set of hypotheses (is that right) in play at any point in time. A lot of debugging is the cycle of1. Think about the system, gather any available data - you can't boil the ocean 2. Consider a set of hypotheses possible cause (even if it is a partial cause) 3. Seek any method to either refute or confirm the possible cause which gives more data.Wash, rinse, repeat. Each cycle will likely get closer to the problem.Each cycle also is likely to find other tech debt that needs to be solved.Rarely is there a single hypothesis that is right first time. Although an experienced person will prune out a lot of poor ideas automatically, and likely subconsciously.Observability goes a long way to getting the data needed to confirm or refute.

notaspecialist大约 3 年前

When a user comes over and says "this isn't happening" I write a test and sure enough, the test fails. I fix the case, re-run all the tests, push to UAT, and ask the user to verify it works in the UAT system. It's pushed into production after hours.Prior to TDD I would spend hours stepping through code, setting variables to replicate the scenario, scratching my head, and usually fix it after a week or so. Then I would get a bug report of something else weird happening. And repeat that process.

评论 #31259262 未加载

chaps大约 3 年前

Here's how I do it:<pre><code> xargs -I'hostname' -a hosts.txt -P128 bash -c "ssh 'hostname' find / -type f -mmin -20 | xargs -P128 -Ifilename grep -cHia error filename 2>/dev/null | sed 's/^/hostname:/' ; :" | sort -nrk3 -t':'</code></pre>

评论 #31257108 未加载

评论 #31256869 未加载

评论 #31256884 未加载

rmbyrro大约 3 年前

If there's an issue receiving emails, there's an endpoint /email/receive/ and nginx logs files, I would have promptly searched these logs for "[error] * /email/receive/"

ricardobayes大约 3 年前

Lately the only technical question we ask when hiring is to debug an issue. Experience in this is really difficult to fake unlike memorizing leetcode issues etc.

评论 #31257837 未加载

ge96大约 3 年前

random thoughts about this subject- sucks when your bug completely blows your project up (type error blank page)- I'm tempted to track every click/event and log it for reproducibility- sucks when your product fails not because of a bug but just people not knowing how to use it (training issue I guess) eg. permissions not accepted, why isn't it working?

invalidname大约 3 年前

I'm very much in favor of this but his observability stack is seriously lacking. With Developer Observability tools this is much easier and more powerful: <a href="https://www.youtube.com/watch?v=k0DPO5jlZtU" rel="nofollow">https://www.youtube.com/watch?v=k0DPO5jlZtU</a>

评论 #31257207 未加载

评论 #31257044 未加载

8 条评论

aaronbwebber大约 3 年前

评论 #31257315 未加载

评论 #31256883 未加载

mtippett大约 3 年前

notaspecialist大约 3 年前

评论 #31259262 未加载

chaps大约 3 年前

评论 #31257108 未加载

评论 #31256869 未加载

评论 #31256884 未加载

rmbyrro大约 3 年前

If there's an issue receiving emails, there's an endpoint /email/receive/ and nginx logs files, I would have promptly searched these logs for "[error] * /email/receive/"

ricardobayes大约 3 年前

Lately the only technical question we ask when hiring is to debug an issue. Experience in this is really difficult to fake unlike memorizing leetcode issues etc.

How I hunt down and fix errors in production

8 条评论

How I hunt down and fix errors in production

8 条评论