TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

CrowdStrike's outage should not have happened

49 点作者 b-man10 个月前

16 条评论

beardedwizard10 个月前
This outage says more to me about the state of software engineering in 2024 than it does about crowdstrike. Starting with the fact that software like crowdstrike exists to compensate for even poorer software rife with exploitable vulnerabilities. It is certainly hard to defend crowdstrike, but is even harder to hear so many hot takes when the engineer emperor has no clothes.<p>Critical software engineering is a race to the bottom across many domains. Healthcare, banking, flight systems, etc.
评论 #41187317 未加载
评论 #41187558 未加载
评论 #41187453 未加载
评论 #41187462 未加载
评论 #41187495 未加载
评论 #41187421 未加载
评论 #41191270 未加载
评论 #41187432 未加载
ethbr110 个月前
&gt;&gt; <i>No staged deployment {changing to} Add staged deployment</i><p>That&#x27;s the thing that amazed me.<p>How do you <i>regularly</i> YOLO patches worldwide to something that runs with enough permissions to crash a system?<p>I don&#x27;t care if this was a configuration update vs a new sensor capability -- universal rollout should never have been allowed by CrowdStrike&#x27;s release team.
评论 #41187278 未加载
评论 #41187538 未加载
eugenekolo10 个月前
Sentiment might hold some merit, but this article is 80% copy pasting from an RCA report and 2 sentences saying nothing more than &quot;This shouldn&#x27;t happen&quot; while offering no alternative or deep thought into improvement...
评论 #41190975 未加载
dwheeler10 个月前
The Crowdstrike report explains why it crashed, but <i>not</i> how it passed final end-to-end testing. There appears to have been many tests of piece parts (unit testing), but that&#x27;s not the same as testing the full system.<p>I would think <i>all</i> the end-to-end tests of the full system would have been instantly detected the problem and prevented it, because it would have failed all the end-to-end tests.<p>Did I miss something? Did they never test the complete system as deployed? Looks like it, but maybe I misunderstood something.
评论 #41187475 未加载
greenthrow10 个月前
Extremely low quality post by the submitter. Yes these shouldn&#x27;t happen, but software engineers -- so far -- are all human. It&#x27;s more useful to talk about the ways this could be mitigated than to just post a few sentences repeating that it shouldn&#x27;t happen.
评论 #41187521 未加载
评论 #41190902 未加载
siliconc0w10 个月前
What commonly happens in these organizations is they have a software delivery path that has a lot of these best practices but soon people figure out that it is too slow so they invent a new, faster, path. From what I can tell Crowdstrike had a lot of the usual best practices like canary rollouts on their binary but they didn&#x27;t on this configuration file despite it having the same consequences of a bad binary push. This wasn&#x27;t even an edge case, it reliably BSOD&#x27;d every windows machine that got this update.<p>One strategy Google SRE uses is that the team ensuring reliability has a different reporting path than the product team - so there is always a check and balance when things like rollout policies get worked around by clever product teams.<p>It&#x27;s a shame because I hear it&#x27;s actually a pretty good product.
zamadatix10 个月前
What is whooshing over my head about &quot;Figure 1&quot;?
评论 #41187420 未加载
satisfice10 个月前
I’m not concerned about the technical solutions. Any technical solution has to be implemented by people.<p>The thing not mentioned in CrowdStrike’s report is anything about people— especially management. Bad management and understaffed teams will defeat any technical solution, any day.
halayli10 个月前
&gt; Multiple engineers identified the issue via analysis of stack dumps as being triggered by a null pointer bug in the C++ the Crowdstrike update was written in; it appears to have tried to call an invalid region of memory that results in a process getting immediately killed by Windows, but that take looked increasingly controversial and Crowdstrike itself said that the incident was not due to &quot;null bytes contained within Channel File 291 [the update that triggered the crashes] or any other Channel File.&quot;
randerson10 个月前
Nation-state hackers of the world must <i>love</i> the idea of a supply chain that pushes out immediate untested updates to half the US Fortune 500, to be processed by a C++ kernel driver. If CrowdStrike&#x27;s goal is to secure companies at scale, they could easily be doing the opposite.
luxuryballs10 个月前
I still think it was intentional, someone activated the CrowdStrike feature that was purchased by the DoD.<p>Maybe people with inside knowledge of recent events were trying to make an exit so they had to smash the glass and hit the red button to stop air travel so they could snag them in time?<p>Making it a perfect update failure is clever enough, but the name of the product is the best part. Imagine a system that can stop breaches even after they occur ;)
Dwedit10 个月前
So you have an &quot;Index Out Of Bounds&quot; problem. It could either directly lead to reading out-of-bounds memory and generating an Access Violation exception, or you could see the out-of-bounds array access and throw an exception.<p>Either way, you&#x27;ve got a kernel-mode exception that isn&#x27;t being caught, and that&#x27;s a BSOD.
评论 #41187497 未加载
insane_dreamer10 个月前
That&#x27;s quite a list of problems with that update; wasn&#x27;t just a single bug that slipped through the cracks.
99990000099910 个月前
CrowdStrike outsourced their SDET positions to save a buck.<p>This is what happens. Stop skimping on QA.
jokoon10 个月前
I&#x27;m not a fan of rust, but if microsoft required that those sort of critical software be written in rust, it would be a good thing.<p>Anything that is doing something sensitive or critical that can crash the system should be written in rust.<p>If not, insurance companies would be mandated by law to run static analysis on such C++ code.
echelon10 个月前
There is <i>no reason</i> to not use Rust for these systems anymore. It&#x27;s why the US Government is pushing so much for Rust adoption.<p>We&#x27;re going to keep seeing these horror stories until C&#x2F;C++ go away.
评论 #41187308 未加载
评论 #41187280 未加载
评论 #41187463 未加载
评论 #41187269 未加载
评论 #41187357 未加载