TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Teaching a new way to prevent outages at Google

106 点作者 motxilo2 个月前

13 条评论

hinkley2 个月前
&gt; In one particular case at Google, a software controller–acting on bad feedback from another software system–determined that it should issue an unsafe control action. It scheduled this action to happen after 30 days. Even though there were indicators that this unsafe action was going to occur, no software engineers–humans–were actually monitoring the indicators. So, after 30 days, the unsafe control action occurred, resulting in an outage.<p>Isn&#x27;t this the time they accidentally deleted governmental databases? I love the attempt at blameless generalization, but wow.
评论 #43420770 未加载
mimikatz2 个月前
Thanks to all the people here pointing out how bloated, overly broad and useless this is. I went to read it thinking I would pick up something applicable and it was written in such a overwrought humanless style that I gave up learning nothing and thought the problem was me. I am glad to learn I am not alone.
评论 #43420985 未加载
smcameron2 个月前
&gt; &quot;The class itself is very well structured. I&#x27;ve heard about STPA in past years, but this was the first time I saw it explained with concrete examples. The Google example at the end was also really helpful.&quot;<p>But the article itself contains no concrete examples.
评论 #43420357 未加载
irjustin2 个月前
I don&#x27;t understand and I really really want to.<p>This seems so cool at a scale that I can&#x27;t fathom. Tell me specifically how it&#x27;s done at google with regards to a specific service, at least enough information to understand what&#x27;s going on. Make it concrete. Like &quot;B lacks feedback from C&quot;, why is this bad?<p>You&#x27;ve told me absolutely nothing and it makes me angry.
评论 #43420209 未加载
评论 #43420465 未加载
评论 #43420205 未加载
snorkel2 个月前
In other words STPA is a design review framework for finding some less obvious failure modes. FMEA is more popular but relies on making a list of all of the knowable failure modes in a system, but the failure modes you haven’t thought of don’t make it on the list. STPA helps fill in some of those gaps of failure modes you haven’t thought of.
评论 #43424220 未加载
primitivesuave2 个月前
This would have been a lot more compelling had they provided a single real-world example of STPA actually solving a reliability issue at Google.
MinelloGiacomo2 个月前
STAMP&#x2F;STPA work well as a model and methodology for complex systems, I was interested in them a while ago in the context of cyber risk quantification. Having a fairly easy model to reason about unsafe control action is not a given in other approaches. I just wish they were adopted by more companies, I have seen too many of them stuck with ERM-based frameworks that do no make sense most of the time when scaled down to working at the system level granularity.
dooglius2 个月前
&gt; After working with the system experts to build this control structure, we immediately noticed missing feedback from controller C to controller B–in other words, controller B did not have enough information to support the decisions it needed to make<p>There is a feedback loop through D? And why does the same issue not apply to the missing directed edge from B to D?<p>EDIT: I figured it out on a reread, the vertical up&#x2F;down orientation matters for whether an edge represents control vs feedback, so B is merely not controlling D, which is fine. But if B is only controlling C as a way to get through to D (which is what I would have guessed, absent other information), what&#x27;s the issue with that?
mianos2 个月前
This is peak corporate drivel—bloated storytelling, buzzwords everywhere, and a desperate attempt to make an old idea sound revolutionary.<p>The article spends paragraphs on some childhood radio repair story before awkwardly linking it to STPA, a safety analysis method that’s been around for decades. Google didn’t invent it, but they act like adapting it for software is a major breakthrough.<p>Most of the piece is just filler about feedback loops and control structures—basic engineering concepts—framed as deep insights. The actual message? &quot;We made an internal training program because existing STPA examples didn’t click with Googlers.&quot; That’s it. But instead of just saying that, they pad it out with corporate storytelling, self-congratulation, and hand-wringing over how hard it is to teach people things.<p>The ending is especially cringe: You can’t afford NOT to use this! Classic corporate play—take something mundane, slap on some urgency, and act like ignoring it is a reckless gamble.<p>TL;DR: Google is training engineers in STPA. That’s the whole story.
评论 #43420136 未加载
评论 #43421857 未加载
评论 #43420834 未加载
pcdoodle2 个月前
I&#x27;d love for google to just go down and create a vacuum suction sound for a year...
1970-01-012 个月前
Ctrl+F &quot;DNS&quot;<p>Hmm..
croisillon2 个月前
an early April fool&#x27;s?
ikiris2 个月前
... So where&#x27;s the training or examples of application?
评论 #43420053 未加载