TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: High availability control system software architectures

8 点作者 zeus_hammer超过 3 年前
Hi HN. I&#x27;m interested in resources describing the architecture and implementation of critical control system software. I&#x27;d like to understand more specifically about how these systems can be or are designed such that loading, patching, and deploying software in these environments can be done with zero downtime.<p>Are there any books, code, or other resources you would recommend?

4 条评论

chermi超过 3 年前
Can you specify exactly what you mean by &quot;control system&quot; here? Are you talking about software that actually sends signals to hardware that eventually makes to physical equipment that does something?<p>I&#x27;m trying to understand the actual environment. When you say &quot;deployment&quot;, what is changed, where does it start, and how far does it propagate?<p>For example, would one option for zero downtime be to have replicated (2 or more) &quot;control systems&quot; beyond some &quot;layer&quot; (sorry, it&#x27;s hard to be precise without knowing more) and enforcing synchronicity between those while having only actually controlling at any time. Then, when you are patching or updating, you freeze on one, update the other, then switch to the other? Not advocating a solution, just trying to understand the situation by throwing out an example to talk around.<p>I&#x27;m not an expert in this at all, but if what I&#x27;m talking about above is even close to being on track, I&#x27;d recommend this book for starters: <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Introduction-Embedded-Systems-Cyber-Physical-Approach&#x2F;dp&#x2F;0262533812&#x2F;ref=sr_1_14?keywords=embedded+systems&amp;qid=1638584424&amp;sr=8-14" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Introduction-Embedded-Systems-Cyber-P...</a>
karmakaze超过 3 年前
Loading, patching, and deploying without downtime is not very complicated on the surface of it. Basically almost all cloud product&#x2F;service providers do this with fault-tolerant network design, routing&#x2F;load-balancing, distributed&#x2F;fault-tolerant datastores, blue-green continuous integration&#x2F;delivery (CI&#x2F;CD) pipelines.<p>The hard part is being very strict to ensure that every change is safe and&#x2F;or be able to rapidly&#x2F;automatically restore a working state to stay within a very low error budget. Each &#x27;9&#x27; in 99.9.. of uptime is order(s) of magnitude harder.
GianFabien超过 3 年前
Telephony central office switches are an example of the sort of systems you are asking after. A typical switch is designed to run for 30 years with less than 30 minutes downtime in all that time. Ericsson AXE and Western Electric are two such systems I have worked with. More recent example would be the Erlang language and the related OTP environment.
yuppie_scum超过 3 年前
Some topics for you to research:<p>12 Factor Applications<p>Site Reliability Engineering<p>Chaos Engineering<p>Kubernetes