TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Safety-critical realtime with Linux

118 pointsby corbetover 7 years ago

9 comments

WalterBrightover 7 years ago
Every industry seems determined to learn from scratch on their own how to make safety critical systems. None look at an industry that figured this out 50+ years ago - the airframe industry, which has an incredibly good track record of making safe systems out of unreliable parts.<p>I wrote a couple articles on the general idea:<p><a href="https:&#x2F;&#x2F;digitalmars.com&#x2F;articles&#x2F;b39.html" rel="nofollow">https:&#x2F;&#x2F;digitalmars.com&#x2F;articles&#x2F;b39.html</a><p><a href="https:&#x2F;&#x2F;digitalmars.com&#x2F;articles&#x2F;b40.html" rel="nofollow">https:&#x2F;&#x2F;digitalmars.com&#x2F;articles&#x2F;b40.html</a>
评论 #15372709 未加载
kev009over 7 years ago
I find that subject line completely terrifying. Please use a small trusted compute base, hopefully with rigorous auditing and attempts at formal modeling, for safety critical systems. The Linux kernel development process is not suitable for this domain.
评论 #15372829 未加载
评论 #15372239 未加载
srcmapover 7 years ago
Designed realtime with Linux is non-trivial.<p>I worked on HA (Highly Available) system with Linux inside Xilinx&#x27;s Vertex Pro PPC. It is redundant system with multiple fault detections and switch over if any subsystem detected failure.<p>There was one 250 ms hard real time requirements: If I am a slave and don&#x27;t detect the master &#x27;s UDP ping for 250 ms. I will assume the master has failed somehow, I will start action and take over control as master.<p>The sub-system did trigger from time to time while the master is alive and working perfectly OK.<p>Eventually I figured out that one of the system API was using &gt; 250 ms time. (Forget which one now, that was &gt; 10 years ago.) I have to profile very carefully and redesign the code to get around that API.
评论 #15371406 未加载
aidenn0over 7 years ago
Segmentation faults are a bad example of a fault you don&#x27;t want in a safety-critical system. A crash is okay, because you will usually have some sort of fall-back (e.g. most power steering systems work unpowered). It&#x27;s non-crashes that cause silent improper behavior that are bad.<p>Of course, a segmentation fault is usually a symptom of pointer misuse, which means your code is likely to also suffer from corruptions.
评论 #15371595 未加载
评论 #15368412 未加载
zurnover 7 years ago
I wonder how they arrive at the X microsecond worst-case number for the software-based solutions. Does it take into account a perfect storm of APIC interprocessor events, interrupts, SMP cache coherency protocol worst-case behaviour and cross-CPU TLB shootdowns, misses on all levels of instruction&#x2F;data TLBs and caches and DRAM, CPU trace cache behaviour, ECC machine check events, worst case OoO core behaviour wrt branch prediction and speculative execution, worst case interference from other SMT threads, other SoC functions accessing DRAM, etc?<p>It would seem to me that a worst case scenario could easily cause slowdowns of many orders of magnitude. You could mitigate some of them by careful manual memory layout and hardware specific tricks like hardwired TLB entries, but still be left with a lot of uncovered stuff.
评论 #15370614 未加载
cjbillingtonover 7 years ago
Looking forward to CPUs getting on-die FPGAs so we can actually chuck some fully deterministic timing tasks onto them. Even if you&#x27;re running on metal without an OS, there are heaps of things that can stop your code from running with predictable timing, and it seems like it&#x27;s getting worse as CPUs are getting more complex.
评论 #15371055 未加载
评论 #15371113 未加载
评论 #15370235 未加载
arca_voragoover 7 years ago
I thought that the kernel had improved enough in recent years for sil3... perhaps not though.<p>I wasn&#x27;t as aware of the issue of safety-critical systems as I should have been until I was inside a couple industrial companies where PLC&#x27;s were everywhere (for this very reason). The thing that interests me now about this is how hard I see netconnected PLC&#x27;s pushing into industrial applications, mostly because everyone in industry is on the edge of their seat for IOT to hit so they can use and abuse the data (instead of waiting for service call to pull data like they used to, why not just use an LTE-modem PLC, for example?) Do you see where I am going with this? Safety-critical industrial applications &lt;sil4 are increasingly more vulnerable, and it&#x27;s not from lack of realtime response to stimuli. In the end, using linux in realtime just seems to exacerbate this particular angle on the issue that I see. It does make me wonder about the implications of microkernel design vs monolithic in such applications though.<p><a href="http:&#x2F;&#x2F;www.nfpa.org&#x2F;codes-and-standards&#x2F;all-codes-and-standards&#x2F;list-of-codes-and-standards&#x2F;detail?code=79" rel="nofollow">http:&#x2F;&#x2F;www.nfpa.org&#x2F;codes-and-standards&#x2F;all-codes-and-standa...</a><p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;IEC_61508" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;IEC_61508</a><p><a href="https:&#x2F;&#x2F;webstore.ansi.org&#x2F;RecordDetail.aspx?sku=ANSI%2fRIA+R15.06-2012" rel="nofollow">https:&#x2F;&#x2F;webstore.ansi.org&#x2F;RecordDetail.aspx?sku=ANSI%2fRIA+R...</a><p><a href="https:&#x2F;&#x2F;www.iso.org&#x2F;standard&#x2F;69883.html" rel="nofollow">https:&#x2F;&#x2F;www.iso.org&#x2F;standard&#x2F;69883.html</a><p><a href="https:&#x2F;&#x2F;webstore.iec.ch&#x2F;publication&#x2F;22797" rel="nofollow">https:&#x2F;&#x2F;webstore.iec.ch&#x2F;publication&#x2F;22797</a><p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Comparison_of_real-time_operating_systems" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Comparison_of_real-time_operat...</a>
irundebianover 7 years ago
Can somebody here recommend any books on developing safety-critical systems? I&#x27;ve read some part of Kleidermacher&#x27;s &quot;Embedded Systems Security&quot; book and found it very helpful.
rocquaover 7 years ago
The title says [LWN subscriber-only content], the link seems to suggest the same.<p>It feels like LWM has bad access control, and someone abused that to post an article that shouldn&#x27;t be free.
评论 #15372792 未加载
评论 #15371971 未加载
评论 #15371972 未加载