The thing we are missing still is the distributed OS. Kubernetes only exists because of the missing abstractions in Linux to be able to do computation, discovery, message passing/IO, instrumentation over multiple nodes. If you could do <i>ps -A</i> and see all processes on all nodes, or run a program and have it automatically execute on a random node, or if (<i>grumble grumble</i>) Systemd unit files would schedule a minimum of X processes on N nodes, most of the K8s ecosystem would become redundant. A lot of other components like unified AuthZ for linux already exist, as well as networking (WireGuard anyone?).
>[The fact that computers are made of many components separated by communication buses] suggests that it may be possible to abstract away the distributed nature of larger-scale systems.<p>This is a neat line of thought, but I don't think it can go very far. There is a huge difference in reliability and predictability between small-scale and large-scale systems. One way to see this is to look at power supplies. Two ICs on the same board can be running off of the same 3.3V supply, and will almost certainly have a single upstream AC connection to the mains. When thinking about communications between the ICs, you don't have to consider power failure because a power failure will take down both ICs. Compare this to a WiFi network where two devices could be on separate parts of the power grid!<p>Other kinds of failures are rare enough to be ignored completely for most applications. An Ethernet cable can be unplugged. A PCB trace can't.<p>I used to work with a low-level digital communication protocol called I²C. It's designed for communication between two chips on the same board. There is no defined timeout for communication. A single malfunctioning slave device can hang the entire bus. According to the official protocol spec, the recommended way of dealing with this is to reset every device on the bus (which may mean resetting the entire board). If a hardware reset is not available, the recommendation is to power-cycle the system! [1]<p>Now I²C is a particularly sloppy protocol, and higher-level versions (SMBus and PMBus) do fix these problems, so this is a bit of an extreme example. But the fact that I²C is still commonly used today shows how reliable a small-scale electronic system can be. Even at the PC level, low-level hardware faults are rare enough that they're often indicated only by weird behavior ("My system hangs when the GPU gets hot"), and the solution is often for the user to guess which component is broken and replace it.<p>[1] Section 3.1.16 of <a href="https://www.nxp.com/docs/en/user-guide/UM10204.pdf" rel="nofollow">https://www.nxp.com/docs/en/user-guide/UM10204.pdf</a>
Yes but your computer will not gracefully handle CPUs randomly failing or RAM randomly failing. Sure, storage devices can come and go, but that's been the case since forever, and most programs are not written to handle this edge case gracefully. Except for the OS kernel.<p>The links between the components of your computer are solid and cannot fail like actual computer network connections.<p>In terms of "CAP" theorom, the system has no Partition tolerance. If one of the the links connecting CPUs/GPUs/RAM breaks, all hell breaks loose. If a single instruction is not processed correctly, all hell might break loose.<p>So I find the analogy misleading.
This was true for several home computers since the late 70's. Atari 8-bit computers had all peripherals connecting via a serial bus, each one with its own little processor, ROM, RAM and IO (the only exception, IIRC, was the cassete drive). Commodores also had a similar design for their disk drives. A couple months back a 1541 drive was demoed running standalone with custom software and generating a valid NTSC signal.
Great post called "Achieving 11M IOPS & 66 GB/s IO on a Single ThreadRipper Workstation" [1, 2] that basically walks through step-by-step that your computer is just a bunch of interconnected networks.<p>Highly recommend the post if you're into this and also sort of amazing how far single systems have come. You can basically do "big data" type things on this single box.<p>[1] <a href="https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd-threadripper-pro-workstation/" rel="nofollow">https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd-th...</a><p>[2] <a href="https://news.ycombinator.com/item?id=25956670" rel="nofollow">https://news.ycombinator.com/item?id=25956670</a>
I worked on this for my masters thesis! The thesis was for a specific part but the group worked on the problem as a whole, see <a href="https://dspace.mit.edu/handle/1721.1/49844" rel="nofollow">https://dspace.mit.edu/handle/1721.1/49844</a><p>IMO there are two things that make the current abstraction of a computer as a unit make sense:<p>- You (mostly) don't have to try to handle partial failures within a computer. Partial failures are what make distributed systems hard.<p>- The difference in communication costs between two cores in a single machine is several orders of magnitude lower than communicating with a separate machine using commodity technologies. So while yes, "it's all just distributed" and you can use a common abstraction, a large enough constant factor difference means that you still will have to look through the abstraction to build a performant system.
I think there’s a big difference which is that your computer is allowed to crash when one component breaks whereas a distributed system is typically more fault tolerant.
So much of programming languages is to hide the distributed nature of what the computer is doing on a regular basis. This is somewhat obvious for thread abstractions where you can get two things happening. It is blatant for CUDA style programming.<p>As this link points out, it gets a bit more difficult with some of the larger machines we have to keep the abstractions useful. That said, it does mostly work. Despite being able to find and harp on the areas that it fails, it is amazing how well so many of the abstractions have held up.<p>Would be neat to see explicit handling of what features are basically completely hiding distributed nature of the computer.
There are lots of good resources in this area:
The programming language of the transputer <a href="https://en.m.wikipedia.org/wiki/Occam_(programming_language)" rel="nofollow">https://en.m.wikipedia.org/wiki/Occam_(programming_language)</a><p>Bluebottle active objects <a href="https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/147091/eth-26082-02.pdf" rel="nofollow">https://www.research-collection.ethz.ch/bitstream/handle/20....</a> with some discussion of DMA<p>Composita components <a href="http://concurrency.ch/Content/publications/Blaeser_Component_Operating_System_PLOS_2007.pdf" rel="nofollow">http://concurrency.ch/Content/publications/Blaeser_Component...</a><p>Mobile Maude (only a spec) <a href="http://maude.sip.ucm.es/mobilemaude/mobile-maude.maude" rel="nofollow">http://maude.sip.ucm.es/mobilemaude/mobile-maude.maude</a><p>Kali scheme (atop Scheme48 secure capability OS) <a href="https://dl.acm.org/doi/pdf/10.1145/213978.213986" rel="nofollow">https://dl.acm.org/doi/pdf/10.1145/213978.213986</a><p>Kali is probably the closest to a distributed OS, supporting secure thread and process migration across local and remote systems (and makes that explicit), distributed profiling and monitoring tools, etc.
It is basically an OS based on the actor model. It doesn't scale massively as routing nodes was out of scope (it connects all nodes on a bus), but that can easily be added.<p>Extremely small (running in 2mb ram), it covers all of R5rs, and the VM has been adapted to bare metal.<p>I feel that there is more to do, but a combination of those is probably the right direction.
This proves that conventional wisdom (such as the idea that abstracting distributed computation is unworkable) is often wrong.<p>What happens is enough people try to do something and can't quite get it to work quite right that it eventually becomes assumed that anyone trying that approach is naive. Then people actively avoid trying because they don't want others to think they don't know "best practices".<p>Remember the post from the other day about magnetic amplifiers? Engineers in the US gave up on them. But for the Russians, mag amps never became "unworkable" and uncool to try, and they eventually solved the hard problems and made them extremely useful.<p>Technology is much more about trends and psychology than people realize. In some ways, so is the whole world. It seems to me that at some level, most humans never _really_ progress beyond middle-school level.<p>The starting point for analyzing most things should probably be from the context of teenage primates.
> This is something unique: an abstraction that hides the distributed nature of a system and actually succeeds.<p>That's not even remotely unique.<p>OP is grappling with "the map is not the territory" vs. maps have many valid uses.<p>Abstractions can be both not accurate in every context and 100% useful in many, many common contexts.<p>Also (before you get too excited), abstractions have quality: there are good abstractions -- which are useful in many common contexts -- and bad abstractions -- which overpromise and turn out to be misleading in some or many common contexts.<p>I'll put it this way: the idea that <i>The Truth</i> exists is a rough (and not particularly useful) abstraction. If you have a problem with that, it just means you have something to learn to engage reality more fruitfully.
Unfortunately, this idea fights vs idea of least responsibility.<p>Because, user level programs are all at one level of abstraction, and this distribution is distributed over many levels of abstraction.<p>So in desktop systems, mean mostly successors of business micro machines, access to other levels of abstraction intentionally hardened for measures of security and reliability.
The same thing applied to crowd computing - there also vps's are isolated from hardware and from other vps's.<p>These measures usually avoided in game systems and in embedded systems, but they are not allowed to run multiple programs from independent developers (for security and reliability), and their programming magnitudes more expensive than desktops and even server side (yes, you may surprised, but game consoles software in many case more reliable than military, and usually far surpass business software).<p>To solve this contradiction, need some totally new paradigms and technologies, may be some revolutionary, like usage of GAI to write code.
Yes, and concurrency is, in fact, an implementation detail. Which is why I think that in most <i>applied</i> scenarios it should be hidden, and taken care of, by the compiler.
Eric Brewer thinks this is a good point of view on such things:<p><a href="https://codahale.com/you-cant-sacrifice-partition-tolerance/" rel="nofollow">https://codahale.com/you-cant-sacrifice-partition-tolerance/</a><p>L1-blockchain entrepreneurs and people who got locked into MongoDB aside, I think most agree.
i recommend people model their apps this way. spin up more threads than needed, one each for api , DB , LB, async, pipelines etc. you can model an entire stack in one memory space. It's a great way to prototype your complete data model before scaling to the proper solutions. Lots of design constraints are found this way . everything looks great on paper but then falls apart when integrating layers.
once you learn to bias to thinking in terms of message passing between actors, and, bias to having immutable shared state, then,a lot of problems become easier to decompose and solve elegantly, esp at scale
Your body is a distributed system.
Your brain is a distributed system.
A live cell is a distributed system.
A molecule is a distributed system.
In other news, water is wet.