TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Linear Address Spaces: Unsafe at any speed

271 pointsby g0xA52A2Aalmost 3 years ago

36 comments

kazinatoralmost 3 years ago
&gt; <i>Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?</i><p>Simple: we don&#x27;t want some low level kernel memory management dictating what constitutes an &quot;object&quot;.<p>Everything isn&#x27;t object-oriented. E.g. large arrays, memory-mapped files, including executables and libraries.<p>Linear memory sucks, but every other organization sucks more.<p>Segmented has been done; the benefit-to-clunk ratio was negligible.
评论 #31929723 未加载
评论 #31926806 未加载
评论 #31926319 未加载
评论 #31928638 未加载
phkampalmost 3 years ago
About Flat vs. Object stores<p>I wrote this piece, frustrated by, what looks to me, like the entire semiconductor industry is only exploring one single computer storage organization, despite the fact that recent inventions like flash practically begs for innovation.<p>For instance few people realize that the Flash Adaptation Layer in the SSD devices means that we literally run two filesystems on top of each other, because nobody has seriously tried to get rid of the &quot;a disk is an array of individually rewritable sectors&quot; despite this literally being untrue both for modern disks and in particular for Flash based storage.<p>Similarly, the &quot;flat physical&#x2F;flat virtual&quot; MMU model is a relic from the days of IBM 360 and VAX 11&#x2F;780 and utterly inefficient and unsuitable for what we do in userland these days.<p>As Robert has shown with CHERI, there is plenty of space to innovate without breaking existing code.<p>And yes, C can be object oriented, all you have to do is keep your hands from the primitives which are necessary to access hardware directly.<p>Architectually GPU&#x27;s are a superoptimized distraction, like the vector-units on Cray and Convex computers were 40-50 years ago, but those too can function in a non-flat address-space.<p>But even C in OO-mode, and C++, Go, Rust, PHP and for that matter LISP and SmallTalk, would benefit from an HW&#x2F;MMU architecture which focused on delivering fast object service, rather than flat address-spaces which software must then convert into objects.<p>But to innovate, we must first realize that we are currently stuck in a box, and dare to look outside it.
评论 #31929603 未加载
评论 #31929571 未加载
评论 #31931956 未加载
评论 #31931858 未加载
评论 #31930763 未加载
评论 #31932783 未加载
phkampalmost 3 years ago
Hi everybody!<p>About the R1000:<p>We have pretty comprehensive user-side documentation of the R1000, but very, very little from the system&#x2F;vendor side, so lots of things we simply do not know yet.<p>We have digitized everything we have here:<p><a href="https:&#x2F;&#x2F;datamuseum.dk&#x2F;wiki&#x2F;Bits:Keyword&#x2F;RATIONAL_1000" rel="nofollow">https:&#x2F;&#x2F;datamuseum.dk&#x2F;wiki&#x2F;Bits:Keyword&#x2F;RATIONAL_1000</a><p>And our top-level wiki-page for the project is here:<p><a href="https:&#x2F;&#x2F;datamuseum.dk&#x2F;wiki&#x2F;Rational&#x2F;R1000s400" rel="nofollow">https:&#x2F;&#x2F;datamuseum.dk&#x2F;wiki&#x2F;Rational&#x2F;R1000s400</a><p>All the doc we have about the hardware type&#x2F;object stuff and instruction set is in these course-slides:<p><a href="https:&#x2F;&#x2F;datamuseum.dk&#x2F;bits&#x2F;30000916" rel="nofollow">https:&#x2F;&#x2F;datamuseum.dk&#x2F;bits&#x2F;30000916</a><p>If you are into data-archaeology and lack for challenges, we have several good outstanding questions for research. For instance the layout of the object-store&#x2F;filesystem.<p>If you want to see a R1000 running, and experience the worlds first truly semantic IDE, come to Datamuseum.dk just outside Copenhagen, because we have the only approx 1.83 running R1000 computers.<p>(We also just started fundraisning for a new permanent building for our huge collection, we are at €113K of €3M goal. See top right corner or homepage or email.)<p>We know of only four surviving computers, we have two, one is privately owned in NZ and IBM donated one to CHM. The rest have been shredded because of classified mil-spec workload.<p>If you are local to&#x2F;affiliated with CHM, and are interested&#x2F;allowed, we would love to know more about their machine, and if possible, assist to get that running too.<p>PS: Here is a piece of Ada source code:<p><a href="http:&#x2F;&#x2F;datamuseum.dk&#x2F;aa&#x2F;r1k_backup&#x2F;13&#x2F;1329b5ea7.html" rel="nofollow">http:&#x2F;&#x2F;datamuseum.dk&#x2F;aa&#x2F;r1k_backup&#x2F;13&#x2F;1329b5ea7.html</a><p>And this may be what it compiles into:<p><a href="http:&#x2F;&#x2F;datamuseum.dk&#x2F;aa&#x2F;r1k_backup&#x2F;85&#x2F;85b414c73.html" rel="nofollow">http:&#x2F;&#x2F;datamuseum.dk&#x2F;aa&#x2F;r1k_backup&#x2F;85&#x2F;85b414c73.html</a>
评论 #31932061 未加载
infogulchalmost 3 years ago
The Mill&#x27;s memory model is one of its most interesting features IMO [1] and solves some of the same problems, but by going the other way.<p>On the Mill the whole processor bank uses a global virtual address space. TLB and mapping to physical memory happens at the <i>memory controller</i>. Everything above the memory controller is in the same virtual address space, including L1-L3+ caches. This solves <i>a lot</i> of problems, for example: If you go out to main memory you&#x27;re already paying ~300 cycles of latency, so having a large silicon area &#x2F; data structure for translation is no longer a 1-cycle latency problem. Writes to main memory are flushed down the same memory hierarchy that reads come from and succeed as soon as they hit L1. Since all cache lines are in the same virtual address space you don&#x27;t have to track and synchronize reads and writes across translation zones within the cache hierarchy. When you request an unallocated page you get the whole pre-zeroed page back <i>instantly</i>, since it doesn&#x27;t need to be mapped to physical pages until writes are flushed out of L3. This means its possible for a page to be allocated, written to, read, and deallocated which <i>never actually touches physical memory</i> throughout the whole sequence and the whole workload is served purely within the cache hierarchy.<p>Protection is a separate system (&quot;PLB&quot;) and can be much smaller and more streamlined since it&#x27;s not trying to do two jobs at once. The PLB allows processes to give fine-grained temporary access of a portion of its memory to another process; RW, Ro, Wo, byte-addressed ranges, for one call or longer etc. Processes get allocated available address space on start, they can&#x27;t just assume they own the whole address space or start at some specific address (you should be using ASLR anyways so this should have no effect on well-formed programs, though there is a legacy fallback).<p>[1]: My previous comment: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27952660" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27952660</a>
评论 #31926447 未加载
评论 #31936693 未加载
评论 #31938224 未加载
ajbalmost 3 years ago
This article compares CHERI to an 80&#x27;s computer, the Rational R1000 (which I&#x27;m glad to know of). It&#x27;s worth noting that CHERI&#x27;s main idea was explored in the 70&#x27;s by the CAP computer[1]. CAP and CHERI are both projects of the University of Cambridge&#x27;s Computer Lab. It&#x27;s fairly clear that CAP inspired CHERI.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;CAP_computer" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;CAP_computer</a>
评论 #31925756 未加载
评论 #31931012 未加载
kimixaalmost 3 years ago
I&#x27;m a little confused about how the object base is looked up in these systems, and if they&#x27;re sparse or dense and have any size or total object count limitations, and if that ends up having the same limitations on total count as page tables that required the current multi-level approach.<p>As surely you could consider page table as effectively implementing a fixed-size &quot;object cache&quot;? It is just a lookup for an offset into physical memory, after all, with the &quot;object ID&quot; just being the masked first part of the address? And if the objects are variable sized, is it possible to end up with physical address fragmentation as objects of different sizes are allocated and freed?<p>The claim of single-cycle lookups today would require an on-chip fixed-size (and small!) fast sram, as there&#x27;s a pretty hard limit on the amount of memory you can get to read in a single clock cycle, no matter how fancy or simple the logic behind deciding to lookup. If we call this area the &quot;TLB&quot; haven&#x27;t we got back to pagetables again?<p>And for the size of sram holding the TLB&#x2F;object cache entries - increasing the amount of data stored in them means you have less total too. A current x86_64 CPU supports 2^48 of physical address space, reduced to 36 bits if you know it&#x27;s 4k aligned - and 2^57 of virtual address space as the tag, again reduced to 45 bits if we know it&#x27;s 4k aligned. That means to store the tag and physical address you need a total of 81 bits of SRRAM. A 64-bit object ID, plus 64-bit physical address plus 64-bit size is 192bits, over 2x that, so you could pack 2x the number of TLB entries into the same sram block. To match the capabilities of the example above, 57 bits of physical address (cannot be reduced as arbitrary sizes means it&#x27;s not aligned), plus a similarly reduced to 48 bit object ID and size still adds up to 153, only slightly less than 2x, though I&#x27;m sure people could argue that reducing the capabilities here have merit, I don&#x27;t know how many objects or their maximum possible size in such a system. And that&#x27;s &quot;worst case&quot; 4k pages for the pagetable system too.<p>I can&#x27;t see how this idea could be implemented without extreme limitations - look at the TLB size of modern processors and that&#x27;s the maximum number of objects you could have while meeting the claims of speed and simplicity. There may be some advantage in making them flexible in terms of size, rather than fixed-size, but then you run into the same fragmentation issues, and need to keep that size somewhere in the extremely-tight TLB memory.
评论 #31929168 未加载
评论 #31925600 未加载
评论 #31925647 未加载
dragontameralmost 3 years ago
&gt; Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?<p>Well, GPU code is certainly not object-oriented, and I hope it never becomes that. SIMD code won&#x27;t be able to jump between objects like typical CPU-oriented OOP does (unless all objects within a warp&#x2F;workgroup jump to the same function pointers?)<p>GPU code is common in video games. DirectX needs to lay out its memory very specifically as you write out the triangles and other vertex&#x2F;pixel data for the GPU to later process. This memory layout is then memcopy&#x27;d over to PCIe using the linear address space mechanism, and GPUs are now cohesive with this space (thanks to Shared Virtual Memory).<p>So today, thanks to shared virtual memory and advanced atomics, we can have atomic compare-and-swap coordinate CPU and GPU code operating over the same data (and copies of that data can be cached in CPU-ram or GPU-VRAM and transferred over automatically with PCIe memory barriers and whatnot).<p>----------<p>Similarly, shared linear address spaces operate over rDMA (remote direct memory access), a protocol built on top of Ethernet. This means that your linear memory space is mmap&#x27;d on your CPU, but then asks for access to someone else&#x27;s RAM over the network. The mmap then causes this whole &quot;inefficient pointer-traversals&quot; to then get turned into Ethernet packets to share RAM between CPUs.<p>Ultimately, when you start dealing with high-speed data-sharing between &quot;external&quot; compute units (ie: a GPU, or a ethernet-connected far-away CPU), rather than &quot;just&quot; a NUMA-node or other nearby CPU, the linear address space seems ideal.<p>--------<p>Even the most basic laptop, or even Cell Phone, these days, is a distributed system consisting of a CPU + GPU. Apple chips even have a DSP and a few other elements. Passing data between all of these things makes sense in a distributed linear address space (albeit really wonky with PCIe, mmaps, base address pointers and all sorts of complications... but they are figured out, and it does work every day)<p>I&#x2F;O devices working directly in memory is going to only become more common. 100Gbps network connections exist in supercomputer labs, 10Gbps Ethernet is around the corner for consumers. NVMe drives are pushing I&#x2F;O to such high bandwidths that&#x27;d make DDR2 RAM blush. GPUs are growing more complicated and are rumored to start turning into distributed chiplets soon. USB3.0 and beyond are high-speed links that directly drop off data into linear address spaces (or so I&#x27;ve been told). Etc. etc.
评论 #31927485 未加载
评论 #31928144 未加载
评论 #31927687 未加载
评论 #31928080 未加载
ww520almost 3 years ago
I think he&#x27;s advocating fitting a high level language like Ada in the kernel or in the CPU(?), with one global addressing space and no separate individual process addressing space for memory protection, but rather relying on the high language to provide memory protection. That&#x27;s where his bizarre hyping of &quot;object oriented&quot; addressing space came from.<p>It has been done. See Singularity project from Microsoft Research, which used C# as the language to provide memory protection, no process, all programs running in the same global memory space, and all of them running in ring-0. It was a fun research project, but never really made it out. There were other research projects like it.<p>Also his (object, offset) addressing space is essentially a segmented memory model. The object id is the segment id. I bet the object id is in a linear space.
评论 #31928771 未加载
评论 #31928791 未加载
throw34almost 3 years ago
&quot;The R1000 addresses 64 bits of address space instantly in every single memory access. And before you tell me this is impossible: The computer is in the next room, built with 74xx-TTL (transistor-transistor logic) chips in the late 1980s. It worked back then, and it still works today.&quot;<p>That statement has to be coming with some hidden caveats. 64 bits of address space is crazy huge so it&#x27;s unlikely the entire range was even present. If only a subset of the range was &quot;instantly&quot; available, we have that now. Turn off main memory and run right out of the L1 cache. Done.<p>We need to keep in mind, the DRAM ICs themselves have a hierarchy with latency trade-offs. <a href="https:&#x2F;&#x2F;www.cse.iitk.ac.in&#x2F;users&#x2F;biswap&#x2F;CS698Y&#x2F;lectures&#x2F;L15.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cse.iitk.ac.in&#x2F;users&#x2F;biswap&#x2F;CS698Y&#x2F;lectures&#x2F;L15....</a><p>This does seem pretty neat though. &quot;CHERI makes pointers a different data type than integers in hardware and prevents conversion between the two types.&quot;<p>I&#x27;m definitely curious how the runtime loader works.
评论 #31925471 未加载
评论 #31929119 未加载
评论 #31929380 未加载
martincmartinalmost 3 years ago
&quot;Unsafe at Any Speed&quot; is the name of Ralph Nader&#x27;s book on car manufacturers resisting car safety measures. It resulted in the creation of the United States Department of Transportation in 1966 and the predecessor agencies of the National Highway Traffic Safety Administration in 1970.
评论 #31927312 未加载
edave64almost 3 years ago
There is often a quite significant distance between the beautiful, elegant and efficient design that brings tears to the eyes of a designer, and being pragmatic and financially viable.<p>Building a new competitive processor architecture isn&#x27;t feasible if you can&#x27;t at least ensure compile-time compatibility with existing programs. People won&#x27;t buy a processor that won&#x27;t run their programs.
gumbyalmost 3 years ago
The Multics system was designed to have segments (for this discussion == pages) that were handled the way he described, down to the pointer handling. Not bad for the 1960s, though Unix was designed for machines with a lot fewer transistors back at the time when that mattered a lot.<p>Things like TLBs (not a new invention, but going back to the 1960s) really only matter to systems programmers, as he says, and judicious use simplifies and has simplified programming for a long time. I think if he really wants to go down this path he&#x27;ll discover that the worst case behavior (five probes to find a page) really is worth it in the long run.
评论 #31926912 未加载
cmrdporcupinealmost 3 years ago
Another system that had an object-based non-linear address space I believe was the &quot;Rekursiv&quot; CPU developed at Linn (yes, the Swedish audio&#x2F;drum machine company; EDIT: Linn. Scottish. Not drum machine. Thanks for the corrections. In fact I even knew this at one time. Yay brain.) in the 80s.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Rekursiv" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Rekursiv</a><p>I actually have a copy of the book they wrote about it here somewhere. I often fantasize about implementing a version of it in FPGA someday.
评论 #31925806 未加载
评论 #31927229 未加载
评论 #31925844 未加载
StillBoredalmost 3 years ago
Its amazes me that articles like this can talk about obscure architectures, and now CHERI but completely fail to notice that just about every PC in use is actually a lightweight capability machine! AKA all that &quot;cruft&quot; everyone complains about in ia32, you know the LDT&#x2F;GDT&#x2F;IDT, the variable length segments referenced there, the selector+offset (aka CS, SS, ES, DS, FS, GS) format maps perfectly to the concept of a data structure, and its offset. The task gates, call gates, interrupt gate etc all are there to support a proper per segment security model.<p>We have these machines, although granted over the past few decades those mostly unused operations have gotten quite slow, and the model harkens back to a time where people didn&#x27;t have a lot of ram, so there aren&#x27;t a lot of &quot;caches&quot; (aka segment registers&#x2F;etc) in place to support modern computing.<p>Which is why I find these articles amusing, suddenly its in vogue to rediscover what most computer architects of the 60-80&#x27;s were doing, until RISC and UNIX basically destroyed it all, with leaky abstractions and insecure designs.<p>And since the PC is just a pile of legacy garbage no one looks at it close enough to discover they have the HW sitting on their desk to try out some of these ideas.
cmrdporcupinealmost 3 years ago
&quot;The R1000 has many interesting aspects ... the data bus is 128 bits wide: 64-bit for the data and 64-bit for data&#x27;s type&quot;<p><i>what what what?</i><p>How on earth would you ever need to have a type enumeration 2^64 long?<p>Neat, though.
评论 #31929080 未加载
评论 #31926361 未加载
评论 #31925996 未加载
评论 #31925689 未加载
评论 #31925384 未加载
a-dubalmost 3 years ago
&gt; Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?<p>are there alternatives to linearly growing call stacks?
评论 #31925847 未加载
评论 #31958478 未加载
verdagonalmost 3 years ago
The linear address space is the root reason why any language with FFI support will be inherently unsafe, unfortunately. Any errant pointer from C can accidentally (or intentionally) corrupt objects in the safe language. It&#x27;s a difficult problem to solve.<p>Vale&#x27;s &quot;Fearless FFI&quot; designs [0] says we could sandbox entire third-party libraries using a WebAssembly compilation step, which might work well. Sometimes I wonder what would happen if we made an entire OS using Vale, and what kind of security improvements it might bring.<p>[0] <a href="https:&#x2F;&#x2F;verdagon.dev&#x2F;blog&#x2F;fearless-ffi" rel="nofollow">https:&#x2F;&#x2F;verdagon.dev&#x2F;blog&#x2F;fearless-ffi</a>
gpderettaalmost 3 years ago
At Intel they probably still have nightmares about iAPX 432. They are not going to try an OO architecture again.<p>Having said that, I wouldn&#x27;t be surprised if some form of segmentation became popular again.
评论 #31925608 未加载
评论 #31925626 未加载
评论 #31925781 未加载
评论 #31928464 未加载
avodonosovalmost 3 years ago
Since this addressing scheme is &lt;object, offset&gt;, and as these pairs need to fit in 64 bits, I am curious, is the numjer of bits for each part is fixed and what are those fixed widths. In other words what is the maximum possible offset within one object and the max number of objects?<p>Probably segment registers in x86 can be thought as object identifiers, thus allowing the same non-linear approach?(Isn&#x27;t that the purpose of segments even?)<p>Update: BTW, another term for what the author calls &quot;linear&quot; is &quot;flat&quot;.
评论 #31925672 未加载
Veservalmost 3 years ago
Of course things would be faster if we did away with coarse grained virtual memory protection and instead merged everything into a single address space and guaranteed protection using fine grained permission mechanisms.<p>The problem with that is that a single error in the fine grained mechanism anywhere in the entire system can quite easily cause complete system compromise. To achieve any safety guarantees requires achieving perfect safety guarantees across all arbitrary code in your entire deployed system. This is astronomically harder than ensuring safety guarantees using virtual memory protection where you only need to analyze the small trusted code base establishing the linear address space and do not need to be able to analyze or even understand arbitrary code to enforce safety and separation.<p>For that matter, fine grained permissions are a strict superset of the prevailing virtual memory paradigm as you can trivially model the existing coarse grained protection by just making the fine grained protection more coarse. So, if you can make a safe system using fine grained permissions then you can trivially create a safe system using coarse grained virtual memory protection. And, if you can do that then you can create a unhackable operating system right now using those techniques. So where is it?<p>Anybody who claims to be able to solve this problem should first start by demonstrating a mathematically proven unhackable operating system as that is <i>strictly easier</i> than what is being proposed. Until they do that, the entire idea is a total pipedream with respect to multi-tenant systems.
评论 #31925972 未加载
评论 #31926407 未加载
评论 #31926178 未加载
scottlambalmost 3 years ago
tl;dr: conventional design bad, me smart, capability-based pointers (base+offset with provenance) can replace virtual memory, CHERI good (a real modern implementation of capability-based pointers).<p>The first two points are similar to other Poul-Henning Kamp articles [1]. The last two are more interesting.<p>I&#x27;m inclined to agree with &quot;CHERI good&quot;. Memory safety is a huge problem. I&#x27;m a fan of improving it by software means (e.g. Rust) but CHERI seems attractive at least for the huge corpus of existing C&#x2F;C++ software. The cost is doubling the size of pointers, but I think it&#x27;s worth it in many cases.<p>I would have liked to see more explanation of how capability-based pointers replacing virtual memory would actually work on a modern system.<p>* Would we give up fork() and other COW sorts of tricks? Personally I&#x27;d be fine with that, but it&#x27;s worth mentioning.<p>* What about paging&#x2F;swap&#x2F;mmap (to compressed memory contents, SSD&#x2F;disk, the recently-discussed &quot;transparent memory offload&quot; [2], etc)? That seems more problematic. Or would we do a more intermediate thing like The Mill [3] where there&#x27;s still a virtual address space but only one rather than per-process mappings?<p>* What bookkeeping is needed, and how does it compare with the status quo? My understanding with CHERI is that the hardware verifies provenance [4]. The OS would still need to handle the assignment. My best guess is the OS would maintain analogous data structures to track assignment to processes (or maybe an extent-based system rather than pages) but maybe the hardware wouldn&#x27;t need them?<p>* How would performance compare? I&#x27;m not sure. On the one hand, double pointer size =&gt; more memory, worse cache usage. On the other hand, I&#x27;ve seen large systems spend &gt;15% of their time waiting on the TLB. Huge pages have taken a chunk out of that already, so maybe the benefit isn&#x27;t as much as it seemed a few years ago. Still, if this nearly eliminates that time, that may be significant, and it&#x27;s something you can measure with e.g. &quot;perf&quot;&#x2F;&quot;pmu-tools&quot;&#x2F;&quot;toplev&quot; on Linux.<p>* etc<p>[1] eyeroll at <a href="https:&#x2F;&#x2F;queue.acm.org&#x2F;detail.cfm?id=1814327" rel="nofollow">https:&#x2F;&#x2F;queue.acm.org&#x2F;detail.cfm?id=1814327</a><p>[2] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31814804" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31814804</a><p>[3] <a href="http:&#x2F;&#x2F;millcomputing.com&#x2F;wiki&#x2F;Memory#Address_Translation" rel="nofollow">http:&#x2F;&#x2F;millcomputing.com&#x2F;wiki&#x2F;Memory#Address_Translation</a><p>[4] I haven&#x27;t dug into <i>how</i> when fetching pointers from RAM rather than pure register operations, but for the moment I&#x27;ll just assume it works, unless it&#x27;s probabilistic?
评论 #31932313 未加载
评论 #31927181 未加载
watersbalmost 3 years ago
The Slab Allocator for Linux keeps an object tag for each slab, which can be of different sizes.<p>Then we get the ZFS extent allocator, written mostly by the same programmer (Jeff Bonwick).<p><a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Slab_allocation" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Slab_allocation</a><p>Doing tagged slab allocation seems like a reasonable thing for a hardware microarchitecture to consider.<p>The article starts by complaining about modern CPU complexity, five-level page tables to maintain the illusion of a linear, uniform memory space.<p>Then goes on to juxtapose that complexity with the an architecture that would use that complexity to support tagged memory objects more directly, part of the system ABI.<p>If we&#x27;re going to spend all of our time manipulating memory objects anyway, why mess around?<p>Was the Intel iAPX so much of a disaster, compared to a modern x86_64 architecture, that we are forever forbidden to even consider such hardware?<p><a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Intel_iAPX_432" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Intel_iAPX_432</a>
jartalmost 3 years ago
You can avoid the five levels of indirection by using &quot;unreal mode&quot;. I just wish it were possible to do with 64-bit code.
评论 #31927441 未加载
anonymoushnalmost 3 years ago
Huge pages cause 10-20% speedups for all sorts of applications by nearly eliminating the pain of looking at a bunch of page tables all the time. Unfortunately they are completely unusable on Windows even though it&#x27;s been at least 26 years since they first shipped in consumer CPUs.
bogomipzalmost 3 years ago
&gt;&quot;Show me somebody who calls the IBM S&#x2F;360 a RISC design, and I will show you somebody who works with the s390 instruction set today.&quot;<p>Could someone explain this quote to me? I don&#x27;t know enough about the IBM S&#x2F;360 to understand this.
评论 #31931109 未加载
akdor1154almost 3 years ago
&gt; They also made it a four-CPU system, with all CPUs operating in the same 64-bit global address space. It also needed a good 1,000 amperes at 5 volts delivered to the backplane through a dozen welding cables.<p>That is absolutely terrifying.
评论 #31926608 未加载
评论 #31929088 未加载
gralxalmost 3 years ago
Link didn&#x27;t work for me. Direct link did:<p><a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;abs&#x2F;10.1145&#x2F;3534854" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;abs&#x2F;10.1145&#x2F;3534854</a>
perryizgr8almost 3 years ago
Physically, RAM chips have transistors in linear arrangments, one after the other.<p>Storage cell #1,334,455,224 is physically next to storage cell #1,334,455,223. Both may contain data from different &quot;objects&quot;, or not. That does not change the physical reality.<p>Having a representation of reality in your system&#x2F;software can be helpful in many cases. A nefarious example would be if you were attempting to write a rowhammer attack. How would you do that if the computer cannot reason about the physical location of storage cells in RAM?
pabs3almost 3 years ago
I wonder what the author thinks of the design of the Mill architecture.<p><a href="https:&#x2F;&#x2F;millcomputing.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;millcomputing.com&#x2F;</a>
peter_d_shermanalmost 3 years ago
&gt;&quot;The R1000 addresses 64 bits of address space instantly in every single memory access. And before you tell me this is impossible: The computer is in the next room, built with 74xx-TTL (transistor-transistor logic) chips in the late 1980s. It worked back then, and it still works today.&quot;<p>Nice! (This is nothing short of a miracle considering that this technology originated in the 1980&#x27;s and was built with TTL!)
nielsbotalmost 3 years ago
This article makes me wonder, now that Apple makes its own CPUs and owns the &quot;whole widget&quot; if they might do something more radical with their CPU architecture. They could build in Swift-specific hardware support for example. Apple could get an &quot;unfair&quot; power and performance advantage over their CPU competition.
ur-whalealmost 3 years ago
&gt; Why do we even have linear physical and virtual addresses in the first place<p>Mmmh, there was a time when PC&#x27;s had non-linear memory [1]<p>Coding for this was IIRC an effing nightmare.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;X86_memory_segmentation" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;X86_memory_segmentation</a>
pvillanoalmost 3 years ago
so right now we have<p>process thinks it has a linear memory to itself, imagines it can write anywhere. writes trigger the expansion of a complex sparse tree data structure transparently converts virtual linear address space to non-contiguous actual locations in main memory<p>the R1000 is<p>program can repeatedly ask for an x KiB page, is given an id, which is actually an index into an array that holds the actual RAM address. data is fetched with an id+offset, which is range&#x2F;pid checked against metadata in the array. the ids a program gets back aren&#x27;t guaranteed to be sequential. BUT the array is dense, which is why no tree structure is needed.<p>dynamic stacks have a little extra work to do, a data return pointer in addition to an instruction return pointer.
eternalbanalmost 3 years ago
An intro to CHERI: <a href="https:&#x2F;&#x2F;www.cl.cam.ac.uk&#x2F;techreports&#x2F;UCAM-CL-TR-941.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cl.cam.ac.uk&#x2F;techreports&#x2F;UCAM-CL-TR-941.pdf</a>
mwcremeralmost 3 years ago
tl;dr page-based linear addressing induces performance loss with complicated access policies, e.g. multilevel page tables. Mr. Kamp would prefer an object model of memory access and protection. Also, CHERI (<a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.5555&#x2F;2665671.2665740" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.5555&#x2F;2665671.2665740</a>) increases code safety by treating pointers and integers as distinct types.
anewpersonalityalmost 3 years ago
CHERI is a gamechanger