I looked at the Go implementation of this in "tagged pointers" [0]<p>The amount of data that can be used for the tag is architecture-dependant, and the routine discards any tag bits that don't fit into the tagged pointer without telling the caller.<p>To me, this seems ridiculous - why not just use a struct with a tag and a pointer, and not run the risk of your tag being destroyed without you knowing because the architecture can't fit that many bits?<p>But the Go folks are smart, and must be doing this for a reason. Can anyone explain the thinking here?<p>[0] <a href="https://github.com/golang/go/blob/master/src/runtime/tagptr_64bit.go">https://github.com/golang/go/blob/master/src/runtime/tagptr_...</a>
There's a trick here I hadn't noticed. Good times.<p>If the plan is 8 byte aligned data and use the three low bits for other stuff, you mask them off before loads/stores. An alternative is to use the high three bits, store the pointer/8, and multiply by 8 to retrieve.<p>That's appealing on x64 because memory operations can include a *8 in the encoding.<p>Specifically, I've long wondered whether the pointer masking tricks mess up prefetch/speculation. It seems plausible that making the dereference look more like a normal memory operation is helpful for that.<p>(It also means the low-alignment-bits and high-unused-bits can be considered contiguous modulo the *8 on decode which is probably less annoying when using both ranges)
Note that Intels 5-level paging uses 57 bit addresses, so on these processors, you can only safely assume the top 7 bits are unused. <a href="https://en.m.wikipedia.org/wiki/Intel_5-level_paging" rel="nofollow noreferrer">https://en.m.wikipedia.org/wiki/Intel_5-level_paging</a><p>I also believe that when masking the bits off before loads/stores, you need to set them to the same value of the highest used bit (bit 47 or bit 56), while this often is 0, it’s not necessarily the case. Something to be aware of.<p>Finally, when using C++, you gotta be careful of strict aliasing. In C++20 you can use std::bit_cast <a href="https://en.cppreference.com/w/cpp/numeric/bit_cast" rel="nofollow noreferrer">https://en.cppreference.com/w/cpp/numeric/bit_cast</a>
It's really unfortunate that all of the mainstream OSes run userspace in the lower portions of the address space.<p>Setting the most-significant 13 bits (really, setting the second to 12th bits and at least one of the following bits) of an IEEE-754 float will result in a NaN bit pattern. That means that any pointer to the top 2 petabytes of a 64-bit address space, if cast to an IEEE-754 double will be NaN. This means the NaN-boxing used in Safari and Firefox's JavaScript engines, LuaJIT, etc. would be no-ops. (Safari and Firefox use different mechanisms, but they'd become the same if moved to the top of the address space.)<p>It's not enough of a performance difference to re-jigger everything in mainstream OSes, but I imagine if someone were to come up with a unikernel/exokernel OS specifically for JITing some dynamic language, there's some performance to be had by having all of the dynamic language objects in the upper 2 petabytes of the address space.
Very old Macs used this trick to squeeze their ROM routines down a bit, operating with 24 bit addressing and using the top bits for flags and whatnot. Of course they ran into trouble when machines with 16MB of memory started appearing. If you do this you might be making more work for yourself in the future when you buy a new machine with 256EB of main memory.
This is a nice summary of the practical aspects and considerations. It isn’t something anyone should be doing explicitly on a regular basis but there are occasions, particularly in libraries, where it is the perfect tool for the job.<p>There is also the inverse use case: smuggling pointers inside status codes, enums, and similar. For example, optionally encoding a pointer to additional error metadata for non-zero result codes. In C++ it isn’t that uncommon to also see Result types implemented similarly when the type allows it.
There's also the Rust library smartstring which allows creating short strings without heap allocations: a Rust string consists of three values (pointer, capacity, length), so 24B on a 64bit architecture. With one byte used as tag this leaves 23 bytes for storing the string in the same space.
This will be a problem when trying this on Capability Hardware Enhanced RISC Instructions (CHERI) <a href="https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/" rel="nofollow noreferrer">https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/</a>
systems. Currently the only CPU with this implemented is Arm's Morello prototype.<p>Pointers are replaced by 128-bit capabilities containing what operations are valid, a 64-bit base address and a size. These capabilities are unforgeable, so trying to play with "unused" parts of 64-bit pointers simply won't work.<p>FreeBSD & Gnome are running on hardware after minor source-code changes, along with a significant proportion of FreeBSD ports collection.<p>As well as ARM, Microsoft is interested: <a href="https://www.microsoft.com/en-us/research/project/portmeirion/overview/" rel="nofollow noreferrer">https://www.microsoft.com/en-us/research/project/portmeirion...</a>
My favorite hack back in MFC days was a combo box which I stored the pointer address in the text (to the right after a lot of spaces so was hidden). When a user chooses an item, parse the pointer and de-reference it back to an object.
"I think it's quite well known that on a 64-bit system, the maximum bit-width
of a virtual address is somewhat lower (commonly 48-bits)." might actually be a perfect example of the Average Framiliarity xkcd[0]. It's perfectly fine to write it that way and the article obviously knows its intended audience. But I'm wondering what percentage of readers here actually knew this beforehand. (I learned it only pretty recently myself. And yes, I should have known this sooner, but ... didn't.)<p>[0] <a href="https://xkcd.com/2501/" rel="nofollow noreferrer">https://xkcd.com/2501/</a>
My first thought was to use the Address calculation logic for an additional ALU, then... my second thought was trying to justify this during a code review... and lastly, why Microsoft used the LDT upper byte to make Xenix 286 incompatible... and the headache that changing architectures made for poor programming.
The alpha only accessed 8-byte aligned mem addresses (originally anyway). The bottom 3 bits were ignored (masked out) on lookups, per the spec, to allow users to stuff juicy extra info into these.