This is not directly relevant, but Power ISA 3.1 (from May 2020) has 128 bit division operations (vdivuq and vdivsq) with a maximum latency of 61 cycles. I don't have access to a Power 10 machine to see how it compares to what's presented here, but I thought it was an interesting addition to the ISA.<p><a href="https://wiki.raptorcs.com/w/images/f/f5/PowerISA_public.v3.1.pdf" rel="nofollow">https://wiki.raptorcs.com/w/images/f/f5/PowerISA_public.v3.1...</a>
<a href="https://files.openpower.foundation/s/EgCy7C43p2NSRfR" rel="nofollow">https://files.openpower.foundation/s/EgCy7C43p2NSRfR</a>