This is a lot of words about things that probably don't matter.<p>IMO: the bottleneck of lookup tables is concisely described as follows: 2 or 3 lookups in L1 cache per cycle on modern CPUs, and 10x fewer to 500x fewer depending on how far away (L2, L3, DDR near, or DDR remote).<p>Meanwhile, modern CPU instructions like AES, Multiply, XOR, ADD and more can effectively operate 3, 4, or even more times per clock tick regardless of circumstances.<p>--------<p>Don't even talk about cache hierarchies. You are already behind at the L1 cache level due to the relatively few load/store units in modern CPU (or GPU) cores. And hitting L2 or L3 cache gets exponentially worse.