I found the paper didn't live up to its claims. It said:
<i>the ever-changing behavior of these SoCs is also visible via internal measurement sensors, allowing us to distinguish between executed instructions, and even different operands of the same instruction</i><p>But when you read further and see what they tried:<p><i>We then selected one Arm instruction from each data-processing bucket,testingstores(str),AESinstructions(aese, aesmc), rotate right (ror), bitwise and (and), and both integer and floating-point addition (add, fadd) and multiplication (mul, fmul). We run each instruction in a loop on all available P-cores on each test device</i><p>What they did is define a handful of known workloads, with very different power profiles. And then they find that they can tell them apart by looking at the power of the chip. Well, duh.