RISC-V has a constant time coding extension (Zkt, <a href="https://riscv-software-src.github.io/riscv-unified-db/manual/html/isa/isa_20240411/exts/Zkt.html" rel="nofollow">https://riscv-software-src.github.io/riscv-unified-db/manual...</a>). I believe the idea of this is that your CPU will have a dedicated core that implements Zkt and performs a known set of operations in constant time. I'm not sure that anyone has actually implemented it.<p>Do other ISAs have anything similar?
So basically compilers, and optimizations inside CPUs, will keep aggressively optimizing code to create the the vulnerability.<p>Perhaps this indicates a missing concept in our compilation and execution of code, metadata that <i>demands</i> a section is not optimized.
The article details how current techniques for preventing timing attack are becoming ineffective for the following reasons:<p>> 1. Compilers are applying optimization techniques that are heuristically good for performance on general software, but happen to leak information through timing-based sidechannels when used on secret data. Since general-purpose CPUs and compilers are applied to a wide variety of tasks, compilers have no reason to stop doing such optimizations.<p>> 2. Constant-time coding techniques mostly aim at fooling the compiler, to prevent it from applying these optimizations. Since compilers keep getting smarter, theses techniques lose their efficiency.<p>> 3. Just-in-time (JIT) compilers have access to runtime data; they can, and will, use such information to perform extra optimizations that static compilers cannot do, and further destroy the developer’s attempt at achieving constant-time code.<p>> 4. JIT compilation is becoming pervasive and is now employed in various contexts, in particular inside CPUs. Such compilers can do the same optimizations as “external” JIT compilers, but are not publicly documented, and the output of their “machine” code translation cannot be inspected to detect deviations from constant-time execution.<p>> 5. Modern hardware and software stacks use a very layered structure, with each layer striving to hide its internal functioning details from upper layers. The socio-economic context which allowed such hardware to exist inherently relies on such abstraction (under
the name of “industrial secrecy”). An effective constant-time coding process would require a model of computation with strong guarantees on timing-related characteristics, that would be maintained through all these layers, down to the semiconductors that implement the logic gates. This industry-wide vertical cooperation is unlikely to happen.
Can an integrated CPU-FPGA design (like the ones AMD sells) be a solution to this problem? Point 5 of the paper says:<p>> Modern hardware and software stacks use a very layered structure, with each layer striving to hide its internal functioning details from upper layers. The socio-economic context which allowed such hardware to exist inherently relies on such abstraction (under the name of “industrial secrecy”). An effective constant-time coding process would require a model of computation with strong guarantees on timing-related characteristics, that would be maintained through all these layers, down to the semiconductors that implement the logic gates. This industry-wide vertical cooperation is unlikely to happen<p>It's not mentioned in the paper, but hardware description languages provide strong guarantees on timing-related characteristics on the semiconductor level. Hardware companies are willing to provide user access to FPGAs, as evidenced by the userspace FPGA subsystem in the Linux kernel as well.[1] You can buy Versal SOCs right now that combine ARM CPUs with FPGAs on one die, run Linux on it, and outsource your crypto tasks to the FPGA.<p>I see a future where we give up on constant-time coding for CPUs and move crypto libraries to FPGAs, given the hundreds of billions of dollars in our economy reliant on strong encryption. That'd be a world in which every corporate laptop is required to have an FPGA for compliance purposes.<p>It'd be interesting to know if anyone has <i>tried</i> this.<p>[1]<a href="https://www.phoronix.com/news/FPGA-User-Space-Interface-AMD" rel="nofollow">https://www.phoronix.com/news/FPGA-User-Space-Interface-AMD</a>
> The intent is to
produce the mask values without using a plain comparison (“x != 0”) so that the compiler
does not notice that we are really making a conditional move.
The compiler is not fooled.<p>So practice your martial arts on the bow of a rowboat. Balance, Danielsan.<p>The whole timing thing is essentially an issue of getting off-balance as in martial arts and the solution is that there are actions you simply never do, and ones that look like they lack economy of motion because so much of the work is in keeping your center of gravity from shifting outside of your base. Tai chi is infinitely better at protecting you from concussing yourself on black ice than it is at protecting you from a mugger.<p>So if you have a conditional move for one calculation, then move a to b if !c and move it to d if c. And when the compilers chase that down in another generation, or someone notices that the L1 cache timing is different if d gets cold, then use both calculations in the output. Calculate complements where the runtime of a + !a = 1 to several decimal points.<p>Will it work for eternity? Unlikely. Will it work for a decade? Probably.
At <a href="https://rwc.iacr.org/2025/program.php" rel="nofollow">https://rwc.iacr.org/2025/program.php</a> you can see there is a talk scheduled to be given in a couple weeks titled "Testing Side-channel Security of Cryptographic Implementations against Future Microarchitectures" with the following bits in the abstract: "Using this framework, we conduct an empirical study of 18 proposed microarchitectural optimizations on 25 implementations of eight cryptographic primitives in five popular libraries. We find that every implementation would contain secret-dependent leaks, sometimes sufficient to recover a victim’s secret key, if these optimizations were realized. Ironically, some leaks are possible only because of coding idioms used to prevent leaks under the standard constant-time model."
So far, SIMD operations are contant time, since the whole vector needs to be computed, and not just a single element in it.<p>ChaCha20 or Kyber might be using that. Although, I don’t know if any cryptographic algorithm depend on vector only calculation as a base for their strength.<p>Alternatively, I suppose that specialised cryptographic CPUs could be made to follow strict standards ensuing constant time runtime.
One, old technique that can counter this is as follows:<p>1. Clock how long the operation takes on any type of data. By operation, it would be a function call maybe for a whole block or buffer.<p>2. Add a small margin of time to that.<p>3. Modify the function call to make sure it has waited that long before returning a response. That is true whether it finished early or not.<p>The result prevents the function call from leaking timing information.<p>High-assurance, security certification in the 1980's also required mitigating covert, storage channels. That would require, at a minimum, overwriting any memory used during the call and all the CPU registers before returning to the caller.
editorialized title, originally "Constant-Time Code: The Pessimist Case"<p>most of the points brought up in the article have been happening already for decade(s), it's not something that 'will soon' happen.
Give me hardware with a discrete cryptographic processing unit which is just a CPU that does has no speculation, caching, or other nonsense. In other words, a CPU that actually works the way it's supposed to work. I don't care that idiots want their wrong software to be as fast as possible, I want correct software that's as fast as possible, and no faster.
My immediate thought upon seeing the condmove() example was "Slap ‘volatile‘ on all those local variables". This forces a standards-compliant compiler to reload their values every time they are referenced. Basically, the compiler is no longer allowed to assume that the current thread of execution is the only actor that updates the variable -- any other mechanism (including another thread, but not limited to that -- e.g., a hardware DMA channel) could have modified the variable's value.<p>That doesn't solve the hardware-JIT-level problem, but it would appear to solve the compiler-level problem.<p>ETA: I only read up to p. 3, but I did search for the text "volatile" and didn't find it.
x86 and ARM both have options for executing certain instructions with data-independent timing. The problem here is that languages and compilers need to expose and honor data types that compile down to only those instructions.<p>This would be something like a 'secret' keyword that would qualify integer types; i.e. in C you could have a variable of type 'secret unsigned int' and the compiler would reject optimizations that would reveal timing information about the variable's contents, while optimizing the rest of the program's non-secret variables. Values can be promoted to secret but not demoted. Code that intends to reveal secrets (e.g. to write them out to secure storage) must explicitly cast away the secretness.<p>AFAIK Golang has crypto/subtle, but that's more special functions to do certain things in a data-independent way and not pervasive language support for it (and, presumably, they have to keep updating the compiler to not optimize that specific module).
I didn't read TFA, but does this imply that all encryption needs to be done in a special enclave with dedicated hardware (that isn't vulnerable to this)?
One whole technique not mentioned in the paper or comments is bitslicing. For non-branching code (e.g. symmetric ciphers) it's guaranteed constant-time and it would be a remarkable compiler indeed which could introduce optimizations and timing variations to bit-sliced code...
(Url changed from <a href="https://eprint.iacr.org/2025/435" rel="nofollow">https://eprint.iacr.org/2025/435</a>, which points to this.)