Spectre and Meltdown opened the Pandora box of a new class of speculative execution attacks that defeat standard memory protection mechanisms. These attacks are not theoretical, they pose a real and immediate security threat, and have been reportedly exploited by cybercriminals. A few presentations from ISCA 2018 Panel, Paul Kocher @ Stanford, Jon Masters from RedHat, and Mark Hill @ CCC BLOG provide an excellent overview.
In this blog we raise some concerns about the emerging software mitigations of Spectre-V1 attacks due the under-specified hardware behavior that they rely upon.
There are two aspects of the Spectre and Meltdown attacks that distinguish them from attacks on software. First, a perfectly correct and safe software may still be vulnerable. Second, the software-based mitigations can only curtail the symptoms of the attacks, rather than address the particular bugs that enable them, as a result remaining potentially vulnerable to many other attacks of the same class. Furthermore, software mitigations, such as KPTI and retpoline, are fairly intrusive, and require either OS modifications or full recompilation. Thus, the hope and the expectation is that the future CPUs will resolve these problems at the hardware level.
Spectre V1: no hardware mitigations
Unfortunately, Intel did not disclose any plans to provide a hardware fix to the Bounds Check Bypass (BCB) attack (aka, Spectre V1), leaving the mitigation to software alone. Worse, the attack cannot be prevented by patching the OS, so the potentially vulnerable code must be refactored or recompiled. There are already many examples of sensitive code that is classified as potentially vulnerable.
Reminder: Spectre V1
In the Spectre attacks there are two processes: a victim and an attacker. For the attack to work, the attacker must be able to control certain inputs to the victim process and must be able to train the branch predictor used by the victim.
Bounds Check Bypass exploits the following gadget in the victim code.
1 if (is_safe(input_ptr)) 2 secret = load(input_ptr) 3 leak(secret)
The attacker controls the input_ptr variable. The attacker uses the variable to access the victim’s memory by specifying an arbitrary value in the victim’s address space (i.e., by accessing outside the valid range). Such an access would have normally been prevented by the conditional statement in line 1. However, this is not the case when the branch is executed speculatively. The attacker trains the branch predictor so that the branch is predicted as taken. When the victim runs, the branch outcome will be mispredicted and the branch will be speculatively executed. It allows the attacker to read the victim’s memory and leak the values (line 3) via a micro-architectural channel which cannot roll back the written values when the misspeculation is detected (i.e., the cache). Thus, if the misspeculation is detected after lines 2 and 3 have been executed, the attack will be successful.
Mitigation 1: full serialization
The most straightforward way of defending against Bounds Check Bypass is to prevent the speculation altogether. In line with this observation, Intel proposes to patch vulnerable regions of code by using LFENCE. Intel claims that LFENCE is, effectively, a serialization instruction that ensures that “any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed” (Chapter 7 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 3). Thus, LFENCE can force the leaky memory access to wait for the safety check if placed right after the conditional statement:
1 if (safe(input_ptr)) 2 LFENCE 3 secret = read(input_ptr) 4 leak(secret)
There are two important observations worth noting.
First, LFENCE is a Load FENCE, so theoretically it is not supposed to serialize the execution of the branch with respect to the load in line 3. However Intel reportedly changed the semantics of LFENCE to block the speculative execution to serialize. In fact, there is a footnote in the processor manual that LFENCE is, indeed, a serializing instruction, but it was rather challenging to find.
Second, as a result of the serialization behavior, this approach is too restrictive because it serializes the execution of all the instructions following the comparison, not only the leaking ones. Thus, if conservatively applied to all the branches in a program, this technique results in fairly high overheads, in certain cases slowing down the execution by 10x. Such prohibitively high overheads are clearly unacceptable. Minimizing the overheads by instrumenting only suspicious branches turns out to be highly non-trivial. Microsoft’s compiler patch attempted to do just that, but this solution was later shown to miss many vulnerable branches that slightly deviate from the exact Spectre gadget.
Mitigation 2: Speculative Load Hardening
Another technique is called Speculative Load Hardening (SLH). SLH uses an interesting trick: it injects an artificial data dependency between the branch condition evaluation and the potentially leaking load. This data dependency is forcing in-order execution of branch-load pairs, without serializing the rest of the instructions in the branch (for more details see the original document).
1 all_ones = 0xFFFFF... 2 all_zero = 0x0 3 mask = all_ones 4 if (safe(adversarial_input)) 5 CMOVcc all_zero, mask 6 secret = read(adversarial_input) 7 secret &= mask 8 leak(secret)
SLH introduces a conditional move (CMOVcc) instruction right after the branch. CMOVcc is a generic name for instructions that check the EFLAGS register to determine whether to update the target memory location. Thus, CMOVcc adds the dependency on the outcome of the condition in line 4. SLH then adds an AND operation, which forces the execution of the secret leaking instruction in line 8 to depend on CMOVcc, and, in turn, on the evaluation of the branch condition.
Another interesting aspect of SLH is that it “hardens” the access to the secret (line 7). The assumption is that even if there is misspeculation of the branch, and if the evaluation of the condition in line 4 is delayed, the CMOVcc will reset the mask to 0, preventing the secret from leaking in line 7.
This solution indeed has much lower overheads compared to LFENCE, and therefore can be applied aggressively, in a much more conservative way to ensure comprehensive protection.
No formal correctness guarantees
The problem, however, is that the correctness of SLH relies on the assumption that the CPU does not perform speculation of the value of EFLAGS register, neither it performs any kind of value speculation. While in principle this assumption may be correct, there is nothing in the Intel spec that appears to guarantee its validity. Moreover, even if SLH does work in the current hardware generation, it might be broken in the next one.
This solution, as well as other similar proposals are described as a tradeoff between security and performance. While much more efficient than LFENCE, their security guarantees are not quite clear. Thus, until the speculation behavior is specified more explicitly in the processor spec, software developers will end up speculating about the actual system behavior, and end up with fairly brittle performance-security tradeoffs. The fear is that the damage of such a speculation may potentially be much harder to mitigate than the original hardware flaws.
About the Authors: Mark Silberstein is an Assistant Professor in the Electrical Engineering Department at the Technion – Israel Institute of Technology, where he heads the Accelerated Computer Systems Lab. Oleksii Oleksenko is a PhD student at TU Dresden. Christof Fetzer is a Professor of Computer Science at TU Dresden
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.