Tug of War for Secure Caches: Randomization, Partitioning, or Both?

This blog post is a continuation of Gururaj’s SIGARCH blog, written three years ago. It revisits the design of secure caches and, primarily, two design choices available to the designers: partitioned cache and randomized cache. In the last three years, significant advances in this field have been made resulting in multiple papers in computer architecture and security venues. These designs improve performance-security tradeoffs and overall security guarantees.

In general, two kinds of threats are pertinent w.r.t. the last-level cache shared by multiple processes running on multiple cores: (i) conflict-based and (ii) occupancy-based. Zhao et al. (ASPLOS 2024) demonstrated a last-level cache (LLC) side-channel attack in the public cloud (FaaS Google Cloud Run), which makes a stronger case for the need for a secure LLC. Over the last three years, more than a dozen papers on randomized or partitioned caches have appeared at architecture and security conferences focusing on conflict-based and occupancy-based attacks. The partitioned cache is a secure design that ensures complete isolation of the LLC space among all applications (trusted and dis-trusted), ensuring complete security. The problem is solved, right? No, not really. A partitioned cache can lead to a significant performance slowdown with an increased core count, especially for memory-intensive workloads. A randomized cache, on the other hand, does not incur a significant performance slowdown. However, most of the randomized caches are prone to occupancy-based attacks.

In the ISCA 2022 paper by Cook et al., the authors demystified the cache occupancy-based website fingerprinting attack proposed by Shusterman et al. (USENIX SECURITY 2019). They showed that system interrupts were the primary cause of the attack, not cache occupancy. This brings us to the first open problem: Can we showcase a side-channel attack through cache occupancy? Yes, intuitively, it makes sense. However, it has yet to be demonstrated comprehensively.

Randomized caches post MIRAGE (USENIX SECURITY 2021). Verma et al. (SEED 2022) showcased a stochastic prime+probe communication channel (occupancy-based channel) possible through randomized caches. However, MIRAGE claims that it mitigates all the attacks that a fully associative cache can mitigate, and occupancy based attacks are possible with fully-associative caches too.

An exciting avenue for future research will be to showcase an occupancy-based side-channel attack through MIRAGE, as the global random eviction may facilitate creating an occupancy-based channel faster than a conventional fully associative cache. However, the community should pay attention to subtle nuances which, if ignored, can lead to wrong conclusions (HPCA 2023) (IEEE CAL 2023).

SassCache (S&P 2023) is the latest randomized cache that uses cryptographic functions to provide index-based soft partitioning between the attacker and the victim. This randomized yet soft-partitioning-based solution helps defend against conflict-based and occupancy-based attacks. However, SassCache incurs a significant performance slowdown with a significant increase in misses at the LLC. Also, SassCache may become vulnerable if the attacker occupies many security domains. A recent paper, VARP (ASIACCS 2024), discusses the effect of LLC replacement policies on the effectiveness of randomized caches and proposes a policy that makes the creation of eviction sets 25 times slower than a random replacement policy. Finally, another paper in ASIACCS 2024 by Ramkrishnan et al. makes a case for non-fusion-based coherent cache randomization. The paper argues that, with previous randomization techniques, domain fusion forces different security domains to use the same randomization function, thus reducing the security level significantly. The paper introduces a new randomization-with-sharing approach (RAWS). Maya cache (ISCA 2024) improves the storage efficiency of MIRAGE with similar security guarantees, which answers one of the research problems as mentioned in the future works of Gururaj’s blog (how to reduce storage and power overheads of randomized caches while maintaining security)? Maya cache exploits the number of dead-on-arrival entries in the LLC to shrink the data store and provision extra tag store entries to track reuse, providing a solid security guarantee.

Another open problem to explore is the following: do cache occupancy attacks present a threat in the real world, and if so, do they leak as much, less or more than conflict based (prime + probe) attacks? It would be great if the architecture and security community can examine this problem more holistically, rather than just focusing on whether MIRAGE or MAYA makes cache occupancy attacks worse.

Cache partitioning post-MIRAGE. Chunked-cache (NDSS 2022) provides a security-performance tradeoff by servicing on-demand security requirements where domains can specify which memory regions need security and the required cache capacity. This configuration dedicates specific cache sets to the domain, ensuring its memory addresses are exclusively mapped, while other domains without security needs can freely access the mainstream cache. SCALE (HOST 2023) provides a secure, dynamically partitioned LLC by enabling bank-level way partitioning. Here, the number of partitions is limited by the product of the number of banks and the cache associativity. To dynamically allocate cache partitions based on the application usage, it adds non-determinism to the allocation policy by introducing randomness. TEE-SHirT (NDSS 2024) partitions LLC by combining both ways and set partitioning. TEE-SHirt is motivated by composable cachelets (USENIX SECURITY 2022). Finally, Ceviche (S&P 2025) introduces a hardware virtualization strategy that enables capability-based cache lookups, ensuring fine-grained cache partitioning at the granularity of a single cache line. Each capability encodes access rights and permitted operations on the physical cache line, providing robust guarantees for confidentiality, availability, and fairness. INTERFACE (SEED 2024) combines both randomization and partitioning. It is an indirect, partitioned, random, and fully associative cache consisting of a fully associative data store and a skewed set-associative tag store.

Security evaluation of randomized caches. Song et al. (S&P 2021) proposed two metrics, eviction rate and probability of creating an eviction set to evaluate randomized caches, keeping conflict-based attacks in mind. CacheFX (ASIACCS 2023) proposed a flexible framework for evaluating and comparing secure cache designs using three different metrics: (i) the entropy induced by memory access, (ii) the complexity of building an eviction set, and (iii) protection against cryptographic attacks. The authors argued that occupancy attacks are more practical for highly random cache designs and should be considered while evaluating their security. Metior (ISCA 2023) proposed a model to evaluate the effectiveness of side-channel mitigations quantitatively. Using the metric of information flow from the victim to the attacker, Metior revealed new insights into recent randomized cache designs and their defense against various classes of attacks. One of the critical conclusions from Metior is that for certain real-world victim applications, occupancy attacks can sometimes achieve a similar level of leakage as conflict-based attacks.

Performance evaluation. Many of the randomized caches and secure cache partitioning proposals show a reduction in the miss rate at the LLC as a metric for performance, which can be misleading, as more pertinent metrics like MPKI cannot be inferred from the miss rate. Also, most partitioning techniques are not evaluated on many-core systems, 16 cores and higher, which makes it hard to comment on their performance trends. Additionally, the workloads used for evaluation must include homogeneous mixes (e.g., SPEC RATE mode) and heterogeneous mixes (both multi-programmed and multi-threaded). Otherwise, getting the overall performance trend across all techniques is difficult.

A fixed and comprehensive set of parameters is essential for evaluating the performance and security of new secure LLC designs, ensuring fair comparison and analysis.

What is next? Both randomized and partitioned caches have their pros and cons. Instead of finding the solution in this space, starting with a randomized cache that provides the best tradeoff regarding storage, power, and performance overhead may be a good idea. If the user needs their system to be isolated from an occupancy attack, partitioning can be applied through a BIOS knob. In this way, the cloud provider can provide security as a service. We already have different mitigations for different attacks; for example, STT (MICRO 2019) and GhostMinion (MICRO 2021) mitigate transient execution attacks through caches but do not mitigate conflict-based and occupancy-based attacks. Another possible way is to have an agile and accurate attack detector like CYCLONE (MICRO 2019) that can detect different kinds of attacks, and only when an attack is detected the cache should reconfigure itself into a randomized cache or a partitioned cache. One such approach is Avenger (SEED 2022).

In summary, the cloud provider can provide different knobs that can be used to reconfigure the cache into a randomized or partitioned cache.

What about a secure multi-level hierarchy? The primary focus so far is to secure the last-level cache, as multiple cores share it. However, whether these ideas should be directly extended to private caches and coherence directories is still being determined. There are two ways to think about it: a homogeneous approach for the entire cache hierarchy or a heterogeneous approach, such as randomization at the large LLCs, partitioning at the private caches, and coherence directories. TEE-SHirT (NDSS 2024) partitions LLC and private L2 caches and flushes private L1 caches. However, it introduces cache coherence issues, which require additional mechanisms to address. Most of the mitigations ignore cache coherence issues.

What if coherence implications are one of the driving forces for what defense to apply?

Finally, there is a new attack in town, which got little attention in the community, which is the confused deputy through the last-level cache. Schaik et al. (USENIX SECURITY 2018) showed that a trusted component such as the MMU acts as a confused deputy and proposed MMU-based indirect cache attacks. It is unclear whether recent randomized caches and partitioned caches are affected by this indirect attack. The authors of Ceviche (S&P 2025) showed that many of the recent proposals are vulnerable to this indirect attack.

Acknowledgements: Many thanks to Gururaj, Dmitry, and my mentees Anubhav and Prerna for useful feedback on early drafts of this article.

About the Author: Biswabandan Panda is an assistant professor at the Computer Science and Engineering Department, Indian Institute of Technology Bombay. His research interests span microarchitecture for performance and security.

Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.

Computer Architecture Today

Tug of War for Secure Caches: Randomization, Partitioning, or Both?

Contribute

Recent Blog Posts

Archives

Subscribe

Join Us

Computer Architecture Today

Tug of War for Secure Caches: Randomization, Partitioning, or Both?

Share this:

Contribute

Recent Blog Posts

Archives

Tags

Subscribe

Join Us