Computer Architecture Today

Informing the broad computing community about current activities, advances and future directions in computer architecture.

Serverless 101

Serverless computing has emerged as a pivotal development and deployment paradigm for cloud computing. According to a recent analyst report, over 50% of companies that use cloud services have adopted serverless computing. Usage of serverless technologies is rapidly expanding, with a CAGR of over 20%. This substantial and growing uptake underscores the transformative potential and increasing acceptance of serverless computing.

In serverless, developers program their applications as workflows of stateless functions, which are invoked on-demand in response to user actions (e.g., clicking a link) or by another serverless function. Cloud providers are responsible for scheduling and placing function instances on physical nodes based on real-time demand, thus freeing developers from infrastructure management concerns. Moreover, the stateless nature of serverless functions facilitates extreme scalability, from zero – when none of the function instances are running yet the service is available – to thousands of concurrent instances. 

Developers find serverless appealing due to the fact that it vastly simplifies infrastructure management and offers potential cost savings over traditional cloud deployment practices. Indeed, serverless functions are billed only for the resources (CPU time, memory, storage I/O) they consume while executing, which contrasts sharply with traditional virtual machines where billing is based on provisioned capacity regardless of its actual utilization. The financial and operational efficiency of serverless allows developers to focus their energy on the core application logic instead of infrastructure management.

From the cloud provider’s perspective, serverless computing is advantageous because it leaves them in control of their IT infrastructure, which facilitates efficient usage of the deployed hardware. For instance, instead of running a small number of traditional virtual machines, some of which may consume precious hardware resources while staying idle for extended periods of time, cloud providers can pack hundreds or even thousands of serverless function instances on a server, each instance using fractional CPU and memory resources. Thus, serverless allows cloud providers to maximize the efficiency and profitability of their infrastructure, positioning serverless computing as a mutually beneficial solution for both developers and providers.

Trends in Serverless Research

With its stateless functions and on-demand instantiations, the serverless model is rife with inefficiencies. This fact has not gone unnoticed by researchers, who have sought to address various bottlenecks using both software and hardware approaches. Here, we discuss three of these bottlenecks and some of the proposed mitigations. 

One pervasive concern for serverless developers is the cold-start latency, i.e., the time needed to launch a new serverless function instance in response to an incoming invocation. While cold-starting an instance requires performing many sequential steps, all of which contribute to cold-start latency, the dominant components include booting the virtual machine on the target node and launching the runtime (e.g., Java or Python). 

A number of works have sought to ameliorate the performance impact of cold starts. One common approach is to capture, or snapshot, the allocated memory of a ready-to-run function instance in local or network-attached storage. When a new instance of the function needs to be created, the snapshot is used to simply restore the state of the entire VM, thus avoiding the lengthy boot process. Several works have sought to optimize the performance of snapshots (e.g., Catalyzer, REAP, FaaSnap) and, recently, Amazon announced support for snapshots for serverless Java deployments. Another approach for addressing the cold start performance bottleneck tries to predict upcoming invocations of a given serverless function to proactively launch new instances, thus incurring the cold start latency out of the critical path of the actual execution (e.g., Shahrad et al., SpecFaaS).

Another significant challenge in today’s serverless model is the stateless nature of serverless functions, which introduces both performance and cost overheads since all state read and written by an instance must be externalized. To make matters worse, functions cannot directly communicate with each other, and any data above a few megabytes transferred between a pair of functions must be moved via a storage service, which adds latency and monetary cost.  

In response, researchers have proposed several enhancements to the serverless communication and data access mechanisms. Works such as  SONIC, Faa$T, and CloudBurst explore approaches for cross-function communication using dedicated storage services, often referred to as ephemeral storage, for caching intermediate results or even providing shared memory semantics as in CloudBurst. Another approach has focused on enabling direct instance-to-instance transfers to avoid the need to pass data through intermediate storage (e.g., Boxer, XDT). A key consideration for direct transfers is maintaining compatibility with existing auto-scaling infrastructure, which is a touchstone of serverless computing. Finally, to reduce or eliminate data movement, works such as Palette and Pheromone try to exploit data affinity across function invocations through developer-specified locality hints. 

Opportunities for innovation in the serverless space are not limited to just software and storage. Recent research has demonstrated gross inefficiencies at the microarchitectural level stemming from thousands of serverless functions interleaving their executions on a single server, thereby inducing extremely high context switch rates (e.g., Shahrad et al., Jukebox). In light of relatively infrequent invocation rates and a high degree of interleaving, a warm serverless function will typically find all of its on-chip microarchitectural state (including caches and in-core structures) completely cold when it starts running – phenomenon termed lukewarm execution. Indeed, recent works have demonstrated that interleaving can degrade the execution time of serverless functions by a factor of two or more as compared to back-to-back invocations with no interleaving, indicating a need for specialized microarchitectural support (e.g., Jukebox, Ignite).

Getting in on the Serverless Action

The abundance of inefficiencies in today’s serverless stacks is a gold mine for computer architects and systems researchers. So, how does one dive in? 

Over the last several years, ourselves along with a number of collaborators and contributors have developed infrastructure and tools to enable full-stack serverless research at any scale. These include vHive (a complete serverless system representative of the leading commercial platforms such as AWS Lambda and Google Cloud Run), vSwarm (a suite of meaningful applications turned into containerized serverless workflows), vSwarm-u (a set of containerized serverless functions directly runnable in the Gem5 simulator), In-Vitro (a framework for cluster-scale studies with production traces), and several others. All of these projects are open-source, well-documented, and have been adopted for research and teaching purposes by a number of institutions in academia and industry. 

The serverless world is your oyster. Dig in!

About the authors:

Boris Grot is a Professor in the School of Informatics at the University of Edinburgh. His research interests include server hardware and software stacks, networking and datacenter-scale computing. Boris was the Program Chair for MICRO 2022 and is the General Chair for HPCA 2024.

Dmitrii Ustiugov is an Assistant Professor at NTU Singapore. Dmitrii’s research interests lie at the intersection of Computer Systems and Architecture with a current focus on serverless computing systems and systems for AI.

Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.