Computer Architecture Today

Informing the broad computing community about current activities, advances and future directions in computer architecture.

Digital technologies have enabled a plethora of new applications that have unlocked significant economic growth and improved the quality of many aspects of our lives. Despite the positive societal benefits, as computing becomes increasingly ubiquitous, so does its environmental footprint. At this year’s ISCA, Carole-Jean Wu organized and moderated a panel on Designing Computer Systems for Sustainability, joined by Tamar Eilam, Babak Falsafi, Gage Hills, and Srilatha Manne. The panel provided an opportunity to examine the environmental implications of computing from the ever-increasing energy use to greenhouse gas (GHG) emissions associated with manufacturing devices and e-waste. 

Thanks to decades of laser-focus on efficiency optimization, the electricity demand of datacenter computing globally increased by merely 6% between 2010 and 2018, despite a greater than 5X increase in the overall compute instances deployed in datacenters world-wide. In the presence of new application drivers, global datacenter electricity demand from 2019 could double by 2026 in the business as usual scenario.

We must look beyond designing computer systems for the use phase. Taking Apple iPhone 15 Pro as an example, more than 80% of the iPhone’s lifecycle emissions come from production (including semiconductor integrated circuit manufacturing) whereas operational carbon emissions account for only 15% with transportation and end-of-life processing accounting for the rest. As more intelligence is brought to the edge for richer contextual signals, interactive experience, and improved privacy, the next wave of computing is expected to come with significantly higher carbon embodied in wearables — at the scale of billions and new datacenters to be built for on-device intelligence!

New Challenges and Opportunities

Addressing sustainability challenges of computing requires a holistic approach that considers the lifecycle of design, manufacturing and use of computing systems. To do so, we need a standardized methodology to quantify the environmental footprint of computing across its overall lifecycle holistically. It requires tools beyond what we have today. We need new metrics to tackle the ever-increasing operational energy, as well as ways to balance between operational and embodied carbon footprint. Being able to measure will open new directions and development opportunities for a sustainable hardware-software ecosystem. Opportunities for optimization exist in every phase of the life cycle and the challenge is to develop tools that will help us do so across the use, design, and manufacturing phases of computing systems.

Use. In the operational use phase, conventional efficiency metrics, such as Power Usage Effectiveness (PUE), are becoming insufficient to address the ever-increasing energy use of datacenter computing. With PUEs reaching 1.0 for hyperscalar datacenters and below 1.2 for the co-location datacenter market, the majority of electricity is delivered to and used by the IT equipment. However, today’s computer systems are not equipped with meters or methodologies to attribute and report where this electricity goes nor how efficiently it is used to generate computation outputs in the datacenter. Effective optimization of datacenter operations relies on the quantification of the energy consumed by different applications. This is, however, a complex challenge, as it requires consideration of resource sharing and the provenance chain across a distributed set of resources used by an application. Developing robust methodologies and techniques for energy quantification, such as Kepler, is essential for identifying opportunities for meaningful optimization and measuring its impact.

Furthermore, computing systems in the datacenter setting are not energy proportional. To achieve higher efficiency, maximizing utilization helps. Conventional technologies to improve IT equipment utilization while maintaining quality-of-server (QoS) in applications, such as virtualization, are effective in improving operational efficiency. There are also tools emerging for datacenters to quantify efficiency based on utilization, such as SDEA Navigator. Utilization is only a first step because it does not help properly gauge what fraction of the energy is consumed in performing computation, and what fraction is spent in the system software. 

Given the scale of computing infrastructures with power capacity in the order of tens of megawatts, enabling infrastructure support for power load flexibility while meeting service-level-objectives (SLO) provides new opportunities for demand-response management. When carried out in coordination, datacenter load flexibility provides power grid operators freedom for energy and carbon emission optimization in a cost-effective, reliable manner.

Design. In the design phase, operational energy efficiency has always been a focus in chip design, and it is ever more so now as we are reaching the limits of Dennard scaling. However, now there is a need to expand the set of available tools for assessing operational efficiency to also include the cost of semiconductor manufacturing corresponding to a chip design. Design tools, beyond operational efficiency, are emerging, including ACT, GreenChip, FOCAL, and Carbon Explorer, enabling designers to quantify embodied carbon footprint at early design stages. Hardware designers are now able to balance embodied and operational carbon, together with conventional metrics such as power, performance, and area. While these models for existing and next-to-market technologies are becoming available, it is also important to be predictive about the impact of emerging technologies that are actively being developed for future generations of computing systems.

Taking chiplet-based computing systems as an example, various forms of interposers or 3D packaging techniques have been proposed to enable dense integration of heterogeneous chiplets, leading to higher operational energy efficiency. However, manufacturing of chiplet-based systems requires additional fabrication steps beyond monolithic integrated circuits, including processes such as die stacking, creation of through-silicon vias (TSVs), and interposer/systems-in-package integration. These increasingly complex manufacturing processes have direct impacts on embodied carbon. To accurately quantify computing’s total carbon footprint, we need models to project embodied carbon not only for today’s systems but also for future computing technologies that are actively being developed. Many of these technologies go beyond conventional electrical wires, such as optical interconnects, silicon devices, or next-generation memories, such as resistive, phase-change, or magnetic RAMs. 

As electronics manufacturing growth accelerates, we must think about designing computer systems for recycling or upcycling. Designing computers with modularity in mind goes a long way towards reuse, but more innovation is needed to reduce the amount of electronic waste.  Enabling organic, biodegradable semiconductors and designing sustainable electronics are just two examples of innovative ideas that may address sustainability concerns across the lifetime of the parts. 

Manufacturing. In manufacturing, we must focus on size, process technology, yield, and lifetime. Each wafer has associated carbon emissions based on the process technology node. Wafers manufactured with smaller nodes come with significantly higher carbon cost. The greater the die size and/or the less the yield, the greater the carbon cost per yielded die. 

At the same time, in order to improve performance and efficiency, more components are being integrated into a single SoC. We are now seeing CPU SoCs with 128 or more cores and 12 or more memory controllers and monolithic GPUs are reaching the reticle limit of semiconductor manufacturing. Chiplets may be able to address sustainability implications of manufacturing large SoCs by improving yield, enabling reuse across multiple products, and limiting the use of newer process nodes to components that can benefit from the latest technology.  

As die size increases, yield decreases because fewer chips fit onto a large wafer and defects impact a larger portion of the gross die. Chiplets may help by enabling the use of smaller die which yield more gross die per wafer, and improving net die yield because defects impact a smaller percentage of gross die. Chiplets also enable silicon designers to utilize the latest process technology only where it can provide the most benefit. For example, the third generation AMD EPYC™ server uses 7nm technology for the core chiplets and a more mature and less carbon intensive 12nm technology for the IO die

Semiconductor manufacturing improvements go beyond yield improvements. We must optimize the semiconductor manufacturing process itself  to reduce the use of energy, water, and other resources such as rare earth materials. Another challenge is to invest in research and development to discover new materials to substitute for harmful material used in the semiconductor sector, such as PFAS. Tools are emerging to help accelerate the scientific discovery of new material

Concluding Remarks

The panel focused on carbon emissions, but there are many facets to sustainability from water, to ecosystems, to environmental justice. The scope is broad, even within the realm of carbon emissions. We focused on the operational and embodied carbon implications of computing across its life cycle, but there are other aspects to computing’s environmental implications such as economic and policy directives that dictate what, how, when, and for how long we use compute resources. We are just scratching the surface on the rich and complex landscape of sustainability and computing, and we hope to see significant advances on this topic in the coming years. 

We would like to thank the General Chairs: Augusto Vega and Esteban Mocskos, and the Program Chairs: Sandhya Dwarkadas and Rajeev Balasubramonian for the opportunity to put together the panel for ISCA 2024! 

Links to other websites, software projects, specific products, etc., are only for reference and mentioning them in the article does not constitute any form of endorsement or support. 

About the Authors:

  • Carole-Jean Wu is a Director at Meta’s Fundamental AI Research Lab, where she leads the Systems and Machine Learning team. Her work spans across datacenter infrastructures and edge systems with a focus on performance, energy efficiency, and sustainability.
  • Tamar Eilam is an IBM Fellow and Chief Scientist for Sustainable Computing in the IBM T. J. Watson Research Center. Tamar is leading research aimed at drastically reducing the carbon footprint associated with computing across infrastructure, systems and software, data, and AI.
  • Babak Falsafi is a Professor at EPFL, the founding president of Swiss Datacenter Efficiency Association (SDEA) — an industrial/academic consortium certifying full-stack efficiency and emissions in datacenter operation, and the founder of EcoCloud — a research center at EPFL investigating sustainable information technology since 2012. 
  • Gage Hills is an Assistant Professor of Electrical Engineering at Harvard. His research focuses on developing energy-efficient and environmentally sustainable computing systems, by combining new technology advances across nanomaterials, devices, sensors, circuits, architectures, and integration techniques.
  • Srilatha (Bobbie) Manne is a Senior Fellow at AMD Research and Advanced Development (RAD), analyzing efficiency and sustainability issues in CPU and GPU designs.

Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.