It’s been almost 5 years since the last post on the SIGARCH blog about gem5. In that post, we talked about improving community outreach, providing a simpler interface to get started with gem5, and adding support for full machine learning stacks. I’m excited to say that we accomplished these goals, and so much more over the past five years! In this blog post, I want to discuss some of the cool new features and developments from the gem5 community over the past five years, how we have continued to build a vibrant and inclusive community, and where we’re going from here (hopefully with your involvement!). To learn more about gem5, I encourage you to read the gem5 paper by Binkert et al. from 2011 and the gem5 v20.0+ paper by Lowe-Power et al. which covers the improvements to gem5 from 2011-2020. To learn how to use gem5, we have a YouTube channel, a myriad of documentation, and we frequently host tutorials at major conferences and bootcamps.
Five years of gem5 development
The gem5 simulator infrastructure is an incredibly vibrant development community. In the past five years, we have made 11 major releases. There have been over 8000 commits (that’s about four commits per day!) from approximately 250 unique authors from at least 70 different institutions. Below, I highlight a few big-ticket items, but this is only a tiny subset out of the 1000s of contributions. You can see the RELEASE-NOTES.md for more details.
Standard library and gem5 resources
How many lines of code do you think it takes to create a model for a multi-core x86 with detailed coherence, out-of-order cores, and DDR4 DRAM which can boot Linux and execute a benchmark application? If you said “four” you would be right!
python
board = X86DemoBoard()
board.set_workload(obtain_resource(“x86-ubuntu-24.04-npb-cg-b “))
simulator = Simulator(board=board)
simulator.run()
Using gem5’s new standard library, there are a variety of pre-configured or pre-built components which you can select from and plug into your board. The standard library also includes a simulator module which simplifies controlling the simulation and supports checkpointing, region of interest markers, and more. Extending the standard library with your own components and customizing the simulation for your experiments is easy with its modular and well documented architecture. Simulations don’t just require a system, they also require workloads which, for full system simulation, are made up of disks, kernels, benchmarks, etc. This is where gem5-resources comes in. The gem5-resources website contains a plethora of ready-to-use benchmarks and is being expanded every day.
CHI
When gem5 was created by combining M5 with GEMS, there were two different models for cache systems, the “classic caches” from M5 and Ruby from GEMS. Ruby was preferred as it enables high fidelity modeling of cache coherence and interconnects. Unfortunately, Ruby did not offer flexible configuration (the cache topology was encoded in the coherence protocol). So, the more configurable classic caches were kept for simpler use cases. In other words, if you wanted to quickly create a two-level or three-level cache hierarchy, the classic caches were easy, but you would need to write a whole new coherence protocol from scratch if using Ruby.
The contribution of the CHI coherence protocol (an implementation of Arm’s AMBA Coherent Hub Interface changed the tradeoff between classic and Ruby caches. CHI is implemented in Ruby, but it exposes a configurable cache hierarchy. Now, you don’t have to choose between configurability and fidelity: CHI gives you both! Researchers at UC Davis have used CHI to create a variety of cache hierarchy models including a coherent mesh network (CMN) implementation like what is implemented in Arm’s N1 architecture and a hierarchy like AMD’s chiplet-based Epyc systems. These are implemented using gem5’s standard library, and we are currently working to improve the accuracy of these systems before contributing them upstream.
Support for full system machine learning stacks
In addition to detailed CPU, cache coherence, and memory models, gem5 also has a detailed compute-GPU model. We have recently extended this model to support full system execution. Now, you can use the actual open-source AMD drivers, unmodified, modern ROCm runtime (e.g., ROCm 6), and unmodified ML frameworks like PyTorch and TensorFlow with gem5’s GPU model.
As demonstrated in our most recent tutorial at ISCA 2024, you can write a few lines of PyTorch code, pass that code to gem5 and actually execute the code on the simulated GPU.
sh
gem5 gem5/configs/example/gpufs/mi300.py –kernel ./vmlinux-gpu-ml-isca –disk-image ./x86-ubuntu-gpu-ml-isca –app pytorch_test.py
Other improvements
There have been so many improvements to gem5 that it is impossible to list them all here, but I do want to mention a few other exciting features:
- Full system and vector support for the RISC-V ISA
- Vector and matrix extensions for Arm (FEAT_SVE, FEAT_SME), and many other Arm features
- Improvements to inter-simulator APIs (e.g., for SystemC, SST, DRAMSim, DRAMSys, and others)
- Support for devices like HBM2, Arm GIC3 and SMMUv3, DDR5, and many more
- Multisim/parallel support, access statistics from Python, SimPoints/LoopPoints, ELFies, and many other usability improvements.
- Support for multiple ISAs in a single simulation
Outreach
We are also working to make gem5 more accessible and useful for a wide range of users, from students to industry professionals. We believe that outreach and education are essential for achieving this goal and fostering a vibrant and diverse gem5 community. That’s why we organized the first ever gem5 bootcamp in 2022, which was a five-day in-person event. The bootcamp was attended by over 50 junior computer architecture researchers. The bootcamp materials, including slides, videos, and code, are available for free. We have also run many tutorials and workshops at different computer architecture conferences. We are also organizing a second gem5 bootcamp this summer and plan to continue to present at conferences and bootcamps in the future. If you have been unable to attend a bootcamp or tutorial, all of our material is available online in various git repositories, online documentation, and on YouTube.
Future community
To try to make it easier for people to contribute to the development of gem5, last year we migrated from using Gerrit to GitHub for our development and code review. This migration has further accelerated the contributions to gem5 with 30% more contributions in the past year compared to the prior year.
If you would like to join the gem5 community, we encourage you to check out our CONTRIBUTING.md documentation, our CODE-OF-CONDUCT.md and our monthly developer meetings on zoom. Developer meetings occur every second Thursday at 9am PT. Zoom information is available on gem5’s discussion pages before the meeting.
Thanks
The gem5 project is a community effort and cannot thrive without the community contributors. I would like to specifically thank Bobby Bruce, Melissa Jost, Ivana Mitrovic, Hoa Nguyen, and Harshil Patel who have been full-time contributors to gem5. I would also like to thank everyone in the DArchR group for their contributions. Other core contributors/developers over this period include Matt Sinclair (UW-Madison), Matt Poremba (AMD), and Giacomo Travaglini (Arm). This work has been supported in part by NSF OAC-2311888 and CNS-1925485 , DOE DE-SC0024502 and DE-SC0024206, Semiconductor Research Corporation, Laboratory for Physical Science (LPS), and Google.
About the Author
Jason Lowe-Power is an Associate Professor in the Computer Science Department at University of California, Davis. He leads the Davis Computer Architecture Research Group (DArchR) and is the Project Management Committee Chair of the gem5 project.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.