Computer Architecture Today

Informing the broad computing community about current activities, advances and future directions in computer architecture.

Twenty years ago research into networks-on-chip (NoCs) was launched with the publication of the foundational Dally and Towles paper, “Route Packets not Wires,” in the Design Automation Conference (DAC) in June 2001. In this blog post, I explore the evolution of research in this area, highlighting key innovations and impact and then present open challenges that remain. NoCs emerged at a time when researchers also began to seriously contemplate life after Moore’s Law. 

In those early days, much could be learned and borrowed from existing work on interconnection networks. This included finding promising topologies and routing algorithms as baseline designs. However, due to a different set of trade-offs such as pin limitations versus abundant on-chip wires, many novel solutions emerged. 

Early work in this domain focused on reducing low-load latency through router design and novel topologies, power consumption, emerging technologies, and NoC co-design with the coherence protocol and memory hierarchy. Mesh topologies and variants such as the concentrated mesh quickly found favor among NoC researchers for their short links and regular structure that mapped well to 2D planar chips. The router pipeline offered an early opportunity to improve performance; as short mesh links enjoyed single cycle latencies, conventional router pipelines contributed significantly to per-hop latency as they typically required 5 stages to traverse. Techniques to shorten the router pipeline or bypass the router pipeline entirely obtained substantial performance benefits and have seen continued improvement and study over many years. High-radix routers which proved successful in large-scale networks, have also been explored in NoCs through the flattened butterfly design and similar work. 

Significant attention was also paid to the power consumption of the NoC. The move towards multi-core motivated the need to develop NoCs. As multi-core processors were motivated by challenges in power consumption, network-on-chip researchers were cognizant of power consumption from the start. Early power models were developed such as Orion and later replaced by newer models such as DSENT. Such tools saw widespread adoption and allowed researchers to rapidly evaluate power consumption trade-offs. Throughout the last 20 years, power modeling of the interconnect has been aided by various early academic and industry prototypes including MIT’s RAW, UT’s TRIPS, Intel’s Teraflops and SCC research prototypes, and more recently MIT’s SCORPIO and the Piton project from Princeton as well as many others. 

With the end of Moore’s Law appearing on the horizon, emerging technologies have played a critical role in NoCs research almost since its inception. In particular, much attention has been paid to the use of silicon nanophotonics as a communication medium. Other emerging technologies explored in conjunction with NoCs include wireless networks and die-stacked architectures. 

Co-design with the memory hierarchy and coherence protocol has also been a feature of much of the research in NoCs over the last two decades. This has included NoC support for multicast and broadcast operations, trade-offs between data placement versus replication, and more recently, opportunities to reduce the cost of deadlock freedom at both the network and protocol levels. With lots of interest in the design of non-uniform cache access (NUCA) cache hierarchies, co-designing the data placement with the communication strategy garnered significant attention from researchers. This work often leveraged different technologies such as photonic, RF and transmission lines. Traditional protocol and routing deadlock freedom approaches such as turn restrictions and extra virtual channels sacrifice either performance or require substantially higher hardware cost; recent work looks at opportunities to provide deadlock freedom via coordination for leaner networks while maximizing performance. 

As the field has matured, some of the initial excitement has perhaps waned but we’ve seen sustained impact as many industrial products adopt NoCs. More recent research in NoCs has looked at NoCs for emerging systems, ML-driven design, and renewed interest in novel solutions targeting correctness considerations such as deadlock freedom.   

In the early days, NoCs were primarily studied in the context of general-purpose multi-core processors and system-on-chips (SoCs). As GPUs continued to scale and focused on throughput over latency, NoCs specific to GPUs were subsequently explored. The recent surge in accelerator architectures has given NoC researchers another set of communication patterns and target architectures to consider. For example, work on integrating broadcast and multicast functionality into the NoC originally done to support coherence protocols is seeing new life as multicast represents an important traffic class in machine learning accelerators such as Eyeriss. Interest in machine learning workloads has sparked new opportunities to explore how best to manage reuse and movement of tremendous volumes of input data and neural network weights. 

Recently, we’ve seen a blurring of the line between off-chip and on-chip interconnection networks. Off-chip interconnection network research has traditionally focused on supercomputer and datacenter networks. Today, we see work emerging that exists in the area between on-chip and off-chip including work on wafer-scale interconnection networks and in-package interconnection networks such as interposer-based networks to enable 2.5D integration. The latest Wafer Scale Engine from Cerebras offers a large and challenging playground for NoC researchers to investigate new solutions. The versatile, mix-and-match chiplet style architectures enabled by 2.5D integration require novel interfaces and flexible, module interconnect designs.

As architectural designs become more complex, researchers are turning to machine learning to help uncover the most efficient designs. This trend has permeated NoCs research as well.  Machine learning has been used to optimize topologies, routing algorithms and arbitration policies. In arbitration for example, different features associated with a packet’s traversal through the network such as source, destination, message type, etc. can interact in non-obvious ways to determine which packets to prioritize for arbitration to maximize overall throughput. Machine learning can help uncover complex patterns that researchers can then distill down to practical hardware implementations.

As we look towards the next decade of NoCs research, there remain many interesting opportunities including security, customization to accelerator traffic patterns including reconfigurability, and continued increases in overall efficiency.

Timing side channels have been exploited to leak information via various microarchitectural and architectural design choices including speculative execution, cache hierarchies, and coherence protocols. Security has also been studied in NoCs but to date has received less attention than security in other aspects of computer architecture. In the rapidly evolving space of accelerators, particularly those targeting machine learning, designs must be agile to adapt to new algorithms and techniques. Failure to adapt will render chips obsolete by the time they hit the market. Reconfigurability extends to the NoC as well and needs additional study. Finally, as the recent work in deadlock freedom and machine-learning driven design has shown us, there is still more efficiency to be gained in the network; leaner networks designed through human intuition or through ML-guided approaches still remain to be discovered, saving chips valuable area and power as we continue to strive for more functionality per chip despite the approaching end of Moore’s Law. In the coming decade, the cost of data movement remains important and NoCs will continue to be critical in future designs.

About the author: Natalie Enright Jerger is the Canada Research Chair in Computer Architecture at the University of Toronto. 

Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.