In this post, I argue that the traditional separation of computation and communication is not representative of the reality of current distributed systems. Instead, a more continuous perspective of computation and communication may be of value. This change requires that we take a new look at how we think of, teach, and use computer and network systems.
The classical distributed systems perspective
The classical view of distributed systems is a distributed software application running on two or more processors that communicate via a network. In this view, processing and communication are two separate aspects of the system with their distinct roles. A result of this perspective is a view where a system either processes or communicates. Data that is transferred from one node to another is unchanged, and data that is processed stays on the node where processing takes place.
This view is also reflected in the Internet ecosystem, where we view web services, cloud computing, content providers as computational service providers “on the other side” of the Internet. Network providers provide the “pipes” that connect our computing system to the computational service providers.
The need for processing everywhere
Networks traditionally have aimed at moving information from one location to another. The famous end-to-end principle suggests that application-specific functionality (i.e., processing) should be done at the end-points of the network and not on intermediate nodes. However, the reality of networks is not as clean-cut.
New paradigms that place processing functions inside the network are being actively developed: fog computing places processing functionality at the network edge, virtual network functions enable network nodes to perform arbitrary processing functions on traversing traffic. Together with cloud computing at the network end-points, processing functionality has infused communication systems across the entire network.
There are multiple reasons why there is a need for such a continuum of computation and communication:
- Application needs: Emerging application spaces, such as the Internet of Things (IoT) connect extremely low-end sensor and actuator components to cloud-based computation and control. In such environments, components may not be able to communicate directly with all systems due to performance and functional constraints. Instead, intermediate processing gateways can relay information. Thus, the overall information flow traverses intermediate processing nodes.
- Operational realities: The internet is an agglomeration of networks that are operated by different entities (i.e., autonomous systems). Each autonomous system pursues its own operational policies and economic goals. As a result, there are practical reasons why computation and communication need to be combined. For example, an entity may perform content inspection to determine if network traffic poses a threat to systems. This type of processing could be done once at the end-system (following the end-to-end principle), but administrative boundaries in the network are also trust boundaries. A network operator may not trust the results of a security check on network traffic that was performed by another entity (unless there are service contracts in place that incentivize the other entity to act truthfully). Similarly, there is an economic opportunity that arises from performing processing operations. For example, caching or ad insertion can be performed as revenue-generating services.
- Performance opportunities: A system, where communication and computation are clearly separated, requires that communication is controlled entirely from the end-system. The dynamic nature of networks, however, offers opportunities for performance improvements if adaptation inside the network is possible. For example, load balancing across multiple links, WAN caching, etc. are functions cannot be implemented on end-systems since they need to be performed inside the network at runtime. Thus, processing steps combined with communication afford opportunities to improve overall system performance.
Even the textbook example of a traditional distributed system, fetching a web page from a server, can involve several processing steps: The wide-area network may use WAN optimization for processing and caching, the local service provider may perform dynamic ad insertion at the edge, and the user’s router node performs intrusion detection processing. This example shows that processing is already becoming integrated with networking functionality.
Implications for computer system design
One key challenge arising from this new approach to communication and computation in distributed systems is to find suitable abstractions to program such systems. The domain of parallel and distributed computing has certainly provided some insights on how to divide processing tasks and spread them over multiple processing resources. However, in this context, the communication aspect is merely a tool to get computation to a node and get the result back. Communication itself is not a prime objective of the system.
In contrast, a continuous communication and computation system moves information from one place to another and transforms the information along the way. This problem has appeared in the networking community, where virtual network functions needed to be programmed in the context of data transfers. To capture both aspects of computing and communication, “service chains” have been developed. Service chains describe the operations that need to be performed on a data transfer between its source and destination. Service nodes can be concatenated (i.e., “chained”), performed in parallel, be conditional, etc. Thus, complex directed, acyclic graphs can be constructed to describe processing operations on a data transfer.
The mapping of processing tasks to physical processing in service chains then determines where in the continuum from local processing, edge computing, access provider service, and cloud service the various computing steps take place. In practice, system, performance, and policy constraints need to be considered.
What are we going?
I believe there is a need to change the way we think about computer system components. We need to move away from looking at processing systems and networks as separate entities. Clearly, there are communities, where the boundaries are blurring: in high-performance computing, the system interconnect is a key aspect of the processing system; in data center networks, communication and computation are tightly coupled. We need to further expand this continuous perspective to most computing systems.
Working in academia, I see the separation of topics particularly starkly in the curriculum. Computer architecture is typically a separate course from computer networking. It is fine to look at these topics independently to study their foundations. What is also needed is an overarching course that brings together these topics and explores the combination of computation and communication. This integrative system perspective is lacking in many curricula.
Further, we need to think about what suitable abstractions are to develop so that communication and computations can be specified and implemented. How can a distributed operating system manage resources at runtime? How can operations be verified when they are performed by different entities on various nodes? What are the security implications of spreading processing tasks across an internetwork? Could service chains be employed as first-class programming abstractions?
There are many open questions in this rich area of research and practice. With challenges abound, there are opportunities for shaping our perspective of computer systems as a whole. I believe that a more integrative, continuous perspective on computation and communication will allow us to understand current systems more accurately and will enable us to build better systems in the future.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.