Consider the following thought experiment. You are on the program committee of a conference like ASPLOS or PLDI, and your committee must choose one of the following two papers for publication.
- Paper 1 describes a new program optimization and its implementation in LLVM. The experiments section evaluates the effectiveness of this optimization on a large suite of benchmarks such as SPEC CPU 2017 and shows that performance improves by 5% on the average.
- Paper 2 is an in-depth case study of a highly optimized implementation of an important application such as a recommendation system based on matrix completion. The paper starts with a naïve implementation and shows how performance can be improved by exploiting more and more features of the architecture, obtaining in the end an implementation that performs as well as the best handwritten code for that problem.
Which paper would you pick?
In my experience, most program committees will pick the first paper, arguing that the second paper “deals with only a single application” and that the lessons from the case study may or may not be useful for other applications, whereas the first paper shows some performance improvements across an entire benchmark suite. In fact, most PCs will reject the second paper even if there is room in the program for both papers.
To put it bluntly, we prefer breadth (benchmark suites) to depth (case studies)!
Case studies are of course used extensively in other fields such as business, medicine, and law. Chris Christensen, a pioneer of the case-study method at the Harvard Business School, advocates their use because case studies “keep the discussion grounded upon some of the stubborn facts that must be faced in real life situations.” In the same spirit, I believe that systems research will benefit if our community performs more in-depth case studies of important applications in collaboration with domain experts. However, this will happen only if such studies are recognized as being valuable and are published at major systems conferences.
Why do case studies?
First, careful studies of highly optimized applications give systems researchers performance targets to shoot for. If you work on compilers for example, such studies give you an idea of how much headroom there is in improving the code generated by your compiler. A broad-spectrum study of the performance improvement resulting from applying some optimization to a benchmark suite is not helpful for this.
A related point is that good case studies enable you to quickly identify bad implementations. A few years ago, I refereed a paper that claimed to show that their cache-oblivious matrix multiplication code outperformed cache-optimized matrix multiplication code by a substantial margin. Since I did not have access to their codes, I could not check this by running experiments, but a quick calculation showed that their allegedly cache-optimized matrix multiplication was getting less than 1% of peak performance! In case you are wondering, I rejected the paper.
A second reason why case studies are important is that they require you to take a holistic view of programs and architectures, forcing you to deal with “some of the stubborn facts that must be faced in real-life situations,” as Christensen says.
There are literally hundreds of compiler papers on computing optimal tile sizes for blocking loops for memory hierarchies, but most of them consider just a single cache level and show that blocking for the L1 cache is better than not blocking the loop at all. However, to get really good performance, you must block for registers, and it turns out that the register block size affects the block sizes for other levels of the memory hierarchy. You would not know that if you read compiler papers that focus on a small part of the problem; instead, you need to study a highly optimized GEMM code in a numerical linear algebra library to understand the subtleties of optimizing code for multiple levels of the memory hierarchy.
A third reason why case studies are useful is that an application may be very important, so it is worthwhile from a scientific point of view to understand what it takes to optimize it even if the techniques are not broadly applicable to a benchmark suite.
My favorite example of this is LU with pivoting, which my friend Nick Trefethen, an eminent numerical analyst at Oxford, calls the most important algorithm in numerical analysis. Optimizing this algorithm for memory hierarchies requires performing loop transformations that violate dependences but are nevertheless legal. Given the central role of dependences in modern compilers, it was long believed that such transformations were outside the reach of automatic program restructuring, but Vijay Menon showed that a clever variation of symbolic analysis called fractal symbol analysis permits the transformations to be performed automatically by a compiler.
Variations of LU factorization with pivoting called Fangcheng were studied by Chinese mathematicians more than 2000 years ago, and it is likely that this algorithm will continue to be used and taught 2000 years from now. Given the importance of LU factorization and pivoting in numerical linear algebra, there is surely scientific value in identifying what it takes to optimize such algorithms even if the techniques are not useful for the benchmark suite du jour. After all, SPEC benchmarks come and go but LU factorization is forever!
Looking ahead
There is a final reason why in-depth case studies may be useful for our community at this juncture. One of the paradoxes of our times is that although computers have become much better than human beings at tasks such as playing chess and Go, really high-performance code still has to be written by human beings. Compilers can do a good job with parts of a program, but the optimized parts often do not play well together. I refer to this as the last-mile problem in program optimization.
It is likely that this last-mile problem will be solved only by taking a holistic view of programs and architectures – in short, by doing careful case studies. I hope program committees will agree with this and act accordingly.
About the Author: Keshav Pingali (pingali@cs.utexas.edu) is a professor in the CS department and ICES at the University of Texas at Austin.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.