Language Optimisations vs Domain-Specific Optimisations
To quote part of the abstract from Arch D. Robinson’s awesome paper, The Impact of Economics on Compiler Optimization:
Compile-time program optimizations are similar to poetry: more are written than are actually published in commercial compilers. Hard economic reality is that many interesting optimizations have too narrow an audience to justify their cost in a general-purpose compiler, and custom compilers are too expensive to write. This has already happened accidentally for C++, albeit imperfectly, in the form of template metaprogramming.
In a nutshell, writing optimisations is hard. Often, it’s too much of an engineering effort to implement tons of optimisations in a production-quality compiler: the gains you see simply aren’t worth the amount of effort. I won’t ramble on about this, since that’s the focus of Robinson’s paper, and it’s an enlightening read if you’re interested in that topic. (It’s also an enlightening read of you’re wondering what all the fuss is about with C++ templates.) Related to Robison’s paper is a talk given by Bill Pugh, titled “Is Code Optimization Research Relevant?”. One standout slide from that talk is this one:
18 years from now, if we pull a Pentium III out of the deep freeze, apply our future compiler technology to SPECINT2000, and get an additional 2x speed improvement … I will be impressed/amazed.
And Bill is pretty much right. But I won’t talk about whether research on optimisations is relevant or not — again, have a browse of Bill’s talk if you want to debate that with him.
What I do want to talk about is that both Robinson’s paper and Pugh’s talk both hint at something which is never explicitly stated: the successful optimisations which have been done in the past few years are mostly domain-specific optimisations. These are optimisations which apply to only to a particular problem area, e.g. graphics (thus the word “domain” in domain-specific). Pugh gives the example that optimising matrix multiplication is quite successful: here, the domain that is being optimised is matrices, a well-defined and well-known area of maths. Robinson makes the point that C++ templates are successful because they can be used to create domain-specific optimisations, since they can be used to meta-program the compiler.
One interesting example of a domain-specific optimisation are the vector units on a modern desktop system: AltiVec on the PowerPC, the PlayStation 2’s Emotion Engine, MMX/SSE on Intel x86s. While they’re not part of the compiler, they are a perfect example of just how well you can optimise code when you start having domain-specific constructs available (in this case, vector operations). Modern-day programmable graphics cards such as the latest NVidias and ATI Radeons are proof that native language (and hardware) support for vectors can reap massive speed benefits, and IBM’s new Cell CPU begs for properly vectorised code to get any decent performance out of it. The point of all this is: if your programming language natively supports expressing the problem domain you want to solve, you can really optimise the hell out of it.
In the future, I hope that languages or compilers will enable us to easily write domain-specific optimisations, because language optimisation research — especially for imperative languages — simply aren’t going to give us that much of a speedup anymore. Look at the number of optimisation options in the gcc manpage and count how many of them will really give you a significant speedup. Beyond some basic, well-known optimisations, such as constant propagation and inlining, are the rest of those obscure optimisations really worthwhile? Is it worth it to put the engineering effort into those optimisations when you not only have to code them up, but also maintain them and make sure that they don’t cause optimised program to behave differently to non-optimised programs?
I’d rather that all that engineering effort be put into making the language and compiler more extensible, so that libraries can come bundled with their own optimisations that can be added to the compiler. This potentially gives a much greater speed increase. In low-level languages such as C, this doesn’t make such a big difference because you’re programming so close to the hardware (though it certainly can sometimes: observe the success of C++ templates). High-level languages that enable the programmer to more closely express the problem domain in the language, such as Python and Haskell, have much more to gain.