Patterns in Software
Paul Graham writes in his “Revenge of the Nerds” article:
… in the OO world you hear a good deal about “patterns”. I wonder if these patterns are not sometimes evidence of … the human compiler, at work. When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I’m using abstractions that aren’t powerful enough — often that I’m generating by hand the expansions of some macro that I need to write.
Language Optimisations vs Domain-Specific Optimisations
To quote part of the abstract from Arch D. Robinson’s awesome paper, The Impact of Economics on Compiler Optimization:
Compile-time program optimizations are similar to poetry: more are written than are actually published in commercial compilers. Hard economic reality is that many interesting optimizations have too narrow an audience to justify their cost in a general-purpose compiler, and custom compilers are too expensive to write. This has already happened accidentally for C++, albeit imperfectly, in the form of template metaprogramming.
In a nutshell, writing optimisations is hard. Often, it’s too much of an engineering effort to implement tons of optimisations in a production-quality compiler: the gains you see simply aren’t worth the amount of effort. I won’t ramble on about this, since that’s the focus of Robinson’s paper, and it’s an enlightening read if you’re interested in that topic. (It’s also an enlightening read of you’re wondering what all the fuss is about with C++ templates.) Related to Robison’s paper is a talk given by Bill Pugh, titled “Is Code Optimization Research Relevant?”. One standout slide from that talk is this one:
18 years from now, if we pull a Pentium III out of the deep freeze, apply our future compiler technology to SPECINT2000, and get an additional 2x speed improvement … I will be impressed/amazed.
And Bill is pretty much right. But I won’t talk about whether research on optimisations is relevant or not — again, have a browse of Bill’s talk if you want to debate that with him.
What I do want to talk about is that both Robinson’s paper and Pugh’s talk both hint at something which is never explicitly stated: the successful optimisations which have been done in the past few years are mostly domain-specific optimisations. These are optimisations which apply to only to a particular problem area, e.g. graphics (thus the word “domain” in domain-specific). Pugh gives the example that optimising matrix multiplication is quite successful: here, the domain that is being optimised is matrices, a well-defined and well-known area of maths. Robinson makes the point that C++ templates are successful because they can be used to create domain-specific optimisations, since they can be used to meta-program the compiler.
One interesting example of a domain-specific optimisation are the vector units on a modern desktop system: AltiVec on the PowerPC, the PlayStation 2’s Emotion Engine, MMX/SSE on Intel x86s. While they’re not part of the compiler, they are a perfect example of just how well you can optimise code when you start having domain-specific constructs available (in this case, vector operations). Modern-day programmable graphics cards such as the latest NVidias and ATI Radeons are proof that native language (and hardware) support for vectors can reap massive speed benefits, and IBM’s new Cell CPU begs for properly vectorised code to get any decent performance out of it. The point of all this is: if your programming language natively supports expressing the problem domain you want to solve, you can really optimise the hell out of it.
In the future, I hope that languages or compilers will enable us to easily write domain-specific optimisations, because language optimisation research — especially for imperative languages — simply aren’t going to give us that much of a speedup anymore. Look at the number of optimisation options in the gcc manpage and count how many of them will really give you a significant speedup. Beyond some basic, well-known optimisations, such as constant propagation and inlining, are the rest of those obscure optimisations really worthwhile? Is it worth it to put the engineering effort into those optimisations when you not only have to code them up, but also maintain them and make sure that they don’t cause optimised program to behave differently to non-optimised programs?
I’d rather that all that engineering effort be put into making the language and compiler more extensible, so that libraries can come bundled with their own optimisations that can be added to the compiler. This potentially gives a much greater speed increase. In low-level languages such as C, this doesn’t make such a big difference because you’re programming so close to the hardware (though it certainly can sometimes: observe the success of C++ templates). High-level languages that enable the programmer to more closely express the problem domain in the language, such as Python and Haskell, have much more to gain.
Flash Player 8 public beta for Mac
Macromedia’s Flash Player 8 has entered public beta testing. For Mac OS X folks, the most significant change is that, well, it finally doesn’t suck. Or, to put it in another way, it doesn’t feel like you’re running it on a 16Mhz 68000. Finally, enjoy the delights of Strong Bad and godskitchendigital without Flash grinding 100% of your CPU!
With joy, I can scratch “write Safari plugin from open-source Flash implementation” off my TODO list …
Update: Flash Player 8 is now out of the beta-testing phase. You can download the full version from Macromedia’s Flash download page.
Valgrind vs C and C++
From an interview with Julian Seward, author of the superb Valgrind (as well as cacheprof, bzip2, and a co-author for the Glorious Glasgow Haskell Compiler: quite a remarkably productive fellow, huh?):
Valgrind is loaded with assertion checks and internal sanity checkers which periodically inspect critical data structures. These are permanently enabled. I don’t care if 5 percent or even 10 percent of the total run-time is spent in these checksóautomated debugging is the way to go. As a result, Valgrind almost never segfaultsóinstead it emits some kind of a useful error message before dying. That’s something I’m rather proud of.
Nice indeed; I wonder if programming languages of the future will be able to do automatic assertions of such data structures in their run-time systems. All you’d need is to annotate the structures with some simple boolean conditions, and the language can merrily check them whenever a garbage collection pass is executed (since programming languages of the future will all have automatic memory management, of course. Right? Helllllllo?)
And, one more little quote (sorry, I couldn’t resist):
Valgrind is only useful because C and C++ are such crappy programming languages.
A-men to that, Haskell brother. Snigger snigger snigger …