Numerics Support in Programming Languages

A nice quote I found from a comment on Slashdot:

Languages like OCAML and Java are defective as general purpose languages because they don’t support efficient data abstraction for numerical types. The fact that their designers just don’t get that fact is a testament to the ignorance of their designers. It’s also what people really mean when they say that those kinds of languages are “just not efficient as C/C++”: it means that in C/C++, you can get whatever code you write to run fast, while in OCAML or Java, there are always problems where you have to drop down to C.

I’ll insert the usual “I agree” here. This is especially a problem with language benchmarks, which typically have benchmarks that operate on huge numbers of tiny bits of data. This usually destroys any chances that a functional language has of competing with a very low-level language like C/C++, because often these big arrays are represented as a series of pointers to the actual data, rather than simply big arrays that directly contain the data. This means one more pointer indirection for every index operation in the array, blowing away your cache hits and thus making your program run several orders of magnitudes slower. (If your language is also lazy, like Haskell is, you basically cannot work around this performance restriction unless you make your data structure strict … in which case, well, you’re not using laziness any more.)

This problem needs to be solved, without forcing the programmer to spend lots of time annotating exactly what data structures should be unboxed and what should be boxed, and what functions are OK to work with unboxed/strict data structures. Otherwise, people just aren’t going to these other languages for processing large quantities of small data. And in this day and age of computing, processing large quantities of small data is required for quite a lot of applications …

There’s also some interesting comments about Python 2.4’s new generator expressions, and how they are similar yet different to lambda expressions/anonymous functions: in particular, how they appear to give rather nice performance benefits. I haven’t given generators too much thought at all yet, assuming they were less elegant, ad-hoc implementation of representing lazy data structures. Sounds like I have some investigation to do!

blog comments powered by Disqus