Sep 2006

rcp in Erlang in 5 lines of code

Joe Armstrong, the Main Man behind Erlang, shows how to write an FTP client in Erlang in a couple of lines of code. Some of my own points:

  • People on his blog who comment that “what operating system comes without an FTP server?” are totally missing the point, which is that Erlang makes it easy to write network programs. How many lines of code do you think it would have taken to do that in C? That file transfer program he wrote is less than half a dozen lines.
  • Yes, it’s not a real FTP client, duh. Erlang does actually come with an FTP module which you can use in your own program, though.

Extensible Design in C++

Namespaces vs Static Methods

In C++, if you want a bunch of plain functions that you can put into a library, you can either do this:

class UtilityFunctions
  static void Foo();
  static void Bar();

or this:

namespace UtilityFunctions
  void Foo();
  void Bar();

Either way, you call the functions with the same syntax:

void SomeFunction()

So, what’s the difference between using a class full of static methods vs using namespaces? While I’m sure there’s plenty of differences about how they’re implemented internally, the big difference is that namespaces are extensible, while a class full of static methods isn’t. That means that in another file, you can just add more functions to the namespace:

namespace UtilityFunctions
  void Baz();
  void Quux();

but you can’t add more static methods to the class.

The Expression Problem

This is rather handy, since it means that C++ can quite nicely solve the expression problem, a problem that plagues nearly all modern programming languages, even ones with expressive type systems such as Haskell and Ocaml. (Note that the expression problem specifically concerns statically typed languages, so while there are solutions for it in modern dynamic languages such as Python, Perl and Ruby, they don’t really count since they’re not statically typed. It’s easy to solve the problem if you’re prepared to throw away all notions of type safety at compile time!)

The expression problem is basically this:

  • You want to able to add new data types, and have existing functions work on those data types. This is easy in an object-oriented language: just subclass the existing data types, and all existing functions will work just fine with your new subclass. This is (very) hard in a functional language, because if you add new cases to a variant type, you must update every pattern match to work properly with the new case.
  • However, you also want to add new functions that will work with those data types. This is very easy in a functional language: just define a new function. This is solvable in an object-oriented language, but isn’t very elegant, because most object-oriented languages can’t add new methods to existing classes (Objective-C is a notable exception; see the footnote below). This means that you are forced to declare a function when you really wanted to add a new method to the class, or, in OO languages which don’t even have normal functions (e.g. Java), you have to declare a totally new class with a static method instead. Ouch.

However, since C++ provides (1) objects, (2) normal functions, and (3) extensible namespaces, this means that you solve the expression problem nicely using the above techniques. It still requires some forethought by planning to use a namespace for sets of functions that you expect to be able to extend, but it’s an elegant solution to the expression problem, as opposed to no solution or a crappy solution. (And I thought I’d never say “C++” and “elegant” in the same sentence).

Extensible Object Factories

There’s one more piece to the puzzle, however. If you’re making your own new subclass, you also want to be able to create objects of that class. However, what if you only know the exact type of object you want to create at runtime? Use a runtime-extensible object factory instead.

Let’s say you’re designing an extensible image library, to read a bunch of image formats such as JPG, PNG, GIF, etc. You can design an abstract Image class that a JPGImage, PNGImage, or GIFImage can then subclass. If you want a uniform interface to create such images, you can use the factory design pattern:

Image* image = ImageFactory::CreateImage("/path/to/image");

In this case, CreateImage() is a factory function that will return you an appropriate Image* object. (Well, if you’re really disciplined, you’ll be using the wonderful boost::shared_ptr rather than an evil raw pointer, but I digress…)

Now, let’s say you want to make this library extensible, so users could add in their own JPEG2000Image subclass outside of your library. How, then, do you let the CreateImage() function know about the user’s new JPEG2000Image class?

There are plenty of solutions to this, but since this is meant to be a didactic post, here’s a cheap’n’cheery solution for you: use a data structure to hold references to functions that are responsible for creating each different type of JPGImage, PNGImage, etc. You can then add to the data structure at runtime (usually called registering the creation function). The CreateImage() function can then look up the registered functions in the extensible data structure and call the appropriate function, no matter whether the image class is provided by your library (JPG, PNG), or by the user (JPEG2000).

If you put together all the above techniques, what you have is a fully extensible framework. A user can:

  • register new data types with the library at run-time,
  • use exactly the same interface to create new types of objects,
  • add new functions to your library without the awkwardness of using a different namespace,
  • … and still retain complete static type safety.

Footnote: Objective-C has a particularly interesting solution to the expression problem, via categories, which are statically type-checked despite Objective-C being a “dynamic” language.


Insights into AppleScript

I recently encountered a paper written by William Cook about a mysterious little programming language that even many programming languages researchers don’t know about: AppleScript. Yep, that’d be the humble, notoriously English-like Mac scripting language that’s renown to be slow and annoying more than anything else. The paper is a fascinating look at the history of AppleScript, and details many insights and innovations that were largely unknown. Here’s some things that I was pleasantly surprised about.

Cook had never used a Mac before he was employed to work on AppleScript: in fact, he had a very strong UNIX background, and had a great amount of experience with UNIX shell scripting. So, one can instantly dismiss the notion that whoever designed AppleScript had “no idea about the elegance of interoperating UNIX tools”: a remark that I’m sure many would have made about the language (myself included). Cook even remarks that Apple’s Automator tool, introduced in Mac OS 10.4 Tiger, was quite similar to UNIX pipes:

The most interesting thing about Automator is that each action has an input and an output, much like a command in a Unix pipe. The resulting model is quite intuitive and easy to use for simple automation tasks.

More on UNIX pipes, he writes that

the sed stream editor can create the customized text file, which is then piped into the mail command for delivery. This new script can be saved as a mail-merge command, which is now available for manual execution or invocation from other scripts.

He then continues with something seemingly obvious, but is nevertheless something I have never thought about UNIX scripts:

One appealing aspect of this model is its compositionality [emphasis mine]: users can create new commands that are invoked in the same way as built-in commands.”

Indeed! In a way, the ability to save executable shell scripts is the equivalent of writing a named function to denote function composition in a functional programming language: it enables that composed code to be re-used and re-executed. It’s no coincidence that the Haskell scripts used in Don Stewart’s h4sh project are semantically quite similar to their equivalent Bourne shell scripts, where Haskell’s laziness emulates the blocking nature of pipes.

More on UNIX: Cook later writes that

A second side effect of pervasive scripting is uniform access to all system data. With Unix, access to information in a machine is idiosyncratic, in the sense that one program was used to list print jobs, another to list users, another for files, and another for hardware configuration. I envisioned a way in which all these different kinds of information could be referenced uniformly… A uniform naming model allows every piece of information anywhere in the system, be it an application or the operating system, to be accessed and updated uniformly.

The uniform naming model sounds eerily familiar who had read Hans Reiser’s white paper about unified namespaces. Has the UNIX world recognised yet just how powerful a unified namespace can be? (For all the warts of the Windows registry, providing the one structure for manipulating configuration data can be a huge benefit.)

Cook was also quite aware of formal programming language theory and other advanced programming languages: his Ph.D thesis was in fact on “A Denotational Semantics of Inheritance”, and his biography includes papers on subtyping and F-bounded polymorphism. Scratch another urban myth that AppleScript was designed by someone who had no idea about programming language theory. He makes references to Self and Common LISP as design influences when talking about AppleScript’s design. However,

No formal semantics was created for the language, despite an awareness that such tools existed. One reason was that only one person on the team was familiar with these tools, so writing a formal semantics would not be an effective means of communication… Sketches of a formal semantics were developed, but the primary guidance for language design came from solving practical problems and user studies, rather than a-priori formal analysis.

(There’s some interesting notes regarding user studies later in this post.) Speaking of programming language influences,

HyperCard, originally released in 1987, was the most direct influence on AppleScript.

Ah, HyperCard… I still remember writing little programs on HyperCard stacks in high school programming camps when I was a young(er) lad. It’s undoubtedly one of the great programming environment gems of the late 80s (and was enormously accessible to kids at the same time), but that’s an entire story unto itself…

The Dylan programming language is also mentioned at one point, as part of an Apple project to create a new Macintosh development environment (named Family Farm). ICFPers will be familiar with Dylan since it’s consistently in the top places for the judge’s prize each year; if you’re not familiar with it, think of it as Common LISP with a saner syntax.

AppleScript also had a different approach to inter-appication messaging. Due to a design flaw in the classic MacOS, AppleScript had to package as much information into its inter-application data-passing as possible, because context switches between applications on early MacOS systems were very costly:

A fine-grained communication model, at the level of individual procedure or method calls between remote objects, would be far too slow… it would take several seconds to perform this script if every method call required a remote message and process switch. As a result, traditional RPC and CORBA were not appropriate… For many years I believed that COM and CORBA would beat the AppleScript communication model in the long run. However, COM and CORBA are now being overwhelmed by web services, which are loosely coupled and use large granularity objects.

Web Services, eh? Later in the paper, Cook mentions:

There may also be lessons from AppleScript for the design of web services. Both are loosely coupled and support large-granularity communication. Apple Events data descriptors are similar to XML, in that they describe arbitrary labeled tree structures without fixed semantics. AppleScript terminologies are similar to web service description language (WDSL) files. One difference is that AppleScript includes a standard query model for identifying remote objects. A similar approach could be useful for web services.

As for interesting programming language features,

AppleScript also supports objects and a simple transaction mechanism.

A transaction mechanism? Nice. When was the last time you saw a transaction mechanism built into a programming language (besides SQL)? Speaking of SQL and domain-specific languages, do you like embedded domain-specific languages, as is the vogue in the programming language research community these days? Well, AppleScript did it over a decade ago:

The AppleScript parser integrates the terminology of applications with its built-in language constructs. For example, when targeting the Microsoft Excel application, spreadsheet terms are known by the parser—nouns like cell and formula, and verbs like recalculate. The statement tell application “Excel” introduces a block in which the Excel terminology is available.

Plus, if you’ve ever toyed with the idea of a programming language that could be written with different syntaxes, AppleScript beat you to that idea as well (and actually implemented it, although see the caveat later in this note):

A dialect defines a presentation for the internal language. Dialects contain lexing and parsing tables, and printing routines. A script can be presented using any dialect—so a script written using the English dialect can be viewed in Japanese… Apple developed dialects for Japanese and French. A “professional” dialect which resembled Java was created but not released… There are numerous difficulties in parsing a programming language that resembles a natural language. For example, Japanese does not have explicit separation between words. This is not a problem for language keywords and names from the terminology, but special conventions were required to recognize user-defined identifiers. Other languages have complex conjugation and agreement rules, which are difficult to implement. Nonetheless, the internal representation of AppleScript and the terminology resources contain information to support these features. A terminology can define names as plural or masculine/feminine, and this information can be used by the custom parser in a dialect.

Jesus, support for masculine and feminine nouns in a programming language? Hell yeah, check this out:

How cool is that? Unfortunately, Apple dropped support for multiple dialects in 1998:

The experiment in designing a language that resembled natural languages (English and Japanese) was not successful. It was assume that scripts should be presented in “natural language” so that average people could read and write them. This lead to the invention of multi-token keywords and the ability to disambiguate tokens without spaces for Japanese Kanji. In the end the syntactic variations and flexibility did more to confuse programmers than help them out. The main problem is that AppleScript only appears to be a natural language on the surface. In fact is an artificial language, like any other programming language… It is very easy to read AppleScript, but quite hard to write it… When writing programs or scripts, users prefer a more conventional programming language structure. Later versions of AppleScript dropped support for dialects. In hindsight, we believe that AppleScript should have adopted the Programmerís Dialect that was developed but never shipped.

A sad end to a truly innovative language feature—even if the feature didn’t work out. I wonder how much more AppleScript would be respected by programmers if it did use a more conventional programming language syntax rather than being based on English. Cook seems to share these sentiments: he states in the closing paragraph to the paper that

Many of the current problems in AppleScript can be traced to the use of syntax based on natural language; however, the ability to create pluggable dialects may provide a solution in the future, by creating a new syntax based on more conventional programming language styles.

Indeed, it’s possible to write Perl and Python code right now to construct and send AppleEvents. Some of you will know that AppleScript is just one of the languages supported by the Open Scripting Architecture (OSA) present in Mac OS X. The story leading to this though, is rather interesting:

In February of 1992, just before the first AppleScript alpha release, Dave Winer convinced Apple management that having one scripting language would not be good for the Macintosh… Dave had created an alternative scripting language, called Frontier… when Dave complained that the impending release of AppleScript was interfering with his product, Apple decided the AppleScript should be opened up to multiple scripting languages. The AppleScript team modified the OSA APIs so that they could be implemented by multiple scripting systems, not just AppleScript… Frontier, created by Dave Winer, is a complete scripting and application development environment. It is also available as an Open Scripting component. Dave went on to participate in the design of web services and SOAP. Tcl, JavaScript, Python and Perl have also been packaged as Open Scripting components.

Well done, Dave!

As for AppleScript’s actual development, there’s an interesting reference to a SubEthaEdit/Gobby/Wiki-like tool that was used for their internal documentation:

The team made extensive use of a nearly collaborative document management/writing tool called Instant Update. It was used in a very wiki-like fashion, a living document constantly updated with the current design. Instant Update provides a shared space of multiple documents that could be viewed and edited simultaneously by any number of users. Each userís text was color-coded and time-stamped.

And also,

Mitch worked during the entire project to provide that documentation, and in the process managed to be a significant communication point for the entire team.

Interesting that their main documentation writer was the communication point for the team, no?

Finally, AppleScript went through usability testing, a practice practically unheard of for programming languages. (Perl 6’s apocalypses and exegeses are probably the closest thing that I know of to getting feedback from users, as opposed to the language designers or committee simply deciding on everything without feedback from their actual userbase!)

Following Appleís standard practice, we user-tested the language in a variety of ways. We identified novice users and asked them “what do you think this script does?” As an example, it turned out that the average user thought that after the command put x into y the variable x no longer retained its old value. The language was changed to use copy x into y instead.

Even more interesting:

We also conducted interviews and a round-table discussion about what kind of functionality users would like to see in the system.

The survey questions looked like this:

The other survey questions in the paper were even more interesting; I’ve omitted them in this article due to lack of space.

So, those were the interesting points that I picked up when I read the paper. I encourage you to read it if you’re interested in programming languages: AppleScript’s focus on pragmatics, ease of use for non-programmers, and its role in a very heavily-based GUI environment makes it a very interesting case study. Thankfully, many Mac OS X applications are now scriptable so that fluent users can automate them effectively with Automator, AppleScript, or even Perl, Python and Ruby and the UNIX shell these days.

Honestly, the more I discover about Apple’s development technologies, the more impressed I am with their technological prowess and know-how: Cocoa, Carbon, CoreFoundation, CoreVideo, QuickTime, vecLib and Accelerate, CoreAudio, CoreData, DiscRecording, SyncServices, Quartz, FSEvents, WebKit, Core Animation, IOKit… their developer frameworks are, for the most part, superbly architectured, layered, and implemented. I used to view AppleScript as a little hack that Apple brought to the table to satisfy the Mac OS X power users, but reading the paper has changed my opinion on that. I now view AppleScript in the same way as QuickTime: incredible architecture and power, with an outside interface that’s draconian and slightly evil because it’s been around and largely unchanged for 15 freaking years.


Pushing the Limits

OK, this is both ridiculous and cool at the same time. I need to write code for Mac OS X, Windows and Linux for work, and I like to work offline at cafes since I actually tend to get more work done when I’m not on the Internet (totally amazing, I know). This presents two problems:

  1. I need a laptop that will run Windows, Mac OS X and Linux.
  2. I need to work offline when we use Subversion for our revision control system at work.

Solving problem #1 turns out to be quite easy: get a MacBook (Pro), and run Mac OS X on it. Our server runs fine on Darwin (Mac OS X’s UNIX layer), and I can always run Windows and Linux with Parallels Desktop if I need to.

For serious Windows coding and testing, though, I actually need to boot into real Windows from time to time (since the program I work on, cineSync, requires decent video card support, which Parallels doesn’t virtualise very well yet). Again, no problem: use Apple’s Boot Camp to boot into Windows XP. Ah, but our server requires a UNIX environment and won’t run under Windows! Again, no problem: just install coLinux, a not very well known but truly awesome port of the Linux kernel that runs as a process on Windows at blazing speeds (with full networking support!).

Problem #2 — working offline with Subversion — is also easily solved. Download and install svk, and bingo, you have a fully distributed Subversion repository. Hack offline, commit your changes offline, and push them back to the master repository when you’re back online. Done.

Where it starts to get stupid is when I want to:

  • check in changes locally to the SVK repository on my laptop when I’m on Mac OS X…
  • and use those changes from the Mac partition’s SVK repository while I’m booted in Windows.

Stumped, eh? Not quite! Simply:

  • purchase one copy of MacDrive 6, which lets you read Mac OS X’s HFS+ partitions from Windows XP,
  • install SVK for Windows, and
  • set the %SVKROOT% environment variable in Windows to point to my home directory on the Mac partition.

Boom! I get full access to my local SVK repository from Windows, can commit back to it, and push those changes back to our main Subversion server whenever I get my lazy cafe-loving arse back online. So now, I can code up and commit changes for both Windows and the Mac while accessing a local test server when I’m totally offline. Beautiful!

But, the thing is… I’m using svk — a distributed front-end to a non-distributed revision control system — on a MacBook Pro running Windows XP — a machine intended to run Mac OS X — while Windows is merrily accessing my Mac HFS+ partition, and oh yeah, I need to run our server in Linux, which is actually coLinux running in Windows… which is, again, running on Mac. If I said this a year ago, people would have given me a crazy look. (Then again, I suppose I’m saying it today and people still give me crazy looks.) Somehow, somewhere, I think this is somewhat toward the evil end of the scale.


Add Cmd-1 and Cmd-2 back to iTunes 7

iTunes 7.0 removed the Cmd-1 and Cmd-2 shortcuts to access the iTunes window and the equaliser, for whatever reason. You can add them back in via the Keyboard preference in System Preferences:

  • launch System Preferences,
  • go to the Keyboard & Mouse preference,
  • click on the Keyboard Shortcuts tab,
  • hit the + button, pick iTunes as the application, type in “Show Equalizer” as the menu title, and use Cmd-2 for the keyboard shortcut.
  • hit the + button, pick iTunes as the application, type in “Hide Equalizer” as the menu title, and use Cmd-2 for the keyboard shortcut.
  • hit the + button, pick iTunes as the application, type in “iTunes” as the menu title, and use Cmd-1 for the keyboard shortcut.