Interview With Albert Gräf - Author of the Pure Programming Language [proxy]

Albert Gräf is the author of the Pure programming language. Pure is a functional programming language based on term rewriting. It has a syntax featuring curried function applications, lexical closures and equational definitions with pattern matching, and thus is somewhat similar to languages of the Haskell and ML variety. Pure is also a dynamic language, and is more like Lisp in this respect. The interpreter has an LLVM backend that does JIT compilation.

Why did you name the programming language Pure?

I don't really remember how that name came about, maybe I originally envisioned the language to be much purer than it came out to be. ;-) But I liked the name, others liked it, too, so it stuck. In retrospect, it's a sort of pun and it also has a nice recursive backronym (Pure universal rewriting engine). Moreover, in this day and age it's quite a feat to find a short catchy name that hasn't been used for a programming language yet.

Pure is based on term rewriting. Can you give a simple explanation of term rewriting as it applies to Pure?

Basically, Pure is just about evaluating expressions, like we all learned in high school algebra. There are some equations with some placeholders (variables) in them, so what you do is apply these equations in order to simplify the expression that you want to evaluate. For instance, you know that, for any x, y, (x+y)^2 = x^2+2*x*y+y^2. If you apply that to an expression like (a*x+b)^2, you obtain (a*x)^2+2*(a*x)*b+b^2 as the result (not really that much simpler, but you can now go on applying other algebraic identities).

That's basically all that Pure does. In fact, you can type this very example in the Pure interpreter and it will work:

> (x+y)^2 = x^2+2*x*y+y^2; > (a*x+b)^2; (a*x)^2+2*(a*x)*b+b^2

However, there's a special twist here. An equation can be applied both ways, but that leads to problems; all too often, after going through a bunch of such replacement steps you may end up with exactly the same expression that you started with. Therefore, in term rewriting we only apply the rules in one direction, i.e., from left to right. That's also how it works in Pure.

Thus, what the Pure interpreter does is to apply equations specified by the programmer as term rewriting rules until it reaches an expression which can't be simplified further because there are no more rewriting rules that apply to it (or to one of its subexpressions). Such an expression is called a normal form in term rewriting, and those normal forms are what constitutes a "value" in Pure. The rewriting steps take place in a specific order (expressions are normally evaluated left-to-right, innermost expressions first, as it's commonly done also in more traditional languages).

Term rewriting is a universal model of computation and thus it wasn't long until someone came up with the idea to use it as a programming language (as far as I can say, among the first to do this was Michael O'Donnell at the beginning of the 1980s). Of course, the first implementations were quite slow and not really practical, but we've learned a lot since then. Pure goes to some lengths to implement term rewriting in an efficient way, by collecting the rewriting rules for a given function and compiling them all to a native code function. Primitive operations such as integer arithmetic are compiled to native code as well.

Beyond the basic term rewriting calculus, Pure also adds a bunch of features that programmers have come to expect from functional languages, such as local functions (closures), macros, namespaces, the ability to call code written in other languages (C, specifically) etc., in order to become a practical general-purpose programming language. So most of the programming techniques that you use in other FPLs such as Lisp, ML and Haskell carry over to Pure pretty easily.

How does term rewriting make Pure different from other programming languages from a users perspective?

I think that the most important point is the conceptual simplicity of term rewriting. If you understand what a mathematical expression is and how you can use equations to manipulate those expressions, then you basically understand how Pure works. You may still have to learn the expression syntax and come to grips with curried function applications and closures, but other than that you can start using Pure right away.

On the other hand, the term rewriting model also offers the advanced programmer a considerable amount of additional flexibility and freedom. Pure is dynamically typed and thus functions and data structures can be polymorphic to an unlimited degree; you can even add rules for the built-in operations such as '+' to make them work with your own data structures. Also, keeping with the spirit of term rewriting, Pure eliminates the segregation of "defined functions" and "data constructors". Any function symbol can occur in a normal form term, so a (pure) constructor symbol is nothing but a function symbol which happens to not have any equations. This might first seem like an arcane feature, but it makes it possible to deal with symbolic terms and "constructors with equations" in a direct fashion. For instance, if you want you can write constructor equations for the "list cons" (denoted ':', like in Haskell) which keep your lists sorted and duplicate free:

> x:y:xs = y:x:xs if x>y; x:y:xs = x:xs if x==y; > [13,7,9,7,1]+[1,9,7,5]; [1,5,7,9,13]

As far as I know, there's no other functional language (outside the special realm of symbolic algebra systems) which allows you to do these things in a direct fashion -- i.e., without adding some level of interpretation, which essentially amounts to writing a little term rewriting interpreter in the language. In Pure that term rewriting evaluator is always at your command, it's at the core of the language.

More detailed answers and further questions concerning data constructors exist in a second post Albert Graef Interview Continuation - Constructor Symbols and Constructors with Equations.

How is Pure different from your prior programming language Q?

It's the next step in the evolution of the Q language. The Q interpreter was quite slow because it interpreted bytecode, and the language had some other shortcomings as well (in particular, the lack of local functions was a real limitation in everyday programming). We had been tossing around the idea to write a compiler for Q on the Q mailing list for quite some time, but the interpreter I had wasn't really designed for that, so I eventually decided to start a new project from scratch.

Pure programs thus run much faster than their equivalents in Q, as Pure uses the LLVM toolkit to compile everything to native code. That is, the Pure interpreter actually doesn't interpret any code at all any more, it's just a frontend to the JIT compiler. But it still provides a dynamic, interactive environment which has the look and feel and the convenience of an interpreter. In addition, the interpreter can also compile scripts to native standalone executables.

The syntax got a major face lift, too. It's much more Haskell-like now in some ways, although I kept the free format. (I think that Haskell is for the most part a very pretty language, but I still can't make myself like indentation-based syntax; probably that's because I was educated using free-format languages such as Algol 60 and Pascal in my youth.)

The language itself also has some important new features, in particular local functions and macros. The latter are also defined by rewriting rules which are applied on the right-hand sides of other definitions before compilation. The macros are hygienic, of course (no name capture). Lazy evaluation is implemented in a different way, using a kind of lazy futures as in Alice ML. Q had a kind of user-defined special forms instead, which was an interesting feature IMHO, but it was somewhat hard to use, and Pure's lazy futures and its macros certainly make up for it.

The data type facilities are also completely different. Q had a kind of algebraic data types with single inheritance, which was workable and efficient, but I never really liked it very much because it seemed to be an ad hoc solution born out of practical requirements, not very much in the spirit of term rewriting. Thus Pure started out without any kind of data types (like in Lisp, they were all in your head), but in the end that just turned out to be impractical. We had an extensive discussion on the Pure mailing list about this, and what I ended up adding to the language just recently is a kind of "types-as-predicates" which are implemented using rewriting rules just like anything else in Pure.

There's one feature in Q that Pure still lacks right now: concurrency. Q used a basic POSIX threads model for that. Pure will offer something higher-level instead, probably some kind of concurrent futures, as well as parallel matrix maps and comprehensions; that's the next big item on the TODO list after version 1.0 is out.

Besides the programming language Q, what other programming languages or technology influenced your design of Pure?

Many. Q was heavily influenced by the original edition of the Bird/Wadler book on functional programming (which used a notation similar to Miranda), and that heritage still shows in Pure, e.g., in the core of the standard library. Some of that has since been updated to be more Haskell-like, though. Lazy futures were adopted from Alice ML. The macro and metaprogramming capabilities are quite similar to those of Lisp. Some of the language elements (in particular, the 'when' and 'with' clauses) were inspired by Wouter van Oortmerssen's Aardappel language (also a programming language based on term rewriting). And the matrix and vector data structures work in a fashion similar to MATLAB/Octave.

When would you use Haskell and when would you use Pure?

To answer that, I would have to confess that I very much prefer dynamically typed FPLs, so I'm going to keep my mouth shut. :)

What made you decide to use LLVM as the backend for Pure? How has that decision worked out so far?

I was originally pointed to LLVM by some members on the Q mailing list and we subsequently discussed it as a potential backend since around 2005, if I remember correctly. Some two years later I also talked about it with friends from Grame who develop the Faust DSP programming language (very cool language). They told me that they planned to use LLVM, too. As it turned out, I beat them to it. The first Pure version was released in 2008, but Faust got its LLVM backend only last summer. I then quickly worked with Stephane Letz at Grame on the integration of Pure and Faust via LLVM bitcode, so you can now call Faust signal processors from Pure very easily (and even inline Faust code in Pure). This was dead easy, a testament to LLVM's capabilities.

For compiler writers, LLVM is a thing from heaven. It makes developing compiler backends so much easier that I regret that I didn't start to use it earlier. I also considered some alternatives in the beginning (GNU Lightning and the JVM come to my mind), but I still think that going with LLVM was the right choice.

One of the potential downsides of LLVM is that its C++ API isn't stable, so we do have to work around backward compatibility issues every once in a while. On the other hand, I'm certainly happy that LLVM is making so much progress. And Pure still works with every LLVM version >= 2.5, so the backward compatibility issues haven't been show stoppers so far.

How much of Pure is written in C++ and how much in Pure?

The compiler, interpreter frontend and the runtime (which implements the primitive operations) is completely written in C++ right now. As we're using LLVM, that was the most reasonable choice. The latest version of the interpreter also enables you to implement interactive commands as Pure functions, though, so this is beginning to change. There's no reason why Pure shouldn't eventually be self-hosting (apart from the dependency on LLVM), but it's not high on my priority list right now.

On the other hand, most of the standard library is written in Pure. The prelude also declares some stuff that is actually implemented in the runtime, either for efficiency (e.g., matrix operations) or because it needs access to the internals of the interpreter and the compiler (the metaprograming features). The list operations and the container types in the library are all written in Pure, however.

What programming environment/tools (editor, OS, debugger etc.) do you use to develop Pure?

Emacs, gcc and gdb, the holy trinity. ;-) Linux as operating system, I've been using that since 1993. Windows with mingw/msys only for building msi packages for Pure. For developing in Pure itself, I also use emacs (there's a Pure mode for it) and the debugger of the Pure interpreter.

Can you list some of your favorite programming books and resources?

I'm reading interesting articles about programming all the time, mostly online; Lambda the Ultimate is a great resource for that.

Van Roy and Haridi's Concepts, Techniques, and Models of Computer Programming is on my bedside locker; being the tome that it is, it will probably stay there for a while.

Like most programmers (I guess) I have fond memories of all-time classics such as C Programming Language and Structure and Interpretation of Computer Programs (the first edition of the latter was a real eye-opener when I started working on Q).

And recently I very much enjoyed Seibel's Coders at Work.

Online Resources

Lambda the Ultimate - http://lambda-the-ultimate.org
Concepts, Techniques, and Models of Computer Programming - http://www.info.ucl.ac.be/~pvr/book.html
The C Programming Language - http://cm.bell-labs.com/cm/cs/cbook/
Structure and Interpretation of Computer Programs - http://mitpress.mit.edu/sicp/
Coders at Work - http://codersatwork.com/