Tom Lee

Software development geekery.

The Chimp Programming Language

Disclaimer: chimp is still very young and incomplete. It’s by no means ready for prime time, merely getting to a point where I figure other language nerds might be interested in tinkering with it. You’ve been warned!

Here’s one for my “Doing It Wrong” bucket. :)

Over the last few of months I’ve been hacking away — with some help from Amjith (thanks Amjith!)– on chimp: a little dynamic, strongly typed programming language experiment written in C. Each task/thread in chimp gets its own heap, garbage collector and virtual machine for execution of bytecode. As a result, chimp tasks are “shared nothing” & communicate via message passing.

The language itself kind of grew out of a naive copying collector I wrote after prodding around in the MRI/Ruby code base (I eventually dropped the copying collector in favour of a naive mark/sweep). A Python-y object model was slapped on top of the copying collector & from there it kind of grew legs, arms & fur.

The language that has evolved is kinda interesting. If you can imagine Rust or Erlang carrying Ruby & Python’s bastard surrogate child … well, yeah. It’s kinda like that.

My imagination sucks. What does it actually look like?

Though the concurrency / message passing stuff is the part of chimp that perhaps best distinguishes it from Ruby/Python, here’s an example without any concurrency voodoo:

use io;

main argv {
  io.print("What's your name?");
  var name = io.readline();
  io.print(str("Hello, ", name));
}

(Yep. My blog totally has chimp syntax highlighting support. :) )

Inside the chimp programming language

The syntax of chimp is probably best described as an odd blend of Python, Ruby, JavaScript and maybe even Go, but it has some pretty significant differences when it comes to the language runtime.

To reiterate what I mentioned earlier, chimp has per-task heap, GC & VM. As a result, tasks run in isolation from one another. Currently this “isolation” is enforced by the runtime itself. The compiler doesn’t yet do a good job of enforcing the rules, so programs can “compile” but then bork hard when the VM hits something strange.

Tasks (currently mapped 1:1 with OS threads) communicate via message passing. Because each task gets its own GC, collections can occur on one task without blocking others. Since each VM runs in isolation, there’s no need for a GIL — or any other special locking at the VM level, for that matter. In theory tasks should be largely “shared nothing”, though there’s still much to do to enforce that. All this means the potential for Real Concurrency without forking off new heavyweight processes.

FP geeks will hate me for this one. :) Since heaps etc. aren’t shared between tasks, there’s only ever one task fiddling with any given bit of data. As a result, it’s “safe” to mutate data. I anticipate the language itself will encourage a functional style, but it would be best to expect something more like Ruby/Python than Haskell/Erlang.

I’m currently restricting (at runtime) the types that can be passed as messages between tasks — only strings, integers, arrays and tasks are permitted. Messages are serialized & deserialized between tasks in the current implementation. There’s no notion of anything like Rust’s exchange heap here.

Message Passing Example

Here’s a (relatively ugly) chimp port of an example from an Erlang tutorial I stumbled across the other day:

use io;

pong me {
  var pinger = me.recv();
  var msg = me.recv();
  while msg != "finished" {
    io.print("Pong received ping");
    pinger.send("pong");
    msg = me.recv();
  }
  io.print("Pong finished");
}

ping me {
  var ponger = me.recv();
  var i = 0;
  while i < 3 {
    ponger.send("ping");
    var m = me.recv();
    if m == "pong" {
      io.print("Ping received pong");
    }
    i = i   1;
  }
  ponger.send("finished");
  io.print("Ping finished");
}

main argv {
  var ponger = spawn { |me| pong(me); };
  var pinger = spawn { |me| ping(me); };
  ponger.send(pinger);
  pinger.send(ponger);
  pinger.join();
  ponger.join();
}

The spawn keyword, seen on lines 30 & 31 spins up a new task & returns a pipe/handle to the new task. This pipe can be used to communicate with the new task.

The explicit joins are kind of ugly & may be something to eventually do away with. Likewise explicitly passing the task handles via send/recv. But y’know, you get the idea.

Check out some of the other examples, if you’re still interested. sandbox.chimp is probably the best example of most of the syntax in the language to date.

What still needs to be done?

*sigh* So very much.

I’m keeping a TODO list of the list of immediate things I want to improve, but it’s not really comprehensive. Tasks deserve a rewrite. Tasks are too “heavy” thanks to aggressive GC memory grabs. Adding new data types isn’t as easy as I’d like. The garbage collector could be a lot better. There are no structured data types. Tests would be nice.

It’s not yet possible to import external chimp source files. Some basic arithmetic operators are unimplemented (modulo, for example). We need more modules. The bytecode instruction set kind of sucks. The VM is inefficient. There’s no optimization pass at all. Code enforcing certain scoping rules is non-existent … the list goes on!

All that said, if you’re a {,would-be} compiler/runtime geek eager to play with a small, dynamic language with some neat little features despite its infancy … well, I’ll take any code I can get. :)

Get involved!

It’s kind of exhausting hammering away on this all by my lonesome. :)

Get in touch or send me pull requests. Happy to answer any questions about where to begin, what needs doing, etc.

I’m also open to the (ideally friendly!) criticism & thoughts of folks who know better. Right now I’m kinda fumbling my way through in places & I’m sure it shows.

Last of all, you can take a page out of my buddy Glenn’s book and actively work to break it using contrived examples. ;)