PyMark

08/06/2012

Using Python's native type system for object markup is an idea that has been bouncing around my head for a very long time. Previously I'd written an implementation and used it for object markup in an XNA game engine. Eventually I actually extended the above into more of a full featured serialiser such that I could actually use python as a scripting language in the engine - with communication going via stdin/stdout or some socket.

It was kind of a cool solution but untimately I ruined it. I made it too specialised to this particular use in the game engine, and it had a bunch of hacked in features which just overcomplicated everything and made it a pain. It ended up slow and troublesome and now it just appears wildly backward.

When I decided to do a rewrite I simplified everything and must have cut the lines of code count by almost a factor of 10. The result is a solution with a much better vision. At the end of the day a new object markup language can be seen as even more useless than designing a new programming language. Still, like a new language designer, I think there are some novel ideas behind it.

The system works by essentially using Python to compile any native python object into a simple binary format. This format can then be read into various other languages.

Doesn't sound very novel but using Python it gives us quite a few cool things "for free". First of all we get a front end for free, which actually has the power of a complete programming language. Doing transformation on the data and all sorts of stuff like that can be achieved at compile time with relatively little trouble. Secondly - the source can be hand written, yet syntax errors are still caught at compile time.

And using a binary format for serialisation is sensible for many situations too. It can be parsed very fast and simply (less than 250 lines of C). Like bytecode, it is stable and you can be almost certain that it is valid and will not throw errors in loading (incorrect access at runtime on the other hand will throw errors).

To start one writes a python module. All objects at the top level of the native python types will be exported. PyMark can act like JSON and look something like this:

""" My Favourite Pets - A basic example """

benny = {
  "type"  : "Dog",
  "name"  : "Benny Boos",
  "color" : "Brown",
  "toys"  : ["Bone", "Ball"]
}

roger = {
  "type"  : "Horse",
  "name"  : "Roger Horse",
  "color" : "White",
  "toys"  : ["Brush", "String"]
}

catherine = {
  "type"  : "Cat",
  "name"  : "Catherine",
  "color" : "Ginger",
  "toys"  : ["String", "Mouse"]
}

But having Python as a front end allows you to be much more expressive if you wish.

""" My Favourite Pets - Another example """

from pymark import enum, module, struct

# Constants

Types = enum("Dog", "Horse", "Cat")
Toys = enum("String", "Mouse", "Brush", "Bone", "Ball")

Colors = struct(
    Brown = (94, 83, 51),
    White = (255, 255, 255),
    Ginger = (237, 133, 14),
)

# Module

pets = module(

  benny = struct(
    type = Types.Dog,
    name = "Benny Boos",
    color = Colors.Brown,
    toys = [Toys.Bone, Toys.Ball]
  ),

  roger = struct(
    type = Types.Horse,
    name = "Roger Horse",
    color = Colors.White,
    toys = [Toys.Brush, Toys.String]
  ),

  catherine = struct(
    type = Types.Cat,
    name = "Catherine",
    color = Colors.Ginger, 
    toys = [Toys.String, Toys.Mouse]
  )

)

Once this is done you feed it through the pymark binary "pymark pets_two.py pets_two.pmk", which ouputs a pymark file which can then be loaded into your application. I've written parsers for a lot of languages. The reason for this is that I've found writing a parser as a very good way to learn a new language. It instantly highlights all of the important issues a language from the type system to the low level capabilities and lilbrary support. In C data access looks something like this.

#include <stdio.h>
#include "../pymark/parsers/PyMark.h"

int main(int argc, char** argv) {

  PyMarkObject* pets_two = PyMark_Unpack("pets_two.pmk");

  printf("TypeID: %i\n", pets_two->get(pets_two, "pets.catherine.type")->as_int);
  printf("Name: %s\n", pets_two->get(pets_two, "pets.catherine.name")->as_string);

  PyMarkObject* color = pets_two->get(pets_two, "pets.catherine.color");
  printf("Color: (%i, %i, %i)\n", color->items[0]->as_int, 
                                  color->items[1]->as_int, 
                                  color->items[2]->as_int);

  PyMark_Delete(pets_two);

  return 0;
}

So overall pretty simple. I've tried to make the access as clean as possible and not overlabour it with search and tree features like an XML reader has. For some developments in the future - perhaps a way to serialise functions using the Python Bytecode. Then if the application has Python/C API access then it can even call functions!