Open or Die

Every programming language I can think of does an optimization where boolean operators determine flow. It seems like a good idea at first, but it's also inconsistent. Here's the phenomenon I'm taking about starting with C:

C

#include <stdio.h>

int foo()
{
    printf("foo\n");
    return 1;
}

int bar()
{
    printf("bar\n");
    return 1;
}

int main(int argc, char** args)
{
    printf("%d\n", foo() || bar());
    return 0;
}

Output:

foo
1

The question is, why doesn't it print this?:

foo
bar
1

And the answer is: bar() didn't run, because C said to itself:

foo() || bar() that's two things or-ed together, the first thing is true, which means, no matter what the second thing is, the expression is true, so we don't have to evauate it.

If you've never seen this before, you might be forgiven for thinking it's a C thing. But weirdly, it seems fairly universal.

Python

def foo():
    print "foo"
    return True

def bar():
    print "bar"
    return True

print(foo() or bar())

Ouput:

foo
True

Lua

function foo()
    print("foo")
    return true
end

function bar()
    print("bar")
    return true
end

print(foo() or bar())

Output:

foo
true

JavaScript

function foo()
{
    console.log("foo")
    return true
}

function bar()
{
    console.log("bar")
    return true
}

console.log(foo() || bar())

Output:

foo
true

Java

class MyFooBar
{
    public static boolean foo()
    {
        System.out.println("foo");
        return true;
    }

    public static boolean bar()
    {
        System.out.println("bar");
        return true;
    }

    public static void main(String args[])
    {
        System.out.println(foo() || bar());
    }
}

Output:

foo
true

Rust

fn foo() -> bool
{
    println!("foo");
    true
}

fn bar() -> bool
{
    println!("bar");
    true
}

fn main()
{
    println!("{}", foo() || bar());
}

Output:

foo
true

Ruby

def foo
    print "foo\n"
    true
end

def bar
    print "bar\n"
    true
end

print (foo() or bar())

Output:

foo
true

Perl

sub foo {
    print "foo";
    return 1
}

sub bar {
    print "bar";
    return 1
}

print (foo or bar)

Output:

foo1

It's kinda remarkable to me that all those languages made the same call, especially when they disagree about so many other things. I guess I can see why, I mean, it comes in handy. In Perl, you often see this:

open(my $fh, "<", "somefile.txt") or die "error!";

So, it kinda says "open or die", which is whimsical, and maybe reads naturally to English-speakers.

In JavaScript and Lua you'll sometimes see a similar trick for getting a default value when a variable is undefined, like in JavaScript, you might see something like this:

function resize(height, width)
{
    height = height || 600;
    width = width || 800;
    // ...
}

In Lua, there's a convention of using and and or to hand-roll a ternary operator, so you'll see this:

local width = isBig() and 300 or 30

instead of:

local width
if isBig() then
    width = 300
else
    width = 30
end

It's just one line, so that's better, right?

Eh... I don't know about all this. It's neat how these short expressions read, but the underlying optimization actually represents a tiny inconsistency in the language. It's inconsistent because "and" and "or" are binary operators, but they have something that no other binary operators have; they control flow. With other binary operators...

foo() + bar()
foo() - bar()
foo() * bar()
foo() / bar()
foo() | bar()
foo() & bar()
foo() == bar()
foo() != bar()

...you can depend on foo() and bar() to both execute. But not with boolean operators? Huh?

Consider this C code:

#include <stdio.h>

int foo()
{
    printf("foo\n");
    return 0;
}

int bar()
{
    printf("bar\n");
    return 0;
}

int main(int argc, char** args)
{
    printf("%d\n", foo() * bar());
    return 0;
}

Output:

foo
bar
0

Even though a similar optimization could work here. Remember, the rationale with this:

foo() or bar()

If foo() is true, we needn't bother running bar() because we know the value of the whole expression already.

That's also the case here when foo() is 0:

foo() * bar()

and yet, the language does not cull the call to bar().

I don't like it. To me, it would be cleaner if I could think about boolean operators as two-argument functions. Instead, they are not like functions, the're actually more like if-statements, because they determine what code executes.

What if we made a language without this exceptional behavior? What if "and" and "or" were just functions? Come on, let's dream big!

How to write Asteroids without Casting

Since very early in my programming life, I have been writing Asteroids clones. I've done it many times, and I pretty much always encounter the same obstacle: how do you use polymorphism when there are two objects with different dynamic types interacting with each other? Actually the question comes up in a lot of contexts, not just Asteroids, but Asteroids is a fun example. Anyway, I recently found a new approach (new to me), and I really like it, but I can't really explain it in just one sentence, so take a deep breath, and come with me on an Asteroids endevaor.

So, we're writing Asteroids using instances of C++ classes to represent the objects in the game, something like this:

class Object
{
};

class Ship : public Object
{
};

class Bullet : public Object
{
};

class Asteroid : public Object
{
};

There's a bunch of objects floating around in space, and when two of them hit each other, something should happen, so it seems like there should be function like:

class Object
{
public:
    virtual void handleCollision(Object* other) = 0;
};

The subclasses override handleCollision() to do whatever it does when things collide, like so:

class Object
{
public:
    virtual void handleCollision(Object* other) = 0;
};

class Ship : public Object
{
public:
    virtual void handleCollision(Ship* other);
    virtual void handleCollision(Bullet* other);
    virtual void handleCollision(Asteroid* other);
};

class Bullet : public Object
{
public:
    virtual void handleCollision(Ship* other);
    virtual void handleCollision(Bullet* other);
    virtual void handleCollision(Asteroid* other);
};

class Asteroid : public Object
{
public:
    virtual void handleCollision(Ship* other);
    virtual void handleCollision(Bullet* other);
    virtual void handleCollision(Asteroid* other);
};

And then you'd have a loop in the game that checks collisions:

for (Object* A : objects)
for (Object* B : objects)
{
    if( A->collidesWith(B) )
    {
        A->handleCollision(B);
    }
}

And then polymorphism happens and the right collide function gets called, right?

Well, no. Or rather, it's not that simple. When you call this:

A->handleCollisions(B);

A and B are both static type Object. If A's dynamic type is Asteroid and B's dynamic type is Bullet, C++ will look in A's virtual-table for a function like this:

void Asteroid::handleCollisions(Object*)
{
}

But it will not look at B's dynamic type, and then search A's virtual-table for a function like this:

void Asteroid::handleCollisions(Bullet*)
{
    // Asteroid hit bullet go boom!
}

So, how do you make handleCollisions take into account the dynamic type of both this and the function argument? It doesn't seem like you can.

I've written stuff like this many times, and around now is when I usually give up and just tag my data:

enum Type
{
    kShip,
    kAsteroid,
    kBullet
};

class Object
{
    Type type;
    virtual void handleCollision(Object* other) = 0;
};

class Ship : public Object
{
    Ship() : type(kShip) {}
};

class Bullet : public Object
{
    Bullet() : type(kBullet) {}
};

class Asteroid : public Object
{
    Asteroid() : type(kAsteroid) {}
};

And then make handleCollision() do it's thing by switching on type:

void Asteroid::handleCollisions(Object* other)
{
    switch(other->type)
    {
        case kShip:
            // Ship explode lose life / game over
        break;
        
        case kAsteroid:
            // Asteroids pass like ships in the night
        break;
        
        case kBullet:
            // Asteroid hit bullet go boom!
        break;
    }
}

That gets the job done, but there's something unsettling about it. The virtual-table already encodes the dynamic type of the object, and here I am doing it again. Another way would be to use a dynamic cast. That would eliminate the redundant data, but it would feel even worse.

Let's look at this line again:

A->handleCollisions(B);

That will look to the dynamic type of A, but not B, so it will call this:

void Asteroid::handleCollisions(Object* other)
{
}

What if that function, instead of trying to figure out what type other is, called a another function on other and let C++ figure it out:

void Asteroid::handleCollisions(Object* other)
{
    other->back_handleCollisions(this);
}

C++ will then look into other's virtual-table and find this function:

void Bullet::back_handleCollisions(Asteroid* other)
{
    other->handleCollisions(this);
}

Which calls back to handleCollisions, this time with the static type Bullet in the argument so it goes here:

void Asteroid::handleCollisions(Bullet* other)
{
    // Asteroid hit bullet go boom!
}

And JUST LIKE THAT, we're where we want to be, and we didn't have to switch or cast or do any tags!

Of course, doing that for all the classes would be a lot of tedious C++ code to write.

That's why I wrote a python script to generate it!

import sys
import json

with open(sys.argv[1]) as f:
    s = f.read()
    config = json.loads(s)

classes = config["classes"]
functions = config["functions"]
baseClass = config["baseType"]
returnType = config["returnType"]

back_declaration_template = "virtual ReturnType back_foo(Secondary*);"

back_definition_template = ""
if returnType == "void":
    back_definition_template = "ReturnType Primary::back_foo(Secondary* _) { _->foo(this); }"
else:
    back_definition_template = "ReturnType Primary::back_foo(Secondary* _) { return _->foo(this); }"

declaration_template = "virtual ReturnType foo(Secondary*);"

definition_template = ""
if returnType == "void":
    definition_template = "ReturnType Primary::foo(Secondary* _) { _->back_foo(this); }"
else:
    definition_template = "ReturnType Primary::foo(Secondary* _) { return _->back_foo(this); }"

body_template = "ReturnType Primary::foo(Secondary* _)\n{\n    defaultCode\n}\n"
forward_declaration_template = "class Primary;"

class_declaration_template = """
class Primary : public Base
{
public:
definitions
};""".replace("Base", baseClass)

base_class_declaration_template = """
class Base
{
public:
definitions
};""".replace("Base", baseClass)

class_declarations = []


def make_macro():
    prototypes = []
    for foo in functions:
        prototypes.append(declaration_template \
            .replace("ReturnType", returnType) \
            .replace("Secondary", baseClass) \
            .replace("foo", foo)) \

        for secondary in classes:
            prototypes.append( back_declaration_template \
                .replace("ReturnType", returnType) \
                .replace("foo", foo) \
                .replace("Secondary", secondary) )

        for secondary in classes:
            prototypes.append( declaration_template \
                .replace("ReturnType", returnType) \
                .replace("foo", foo) \
                .replace("Secondary", secondary) )

    return "    " + "\\\n    ".join(prototypes)


macroName = config["macro"]

def make_base_class_declaration():
    prototypes = []
    for foo in functions:
        prototypes += [declaration_template \
            .replace("ReturnType", returnType) \
            .replace("Secondary", baseClass) \
            .replace("foo", foo) \
            .replace(";", " = 0;")]

        for secondary in classes:
            prototypes.append( back_declaration_template \
                .replace("ReturnType", returnType) \
                .replace("foo", foo) \
                .replace("Secondary", secondary) \
                .replace(";", " = 0;") )

    return base_class_declaration_template \
        .replace("definitions", "    " + "\n    ".join(prototypes) )


def make_primary_class_declarations():
    declarations = []
    for primary in classes:
        declarations.append(class_declaration_template \
            .replace("definitions", macroName) \
            .replace("Primary", primary)
            )
    return "\n".join(declarations)


def make_class_declarations():
    return make_base_class_declaration() + "\n"\
        + make_primary_class_declarations()


def make_level_one_source():
    bodies = []
    for foo in functions:
        for primary in classes:
            bodies.append(definition_template \
                .replace("ReturnType", returnType) \
                .replace("foo", foo) \
                .replace("Primary", primary) \
                .replace("Secondary", baseClass))

            for secondary in classes:
                bodies.append(back_definition_template \
                    .replace("ReturnType", returnType) \
                    .replace("foo", foo) \
                    .replace("Primary", primary) \
                    .replace("Secondary", secondary))
    return "\n".join(bodies)

defaultCode = config["defaultCode"]

def make_source():
    bodies = []
    for foo in functions:
        for primary in classes:
            for secondary in classes:
                bodies.append(body_template \
                    .replace("ReturnType", returnType) \
                    .replace("foo", foo) \
                    .replace("Primary", primary) \
                    .replace("Secondary", secondary) \
                    .replace("defaultCode", defaultCode))
    return "\n".join(bodies)


def make_forward_declarations():
    declarations = [forward_declaration_template \
            .replace("Primary", baseClass)]
    for c in classes:
        declarations.append(forward_declaration_template.replace("Primary", c))
    return "\n".join(declarations)


headerName = config["header"]
sourceName = config["source"]
namespaceName = config["namespace"]

with open(headerName, "w") as f:
    headerOnceConstant = headerName.replace(".","_")
    f.write("#ifndef _" + headerOnceConstant + "_\n")
    f.write("#define _" + headerOnceConstant + "_\n\n")
    f.write("namespace " + namespaceName + "\n{\n")
    f.write(make_forward_declarations() + "\n\n")
    f.write("#define " + macroName + " \\\n" + make_macro() + "\n\n")
    f.write(make_class_declarations() + "\n\n")
    f.write("\n}\n")
    f.write("\n#endif\n")


with open(sourceName, "w") as f:

    f.write("#include \"" + headerName + "\"\n")
    f.write("namespace " + namespaceName + "\n")
    f.write("{\n")

    f.write(make_source() + "\n")
    f.write(make_level_one_source() + "\n\n")

    f.write("}")

Run with this config file:

{
    "classes" : [
        "Asteroid",
        "Ship",
        "Bullet"
    ]

    , "baseType" : "Object"
    , "returnType" : "void"

    , "functions" : [
        "handleCollision"
    ]

    , "defaultCode" : ""
    , "namespace" : "oids"
    , "macro" : "ASTEROID_DEFINITIONS"
    , "header" : "asteroids.h"
    , "source" : "asteroids.cpp"
}

To get this header:

#ifndef _asteroids_h_
#define _asteroids_h_

namespace oids
{
class Object;
class Asteroid;
class Ship;
class Bullet;

#define ASTEROID_DEFINITIONS \
    virtual void handleCollision(Object*);\
    virtual void back_handleCollision(Asteroid*);\
    virtual void back_handleCollision(Ship*);\
    virtual void back_handleCollision(Bullet*);\
    virtual void handleCollision(Asteroid*);\
    virtual void handleCollision(Ship*);\
    virtual void handleCollision(Bullet*);


class Object
{
public:
    virtual void handleCollision(Object*) = 0;
    virtual void back_handleCollision(Asteroid*) = 0;
    virtual void back_handleCollision(Ship*) = 0;
    virtual void back_handleCollision(Bullet*) = 0;
};

class Asteroid : public Object
{
public:
ASTEROID_DEFINITIONS
};

class Ship : public Object
{
public:
ASTEROID_DEFINITIONS
};

class Bullet : public Object
{
public:
ASTEROID_DEFINITIONS
};

}

#endif

And this source file:

#include "asteroids.h"

#include <stdio.h>

namespace oids
{
void Asteroid::handleCollision(Asteroid* _)
{
    printf( "Asteroid pass asteroid in night\n" );
}

void Asteroid::handleCollision(Ship* _)
{
    printf( "Asteroid hit ship BOOM split make smaller asteroids\n" );
}

void Asteroid::handleCollision(Bullet* _)
{
    printf( "Asteroid hit bullet BOOM split make smaller asteorids\n" );
}

void Ship::handleCollision(Asteroid* _)
{
    printf( "Ship hit asteroid BOOM ship go bye-bye\n" );
}

void Ship::handleCollision(Ship* _)
{
    printf( "Ship pass ship in night (they're allies)\n" );
}

void Ship::handleCollision(Bullet* _)
{
    printf( "Ship hits bullet.  FRIENDLY FIRE!\n" );
}

void Bullet::handleCollision(Asteroid* _)
{
    printf( "Bullet hit asteroid !\n" );
}

void Bullet::handleCollision(Ship* _)
{
    printf( "Bullet hits ship.  FRIENDLY FIRE!\n" );
}

void Bullet::handleCollision(Bullet* _)
{
    printf( "Bullet hits bullet.  Amazing marksmanship!\n" );
}

void Asteroid::handleCollision(Object* _) { _->back_handleCollision(this); }
void Asteroid::back_handleCollision(Asteroid* _) { _->handleCollision(this); }
void Asteroid::back_handleCollision(Ship* _) { _->handleCollision(this); }
void Asteroid::back_handleCollision(Bullet* _) { _->handleCollision(this); }
void Ship::handleCollision(Object* _) { _->back_handleCollision(this); }
void Ship::back_handleCollision(Asteroid* _) { _->handleCollision(this); }
void Ship::back_handleCollision(Ship* _) { _->handleCollision(this); }
void Ship::back_handleCollision(Bullet* _) { _->handleCollision(this); }
void Bullet::handleCollision(Object* _) { _->back_handleCollision(this); }
void Bullet::back_handleCollision(Asteroid* _) { _->handleCollision(this); }
void Bullet::back_handleCollision(Ship* _) { _->handleCollision(this); }
void Bullet::back_handleCollision(Bullet* _) { _->handleCollision(this); }

}

I had to insert the printfs by hand.

Then run with this main:


#include <vector>
#include "asteroids.h"

using namespace oids;

int main()
{
    Asteroid* asteroid = new Asteroid;
    Ship* ship = new Ship;
    Bullet* bullet = new Bullet;

    std::vector<Object*> objects;

    objects.push_back(asteroid);
    objects.push_back(ship);
    objects.push_back(bullet);

    for (std::vector<Object*>::iterator A = objects.begin(); A < objects.end(); A++)
    for (std::vector<Object*>::iterator B = objects.begin(); B < objects.end(); B++)
    {
        (*A)->handleCollision(*B);
    }

    delete asteroid;
    delete ship;
    delete bullet;

    return 0;
}

And lo.

Asteroid pass asteroid in night
Asteroid hit ship BOOM split make smaller asteroids
Asteroid hit bullet BOOM split make smaller asteorids
Ship hit asteroid BOOM ship go bye-bye
Ship pass ship in night (they're allies)
Ship hits bullet.  FRIENDLY FIRE!
Bullet hit asteroid !
Bullet hits ship.  FRIENDLY FIRE!
Bullet hits bullet.  Amazing marksmanship!

There you have it, asteroids without casting (or tagging (which feels like casting)).

Python vs C++: Brevity Category

The task: write a commandline utility that takes a single filename, opens that file and outputs the base64-encoded sha512 of the file.

Here is the python code:

import hashlib
import base64
import sys

with open(sys.argv[1], "rb") as f:
    print base64.b64encode(hashlib.sha512(f.read()).digest())

In C++, the first thing I did was pursue how to do a sha512. The easiest way I could find was to link against OpenSSL, and use that library's built-in funcitons. So, I needed an OpenSSL installation, and for that I turned to homebrew. I made this makefile:

CPP = c++

OPENSSL_DIR = /usr/local/Cellar/openssl/1.0.2/

OPENSSL_INCLUDES = -I$(OPENSSL_DIR)/include/
OPENSSL_LIBDIR = -L$(OPENSSL_DIR)/lib/
OPENSSL_LIBS = -lssl -lcrypto

sechash: sechash.cpp
	$(CPP) \
		$(OPENSSL_INCLUDES) \
		$(OPENSSL_LIBDIR) \
		$(OPENSSL_LIBS) \
		sechash.cpp \
		-o sechash

which of course depends on where homebrew puts it, and also the version, which is a pain, but it worked, so I stopped searching for a better way.

Then I needed a way of base64-encoding raw data. For that, I found some snipptes on StackOverflow, and rewrote them to be more in accordance to my coding style. Then I reminded myself how to use fread().... Anyway, here's what I wrote:

#include <openssl/sha.h>

#include <string>

#include <string.h>
#include <stdlib.h>
#include <stdint.h>


static const std::string base64_chars =
     "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
     "abcdefghijklmnopqrstuvwxyz"
     "0123456789+/";


static inline bool is_base64(uint8_t c)
{
    return ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || (c == '+') || (c == '/'));
}


static void base64_encode(const uint8_t* bytes_to_encode, size_t in_len, std::string& out)
{
    uint8_t char_array_3[3];
    uint8_t char_array_4[4];

    size_t i = 0;

    while (in_len--)
    {
        char_array_3[i++] = *(bytes_to_encode++);

        if( i == 3 )
        {
            char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
            char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
            char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
            char_array_4[3] = char_array_3[2] & 0x3f;

            for( i = 0; i < 4; i++ )
            {
                out += base64_chars[char_array_4[i]];
            }
            i = 0;
        }
    }

    if( i )
    {
        int j = 0;

        for( j = i; j < 3; j++ )
        {
            char_array_3[j] = '\0';
        }

        char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
        char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
        char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
        char_array_4[3] = char_array_3[2] & 0x3f;

        for( j = 0; (j < i + 1); j++ )
        {
            out += base64_chars[char_array_4[j]];
        }

        while( i++ < 3 )
        {
            out += '=';
        }
    }
}

enum SecHashError
{
    NO_ERROR = 0,
    FILE_NOT_FOUND,
    FILE_NOT_FULLY_READ,
    SHA_ERROR
};

class SecHash
{
public:
    SecHash(SecHashError err)
        : err(err)
    {
    }

    SecHash(SHA512_CTX* shaContextPtr)
        : err(NO_ERROR)
    {
        if( ! SHA512_Final(bytes, shaContextPtr) )
        {
            err = SHA_ERROR;
        }
    }

    const uint8_t* getBytes() const
    {
        return bytes;
    }

    SecHashError getError() const
    {
        return err;
    }

    const char* getBase64() const
    {
        if( base64.empty() )
        {
            base64_encode(bytes, SHA512_DIGEST_LENGTH, base64);
        }

        return base64.c_str();
    }

private:
    SecHashError err;
    uint8_t bytes[SHA512_DIGEST_LENGTH];
    mutable std::string base64;
};


#include <stdio.h>

const size_t CHUNK_SIZE = 256;

SecHash hashFile(const char* filename)
{
    SHA512_CTX shaContext;
    SHA512_Init(&shaContext);

    uint8_t chunk[CHUNK_SIZE];

    FILE* fp = fopen(filename, "rb");
    if( fp )
    {
        size_t numBytesRead;
        while( (numBytesRead = fread(chunk, sizeof(uint8_t), CHUNK_SIZE, fp)) > 0 )
        {
            SHA512_Update(&shaContext, chunk, numBytesRead);
        }

        if( ferror(fp) )
        {
            return SecHash(FILE_NOT_FULLY_READ);
        }
    }
    else
    {
        return SecHash(FILE_NOT_FOUND);
    }

    return SecHash(&shaContext);
};


int main(int argc, char** args)
{
    if( argc != 2 )
    {
        fprintf(stderr, "wrong number of args, takes filepath\n");
    }
    else
    {
        SecHash hash(hashFile(args[1]));
        if( hash.getError() == NO_ERROR )
        {
            printf("%s\n", hash.getBase64());
        }
        else
        {
            switch(hash.getError())
            {
                case NO_ERROR:
                    fprintf(stderr, "What?  No error?\n");
                break;

                case FILE_NOT_FOUND:
                    fprintf(stderr, "File not found.\n");
                break;

                case FILE_NOT_FULLY_READ:
                    fprintf(stderr, "File not fully read.\n");
                break;

                case SHA_ERROR:
                    fprintf(stderr, "SHA512 computation failed.\n");
                break;

                default:
                    fprintf(stderr, "Unknown error.\n");
                break;
            }
        }

    }

    return 0;
}

And in the words of Tom Lehrer, I can see from the look on your faces that I've made my point, and that pleases me.

Writing a Sublime Plugin

Inspired by a Medium article...

Coding in Color

... I've been thinking a lot about semantic code highlighting lately.

I've been writing a lot of Lua at work that gets interpreted by C++ code, and owing to the structure of the workflow, there's a long time between writing code and testing which means little mistakes like typos cost me a lot of time. Rather than trying to get better at not making mistakes... I try to find better tools. I started wondering if semantic highlighting would make certain mistakes more visually obvious.

I looked for a Sublime Plugin that would do this, and I found a couple, but when I installed them they didn't work on most of the code for some reason. I could tell the semantic highlighter was doing something, but most of the code was still a neutral color.

My theory as to why is that these plugins are trying to understand the language too much. To really (really) properly color the code in a way that's aware of the language, the text editor has to do a lot of the work of the compiler and that seems impractical.

But I don't really need that. I don't mind, for instance, if the highlighter colors a variable called foo the same color in two different functions. I don't mind if the highlighter doesn't have a very advanced concept of the language at all, in fact.

So, the other day, I spent the morning writing this:

import sublime
import sublime_plugin
import hashlib

def hash_string(s):
    return int(int(hashlib.md5(s).hexdigest(), 16)%360)

def get_scopename(h):
    return "explicit-hue-"+str(h)

class ColorerListener(sublime_plugin.EventListener):
    def redo(self, view):
        word_regions = view.find_all("[_A-Za-z][_a-zA-Z0-9]*")

        hash_to_region_list = {}
        for region in word_regions:
            h = hash_string(view.substr(region).encode('utf-8'))
            hash_to_region_list[h] = hash_to_region_list.get(h, []) + [region]

        for h in range(0, 360):
            view.erase_regions(get_scopename(h))

        for h, regions in hash_to_region_list.items():
            scopename = get_scopename(h)
            view.add_regions(scopename, regions, scopename, "", sublime.DRAW_NO_OUTLINE)

    def on_modified(self, view):
        self.redo(view)

    def on_activated(self, view):
        self.redo(view)

That is the source code to a Sublime plugin. Sublime colors text using a collection of objects of type Region. A Region is an interval of characters with a starting and ending index. The function view.add_regions makes a list of regions all active in the current text view. And each Region can be assigned a "scope" when its added. A "scope" is a string associated with it which determines its color according to xml in user-defined settings. The intended use for these scopes seems to be something like: the highlighter indicates using a scope that something is keyword or somesuch language construct and then all keywords are assigned the same color which can be controlled in the program in user preferences.

Of course, what I want is to color each word a different color with a pseudo-random hue, but I'm stuck going through the same pipeline, so what I've done is populate a settings xml file with 360 entities like this...

<dict>
    <key>name</key>
    <string>explicit-hue-94</string>
    <key>scope</key>
    <string>explicit-hue-94</string>
    <key>settings</key>
    <dict>
        <key>foreground</key>
        <string>#5b6ed6</string>
        <key>background</key>
        <string>#000000</string>
    </dict>
</dict>

(Which I generated using this script)

import colorsys
import math

template = """
        <dict>
            <key>name</key>
            <string>steve</string>
            <key>scope</key>
            <string>steve</string>
            <key>settings</key>
            <dict>
                <key>foreground</key>
                <string>#0000ff</string>
                <key>background</key>
                <string>#000000</string>
            </dict>
        </dict>
"""

def rgb_to_hex(rgb):
    return '#%02x%02x%02x' % tuple(map( lambda x: int(x * 255), rgb ))

for i in range(0,360):
    rgb = colorsys.hls_to_rgb(i * math.pi / 180.0, 0.6, 0.6)

    this_text = template.replace("steve", "explicit-hue-" + str(i))
    this_text = this_text.replace("#0000ff", rgb_to_hex(rgb))

    print this_text

...giving me 360 scopes of different hue.

Here's the result:

I mean, it really doesn't know what language that is. It has the basic functionality. currentIndex is the same color throughout, and that's the point. It doesn't know what is commented, maybe I'll fix that some day. But such as it is, I've been using this at work and it's already saved me some time.

A WebGL Tutorial

I have a job interview later today. The specific position I'm applying for is WebGL developer. At some point I knew a lot of WebGL, in fact, at some point, I think I knew all of it, but that was a long time ago, and right now, I need some review. So, as an exercise, I'm writing this tutorial.

To get started with WebGL, you need a canvas in a webpage. So, we'll start with this HTML file.

<html>
<head>
<script src="webgl-debug.js"></script>
<script type="text/javascript">
    function get_webgl_context() {
        // Insert javascript here.
    }

    function draw() {
        // Insert javascript here.
    }
</script>
</head>

<body onload="draw();">
<div>
<canvas id="mycanvas" height=300 width=400></canvas>
</div>

<textarea id="vertexshader">
        // Insert glsl here.
</textarea>

<textarea id="fragmentshader">
        // Insert glsl here.
</textarea>

</body>

</html>

If you've ever used a canvas to draw in 2D, you'll know that 2D drawing functions don't live in the canvas itself, they live in another object called the context which you obtain from the canvas. With WebGL, it's similar, you extract an object called a WebGLRenderingContext from the canvas using some code like this:

function get_webgl_context() {
	var canvas = document.getElementById('mycanvas');
	var gl;
	if (canvas.getContext) {
		try {
			gl = canvas.getContext("webgl") ||
				canvas.getContext("experimental-webgl");
		}
		catch(e) {
			alert( "getting gl context threw exception" );
		}
	} else {
		alert( "can't get context" );
	}
	
	gl = WebGLDebugUtils.makeDebugContext(gl);
	
	return gl;
}

There's a lot of caveats and addendums (error checking) in there, the important line is this one:

gl = canvas.getContext("webgl") || canvas.getContext("experimental-webgl");

That gets the WebGL context object, which contains all the functions and constants you need to write code to draw into that canvas.

To stake out a namespace, C-based OpenGL standards name all their functions and constants starting with "gl" or "GL". The original developers of WebGL thought that WebGL javascript code should resemble OpenGL C-code as closely as possible, so the innocent might suspect that they named the functions in the context object exactly the same as the C-functions, but they didn't. Instead, functions like glBindBuffer() and constants like GL_ELEMENT_ARRAY_BUFFER lose their namespace prefixes and become bindBuffer() and ELEMENT_ARRAY_BUFFER so if you're cunning, you name the context object gl. Then the code reads similarly:

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBuffer);

... becomes ...

gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, indexBuffer);

Neat, huh?

Anyway, let's get down to business. We're going to start populating this function:

function draw() {
}

First, we get our WebGL context by calling the function above:

var gl = get_webgl_context();

Step One: clear the screen. The canvas now represents a pixel display. Clearing means setting all the pixels to the same color and depth. Use these functions to set the color and depth values that will be used:

gl.clearColor(0,0,1,1);
gl.clearDepth(1);

These functions don't draw anything, they set global variables. gl.clearColor() takes a red, green, blue and alpha component (on a scale from 0 to 1), so the line above sets the clear color to fully opaque blue. gl.clearDepth takes a float from 0 to 1. 1 is the default value, I just added the line for completeness. Now this line:

gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);

...clears the display.

If you run the code with just this clear, you should now see a blue rectangle filling the canvas. Here is a full html file for reference:

<html>
<head>
<script src="webgl-debug.js"></script>
<script type="text/javascript">
    function get_webgl_context() {
        var canvas = document.getElementById('mycanvas');
        var gl;
        if (canvas.getContext) {
            try {
                gl = canvas.getContext("webgl") ||
                    canvas.getContext("experimental-webgl");
            }
            catch(e) {
                alert( "getting gl context threw exception" );
            }
        } else {
            alert( "can't get context" );
        }

        gl = WebGLDebugUtils.makeDebugContext(gl);

        return gl;
    }

    function draw() {
        var gl = get_webgl_context();

        gl.clearColor(0,0,1,1);
        gl.clearDepth(1);
        gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
    }
</script>
</head>

<body onload="draw();">
<div>
<canvas id="mycanvas" height=300 width=400></canvas>
</div>

<textarea id="vertexshader">
</textarea>

<textarea id="fragmentshader">
</textarea>

</body>

</html>

Now we need to get started drawing some actual geometry. This is where it gets a bit daunting. You have to call a lot of functions just to draw one triangle on the screen. There are a lot of steps, but by the end, I promise the pay-off is that you'll see enormous potential in this beautiful low-level process. First, we'll need an array of vertex coordinates:

var array = new Float32Array([0,0,0, 1,0,0, 1,1,0]);

Great. Then we need an array-buffer object. The difference between a buffer object and a javascript array is that the buffer object is stored in graphics memory and OpenGL is aware of it. Generate a buffer object by calling:

var buffer = gl.createBuffer();

Then call:

gl.bindBuffer(gl.ARRAY_BUFFER, buffer);

In OpenGL, functions with "bind" in their names set a state variable. When I read "bind" I think "make current". gl.bindBuffer() makes the specified buffer object the current object associated with the name ARRAY_BUFFER. The next call...

gl.bufferData(gl.ARRAY_BUFFER, array, gl.STATIC_DRAW);

...loads the data from array into the buffer currently bound to ARRAY_BUFFER (which we just set in the line before.)

So we have now informed OpenGL of an array of vertex coordinate data with 3-space coordinates for the corners of a triangle. But of course, OpenGL doesn't know that yet. OpenGL just has an array of floats, we now need to describe how the numbers in that array get accessed to make the coordinates of shapes. That is done with something called an element-array-buffer.

var indexArray = new Uint16Array([0,1,2]);
var indexBuffer = gl.createBuffer();
gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, indexBuffer);
gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, indexArray, gl.STATIC_DRAW);

That is an array of integers indicating how the vertex coordinate data in the array-buffer gets sampled to make the coordinates of a triangle.

Now comes the really fun part: shaders. Pretty much every pixel you draw using WebGL is drawn with shaders (the exception being clearing) A shader is a tiny program that runs on the GPU. There are two types: vertex and fragment. A vertex shader runs for each vertex of the geometry you draw to determine the location that that vertex gets drawn on the screen. And a fragment shader gets run for each pixel of geometry that gets drawn to determine the color of that pixel.

We're going to add our shader code inside textarea elements in the HTML of our webpage, and then use some javascript to extract the strings:

First here's our vertex shader:

<textarea id="vertexshader">
  attribute vec4 position;

  void main() {
    gl_Position = position;
  }
</textarea>

Typically, a vertex shader is responsible for converting coordinates from the coordinates given in the array-buffer to something called clip-space coordinates. It can be used to apply a perspective distortion to create an illusion of 3D, or it can simply echo the coordinates given in the buffer. That's what this shader does.

Now, here's our fragment shader:

<textarea id="fragmentshader">
  void main() {
    gl_FragColor = vec4(1, 0, 0, 1);
  }
</textarea>

A typical fragment shader is responsible for doing things like lighting calculations and texture sampling to get all the fancy colors on the screen that you're used to seeing in Doom or Halo or whatever favorite game whatever it is. This shader simply returns the color red with full alpha no matter where the geometry is.

Now let's go back to javascript. Call this to create a program object. A program object represents a vertex and a fragment object linked together, we'll be populating this program object with the shaders we just wrote:

var program = gl.createProgram();

Then create vertex and fragment shader objects using these calls:

var vertexShader = gl.createShader(gl.VERTEX_SHADER);
var fragmentShader = gl.createShader(gl.FRAGMENT_SHADER);

Obtain the strings for the shader code from the textareas we made:

var vertexShaderString = document.getElementById('vertexshader').value;
var fragmentShaderString = document.getElementById('fragmentshader').value;

Use this line to attach the shader code to the WebGL shader object:

gl.shaderSource(
    vertexShader,
    "#ifdef GL_ES\nprecision highp float;\n#endif\n" +
    vertexShaderString);

And this line to compile:

gl.compileShader(vertexShader);

We now have a compiled vertex shader which we can attach to our program:

gl.attachShader(program, vertexShader);

Do the same thing for the fragment code:

gl.shaderSource(
    fragmentShader,
    "#ifdef GL_ES\nprecision highp float;\n#endif\n" +
    fragmentShaderString);
gl.compileShader(fragmentShader);
gl.attachShader(program, fragmentShader);

After all that, what we now have is a program object with a vertex and a fragment shader attached. One more call to link the vertex and fragment shaders together within that program:

gl.linkProgram(program);

We then inform WebGL of how to route the vertex coordinate data in our buffers into the program for the purpose of drawing. For this, we need to set up a vertex attribute. In general vertices can have lots of attributes like position, normal vector coordinates, texture coordinates or anything else you would like. In this tutorial all we have is position information. We assign an index (0) to the position attribute variable in the shader with this code:

var positionAttribIndex = 0;
gl.bindAttribLocation(program, positionAttribIndex, 'position');
gl.enableVertexAttribArray(positionAttribIndex);

Then we need to tell OpenGL which buffers to use and how to sample them to provide the right data to the position attribute:

var kFloatSize = Float32Array.BYTES_PER_ELEMENT;
gl.vertexAttribPointer(positionAttribIndex,
    3, gl.FLOAT, false, 3 * kFloatSize, 0 * kFloatSize);

And we're almost ready to draw. This line...

gl.useProgram(program);

... makes the program current. Subsequent draw calls appeal to that program. And this line (the draw call)...

gl.drawElements(gl.TRIANGLES, 3, gl.UNSIGNED_SHORT, 0);

... actually draws the damn triangle.

For reference, here is the full HTML file with everything we just did.

<html>
<head>
<script src="webgl-debug.js"></script>
<script type="text/javascript">
    function get_webgl_context() {
        var canvas = document.getElementById('mycanvas');
        var gl;
        if (canvas.getContext) {
            try {
                gl = canvas.getContext("webgl") ||
                    canvas.getContext("experimental-webgl");
            }
            catch(e) {
                alert( "getting gl context threw exception" );
            }
        } else {
            alert( "can't get context" );
        }

        gl = WebGLDebugUtils.makeDebugContext(gl);

        return gl;
    }

    function draw() {
        var gl = get_webgl_context();

        gl.clearColor(0,0,1,1);
        gl.clearDepth(1);
        gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);

        var array = new Float32Array([0,0,0, 1,0,0, 1,1,0]);
        var buffer = gl.createBuffer();
        gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
        gl.bufferData(gl.ARRAY_BUFFER, array, gl.STATIC_DRAW);

        var indexArray = new Uint16Array([0,1,2]);
        var indexBuffer = gl.createBuffer();
        gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, indexBuffer);
        gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, indexArray, gl.STATIC_DRAW);

        var program = gl.createProgram();

        var vertexShader = gl.createShader(gl.VERTEX_SHADER);
        var fragmentShader = gl.createShader(gl.FRAGMENT_SHADER);

        var vertexShaderString = document.getElementById('vertexshader').value;
        var fragmentShaderString = document.getElementById('fragmentshader').value;

        gl.shaderSource(
            vertexShader,
            "#ifdef GL_ES\nprecision highp float;\n#endif\n" +
            vertexShaderString);
        gl.compileShader(vertexShader);
        gl.attachShader(program, vertexShader);

        gl.shaderSource(
            fragmentShader,
            "#ifdef GL_ES\nprecision highp float;\n#endif\n" +
            fragmentShaderString);
        gl.compileShader(fragmentShader);
        gl.attachShader(program, fragmentShader);

        gl.linkProgram(program);

        var positionAttribIndex = 0;
        gl.bindAttribLocation(program, positionAttribIndex, 'position');
        gl.enableVertexAttribArray(positionAttribIndex);

        var kFloatSize = Float32Array.BYTES_PER_ELEMENT;
        gl.vertexAttribPointer(positionAttribIndex,
            3, gl.FLOAT, false, 3 * kFloatSize, 0 * kFloatSize);
        
        gl.useProgram(program);

        gl.drawElements(gl.TRIANGLES, 3, gl.UNSIGNED_SHORT, 0);
    }
</script>
</head>

<body onload="draw();">
<canvas id="mycanvas" height=300 width=400>
  canvas text
</canvas><br/>

<textarea id="vertexshader">
  attribute vec4 position;

  void main() {
    gl_Position = position;
  }
</textarea>

<textarea id="fragmentshader">
  void main() {
    gl_FragColor = vec4(1, 0, 0, 1);
  }
</textarea>

</body>

</html>

Whew. Not my finest hour pedagogically perhaps. My explanations got a bit brusque at the end, because my interview is coming up, and I still need to take a shower, shave and print out a paper resume.