Inspired by a Medium article...

Coding in Color

... I've been thinking a lot about semantic code highlighting lately.

I've been writing a lot of Lua at work that gets interpreted by C++ code, and owing to the structure of the workflow, there's a long time between writing code and testing which means little mistakes like typos cost me a lot of time. Rather than trying to get better at not making mistakes... I try to find better tools. I started wondering if semantic highlighting would make certain mistakes more visually obvious.

I looked for a Sublime Plugin that would do this, and I found a couple, but when I installed them they didn't work on most of the code for some reason. I could tell the semantic highlighter was doing something, but most of the code was still a neutral color.

My theory as to why is that these plugins are trying to understand the language too much. To really (really) properly color the code in a way that's aware of the language, the text editor has to do a lot of the work of the compiler and that seems impractical.

But I don't really need that. I don't mind, for instance, if the highlighter colors a variable called foo the same color in two different functions. I don't mind if the highlighter doesn't have a very advanced concept of the language at all, in fact.

So, the other day, I spent the morning writing this:

import sublime
import sublime_plugin
import hashlib

def hash_string(s):
    return int(int(hashlib.md5(s).hexdigest(), 16)%360)

def get_scopename(h):
    return "explicit-hue-"+str(h)

class ColorerListener(sublime_plugin.EventListener):
    def redo(self, view):
        word_regions = view.find_all("[_A-Za-z][_a-zA-Z0-9]*")

        hash_to_region_list = {}
        for region in word_regions:
            h = hash_string(view.substr(region).encode('utf-8'))
            hash_to_region_list[h] = hash_to_region_list.get(h, []) + [region]

        for h in range(0, 360):
            view.erase_regions(get_scopename(h))

        for h, regions in hash_to_region_list.items():
            scopename = get_scopename(h)
            view.add_regions(scopename, regions, scopename, "", sublime.DRAW_NO_OUTLINE)

    def on_modified(self, view):
        self.redo(view)

    def on_activated(self, view):
        self.redo(view)

That is the source code to a Sublime plugin. Sublime colors text using a collection of objects of type Region. A Region is an interval of characters with a starting and ending index. The function view.add_regions makes a list of regions all active in the current text view. And each Region can be assigned a "scope" when its added. A "scope" is a string associated with it which determines its color according to xml in user-defined settings. The intended use for these scopes seems to be something like: the highlighter indicates using a scope that something is keyword or somesuch language construct and then all keywords are assigned the same color which can be controlled in the program in user preferences.

Of course, what I want is to color each word a different color with a pseudo-random hue, but I'm stuck going through the same pipeline, so what I've done is populate a settings xml file with 360 entities like this...

<dict>
    <key>name</key>
    <string>explicit-hue-94</string>
    <key>scope</key>
    <string>explicit-hue-94</string>
    <key>settings</key>
    <dict>
        <key>foreground</key>
        <string>#5b6ed6</string>
        <key>background</key>
        <string>#000000</string>
    </dict>
</dict>

(Which I generated using this script)

import colorsys
import math

template = """
        <dict>
            <key>name</key>
            <string>steve</string>
            <key>scope</key>
            <string>steve</string>
            <key>settings</key>
            <dict>
                <key>foreground</key>
                <string>#0000ff</string>
                <key>background</key>
                <string>#000000</string>
            </dict>
        </dict>
"""

def rgb_to_hex(rgb):
    return '#%02x%02x%02x' % tuple(map( lambda x: int(x * 255), rgb ))

for i in range(0,360):
    rgb = colorsys.hls_to_rgb(i * math.pi / 180.0, 0.6, 0.6)

    this_text = template.replace("steve", "explicit-hue-" + str(i))
    this_text = this_text.replace("#0000ff", rgb_to_hex(rgb))

    print this_text

...giving me 360 scopes of different hue.

Here's the result:

I mean, it really doesn't know what language that is. It has the basic functionality. currentIndex is the same color throughout, and that's the point. It doesn't know what is commented, maybe I'll fix that some day. But such as it is, I've been using this at work and it's already saved me some time.