The Secret Java Preprocessor

Ask a random person on the street if Java has a preprocessor and they’ll probably say “no, that’s a C thing”.

Of course, that doesn’t really make sense. Preprocessing Java would mean modifying Java code before it gets interpreted as Java. That’s totally possible. Java tools just don’t have that functionality built in.

gcc has a built-in preprocessor. How about we use it?

Here’s some Java code with a #if in it. (paste into a file called Hello.java if you want to try this at home).

class Hello {
    public static void main(String[] args) {
#if DEBUG
    System.out.println("About to print 'Hello Internet!'");
#endif
    System.out.println("Hello Internet!");
    }
}

Step 1. Disguise the Java as C.

mv Hello.java Hello.c

Step 2. run the C preprocessor.

gcc -E -P Hello.c -o Hello.java

Step 3. Compile the resulting file as Java.

javac Hello.java

Step 4. Recover the original java.

mv Hello.c Hello.java

Step 5. Run the program.

$ java Hello
Hello Internet!

Now, try it with DEBUG=1

mv Hello.java Hello.c
gcc -E -P -DDEBUG=1 Hello.c -o Hello.java
javac Hello.java
mv Hello.c Hello.java

Run the program again:

$ java Hello
About to print 'Hello Internet!'
Hello Internet!

Behold! the Java preprocessor!

Now, you’re in for some work if you want to use this in your tool-chain. You’ll probably want to take into account the fact that the preprocessor can muck with line numbers and your Java compiler will lie to you about the locations of errors. If you remove -P, gcc will output a file with meta data that you can use to track the origin of an error, but this is starting to get complicated. Making your text-editor color your code right is going to be just about impossible. In short, getting back the comfort you had before would be a hassle.

And probably you don’t want to deal with even a small hassle just to create an arcane tool that will enable you to write .java files no other Java development environment can understand.

But I think I’ve made my point. The C preprocessor is not actually that specific to C. It’s a separable module that can be used all by itself.

Who Killed Make?

Remember Make? You know, the program that runs when you type “make” at the command-line?

I use Make all the time at home. I’ve worked at seven software companies, and I’ve only ever dealt with one makefile at work. The makefile in question was only used for the Linux version of the product. At that company, Linux was a low priority and the makefile always felt like an afterthought.

Lately, I’ve been in mobile development and Make is even more scarce. In a room full of Linux nerds at a party, I asked out loud

“who killed Make?”, and somebody said,

“What are you talking about? I wrote a makefile yesterday.”

Apparently Make isn’t dead after all. The Wikipedia article says:

Though integrated development environments and language-specific compiler features can also be used to manage a build process, Make remains widely used, especially in Unix.

That’s great and all, but I think of make as being a multi-purpose program that runs on almost any computer, so it disappoints me to observe it relegated to the Linux universe and used only to compile legacy C code.

I think this happens a lot: a group of brilliant people develop a flexible tool that can handle a variety of use-cases. The tool gets popular for one particular use-case, and then everybody thinks that’s all it can do. Eventually the tool and the use-case become synonymous (What is X? It’s a tool for doing Y.) And all the blood sweat and tears of those brilliant people go down the drain.

Well, sometimes that’s just the way things go. Time marches on. It’s out with the old build system, in with the new. And unless you’re stuck on Linux writing C, you’re better off using the current, fully-featured IDE whatever it is, XCode, Eclipse, Android Studio or Microsoft Visual Studio. This is largely true, and these are all great pieces of software, but they are not replacements for Make, and here’s why:

Make is not a tool for compiling software.

Make is a language for defining a dependence graph.

Any time you use programs that take files as input and generate files as output (which is pretty much always), situations arise where some files need to update whenever certain other files change. Inevitably, the way in which files depend on each other gets complex. Make can help you manage that complexity by automating the update process.

The need to update files based on dependences is not exclusive to C-based development on Linux.

  • Maybe you’re writing a document in TeX with diagrams made in Illustrator and the PDF needs to re-render any time one of the diagrams changes.

  • Maybe you have a custom script that scrapes comments from the source of the numerical analysis API you’re writing and uses them to build the developer website.

  • Perhaps the documentation and library files for your proprietary Matlab module need to be copied into a directory along with sample code and then zipped.

Any development process that involves more than one piece of software is going to have a dependence graph. But in a world where Make is dismissed as the build tool for linux neckbeards, what people tend to do (whether or not they realize it) is memorize the graph and perform updates by hand. This leads to such situations as:

Oh, the client needs a new package, uh... is this one current? Let me check the created date. Wednesday. Was that after the bug fix? I think it was, but hang on, I’m not sure the documentation in this package reflects the API changes that we did last week, because I might have forgotten at the time to do the doc generation step. I guess I’ll just remake the whole thing.

Make exists to liberate you from that nonsense. Why not use it?

If Statements (Part 3)

... It is now the present.

I've started making a conscious effort to reduce if-statements in my code. When I write an if-statement, I at least consider ways of replacing the functionality with things like maps and polymorphism.

I discovered there's a whole community around this concept:

The Anti-If Campaign

There are also lots of posts on stack overflow about polymorphism replacing conditionals. One thing that sometimes comes up is that SmallTalk, the most successful vaporware ever, has no if-statement. Instead, it has an ifTrue method on the Boolean class. Which means you can write code like this:

(x < y) ifTrue: [ 'Yes, x is less than y!' printNl ]

I'm not sure that's any better, actually, but my point is: I'm not the only one thinking about this.

And it is fun to think about, but what about in practice? In practice there's the problem of...

Does my boss care?

You can't impress your boss by writing beautiful code. You can only impress your boss by writing code that does what it's supposed to. Often times you'll trace a bug to a nasty, if-ridden mess. It's your job to fix it. Probably you're being pressured to fix it quickly. Also the code works most of the time.... Maybe the bug only shows up on iPads and not iPhones.... You don't want your fix to accidentally break other devices do you? What's one more if-statement in a sea of if-statements?

if([UIDevice currentDevice].userInterfaceIdiom==UIUserInterfaceIdiomPad)
{
    // My code to fix the bug.
}

So it comes down to either:

  1. Do a large change that reorganizes the code to use less if-statements.
  2. Do a small change that fixes the problem but adds to the mess.

It's a classic choice between slow-but-sustainable and fast-but-crappy. In the midst of it, you choose the second thing, because it seems less risky. And then it eats at you. You fixed the bug, but you didn't fix the code. And you know that the next sad fuck that looks at it won't have a clue what to do.

Well, that next sad fuck might be me, and I don't want you to compromise.

I want you to go to your boss and say "I will NOT fix code by making it worse"

I want you to STOP WRITING IF STATEMENTS

GET UP RIGHT NOW, GO TO YOUR CODE, FACTOR OUT A BUNCH OF IF STATEMENTS, AND MAKE A PULL REQUEST

STOP WRITING IF STATEMENTS

STOP!

If Statements (Part 2)

Two years later....

...I met Aaron, an engineer on the server team, and a modern-day prophet denouncing the evil coding practices of our time!

Aaron had a problem with switch-statements. He said:

Every time you write a switch statement, a little part of Aaron dies.

I asked him to explain this to me. He told me most of the time, when you write a switch statement, you're really switching on the type of something. So, remove the switch statement, and make an actual type in the language, then use polymorphism to accomplish the same job.

Take this pseudocode using OpenGLES for example. OpenGLES has different versions. It's possible to draw a cube in each one, but how you draw a cube varies quite a bit. Maybe you wrote this code in your game:

enum OpenGLVersion {
    GLES10 = 1,
    GLES20 = 2,
    GLES30 = 3
};

class Game {
    private:
        OpenGLVersion openGLVersion;
        void draw() const;
// ...etc etc...
};

void Game::draw() const {
    // Code that's common to
    // drawing a cube
    // using any GL version.
    switch(openGlVersion) {
        case GLES10:
            // draw the cube the GLES10 way.
        break;
        case GLES10:
            // draw the cube the GLES20 way.
        break;
        case GLES30:
            // draw the cube the GLES30 way.
        break;
    }
}

(A little part of Aaron dies when you do that.)

We're switching on something. Let's take Aaron's advice, and convince ourselves that something is a type. GLES10, GLES20, GLES30 those are different versions of OpenGL... which is a graphics engine. Eureka!

class GraphicsEngine {
    virtual void drawCube() const;
protected:
    void beginDrawCube() const;
};

class GLES10 : public GraphicsEngine {
    void drawCube();
};

class GLES20 : public GraphicsEngine {
    void drawCube();
};

class GLES30 : public GraphicsEngine {
    void drawCube();
};

void GraphicsEngine::beginDrawCube() const {
    // Code that's common to
    // drawing a cube
    // using any GL version.
}

void GLES10::drawCube() const {
    beginDrawCube();
    // draw the cube the GLES10 way
}

void GLES20::drawCube() const {
    beginDrawCube();
    // draw the cube the GLES20 way
}

void GLES30::drawCube() const {
    beginDrawCube();
    // draw the cube the GLES30 way
}

And finally

class Game {
    private:
        GraphicsEngine* graphicsEngine;
        void draw() const;
// ...etc etc...
};

void Game::draw()
{
    graphicsEngine->drawCube();  // The switch is gone!
}

Why is this better?

  1. The functions are shorter. There are more of them, but they're shorter. That generally makes the code more readable, since each function is a more easily-digested unit with a descriptive name.

  2. It's more debuggable. Suppose you're tracing the origin of a bug that only shows up in OpenGLES 3.0 Imagine stepping through the new code versus code with a switch. The arrow is pointed at graphicsEngine->drawCube();, you step in, now the arrow is in a new function that only does the OpenGLES 3.0 method. If the other draw functions are in different files, you might not even see them, which is good because you're not debugging them right now. With the switch statement, the other switch cases are cluttering up the screen with code that isn't currently relevant.

  3. I would argue that the new code reflects a more accurate mental model. The game doesn't draw the cube itself, it appeals to the graphics engine to draw the cube. I think of the graphics engine as being an object. Now, it's an object in the code. Before, the graphics engine was an object in my mind, but in the code, I had a number.

  4. If the classes GLES10, GLES20, GLES30 are separated into files, you can optimize the build by only including headers that each file needs. Suppose some day, OpenGLES 1.0 is no longer supported on the platform you're on, and you need to take out the OpenGLES 1.0 code. If you do things right, you just delete a file and you're done.

The thing is, now that I see things Aaron's way, I've started to see all the places this pattern gets violated. I understand the painful sensation Aaron experiences whenever someone in the world writes a switch.

There's no rest for the righteous, I guess.

If Statements (Part 3)

If Statements (Part 1)

Three years ago, a hackathon changed my life.

Well, it influenced my coding philosophy anyway.

The rules of the hackathon were this: Work with a partner to write Conway's Game of Life in an hour in some language / framework. Then, at the end of the hour, throw away your work, and start over with a new partner / language / framework. All in all, we wrote Game of Life seven times in a day.

It turns out, once you've practiced it a couple times, writing the Game of Life in an hour is pretty doable. So, as the day went on, various extra challenges arose, like: Write it in haskell, write it in the web, write it in python..... with no if statements.

That one appealed to me. Python with no if-statements.

I managed to write it. This is the code. (I'm posting in flagrant violation of the rules of the hackathon)

class Game:
    def __init__(self):
        self.cells = []

    def neighbors(self, cell):
        x, y = cell
        return [(x-1, y-1), (x, y-1), (x+1, y-1), (x-1, y),
                (x+1, y), (x-1,y+1), (x, y+1), (x+1, y+1)]

    def next(self):
        neighbor_count = {}

        for center in self.cells:
            for cell in self.neighbors(center):
                neighbor_count[cell] =
                        neighbor_count.get(cell, 0) + 1

        def alive(cell):
            n = neighbor_count[cell]
            return n == 3 or (cell in self.cells and n == 2)

        self.cells = list(set(filter(alive, self.cells +
                reduce(list.__add__, map(self.neighbors, self.cells)))))

    def __repr__(self):
        return repr(self.cells)

You can try it at the command line. Here's how you make the classic glider:

>>> g = Game([(0,0), (1,0), (2,0), (2,1), (1,2)])

Then type this over and over, and you'll see the glider move:

>>> g.next(); g

Anyway, Game of Life aside, I want you to try this. Try to figure out how to rewrite your code, whatever it is, without if statements. You might learn something. I did.

If Statements (Part 2)