Redundant Code Gets a Bad Rap (Part 2)

Since you were a young child, people have probably been telling you that redundant code is a terrible scourge and should be eliminated. But it’s not that simple. Sometimes factoring redundant code introduces other problems by stealth.

Once again it’s time for a parable. Eddie is an engineer at a hypothetical company. While perusing some python scripts in the company code base...

copy_files_and_zip.py
upload_zipped_package.py
upload_all_things.py

… Eddie discovers that copy_files_and_zip.py and upload_zipped_package.py have identical code blocks that zip a collection of files. Eddie dutifully factors that code into a new function:

def zip_stuff():
    blah blah blah
    blah zip blah
    blah blah blah

Also, upload_zipped_package.py and upload_all_things.py have functionality in common, so Eddie factors it into this function:

def upload_zipfile():
    blah blah blah
    blah upload blah
    blah blah blah

He puts the common functions in a new file called utilities.py which the other files can import.

def zip_stuff():
    blah blah blah
    blah zip blah
    blah blah blah

def upload_zipfile():
    blah blah blah
    blah upload blah
    blah blah blah

At first, this doesn’t work, and then Eddie realizes that zip_stuff() and upload_zipfile() use modules, so he needs to import those modules in utilities.py.

import zipfile
import boto

def zip_stuff():
    blah blah blah
    blah zip blah
    blah blah blah

def upload_zipfile():
    blah blah blah
    blah upload blah
    blah blah blah

Great.

Except now all three scripts import utilities which in turn imports boto, a third-party module that doesn’t come with standard Python installs.

And guess what, not everybody at the company uses all three of these scripts on a regular basis. Franka, for instance, only ever runs copy_files_and_zip.py. She doesn’t have boto on her computer. So, several days later, when she grabs the latest version to get some other feature, it doesn't work.

When Franka asserts that the script no longer works, there's initially a lot of confusion about why and a lot of back and fourth about how "it works on MY computer" et cetera. Finally, Eddie realizes what has happened, and he fixes it... by having Franka install boto.

Franka can run the script again, but copy_files_and_zip.py still imports boto without actually using it. Fixing this inconsistency would mean separating the two utility functions into files —which seems to spread the code unnecessary thin— or doing this…

def upload_zipfile():
    import boto
    blah blah blah
    blah upload blah
    blah blah blah

…which is kinda weird. I mean, imports go at the top of the file, right?

So, Eddie leaves it the way it is, with a dependence in the code that doesn’t reflect reality.

Now, maybe this is no big deal. Everybody needs to install boto to use the scripts, now. So what? you have to install things to run the stuff you need all the time.

Well, what if it were something other than boto? What if it were a library that didn’t exist on Windows, and then nobody with Windows could run any of the scripts even if all they do is copy and zip files?!?!

And who, by the way, has it helped to factor these functions into a common file, exactly? It didn’t fix any bugs, it didn’t add any features. It made the total volume of code a tiny bit smaller.

Maybe it’s time we started telling our children that eliminating redundancy is fine if you’re really careful.

Redundant Code Gets a Bad Rap (Part 1)

If you’ve ever done any kind of Computer Science, you’ve probably heard that redundant code is bad. Nobody ever talks about how eliminating redundancy can get you into trouble.

Let me explain what I mean with a parable. Andy is a highly-principled engineer. He knows that redundant code is bad, so when he spots these two functions in the code base…

void initPTCModuleA() {
    mHnd = generatePTCHandle();
    unlockInputChannel(mHnd);
    syncToSystemClock(mHnd);
    
    embellishWithA(mHnd);
}
void initPTCModuleB() {
    mHnd = generatePTCHandle();
    unlockInputChannel(mHnd);
    syncToSystemClock(mHnd);
    
    embellishWithB(mHnd);
}

…he immediately factors the duplicate code into a function.

void newBasicPTCHandle() {
    mHnd = generatePTCHandle();
    unlockInputChannel(mHnd);
    syncToSystemClock(mHnd);
}
void initPTCModuleA() {
    newBasicPTCHandle();
    embellishWithA(mHnd);
}
void initPTCModuleB() {
    newBasicPTCHandle();
    embellishWithB(mHnd);
}

Time goes by.

Another engineer, Bethany, discovers a rare bug and traces the bug to the common function:

void newBasicPTCHandle() {
    mHnd = generatePTCHandle();
    unlockInputChannel(h);
    syncToSystemClock(h);
}

She fixes it by changing the order of initialization calls.

void newBasicPTCHandle() {
    mHnd = generatePTCHandle();
    syncToSystemClock(h);
    unlockInputChannel(h);
}

Except I forgot to tell you something important. Bethany is on the iOS team. And initPTCModuleA() only ever gets called on Android. Bethany’s change actually introduces a bug on Android, but she doesn’t realize it because she only has iOS devices at her desk.

Calvin is on the Android team, but he hasn’t pulled in a while, because he’s working on a big UI revamp, so his next pull brings a tidal wave of new commits including the change that broke this init function. Calvin wastes hours looking for the bug, because he assumes it has to do with his UI changes. Finally, he resorts to a git bisect and finds Bethany’s commit.

Now it might seem like Bethany made this mess and she should have to clean it up, but it isn’t that simple. Maybe the real issue here is that embellishWithA() makes problematic assumptions about the code that precedes it.

Of course, that explanation isn’t going to impress Darcy the tech lead. The way she sees it, the Android build worked before, now it doesn’t, the solution was difficult to track down, and none of this would have happened if Andy had left the redundant code alone.

Sharing code between functions is great, but sharing code between projects actually requires a lot of diligence. Contracts between calling code and common code need to be very explicit, and people need to be aware that changes made to common code affect everybody. It’s virtuous to have your coworkers’ backs, but it also represents extra work. Redundant code isn’t just unambiguously bad, there are tradeoffs.

Rip it Out

Fifteen years ago, at one of my first jobs, I fried a piece of hardware.

I slipped while inspecting a live circuit board with a voltmeter. The voltmeter probe itself conducts, and when I slipped, I accidentally connected a sensitive component to a raw, unresisted power source. Sparks flew, smoke rose, and then BOARD NO WORKIE.

For help repairing the board, I turned sheepishly to Dave, the hardware expert who sat behind me. I didn't like Dave, but that's another story.

What's interesting, and suddenly seems relevant again in my career, is the way Dave fixed it. He started removing parts and retesting the board with parts removed. He did this until the board's behavior matched his expectations given the disability that he saw. He then took the board to the manufacturing department, and asked them to replace the missing parts.

Unfortunately, that wasn't quite enough, because my accidental short-circuit also melted a trace. A trace is one of the thin wires that goes through the middle of the circuit board, it's integral to the plastic, so we couldn't just ask manufacturing to stick another one in there. Instead, Dave had to replace the trace with a hand-soldered wire which hung inelegantly off the board from then on.

Many things happened that day that I took lessons from, but the ones that seem relevant right this second are:

  1. Removing parts led the way to understanding the problem.
  2. Damaging the board ultimately revealed previously hidden information about how it worked.

These days, I stick to software for the most part.

Recently, at work, I removed a large piece of functionality from the software we make, because it's no longer supported. Doing this was VERY educational. I wish I had done it years ago, when I started at the company. The next time I start on a new project, and I need to familiarize myself with the codebase, I intend to make a secret branch, pick an arbitrary, big feature, and remove it. I recommend you do it, too. Seriously. It informs you of the greater architecture of the software in a very practical way. It might change your mind about some design decisions.

Puzzle Solving

This is the story of a friendly competition I engaged in with Haitao, one of my coworkers. We raced to find the solution to a plastic puzzle called the Tetris Cube that the office obtained somehow. He solved it by hand. I wrote a Python script. I didn't tell Haitao that it was a race, and it's unclear who won.

Tetris Cube consists of twelve colored, plastic, tetris-like pieces which go together to make a cube.

Here is a picture of the pieces:

This is the program I wrote:

import numpy as np

orientations = map(np.matrix, [
    [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
    [[1, 0, 0], [0, 0, -1], [0, 1, 0]],
    [[1, 0, 0], [0, -1, 0], [0, 0, -1]],
    [[1, 0, 0], [0, 0, 1], [0, -1, 0]],
    [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
    [[0, 0, -1], [0, 1, 0], [1, 0, 0]],
    [[-1, 0, 0], [0, 1, 0], [0, 0, -1]],
    [[0, 0, 1], [0, 1, 0], [-1, 0, 0]],
    [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
    [[0, -1, 0], [1, 0, 0], [0, 0, 1]],
    [[-1, 0, 0], [0, -1, 0], [0, 0, 1]],
    [[0, 1, 0], [-1, 0, 0], [0, 0, 1]]])

piece_templates = {
    "red t/" : [(0,0,0), (1,0,0), (2,0,0), (3,0,0), (2,1,0)],
    "yellow tx" : [(1,0,0), (0,1,0), (1,1,0), (2,1,0), (1,0,1)],
    "yellow zz" : [(1,0,0), (0,1,0), (1,1,0), (1,0,1), (2,0,1)],
    "yellow t+" : [(1,0,0), (0,1,0), (1,1,0), (2,1,0), (1,1,1), (1,2,1)],
    "yellow Ll" : [(0,0,0), (1,0,0), (2,0,0), (0,0,1), (0,1,1)],
    "red T," : [(2,0,0), (0,1,0), (1,1,0), (2,1,0), (1,2,0), (2,0,1)],
    "red O'" : [(0,0,0), (1,0,0), (1,1,0), (0,1,0), (0,0,1)],
    "red L" : [(0,0,0), (1,0,0), (2,0,0), (0,1,0), (0,2,0)],
    "blue L'" : [(0,0,0), (1,0,0), (2,0,0), (0,1,0), (0,2,0), (1,0,1)],
    "blue z" : [(0,0,0), (1,0,0), (2,0,0), (2,1,0), (3,1,0)],
    "blue zl" : [(1,0,0), (2,0,0), (0,1,0), (1,1,0), (2,0,1)],
    "blue rl" : [(0,0,0), (1,0,0), (2,0,0), (0,1,0), (2,0,1), (2,0,2)]}


def transform(m, piece):
    def t(p):
        y = p*m
        return (y.item(0), y.item(1), y.item(2))
    return set(map(t, piece))

def translate(v, piece):
    def t(p):
        return (v[0]+p[0], v[1]+p[1], v[2]+p[2])
    return set(map(t, piece))

def extreme(foo, index, piece):
    def t(p):
        return p[index]
    return reduce(foo, map(t, piece))

def all_fits(pt):
    result = []
    for m in orientations:
        piece = transform(m, pt)
        for i in range(-extreme(min, 0, piece), 4-extreme(max, 0, piece)):
            for j in range(-extreme(min, 1, piece), 4-extreme(max, 1, piece)):
                for k in range(-extreme(min, 2, piece), 4-extreme(max, 2, piece)):
                     result.append(translate((i,j,k), piece))
    return result


initial_fitmap = {}
for name in piece_templates:
    initial_fitmap[name] = all_fits(piece_templates[name])


def solve(piecesplaced, fitmap):
    if len(fitmap)==0:
        i = 0
        print '\n'
        for name in initial_fitmap:
            print name, piecesplaced[i]
            i+=1
        print '\n'
        return

    newfitmap = {}

    dirt = set().union(*piecesplaced)
    def cull(s):
        return dirt.isdisjoint(s)

    firstname = fitmap.keys()[0]

    firstfit = filter(cull, fitmap[firstname])
    if len(firstfit) == 0:
        return

    for name in fitmap.keys()[1:]:
        newfit = filter(cull, fitmap[name])
        if len(newfit) == 0:
            return
        newfitmap[name] = newfit

    for piece in firstfit:
        solve(piecesplaced + [piece], newfitmap)


print "solving..."
solve([], initial_fitmap)
print "... done solving"

Haitao started solving the puzzle by hand using wits and elbow-grease. I secretly wrote the program and left it running over night. The next day, I saw a printed solution on my screen. Haitao still had the pieces at his desk. He finished the puzzle that morning. Then later I verified that the program's solution was correct, by assembling by hand according to the program's output.

So, who was the winner? Hard to call.

Why I Prefer C++ to Java

I realize The Internet doesn’t need another rant about which programming language is better, but I need to write one, apparently. Here is why I prefer C++ to Java in various categories.

Portability

If you ask a random person on the street which of C++ or Java is more portable, they’ll probably say “Java”. But it’s really not that simple.

Java is the language of choice for Android. Android is pretty ubiquitous these days. However there’s a big difference between an Android app and a Java applet. Your Android app probably won’t run on your PC, so when somebody says “Java is portable”, I don’t know what they’re talking about.

C++ needs to be compiled into processor-specific assembly, so you have to compile your C++ into as many libraries as architectures you support. SO WHAT? It’s not much more difficult to compile for one architecture as it is to compile for a few. Also, intel-based Android devices run ARM assembly. Most games made in Unity only have one ARM version of the binary in them, and nobody seems to be losing any sleep.

Garbage Collection

Another advantage of Java is garbage collection. Early iPhone apps had a lot of memory leaks because developers didn’t understand the objective-C memory allocation paradigm. Mobile OS’s tend (for some reason) to have an unclear concept of quitting an app. So, maybe it’s a good idea to have garbage collection on mobile.

Personally, I’ve never felt like garbage collection actually helped me very much. Yeah, memory leaks are elusive, but so are a lot of bugs. And garbage collection doesn’t actually make memory leaks impossible, it just makes them invisible. It replaces the thought process of “when would I like this object to be deallocated?” with “how is it that I’m retaining a reference to this object without realizing it?” Admittedly in Java this contemplation doesn’t happen as often, but it’s harder to deal with when it does.

On Android, each Java class is implemented in an underlying C++ class. Sometimes, finalize() gets used to deallocate that underlying C++ object. If there's ever a bug with this, it becomes a weird race-y bug that's nearly impossible to find. For example, the issue described here:

http://stackoverflow.com/questions/30879831/garbage-collection-invalidates-filedescriptor

It's hard to tell at first that a bug is connected to garbage collection because garbage collection passes are (hopefully) rare. I suppose this is not a fault with Java the language though. So, let's talk about language features.

Const

C and C++ have const. const can be used in a number of places in C++, but in particular, you can make a method const like this:

class MusicNote
{
    void draw() const;
    // ...
};

Here const forbids draw() from changing member variables of MusicNote and from calling non-const methods. A lot of bugs in software come from programmers not realizing all the states the objects can be in. It’s helpful to be able to look at a function’s prototype and tell whether it might be changing the state of that object. In Java you can use final to get some of the uses of const, but not this very important one. For instance, you can put final here:

final int NUM_STEREO_CHANNELS = 2;

and it does something similar to C++. But in this context:

final void close();

it forbids the function from being overridden.

I get the logic of that, and it’s a good idea to have a mechanism for forbidding a subclass from overriding a particular function. That way, a class can defend against subclasses accidentally sabotaging it. In C++, this is the role of virtual. Again, I prefer the C++ way, because the default behavior is the more defensive one.

People sometimes complain that C++ has too many exceptions to its own rules. “const is great except I can’t trust it because of mutable.” I understand this, but mutable has its place. When a const function caches intermediate results, for instance. Or a graphics function like this:

void display() const
{
   // ... graphics code... 
}

which changes the state of an object in a retain-mode graphics API. Java doesn't have mutable because it doesn't have const. Everybody can just change everything they can see. Which brings us to visibility.

Visibility

Java and C++ both have public, private and protected. The basic functionality in the two languages is the same, but the Devil is in the details.

Java has a fourth visibility mode “package”. A lot of people don't realize this at first, because there’s no “package” keyword, instead package visibility is attained by default if the visibility modifier is omitted. My first complaint about this is that it’s confusing for beginners.

I understand the need for something like package visibility, but I still prefer the C++ way which is to use friend. friend allows one class to permit another class access to private and protected members. Again, this is an area where proponents of Java would say C++ has too many exceptions to its own rules. So, why do I like this? Because sometimes the way classes interface with each other needs to be more intimate than the way the client of those classes interfaces with them. Maybe you have

class List
{
    //...
}

class Iterator
{
    //...
}

It’s easy to imagine List and Iterator needing access to members of each other that are invisible to the client. The Java way of doing this would be to put List and Iterator in the same package and use package visibility. But everything in the package can see an element with package visibility…. Maybe the package is company wide… Also, there's a Java convention where the directory structure of files reflects the package structure, so if I change my mind about package organization, then I have to move the files!

Also, C++ allows a class to declare a visibility modifier on its parent, i.e.

class MyIntList : private list<int>
{
    void push();
    void pop();
    // ...
};

This makes even the public and protected members of the parent list<int> accessible only from within MyIntList functions. That way, if MyIntList makes assumptions in its use of the parent, it can assure those assumptions won’t be sabotaged by an outside caller.

Pass-By-Reference / Operator Override

In Java, primitives are passed by copy, and objects are passed by reference. That’s fairly easy to get used to, and a lot of other languages are like that.

int A = 4;
int B = 3;
A = B;

The code above does the same thing In C++ and Java. A and B represent different words in memory, and assigning B to A copies the word represented by B into A’s location. They are still different, but identical in content.

String A = "apple";
String B = "banana";
A = B;

In this code, when interpreted as Java, A and B first reference different objects, but then, after the assignment, A and B reference the same object.

The same code in C++ (assuming a class called String) is actually more similar to the previous example with ints. A and B represent different spaces in memory, one of whose contents gets copied to the other.

I find the Java way slightly inconsistent, but Python, Ruby and JavaScript are also that way, so fine, I guess.

The place where I really miss the C++ scheme is in something like a mathematical vector class.

class Vector
{
    double x;
    double y;
    double z;
    // ...
}

Combined with operator override (another C++ feature Java doesn’t have) you can make code that looks like this

Vector midPoint(Vector A, Vector B)
{
    return 0.5 * (A + B);
}

The equivalent function in Java would look something like this:

public Vector midPoint(Vector A, Vector B)
{
    return A.add(B).multiply(0.5);
}

Objects pass by reference, so the Java midPoint has to call new twice. Am I the only one who finds that wasteful? Will the temporary Vector that Java makes when this function executes get deallocated at the end of the function? Or will it get deallocated at some arbitrary point in the future when garbage collection runs? I'm not sure.

I have more to say, but I have to get to work.