Perl and Bad Consciences

Five years ago, I took a perl class, and I'm still angry about it.

This was the last class I took in an official, letter-graded, collegiate context. I had finished my academic career three years prior. I signed up mainly so that a dear friend of mine wouldn't have to take the class alone. And I just want to say (in case she is reading this) none of this is her fault. I had every intention of doing it with good grace. It crossed my mind that I might learn something, or at least I might be able to construct an educated basis for my general distain for perl.

Around week 7, the instructor (to whom I will refer as Steve) introduced an extra credit programming contest where we could presumably win prizes by solving a challenge problem about something called a sum-product number.

I'm now going to explain the challenge problem, but you should understand: this is a trap. Don't try to do this problem. Remeber that scene in Portal 2 where GladOS thinks she can defeat Wheatley by giving him a paradox, but it doesn't work because Wheatley is too thick to appreciate it? Well, this isn't like that, because you are not Wheatley. You're intelligent and self-aware and care about math and computer science or you wouldn't have read this far. You'll get sucked in to this, and you won't be able to get out. I'm warning you.

A sum-product number (in base-10) is an integer that is equal to the sum of its digits times the product of its digits. 135 is a sum-product number because:

135 = (1 + 3 + 5) × (1 × 3 × 5)

There are only four sum-product numbers in base-10, and they are:

0, 1, 135, and 144

The proof that these are the only four (attributed to David Wilson) is computer-assisted. A formal argument can be made that there are no sum-product numbers past a certain point. A computer program verifies that up to that point, there are only the four listed above.

Computer-assisted proofs are cool, and programming contests are also cool, but they don't mix very well. Our assignment for the contest was to write a perl script which (as efficiently as possible) printed all the sum-product numbers from 1 to 10,000,000, one to a line, each with the text: " is a sum-product number!!"

This is the code I wrote to get started:

#!/usr/local/bin/perl

use integer;

my ($i, $t, $s, $p, $d);

for($i = 1; $i < 10000000; $i++) {
    $t = $i;
    $s = 0;
    $p = 1;

    do {
        $d = $t % 10;
        $p *= $d;
        ($d != 0) or next;
        $s += $d;
    } while($t/=10);


    if( ($p != 0) and $s * $p == $i ) {
        print $i, " is a sum-product number!!\n";
    }
}

Here's the issue. Normal programming contests ask for a program that takes input and computes something which depends on that input. A submitted program gets judged by running it with input or inputs which are kept secret. Because the input is kept secret from you, you must write a correct algorithm. Your logic cannot be wrong or you would risk getting the wrong answer and therefore losing the contest.

But the challenge from this class didn't take an input; it simply asked for the result of one particular query, so it just had one correct output, this:

1 is a sum-product number!!
135 is a sum-product number!!
144 is a sum-product number!!

Presumably, the fastest-executing entry wins, so one is tempted at first to enter this script:

#!/usr/local/bin/perl

print  <<'X';
1 is a sum-product number!!
135 is a sum-product number!!
144 is a sum-product number!!
X

...which prints the correct output without doing all that pesky, time-comsuming computation.

Obviously this is not how the contest is intended to be won, but why not? I guess because it goes against the spirit of the problem which is to replicate the work of the original computer-assisted proof.

So then, it seems like for an optimization to be valid, it would have to be accompanied by an argument that the work saved was non-essential to the computer-assited proof. For instance, if you can argue that none of the numbers in some large range have the sum-product property, then you can rewrite the loop to skip over that range. But what if your argument is wrong? Then you cull away a bunch of work based on your wrong logic, get the right answer and win the contest? That's not fair.

Thinking along these lines led me into a torpor. I had already gotten kinda sick of the class.

Today, I dug through some old emails and found the rant I wrote to the other students in my study group. Here it is, exactly as I wrote it except with the instructor's name replaced.

I pissed away the whole day ruminating about perl and struggling with the profound powerlessness I feel in the face of the pedagogical catastrophe that is this class. This class has a way of sending me into a maelstrom of second-guessing and self-doubt. I try to work on the homework, but I wind up pacing and obsessing, pouring over details and swishing thoughts around in my mind, trying to understand perl, or worse, trying to understand Steve's motivation. It starts out logical, but it winds up a cognitive slurry, and by the end, I'm sulking in the tub, pouring back over the day, trying to figure out when the transition to madness occurred. Have you looked at the extra credit assignment? May the devil take this goddamn piece of shit. If you do look, I promise, you'll discover something about yourself. You'll think you're solving a puzzle, but you're not, what you're doing is allowing the inception of thought-poison. Soon you'll be looking for loopholes in the rules, and wondering if the solution you've come up with is cheating or not. You'll contemplate doing it anyway, just to have a fighting chance against the idiot who didn't even think about whether it was cheating. That's not fun! Coding contests are supposed to be fun! They are, in fact, one of my favorite things. I love the art of creating cunning, corner-cutting code that does something brilliant from within a constrained framework. It's the white wine of computer engineering distilled to make cognac. It's the thrill of moving the world with Archimedes' lever. But instead of relishing the contest, I'm playing prisoner's dilemma and smudging my personal ethics. Up until now, I've found Steve's bad pedagogy tolerable. He designs each homework around calendar-related application instead of selecting an application that's well suited to the current topic. In my opinion, he alienates people on the discussion forum by answering every question he can by reference. He devotes an inappropriate portion of the text he writes to warning the students about how challenging the class is. It's defensive and wastes time. But this... this is psychopathy. He's taken away Christmas.

I got an A in the class.

Let's Talk About Closures

A closure is a function that returns another function. A lot of languages can have that, but I seem to encounter closures mostly in JavaScript. Over the past two months I have been making a lot of closures because of superstitions I've developed about needing them. After torturing myself writing closures in places I probably didn't have to, I decided to do some experiments to try to really understand the problem they solve. I'm writing this article as a way to record my results, so it's mostly for my own education. But also, I was trying to explain this stuff to my friends in the car, and I got it slightly wrong, so I wanted to clear that up.

First off, we're going to need to run some JavaScript snippets and see what happens, so we need some kind of logging. It so happens that I'll be running these experiments using the utility CodeRunner, which is a nice little app for running test scripts. However, CodeRunner doesn't route console.log to a place where you can see it. So, instead of using console.log, I'll use a line of jQuery to append text onto the HTML body.

function log(message)
{
    $("body").append($("<div>").text(message));
}

Now that we have a log function, we can run some tests! First, let's try something that definitely works the way we expect. We're going to count from 0 to 4 and log the numbers.

for( var i = 0; i < 5; i++ )
{
    log(i);
}

And on the screen displays a nice, pleasing:

0
1
2
3
4

Great. Now, without changing the behavior, let's add a layer of complexity by wrapping the log statement in a function, and then calling that function.

for( var i = 0; i < 5; i++ )
{
    function foo()
    {
        log(i);
    }
    
    foo();
}

Again, the output:

0
1
2
3
4

Groovy.

Now's when the trouble starts. Each iteration, we're going to push foo onto a list foos. Then we'll call each function in the list in a separate loop.

foos = [];

for( var i = 0; i < 5; i++ )
{
    function foo()
    {
        log(i);
    }
    
    foos.push(foo);
}

for( var j = 0; j < 5; j++ )
{
    foos[j]();
}

Now the output is:

5
5
5
5
5

HUH?

How could it print "5"? The loop never made it to 5!

I found this really confusing at first. I expected the output 0 through 4, like before. My logic was: because foo is declared inside the loop, each iteration makes a different foo that prints the value of i for that iteration. But that's not how it works. Instead, each function in foos actually does the exact same thing; it prints i. There's only one i, and at the time of the loop's exiting, i is 5, so it prints 5.

Side-note: I used j for the iterator in the second loop, if you use i, you get 0 through 4 again as the output, but that's a deceptive accident.

At first, I thought this might be a weird idiosyncrasy of JavaScript, but Python (my beloved Python) does essentially the same thing. This code...

foos = []
for i in range(0, 5):
    def foo():
        print i
    foos.append(foo)

for f in foos:
    f()

...prints out:

4
4
4
4
4

It's 4's, not 5's, because Python's looping conventions are different, but the principle is the same.

So, how do you make a list of 5 functions that print the numbers 0 through 4? Answer: you use a closure.

function make_number_logger(n)
{
    function foo()
    {
        log(n);
    }
    
    return foo;
}

for( var i = 0; i < 5; i++ )
{
    foos.push( make_number_logger(i) );
}

for( var j = 0; j < 5; j++ )
{
    foos[j]();
}

And now we're back to the output we're used to:

0
1
2
3
4

Everything is fine. Except... What? Why does that make a difference? Before, foo logged i at whatever value i was set to last. Now we have a function that logs n. Why doesn't it log n at whatever value n was set to last? What's the difference? Well, the difference is that n is the argument of make_number_logger and whenever that function gets called, a new, distinct n is created on a stack-frame. It's not the same n every time.

Normally, when a function returns, its stack-frame is disposed. But here, we're returning foo, and foo knows about n, so the language has to hold on to the stack-frame for that reference to still work. That's the key.

This is where I have to apologize to my friends for explaining this wrong. What makes this work is not some magic about a functions returning functions. To see what I mean, here's another couple experiments.

function make_number_logger()
{
    function foo()
    {
        log(n);
    }
    
    return foo;
}

var n = 9;
var f = make_number_logger();

n = 17;
f();

Output:

17

It doesn't print "9" because n is global, so the line n=17; supplants the 9 with a 17. It doesn't matter that foo was returned by make_number_logger.

Also, suppose we don't return a function from make_number_logger, instead we assign it to a global variable?

function make_number_logger(n)  
{
    function foo()
    {
        log(n);
    }

    g_foo = foo;
}

make_number_logger(9);

n = 17;
g_foo();

Output:

9

This time, n is an argument again, so inside make_number_logger, n is a new n, distinct from the global one that gets the assignment to 17. JavaScript holds on to it when it does g_foo = foo;.

So, my conclusion is that I've been writing a bunch of closures that I don't need to, and also getting the explanation wrong for a while. Hopefully, this article redeems me.

Irony

I have a theory.

The smarter you are, the worse code you write.

When I was taking Computer Science classes in college in the early 2000's, I was told that the landscape of development tools was ever-changing, and rather than try to predict what cutting-edge technologies would still be relevant in four years, the curriculum designers focused on teaching general techniques for managing the complexity of a software project.

That phrase stuck with me "managing the complexity"

When I started my first full-time software engineering job, very dispiriting things would happen to me. I would submit a change for review, and then sit there and watch as the reviewer unfolded all manner of subtleties I had overlooked about how my change was going to break a bunch of shit. In the end, the reviewer wound up effectively writing a lot of my code, slowly, via a series of editions driven by objections in review.

At that point in time, I had already been making software more than half of my life. I was a newbie on a team with industry professionals, but still, my contribution to the code seemed disproportionately low. Engineers that had twice as many years of experience as me were dominating the code base, writing hundreds of times more lines. I wound up quitting with the feeling that I couldn't cut it.

Since then I've had other jobs, but stuff like that never seems to go away. Here's another example: sometimes I'll be implementing a feature by adapting some existing code that does part of what I need. Another engineer at the company will assure me that the existing code "works fine", so I think the process of adapting it to my purpose will be straight-forward. But when I commence... I discover a bunch of details making the task an unexpected slog. Again, the obstacles seem to be in the sea of caveats external to the code I'm editing.

Maybe when other people encounter things like this they dismiss it as simply part of the learning process. I tend to see it as a huge problem. And I used to think the problem was with me. I used to think I just didn't have the cognitive capacity to see all the details. To some extent this is certainly true; I've learned a lot since I started, and I've gotten better at thinking about code. But over the years, stuff like this continues to happen, and I've formed a new theory. I now understand that engineers who play an authoritative role on a project aren't necessarily better engineers. They don't write better code, they just know the codebase better. They seem like better engineers because you have to keep asking them how the code works. But that's actually an indication of why they suck as engineers because they wrote a bunch of code you can't comprehend.

I'm not saying all tech leads got where they are by luck, I'm sure they're really smart, too. But there's a problem with being really smart; the smarter you are, the worse code you write. You see, smarter people are better at managing the complexity of software mentally, and they don't need the code to be organized and readable.

Take function length for instance. There are two advantages to writing functions of a manageable size. The first is you can give those functions descriptive names, which makes the code more comprehensible. The second advantage is subtler. It's a well known fact that a great proportion of bugs arise from the programmer failing to realize all the states code can execute in. Code with shorter functions tends to be written in a more functional style which means less state, which means less bugs.

But the thing is: only stupid people need that. Smart people don't mind long functions because they can mentally manage all the possible variable states, nested control structures and crap.

As a stupid person, you'll probably get shut down if you try to factor a long function into shorter ones, because you'll probably break it. It is very difficult to reorganize code without changing the behavior in some way. Then some smart person will tell you to put it back the way it was because it "worked" before.

Fellow stupid people, we need to rise above that. It is possible to be smart and organized. Stand up for organization. If you made an organizational change and broke something, argue that the code was unmaintainable in the first place. I'm pulling for you. We're all in this together.

Comments

Let's talk about comments.

There are a lot of varying philosophies about comments and when and how often to write them. Some people think comments are good, some people think comments don't belong in code at all. I don't think I'm telling anybody anything they don't already know when I say: the right approach is somewhere in the middle.

But that's not fun to write about. I present to you:

Top 10 Problematic Comments

I started writing this article with a serious tone. I have now abandoned that. Most comments are reprehensible. These comments are the worst. They'll rot your code, infect your repository, and keep your company from competing in the marketplace. If you see comments like this in your code, delete them immediately, then make a pull request. And then cite this article when the reviewer complains.

10. The Rotten Comment

This is a comment that persists despite its original purpose having been lost. The code around it changes, making it obsolete, sometimes these comments are obvious.

// Run the status check 5 times just to be safe:
for(int i = 0; i < 16; i++)
    runStatusCheck();

And sometimes it's not as obvious.

// It is very important to call these functions in this order:
initResources();
initNetworkComm();

Is it? Maybe that used to be important, but then somebody repaired initResources() and initNetworkComm() so that they no longer irrationally depend on each other. Who knows? Only the code knows.

9. The Commented-Out Old Method

/*
// We don't do it this way any more
if( argv.size() < 3 )
{
    LOG( "Not enough arguments given.  See docs." );
    exit(0);
}
*/
checkArgumentsFailGracefully(argv);

"We don't do it this way anymore" Thanks, asshole. What kind of weird cowardice is this? Just delete it, revision control will keep track of the way you used to do things.

8. The Code-Parroting Comment

Every so often I see code like this:

// This function initializes the AES event handler
void initAESEventHandler()
{
    // Allocates AES handler with a size of 20
    mHndlr = allocateAESHandler(20);

    // Set the ticks per second of the handler to 60
    setAESHandlerParameter( mHndlr, AES_TICKS_PER_SECOND, 60);

    // Set the sample-rate of the handler to 44k.
    setAESHandlerParameter( mHndlr, AES_SAMPLE_RATE, 44100);
}

Just imagine for a second, that you don't know what AES stands for. Obviously these comments don't help you. In fact, they don't help anybody, and yet, I see stuff like this all the time: incomprehensible code surrounded by equally incomprehensible comments.

I can only imagine some jerk, in a fit of mentorship, said

At this company, we comment our code

And a young, impressionable engineer thought they needed to write all this crap to fit in. Thanks for nothing.

The whole idea of commenting every line dates back to a time when languages were inherently hard to read. We don't live in that world now, languages are better, they've evolved to a nice middle ground where they are readable by computers and humans. Get with the times.

7. The Naively Optimistic Comment

This is a comment that says that the code works, when it actually does not. Suppose a server request that is meant to get sent after a 5 second delay, but instead it's getting sent immediately. You trace the bug to this code:

// After 5 seconds...
dispatch_after(
    dispatch_time(DISPATCH_TIME_NOW, 5e6),
    dispatch_get_main_queue(), ^(void)
{
    // ... send a request to the server.
    [WebClient sendRequest];
});

Now, unless you know that dispatch_time takes a duration measured in nanoseconds, you probably will be very distracted by a comment that assumes the code does what it's supposed to.

6. The Standard Header

I open a file and the first thing I see, at the top of the file is this horseshit:

/*****************************************************/
/*****************************************************/

/*** Everhard Corporation ****************************/
/*** Standard Source Header Comment ******************/
/*** This file is called utils.c *********************/
/*** Written By Jeff Hanson **************************/
/*** September 20, 2009 ******************************/

/*****************************************************/
/*****************************************************/

Who the fuck is Jeff Hanson? Does he even work here anymore?

We all use revision control now. What is the point of putting creator and created-at date at the top of a file? All that information and more is recorded in revision control.

Sometimes these goddamn things take up a window's worth of lines, and they're completely irrelevant. Seriously. What bug do they fix?

If there's a name at the top of the file, I like to immediately email that person whenever something even-a-little-bit goes wrong with the code in that file. They could be a tech lead on another team, at another company, I don't care. "It says here, you wrote this code..."

If you are a potential employer reading my blog to learn something about my personality as an engineer before hiring me on, you should know that I'm going to write a script to remove all of these from your codebase.

5. The File Name at the Top of the File

At the top of a file called ZucchiniSlicer.cpp you'll see:

//////////////////////////////
// File: ZucchiniSlicer.cpp //
//////////////////////////////

And then, later

class ZucchiniSlicer {
    // ...
};

This is just stupid. It's so stupid that it deserves special mention even though it's usually part of a bigger problematic comment: The Standard Header.

I've never seen a text editor in my life that didn't display the name of the file prominently at the top of the window. We don't need a comment at the top of the file. Who does that help? Also, every so often somebody will make a new file by duplicating an existing file, forget to change the comment, and then it's wrong.

4. The Content-Free Warning

Maybe you've got some Android Java code designed to detect if a particular class is present in your game engine that you're using.

// DO NOT remove this temporary variable.
String className = "com.engine.Object";
Class<?> theClass = Class.forName(className);

What! Why?!?!! It's only used in that one place. I want so badly to remove it, and you didn't give a reason. How about:

// Putting the class name in a temporary variable
// works around a warning in the obfuscator.
String className = "com.engine.Object";
Class<?> theClass = Class.forName(className);

At least maybe I can figure out what that's about.

3. The TODO

def treat_files(pathlist):
    for path in pathlist:
        f = open(path, 'r')
        s = f.read()
        f.close()
        if not s.endswith('\n'):
            f = open(path, 'w')
            f.write(s + '\n') ## TODO: use append-mode instead.
            f.close()

Fuck you. I've never gone back and done a TODO in my life, and neither have you, you prick. Stop showing me all the places you failed to follow through on some fleeting idea for an optimization.

2. The Code Snippet Label

Okay, this one is actually important, because it's an anti-pattern. A function gets long and complex, and instead of dividing it further into functions, you label each block inside with a comment that says what that block does.

void Game::initEverything()
{
    // First, set up the graphics engine
    glBlabityBlah();
    glSomeMore();
    glEtCetera();

    // Then, prepare the audio stuff.
    alCreateWhatever();
    alMakeChannelSomething();
    alConnectPieces();

    // Then put up the splashscreen for the game and
    // play a sound effect.
    glCreateSomeGeometry();
    glDrawStuff();
    alSproing();
}

FOR CHRISSAKE!

void Game::initEverything()
{
    initGraphics();
    initSound();
    initFirstScene();
}

void Game::initGraphics()
{
    glBlabityBlah();
    glSomeMore();
    glEtCetera();
}

void Game::initSound()
{
    alCreateWhatever();
    alMakeChannelSomething();
    alConnectPieces();
}

void Game::initFirstScene()
{
    glCreateSomeGeometry();
    glDrawStuff();
    alSproing();
}

Which brings me to the final category of problematic comments.

1. Any Comment Explaining How Code Works

If you find yourself writing comments that explain the code around them, it probably means your code doesn't connect well enough to your mental model. Instead of trying to bridge the gap with words, try reorganizing your code so that it reflects how you naturally think.

I'm not saying all comments are bad. There are rare circumstances where a comment is necessary, like the Android Java example above. But the distinction there is: the code does something weird to work around a systemic issue. Mostly, I think you'll find you can convey more meaning by organizing the code and choosing sensical variable and function names.

I'm for documenting at the public function level. If other people's code calls a function you write, write a description of what that function does, not how it works. That's what I think.

Redundant Code Gets a Bad Rap (Part 2)

Since you were a young child, people have probably been telling you that redundant code is a terrible scourge and should be eliminated. But it’s not that simple. Sometimes factoring redundant code introduces other problems by stealth.

Once again it’s time for a parable. Eddie is an engineer at a hypothetical company. While perusing some python scripts in the company code base...

copy_files_and_zip.py
upload_zipped_package.py
upload_all_things.py

… Eddie discovers that copy_files_and_zip.py and upload_zipped_package.py have identical code blocks that zip a collection of files. Eddie dutifully factors that code into a new function:

def zip_stuff():
    blah blah blah
    blah zip blah
    blah blah blah

Also, upload_zipped_package.py and upload_all_things.py have functionality in common, so Eddie factors it into this function:

def upload_zipfile():
    blah blah blah
    blah upload blah
    blah blah blah

He puts the common functions in a new file called utilities.py which the other files can import.

def zip_stuff():
    blah blah blah
    blah zip blah
    blah blah blah

def upload_zipfile():
    blah blah blah
    blah upload blah
    blah blah blah

At first, this doesn’t work, and then Eddie realizes that zip_stuff() and upload_zipfile() use modules, so he needs to import those modules in utilities.py.

import zipfile
import boto

def zip_stuff():
    blah blah blah
    blah zip blah
    blah blah blah

def upload_zipfile():
    blah blah blah
    blah upload blah
    blah blah blah

Great.

Except now all three scripts import utilities which in turn imports boto, a third-party module that doesn’t come with standard Python installs.

And guess what, not everybody at the company uses all three of these scripts on a regular basis. Franka, for instance, only ever runs copy_files_and_zip.py. She doesn’t have boto on her computer. So, several days later, when she grabs the latest version to get some other feature, it doesn't work.

When Franka asserts that the script no longer works, there's initially a lot of confusion about why and a lot of back and fourth about how "it works on MY computer" et cetera. Finally, Eddie realizes what has happened, and he fixes it... by having Franka install boto.

Franka can run the script again, but copy_files_and_zip.py still imports boto without actually using it. Fixing this inconsistency would mean separating the two utility functions into files —which seems to spread the code unnecessary thin— or doing this…

def upload_zipfile():
    import boto
    blah blah blah
    blah upload blah
    blah blah blah

…which is kinda weird. I mean, imports go at the top of the file, right?

So, Eddie leaves it the way it is, with a dependence in the code that doesn’t reflect reality.

Now, maybe this is no big deal. Everybody needs to install boto to use the scripts, now. So what? you have to install things to run the stuff you need all the time.

Well, what if it were something other than boto? What if it were a library that didn’t exist on Windows, and then nobody with Windows could run any of the scripts even if all they do is copy and zip files?!?!

And who, by the way, has it helped to factor these functions into a common file, exactly? It didn’t fix any bugs, it didn’t add any features. It made the total volume of code a tiny bit smaller.

Maybe it’s time we started telling our children that eliminating redundancy is fine if you’re really careful.