When perl is too fast

The idea for this post has been brewing for a few months, but I was inspired by this post from David MacIver to get on it.

I like Perl. From a trendiness standpoint this falls somewhere between admitting that I like peanut-butter-and-pickle sandwiches and admitting that I like clubbing baby seals. I understand and can appreciate the common complaints: the language is too big, its syntax is baroque, it is write-only, TMTOWTDI is a bug rather than a feature. But I want to talk about something else that has been irking me.

I am a code quality freak. I like to write and work on good, readable, maintainable code. When I find bad code, I fix it. And I don't believe in blaming my tools, so I don't rail against Perl syntax -- I just write the best, clearest, simplest code I can.

I am also a speed freak. Micro-optimizing, benchmarking, profiling, shaving the milliseconds off things that already run fast, these are some of my favorite things to do. And I am more often surprised by how fast perl is than by how slow it can be.

Most of the time, these two inner freaks aren't at odds, and I'm able to do my job (make a moderately-sized, heavily-loaded piece of Perl software run really bloody fast) without compromising my quality standards. But several times recently, I have bumped into problems where Perl has pushed me to seriously compromise on code quality in order to get the performance I needed. The problem isn't that perl is slow. Not at all! On the contrary, some of the bits of perl are too damn fast relative to other approaches -- so fast that performance critical Perl code has effectively no choice. And in some cases they are the worst bits you can imagine, from the point of view of writing clean, readable code.

Let me jump right away into an example. Consider a trivial perl function that takes a single scalar argument and returns its square. I'd bet that anyone with any imperative programming experience can read it (though they might wonder about the 'shift'):

sub square {
    my $n = shift;
    return $n * $n;

There is nothing wrong with this code, and in any sane language the only way to improve its performance would be to inline it. But in Perl you can make it roughly 25% faster by writing it thus:

sub square {
    return $_[0] * $_[0];

The trick is using the scalar directly off the function's parameter stack (no copy is made). The code is brutally bad (and it gets way, way worse if there are multiple arguments or the function is more than 2 lines long) but from a performance standpoint it is "best practice" for critical functions in Perl. And that is actually a well enough known optimization that some coders apply it automatically, and it is moderately common in CPAN code.

Knuth's "Premature optimization is the root of all evil" is true in any language... but Perl isn't helping here.

Onto a more complex example. The codebase I'm working in is absolutely littered with blocks like this:

my $stuff = compute_50_megs_of_stuff();
if ($debug_flags{'stuff'}) {
    print Data::Dumper::Dumper($stuff);

There are a lot of paths to improving this code, but we really wanted a couple of "meta" features -- a flag to enable all other flags, and a flag to automatically timestamp every debugging print -- which pointed to getting all of the debug prints to go through a single point. Perl has a simple mechanism for adding new syntax, so I decided what I wanted was that the above code should be rewritten as:

my $stuff = compute_50_megs_of_stuff();
debug_stuff { Data::Dumper::Dumper($stuff) }

(Aside: not debug_stuff(Data::Dumper::Dumper($stuff)). That would generate a 50-megabyte string representation of $stuff, which would then be discarded 99.9999% of the time (i.e. when not debugging)).

So debug_stuff would look like this:

sub debug_stuff (&) {
    if ($debug_flags{'stuff'} or $debug_flags{'all'}) {
        my $string_generator = shift;
        my $eol = "\n";
        if ($debug_flags{'timestamp'}) {
            $eol = "[[[" . time() . "]]]\n";
        print $string_generator->() . $eol;

I determined through experimentation that we could afford the overhead of a function call for each debugging statement (there are a *lot* of them, some in very tight loops). Even that wasn't an easy decision, which is sadly telling. Function call overhead is the reason I keep bumping into "mega methods": where all possible inlining has been done, at the expense of all maintainability and reusability of the code...

I digress. Once implemented, system performance was unacceptable, and I had to scramble to figure out why. It turned out that it was not because of the function overhead, but because of the second-class syntax extension. Specifically, Perl syntax extension is sugar for higher-order functions with anonymous function arguments. These anonymous functions may be closures, which (it turns out) impose significant overhead. In a trivial example, an inline 'if' is almost 4x faster than an extension if the block isn't a closure (that's the function call overhead) but over 25x faster if the block is a closure. Again: in performance-critical code, where you're scrambling for every millisecond, you "can't" use domain-specific syntax (or any higher-order abstraction which relies on closures, for that matter) which is really unfortunate from the standpoint of quality.

I would rather see a slower perl, if that's what it took for me to get away with writing higher quality code. If extension syntax had the same performance characteristics as built-in syntax, I would have been able to get away with that. If creating and passing anonymous subroutines were the same regardless of whether they were closures, I'd be able to write higher-order code whenever appropriate...

Of course, I probably wouldn't be working in Perl at all. It wouldn't be fast enough.