2010-08-26

When to Forget

It is well-known that we programmers fill our brains with a lot of complex details while we're working on a problem in code. Some time later, due to our sadly finite brains, we go through the process of forgetting most of these details. Paul Graham, in his essay "Holding a Program in One's Head" summarizes this process nicely: "You can start to treat parts as black boxes once you feel confident you've fully explored them."

This forgetting is important! Software is complicated, and trying to hold all of the details of even a modest body of code will easily outstrip any programmer's mental capacity. We must forget things! But when? When is it safe to forget the details? When is it reasonable to feel confident you've fully explored them?

In my career I've worked with programmers who coped with this problem in various ways. Some were strictly FIFO (with various queue sizes, nudge-nudge wink-wink say no more). Some seemed to forget as soon as they checked the code in, others after it was merged to trunk, or tagged as a release candidate, or passed testing.

I think this is an important question to think about. When a defect is found (or a change is requested -- I'll treat these as the same from now on), a programmer with the implementation details in mind will be able to react faster and with more confidence than one who has to re-learn them. It's something like the difference between reading a page which is in RAM vs. reading a page which has been swapped out to disk: a couple orders of magnitude more work.

I'll go ahead and be opinionated here: the right time to start forgetting about code is after it's in production. There is a qualitative difference between the production release and everything that comes before that point, which is that nothing else actually matters. The real-world behavior of the software is the software, from the point of view of everyone who isn't a programmer. Nobody else cares that it worked in your sandbox!

Then, too, there's this rough graph I completely made up:


Briefly: defects in code are discovered throughout the software development process. But shortly after release (for some fairly variable definition of "shortly") there is a very sharp drop-off in defects found with new code. Until that sharp drop-off, a good programmer should have all of the pertinent details in mind, so that coming up with a fix doesn't involve a lot of head-scratching "what, wait, how does this work again?"

It's worth talking about that hand-wavey "shortly" for a sec. I've spent my entire career writing software which deployed to internal servers under crushing load; for me, shortly has always been less than 24 hours. I know that other types of software and other types of deployments have very different values of "shortly". I don't think that matters much -- the important point is that long-standing issues are vastly, vastly less common than ones which are discovered when the software first leaves the nest.

Flipping this whole thought around: you've probably seen a graph like this, showing how the cost to fix a defect rises drastically the longer the defect resides in the system:


This is typically (and reasonably) used to argue for better and earlier testing. But I've got a hunch, which I can only substantiate with my own limited experience: this graph is a lot flatter and less frightening in organizations which do rapid or continuous deployments. Partly because such organizations must (by definition) have scrubbed most of the time and effort from the deployment process. But also because the programmers don't have to re-learn all of the details of code written weeks or months (or years, yagh!) earlier.

Yes, in a perfect universe, we'd have all of the details of our software in mind forever. And learning the intricacies of other people's code would be easy. And defects would always be caught in testing and never in production. Here in reality: strive to keep the details in mind until after the code is released.

And if that's hard or impossible to do with your code, ask yourself: why? In my next post, I will go over a few specific bad practices I've encountered that work against this effort, and a couple of concrete things that can help.