2007-10-11

Code Review: Constants.pm

(cohesion, scope, laziness)



The Code

Today I took home a chunk of code to review. It's a chunk of code which I've run across, in one form or another, for 10 years now, and it drives me around the bend. In Perl: Constants.pm. In Java: Constants.java. In C/C++: constants.h.

Let's have a look at the perl module KitchenSink/Constants.pm:
$ ls -ln Constants.pm
-rw-r--r--   1 501  501  41387 Sep 17 2007 Constants.pm
First: holy Larry Wall, that's a big module! What the heck is going on in there? There can't really be 40k worth of constants in this project, can there? Yes, apparently there can. Okay, in keeping with my study strategy, how can I describe the problems with this module in precise terms, and how can I improve it?

A peek inside the file shows more or less what you'd expect:

Constants.pm
...
our @EXPORT = qw(
DEBUGLOG_LEVEL

CC_STATUS_SUCCESS
CC_STATUS_CCPROBLEM

RET_SUCCESS
RET_API_ERROR
RET_INTERNAL_ERROR
...
);

use constant DEBUGLOG_LEVEL => 2;

use constant CC_STATUS_SUCCESS => 0;
use constant CC_STATUS_CCPROBLEM => -1;

use constant RET_SUCCESS => 0;
use constant RET_API_ERROR => 1000;
use constant RET_INTERNAL_ERROR => 2000;
...

And the consumers of these values:

Log.pm
...
sub log {
   my ($level, $message) = @_;
   if ($level >= DEBUGLOG_LEVEL) {
       print LOG $message;
   }
}
...

CreditCard.pm
...
   if ($vrc == CC_STATUS_SUCCESS) {
       $total_charged += $amount;
   } elsif ($vrc == CC_STATUS_CCPROBLEM) {
       $total_failed += $amount;
   }
...

hundreds of files throughout the project
...
$retval = someServiceMethod();
die KitchenSink::Error::describeRet($retval)
    unless $retval == RET_SUCCESS;
...

Way back at the dawn of time, when there were only a few thousand lines of code and a dozen or so constants, someone lumped them all together. It was a bad idea from the beginning, and nobody ever questioned this organization, so now there are a half million lines of code and almost a thousand constants, and it's a great stinky mess.

Let's look at cohesion -- how closely related are the things in this package? Highly cohesive modules are desirable. Well... everything in here is constant, so they're all the same, so the cohesion is very high! This counter-argument is absolutely true, but the code is still making me want to gouge out my eyeballs, so what's going on?

It turns out there are lots of kinds of cohesion. What we have here is a form of logical cohesion -- grouping into some logical "category". Check out this wikipedia page for a reasonable breakdown of different types of cohesion. In short, though, logical cohesion is among the worst kinds. It is all the more dangerous because it's possible to defend it as strongly cohesive.

How Can I Improve This Code?

As long as there's a file called Constants.pm, I know there's some poorly organized code. Consider the fictional modules VariablesBeginningWithG.pm and ArraysCreatedByDlowe.pm: You know just by looking at the names of these modules that the code inside them is poorly organized. So since this module's very name implies a poor form of cohesion, my goal will be to completely eliminate it.

In looking at the file (all 40k of it!) I found three major categories of values. The first are things which should really be runtime options -- characterized by DEBUGLOG_LEVEL, in the above sample. These sneak in when we're deep in coding mode and don't want to switch gears, often with a # TODO: move to configuration file comment. With compiled languages, they generally actually to get moved to configuration (or we get busted trying to explain how to enable debug logging in production). With Perl, though, this seems to be just tolerable enough to end up in released software. This category is best dealt with by moving out to configuration in some form. In this particular case, it's very very easy, because the value is actually only used from one place:

Log.pm
...
use constant DEBUGLOG_LEVEL =>
    Config->get("debuglog_level");
...

Next, I found component-specific constants like CC_STATUS_*. These ended up in Constants.pm because some well intentioned person was following the "When in Rome" rule: "looks like all the project constants are in one place; I'd better do the same." Consistency may be desirable, but in this case it is seriously misguided. For absolutely zero gain, this adds an inter-module dependency, namespace pollution, and a global constant. Again, the fix is trivial, because the values of these constants are only used from one module:

CreditCard.pm
...
use constant CC_STATUS_SUCCESS => 0;
use constant CC_STATUS_CCPROBLEM => -1;
...
   if ($vrc == CC_STATUS_SUCCESS) {
       $total_charged += $amount;
   } elsif ($vrc == CC_STATUS_CCPROBLEM) {
       $total_failed += $amount;
   }
...

While on the subject: for an excellent (if perhaps somewhat long-winded) argument on the value of keeping scope as small as possible, I recommend section 10.4 of Steve McConnell's "Code Complete 2". It boils down to intellectual manageability: it is easier to understand code when all of the pieces are close at hand.

Also note that this is an improvement regardless of how well- or poorly-organized the destination module might be. Even if CreditCard.pm is total garbage, both it and Constants.pm are simplified and improved by moving the constants closer to where they're used.

Finally, I found constants which are part of some API -- here the RET_* constants are my example, which are used all over the code to standardize on return values from service-layer functions. Their scope truly is global, in this project. Here is where we need to improve the cohesion. Referring back to that wikipedia article, I would suggest that for the RET_* constants, we can achieve communicational cohesion by considering the set of constants as a "datatype" (in quotes because this is Perl, after all...). In other words, we will group together these constants with all functions which operate on them: currently in the KitchenSink::Error module.

While I'm moving to the new module, I am not going to carry over the symbol exporting code. Without getting into the memory bloat problem (which can be really serious), symbol importing and exporting is the wrong kind of laziness: the kind that makes code easier to write at the expense of maintainability, as it obscures (to the human reader of code) the origins of symbols. And since users of these constants will have to namespace qualify them, I did away with the RET_ prefix. So here's my addition to the KitchenSink::Error module:

KitchenSink/Error.pm
...
## "Enumeration" of all of the possible return values
## from the service layer.
use constant {
    SUCCESS => 0,
    API_ERROR => 1000,
    INTERNAL_ERROR => 2000,
};
...

Consumers of these values are now:

hundreds of files
...
$retval = someServiceMethod();
die KitchenSink::Error::describeRet($retval)
    unless $retval == KitchenSink::Error::SUCCESS;
...

It'd be possible, too, to make KitchenSink::Error into a "class" (in quotes because this is Perl, after all...), where the consumers could become:

hundreds of files
...
$retval = someServiceMethod();
die $retval->describe()
    unless $retval->isSuccess();
...

Some might consider this a major improvement; I personally do not, at least in Perl (because it's so weakly typed: the maintainer cannot find out what to expect to find in $retval without looking at the *implementation* of someServiceMethod()). This is, however, basically a matter of taste; one could certainly argue for the class-based approach.

There are also instances (not shown) in the Constants.pm module of constants similar to the RET_* constants, in that they are intended basically as enumeration data types, but where there's no obvious existing module with methods that work on the data type. In these cases, a new module gets created, which just defines (but doesn't export) the enumeration. This is a degenerate case of communicational cohesion: all the functions which operate strictly on the datatype are in one place -- there just aren't any such functions.

If you're balking at the creation of such tiny modules, think of it from the perspective of the maintainer of the software. When something about this datatype breaks, would they rather see it in a module by itself, or mixed in with a bunch of unrelated stuff? Which would make it easier to focus on and comprehend the problem?

In the end, I wasn't able to justify the continued existence of the KitchenSink::Constants module -- mission accomplished. My eyeballs can remain un-gouged-out for a little while longer.