Monday 30 April 2012

Long Identifiers Make Code Unreadable


I recently wrote an article for Code Project about common fallacies of good code.  (This was intended for C language but was generally applicable to any language.)  The first fallacy I covered was that of using very long identifiers.  I thought this debate had been settled a long time ago and it was agreed that using identifiers that are too long makes code hard to read.  I only added it to the article because every now and then I see some code that uses really long variable names, so I thought there must still be a few people around who still think it is a good idea.

So I was very surprised to get quite a bit of feedback objecting to this "fallacy".  It seems that the idea (and the idea that code can be self-describing) is having a resurgence.  For example, I have just been reading the book "Clean Code" by Robert C. Martin which also seemed to say that really long variable names are a good idea.  So I think I need to explain this in more detail for those who are still unconvinced.

Initial Problem

I guess the problem all began because K&R and other books used example code with very short names, often single characters. This is not necessarily a bad style in that context since these examples are small and often demonstrate abstract concepts.

The real problem started when many C programmers emulated this style everywhere, even in much larger and more practical software and for variables with much greater scope.  (In these situations longer and more descriptive variable names should, of course, be used as they are more easily seen, remembered and differentiated.)

The reaction to this problem of use of poor variable names was to mandate (eg, in coding standards) that variable names should fully describe the purpose of the variable.  This was an overreaction.

Self Describing Code

Another contributing factor was that many authors have promoted the idea that code should be self-describing, rendering the use of comments unnecessary.  First it is argued that well-written code should not need any explanation. However, the main justification was really that comments are sometimes out of date or just plain wrong.

The problems with the idea of self-describing code are many.  First, it is much easier to describe many things in plain English than to contort the code in an effort to make things clearer.  Trying to make the code self-describing actually can make it harder to read, at least for the casual reader.

I will say that comments on every line that simply reflect what the code is doing are worse than useless, but comments at the start of blocks of code or for every function can make it much easier to quickly understand what is going on.  In a large program occasional explanatory comments can be an absolute godsend, allowing you to quickly hone in on what you are looking for.

Also, just because comments are often incorrect does not mean that comments are inherently bad.  Some comments are useless and can be removed but sometimes the comments should simply be improved not removed.  I often think that the idea of self-describing code was invented by someone who had just read too many bad comments.

Finally, the truly unfortunate thing is that many programmers used the idea as justification for not adding any comments to code.  I have read a lot of code in my time and invariably the worst code has no comments and the best has several (or at least a few).  Rightly or not, when I see code with no comments I immediately assume it is of poor quality.

Clean Code

I have been reading this book by Robert (Uncle Bob) Martin and have found many useful ideas in it.  The book actually has a whole chapter devoted to naming of identifiers (Chapter 2: Meaningful Names) which I find ridiculous in itself since most of the ideas presented are bleedingly obvious and really not worth anyone reading let alone putting to paper.  However, what I really did not like was his ideas on the length of identifiers.

Actually the book seems to contradict itself.  For example, on page 18 it says that an identifier should describe "why it exists, what it does, and how it is used", but on page 30 "Shorter names are generally better than longer ones, so long as they are clear."  I guess these statements are too vague to be contradictory but they are at least confusing.  Perhaps, his code example will clarify... 

    int  d;
    int elapsedTimeInDays;

According to Uncle Bob the first line is bad but the second is good.  I agree with that, but what about the obvious:

    int days;

I actually don't mind the name "elapsedTimeInDays" (though it is close to exceeding my rule of thumb of a maximum of about 15 characters), but that would depend on how it is used.   If it is used many times within a few lines of code a shorter name (like "days") will make the code much easier to scan.

But what I really don't like is:

    "If a name requires a comment, then the name does not reveal its intent." - page 18
    "A long descriptive name is better than a long descriptive comment."  - page 39

This brings us to my main point that it is generally best to use a shorter name and a comment when declaring a variable.

Long Names vs Comments

Another idea of Uncle Bob's is that code should read like a good novel, rather than some technical document.  I agree completely.  Taking the analogy even further: declaring a variable in a program is like introducing a new character in a novel.  In a novel the author might take a paragraph or two to introduce a new character, but thenceforth he or she would be referred to by first or last name.  In fact the main character of a story would be referred to as "he" or "she" most of the time.  The point is that each time a character is mentioned, the author does not fully describe that character again or even use their full name.  (Of course, characters should have names that are memorable and not similar to the names of other characters in the story to avoid confusion.)

Similarly, in a program a variable is introduced when it is declared.  A comment should appear at this point that fully describes the purpose of the variable and how it is used.  Thereafter you need to be able to refer to the variable by a name that is meaningful enough that it is easily remembered what it is for, but it must not be too long that it makes reading the code tedious.  Contrary to what Uncle Bob would have us believe, the name should not try to include every interesting thing about the variable.

Compiler Limits

Originally C compilers typically only supported identifiers that were different in their first eight characters (and some linkers limited the length of external variables to 6 characters).  When the C standard lifted this limit to 63 characters many programmers took this as meaning they should use much longer names.  However, the real reason the length was increased was to alleviate problems with machine generated code.

DRY

A final point is that having lots of information in an identifier contravenes a fundamental principle of good software: DRY (don't repeat yourself).  Every time you use the long, overly descriptive identifier you are repeating yourself.  (Even Uncle Bob mentions the DRY principle.)

Conclusion

I firmly stand by my original recommendations in the CodeProject article.  When you try to put too much information into a variable name it makes the code hard to read especially if that variable is used often.  It is far better to put all that information in one place - in a comment where the variable is declared.