Sunday, 13 October 2013

Book Review: Clean Code

Over a year ago I was given a copy of the book Clean Code by Robert (Uncle Bob) Martin. There are many good things in this book. I guess the highest compliment I can pay is to note that I have changed my coding practices as a result of reading it.

Actually all the developers where I work were given a copy of this book. In retrospect this is odd as all the example code is in Java and we do our coding in C, C++ and C#. (We probably got a free pallet of the books as we had Object Mentor come in and do an audit on our development practices a few years ago.)


Anyway, I think everyone here who read the book got something out of it (though everybody I talked to said they skipped the Java examples).

The Cover

Of course, you should not judge a book by it's cover but I found a few things about the cover misleading. (I am not sure that anything is insinuated by the picture of the M104 galaxy on the cover which is both beautiful and distant and home to one of the largest known black holes!)

First the name on the cover says Robert C. Martin so the implication is that he wrote it. It's not until you start reading some later chapters that you notice that some of them say "By ..." or "With ...". Apparently Uncle Bob did not write all of it.

I generally avoid reading anthologies, as I prefer a book to be written by one author (or group of cooperating authors), rather than separate loosely related offerings from different people. This makes for a more concise, consistent and generally holistic text; whereas an anthology is invariably disjointed, contradictory and often repetitive.

Admittedly, this book is not as bad as some anthologies, since most of the text seems to have been written or at least edited by Uncle Bob, but there are examples of repetition and contradiction which I mention later.


My second complaint is that there is a long blurb on the back cover but absolutely no mention is made of Java. The perception I obtained by reading the blurb and the Introduction (which also does not mention Java) is that the book is suitable for all programmers, but that is not true at least for some chapters.

Some parts and even whole chapters are almost completely Java-centric (eg Chapters 11 and 13). Not only are the examples in Java but often the text gives advice specifically aimed at Java programmers. This is another problem with using different authors as the chapters by Uncle Bob are aimed at all programmers (but still with Java examples) but some of the other chapters are aimed squarely at Java developers.

Some code examples can be followed without knowledge of Java, but many require a fairly deep understanding. I did some Java programming about 15 years ago but I still could not understand most of them. I would prefer examples in another language or at least a better explanation of the bits that only an hardened Java programmer would understand.


My final complaint about the cover is the inclusion of the word Agile in the title. (The full title is: Clean Code - A Handbook of Agile Software Development.) The publisher probably insisted on this title since Agile is the flavour of the decade. (The previous decade book titles had to include Object Oriented.)

Admittedly, there is quite a bit of content specific to Agile Methodologies, but most of the book is not about agile techniques. In fact many of the ideas far predate agile.

Good Things

This book is full of good advice and ideas. Please don't take my negative comments (above and below) as lack of endorsement. Most developers, even experienced ones, can get an enormous amount out of the book.

That said, I did find a few things that I strongly disagree with. I hope the arguments below can likewise convince you.

The best thing about the book is that it presents a well-rounded summary of the important points of designing and writing good code. A lot of these are old ideas (though they seem to be presented as if they are new), but it is good to have them all in one place and presented in a reasonable order.

There are many places in the book where I felt that I could have written the same thing almost word for word, such as the description of maximum code line length. (My personal convention is code not to go past line 100 and end of line comments not to exceed column 120.)

There are also a few worthwhile things that I had not considered or read about before. An example is the section on creating code at different levels of abstraction (see page 36).

Bad Things

Probably my biggest complaint is that the first few chapters are far too detailed, stating the bleedingly obvious (though there are a few things that are bleedingly wrong - see Identifiers and Comments below). For example, there is a whole chapter on creating names for identifiers, then much of the same thing is considered in the next chapter (Functions). When I first wrote some coding standards (for a team of developers at AMP in 1986) all I said on the subject is:
  • identifiers with broad scope (eg, global variables and functions) should have long descriptive names
  • variables with narrow scope (eg, local variables) should have shorter names
  • all variables and functions should have a comment describing their purpose where they are declared
I still believe this is enough. (I actually thought this might be too long and considered cutting it down.) I really can't see how Uncle Bob can justify writing dozens of pages on this subject.

Finally, I will mention that Uncle Bob loves his TLAs (Three Letter Acronyms) such as DRY, SRP, etc which I am not sure is a good or bad thing. The book gives the impression that he not only invented all these acronyms but also the ideas behind them. Of course, this is not true as the ideas have been known and practised (not under those names) for decades. (An arguable exception is IOC, though I have seen similar approaches used in the past.)

Unit Tests

Unit Tests have been my pet subject for more than two decades, so I was pleased that there was a chapter on them. Like many people I independently discovered what are now called units tests (in my case in 1985) but I gave them the unglamorous name of automated module regression tests.

Again, this chapter is a little verbose at explaining the simple things. It also misses some important areas like mock objects and black-box vs white-box testing.

The Unit Tests chapter also covers TDD (test driven development). TDD should have been given its own chapter, and explained in more depth, due to its importance (it is a lot more than just unit tests).

In this chapter Uncle Bob also says that unit tests do not need to be efficient. I disagree. Unit tests are just like any other piece of code and should have the quality attributes required of them. Of course, production code is more likely to require optimization, but that does not mean that units tests should be slovenly. In fact, I remember reading somewhere else in the book where he (or another of the authors) says that tests should run fast to avoid not being run at all - another contradiction.

Also on the subject of units tests, it is mentioned in the following chapter (page 136), that unit test code can be allowed access to the internals of an object. This is simply wrong, tests should only ever test the interface of an object never how it is implemented internally.

Identifiers and Comments

There is one area this book really goes off the rails. Uncle Bob insists that you should try to avoid adding comments in your code by instead using long descriptive identifiers. This is the old nugget of self-describing code which I have already refuted in a previous blog (see Self Describing Code).

But Uncle Bob goes further by promoting the idea of creating even more identifiers. The first way he does this is by creating temporary variables solely for the purpose of giving them a meaningful name. His other idea is to extract bits of code into tiny little functions for the same reason. Both of these ideas I really hate, for many reasons (described below), not the least of which is that I already have enough trouble thinking up good identifiers without this extra burden.

I will use a numbered list to emphasize how many things are wrong with this approach:

1. Over the past few decades many people have been infatuated with the idea of self-describing code. In fact this was the guiding principle behind the design of COBOL. All have been failures. (COBOL was a successful language but it was recognized that this aspect of the language was a failure.)

2. The whole idea that long descriptive identifiers are good and comments are bad is contradictory. In many ways identifiers are the same as comments - the compiler doesn't care about the characters of the identifier, just that whenever it is used it is spelt the same way.

Using long variable names instead of comments makes no sense.

3. Repeatedly typing a long variable name becomes more and more tedious. I know that many editors/IDEs provide name-completion but it is still distracting to have to look at a list of names and pick the right one (and name-completion propagates horrible typos).

Worse, whoever has to read the code must scan these long, tedious names. You can skip over long comments but variables names are harder to skip as they are embedded in the code.

Research has shown that identifiers that are too long (ie more than 16-20 characters) make code difficult to read. This affects the understandability of the code and consequently the maintainability.

4. Uncle Bob promotes the idea that code should read like a well-written novel. This is something I wholeheartedly agree with. I guess then, a variable would be analogous to a character in the novel. Let's look at how characters in novels are described.

When a (major) character first appears in a novel they are introduced to the reader with their full name, and any relevant description of their appearance and/or character, etc (in Dickens this can go on for many pages). Subsequently, the character is only referred to by their first or last name or even simply as him or her.

These characters also have relatively short, easily remembered names. More importantly different characters generally have very different names so that they are easily distinguished from each other. (Though I have read novels which were very confusing because there were two characters with similar names.)

The analogy in code is that you "introduce" a variable by declaring it with a relatively short (but descriptive) name as well as using a comment that describes its purpose. (In fact, this is something that I have required of my team members for the last 27 years - ie all variables and functions to be described when declared.) The important thing about the variable name is that it is easily remembered, that its name gives an indication of its purpose, and that it is quite different to other identifiers in the same scope.

Using a very long, overly descriptive, name that tries to describe the full purpose of the variable is equivalent to repetitively re-describing the same character throughout a novel. This is tiresome when reading code just as it would be reading a novel.

5. Uncle Bob's assumption seems to be that all comments have to be read. Some comments (usually end of line comments) are additional tips in case you can't understand the code. If you understand the code you can simply ignore the comments.

6. The first criticism Uncle Bob has of comments (page 54) is that they "lie". I agree that many comments have absolutely no value or even, if incorrect, a negative value. Even if they are initially correct they can quickly become out of sync with the code as code changes are made and the comments not updated accordingly.

I guess Uncle Bob is coming from the agile stance of "favour working code over documentation" (another thing with which I wholeheartedly agree).

However, as mentioned in point 2 (above), identifiers are just as much "documentation" as comments. In fact, in my experience misleading identifier names are far more of a problem than misleading comments.

The other point is that just because something is done badly does not infer you should stop doing it. (Did we stop making airships after the Hindenberg and other airship disasters? Woops! That's a bad example.) Instead we should find ways to improve comments (and identifiers). For example, in my experience comments are better written and better maintained in code that is subject to code reviews.

Please note at this point that I do think we can eliminate the needs for some comments by using unit tests. In this case we are actually favouring working code (the tests) over documentation (the comments).

7. One thing I haven't mentioned yet but really bugs me is Uncle Bob's penchant for taking a small expression or even part of an expression and turning it into a function. The sole purpose of this is to be able to add another identifier (ie, the function name) to describe what is happening, rather than adding a comment.

Now I have nothing against short functions. I believe that the ideal length for a function is less than 10 lines, but these functions that are actually a fraction of a line are bad for several reasons.

First, Uncle Bob mixes this reason for creating short functions with the other reasons. I believe he should make clear that he promotes short functions to aid organization and understanding of the code, but also is promoting adding short functions as a replacement for comments.

Second, it moves the actual code somewhere else. If you really want to look at the code, not just the function name, you have to go and find it.

Third, the name that is given to the function may make perfect sense to whoever created it, but it may be gibberish to a later reader of the code. In my experience, no matter how long and descriptive a name, someone (actually most, if not all, people) will misinterpret it.

Fourth, I don't think it is a good idea to change the actual code for the purpose of trying to explain it. It is tricky enough to create quality code without this extra consideration.

Lastly, this approach actually goes against the agile principle of favouring code over documentation. You are replacing a piece of working code with the name of a function, and as I mentioned above identifiers are more documentation than code.

8. A similar one is using temporary variables for the purpose of introducing another descriptive identifier and hence avoiding a comment.

Using too many temporary variables has several problems. First, they often lead to bugs for many reasons, such as not being initialized, being of the wrong type (eg leading to overflows), using the wrong one when there are many of them, etc.

Another problem is that the code bloats when using lots of temporaries, which makes it difficult to understand. I would much rather read a concise one line expression (even a complicated one) than try to decipher 10 lines of code using half a dozen temporary variables.

Finally, when temporaries are used for control flow it can be very difficult to understand what the code is doing by reading it. This begins to look like self-modifying code which has been regarded as unacceptable for more than 50 years.

9. I don't know about you but I often have to manually type in an identifier. I'm not sure why but it might be because I use multiple systems and I need to search on one system for something I found on a different system. Or it may be that a colleague has asked me to search for the use of a particular variable in the code.

Long variable names are very tedious and difficult to type in correctly, especially if they use incorrect/ambiguous camel-casing (eg Bitmap vs BitMap).

This is just one more reason to use short, simple, descriptive, easily remembered, easily differentiated variable names.

10.Here is an example of a function name taken from one of Uncle Bob's good code examples:

    isLeastrelevantMultipleOfNextLaregrPrimeFactor (page 145)

The problem is that despite it being almost 50 characters long I still do not understand it's purpose.

Contents/Index

Unfortunately, ever since I first read K&R (with its brilliant index), I tend to judge a book on how easily I can find information.

At least Clean Code has a table of contents and an index but there are problems such as wrong page numbers. The index is below average - for example, look up "error handling" and you are directed to pages 8 and 47-48. However, there is a whole chapter on error handling on pages 103-112. Again this is probably symptomatic of the book being written by different people.

Further the Introduction mentions that the book is divided into three sections but it is not really clear where the sections start and end.

Conclusion

I like almost everything about this book, except for the few things I have mentioned above. It is definitely worth reading by any programmer.

However, I will note that unless you use Java you may not get as much out of it as you want. And it seems to me that you need to be an advanced Java programmer to understand some of the chapters not written by Uncle Bob. There are also some areas that are repetitive and contradictory due to the use of multiple authors.

One thing I didn't like was the excruciating detail in the first few chapters - but this might be useful for inexperienced developers. On the other hand there was not enough information in other areas such as TDD, Unit Tests and Concurrent Programming.

No comments:

Post a Comment