Saturday 26 October 2013

Change (Unit Tests Intro)

A perennial problem in software development is coping with change.
It's happened again!! I started writing a blog about Unit Tests (see my next post coming soon) and ended up writing a lengthy introduction about Change. So I have split the intro into this separate post. This is so I don't overload you and so you can skip it, if you're not interested. (However, you probably should read the Conclusion below anyway.)
This post explains the types and reasons for change and the costs involved. It also looks at how change is traditionally handled and how Agile development and Unit Tests can help.

The Cost of Change

First, it may be helpful to understand one aspect of change - how much it costs, and particularly how the cost varies over time. This has been extensively studied and documented under the Waterfall development model - for example, search for Barry Boehm's articles on the subject if you are interested.

However, I can save you the trouble by giving this summary: the longer it takes before you find and fix a mistake the greater the cost. As the graph below shows the rate of growth in the cost is much worse than linear, even exponential.



Diagram 1. Cost of Change (Waterfall model)

The rule of thumb I was given (about 20 years ago) is that for every phase of development the cost goes up one order of magnitude. So if an analysis mistake is not picked up until testing this means that it would cost 1000 times more to fix, since it passed through three phases (analysis > design > coding > testing), while a coding bug picked up in testing will only cost 10 times more to fix than if it had been found straight away.

I believe there are two fundamentally different reasons for this, which I have called the cost of delay and the cost of rework.

Cost of Delay

Experienced programmers know that if you find a bug while working on the code, you can often correct it quickly or even immediately, but fixing the same bug a month or two down the track can take hours or even days, and even then may not be fixed properly. This is primarily due to limitations of the human brain - you will have forgotten exactly how the code works. (Of course, if the original developer has left the company and there are no Unit Tests or comments/documentation then it can take much longer.)
The Cost of Delay is ...
due to limitations of the human brain.

Worse, you may not even realize you have forgotten how the code works and make changes based on a misapprehension. In my experience, these are the worst source of bugs. I have often written, thoroughly tested and understood a module which was working perfectly. Then a simple change made later causes all sorts of problems. Again, as I discuss later, Unit Tests help here.

The cost of delay is also due to other tedious and time-consuming problems like setting up an environment for testing and debugging. You may not even be able to reproduce the problem in the latest version of the code so you need to find and rebuild an older version in which the problem occurs in the field - this may require tracking down old versions of compilers, tools, libraries etc, if this information was even written down somewhere.

All this should convince you that it is best to find bugs as soon as possible - something I previously discussed in JIT Testing. Actually this is a good example of the principle of DIRE (Don't Isolate Related Entities). In this case we are talking about not isolating the coding from the testing in time. This is one advantage of using Unit Tests, and particularly TDD (Test Driven Development).

More generally, reducing the cost of delay is one of the principal advantages of Agile methodologies. The continuous feedback from the customer/users (eg at least at the end of every Sprint in Scrum) means that problems are found and fixed much more quickly when the context is still fresh in everyone's mind.

Cost of Rework

The other problem with delaying changes is that in the meantime a lot of work may have been done, based on the original software. This problem is not due to the length of time that has passed but simply due to the fact that this work has to be repeated. For example, extensive testing may have been performed that would need to be redone to ensure that no new bugs were introduced.

Another cost, for software already released, would be notifying and updating users of the problem. Rolling out a new release, even just for a bug fix, can be costly.

There can be coding costs too, since the change may require a major internal re-design. In this case, the behaviour of the code must be understood once more, the code modified and once again a great deal of regression testing is required to make sure nothing has been broken.

Unit Tests can help reduce the costs of rework too. If the original software had comprehensive Unit Tests then changes can easily be made to the code. Generally, if all Unit Tests pass, it is safe to assume that no bugs have been introduced by the changes. This can reduce the costs associated with coding, debugging and even regression testing.

But there is an even greater advantage to Unit Tests. It is not uncommon to find that, due to the large cost of rework (as explained above), software changes are not done properly but in a manner that minimises these costs. That is changes are made in a way that minimises the risk of introducing new bugs.

I can give many examples of how this occurs but a common one is to duplicate a complete function or module and modify the copy leaving the original untouched (so that the original can handle the commonly used existing functionality in exactly the same way). This strategy results in code duplication (often on a massive scale) which contravenes the principle of DRY (see Principles of Software Design).
I briefly discussed this in early 2012 (see When Good Programs Go Bad) and a reader responded with a quote from Bruce Eckel:

"Management's reluctance
to let you tamper with
a functioning system
robs code of the resilience
it needs to endure."

This is very insightful but Units Tests can help to assuage that reluctance.

This sort of practice is so ingrained in the industry that most developers do not even realise they are doing it. However, developers are not entirely to blame here since it is often the managers that react very badly when bugs are found (and it was probably the same managers who imposed draconian deadlines that precluded creation of Unit Tests).

If you haven't seen the light yet here is a summary: Unit Tests allow software to be changed without compromising the original design and without fear of introducing new bugs. The code can then adapt and evolve and never need to be completely rewritten or discarded.

Reasons for Change

Why do we need to change software? Ask most people and you get two answers: fixing bugs and adding features. But there is a lot more to it than that! More generally, changes are made to improve the quality of the software (of which bug fixing is just one aspect) and, yes, to add enhancements.

Enhancements

I really don't have much to say about adding functionality, except that there is often a reluctance to add new features due to the cost and the possibility of breaking existing features. Unit Tests help by making the code more modifiable (see next blog) and by catching bugs caused by the changes.

Fixing Defects etc (User quality attributes)

The other reason for change is to improve the quality of the software. There are many aspects to the quality of code not just correctness (which is what fixing bugs is all about). The software might also need to be improved if specific problems have been identified in areas such as performance, usability, reliability and other "user" quality attributes.

Refactoring (Developer quality attributes)

What is often neglected are changes that improve developer quality attributes, especially maintainability (but also includes verifiability, portability, reusability, etc). See my previous post on The Importance of Developer Quality Attributes for an explanation of the difference between user and developer quality attributes. Changing the software to improve developer quality attributes is known as refactoring and many people are realising that it is essential to the long-term viability of any software.

There are actually many benefits to software that is easily modified (and hence easily refactored) as I discuss later. I will mention Unit Tests again here as they allow you to improve the software without fear of introducing bugs.

Change Management

Before Agile methodologies came along change was thought of as something to be avoided. Of course, considering the cost of change graph (above) this was completely understandable. I will first look how traditionally change has been managed and then look at the Agile approach.

Eliminate Mistakes

If you don't make mistakes then you don't have to fix them. This is
“ Right
 First
    Time ”
the attitude epitomized by the SQA motto Do It Right the First Time or simply Right First Time. Over many decades starting sometime in the 1960's there have been a huge number of software projects fail and the reason given has invariably been inadequate, poorly documented, or continually changing requirements.  (Some studies have found poor estimations as the primary cause but that is just a result of poor requirements.). In other words: we didn't really know what we were doing.

So the thinking was always that more time should have been spent on analyzing the problem in order to eliminate all the mistakes and omissions in the requirements and anticipate any "unanticipated" changes.

This is why there has been a huge amount of research, and even more debate, on how to avoid the mistakes including:
  • better and more thorough analysis techniques
  • better communication with the customer
  • the invention of various estimation techniques
  • using prototypes so the customer better understands what is specified
  • modelling languages and diagrams
  • formal proofs of correctness
  • etc
The problem is that with any reasonably large, real-world software project you can never get it right the first time! Moreover spending a lot of time and effort trying to do so is time-consuming. The customer or sponsor also becomes concerned when nothing appears to be happening except a lot of people trying to understand the problem.
You can never get it right first time!

Further, not all change comes about from mistakes. Even if you can avoid making mistakes (which you can't) there will be other reasons for change. You can't anticipate unanticipated changes. Some change you just can't avoid (such as regulatory changes).

Discourage Change

Whether deliberate or not, another strategy commonly used in a large project (under Waterfall) is to do everything possible to prevent changes from being made.

First, a complex and tedious procedure, with lots of forms, is set up for the approval of all changes. All proposals go to a change review board consisting of managers, analysts and architects with the knowledge and ability to give reasonable grounds for rejecting almost any proposal.

In this sort of environment refactoring the code is never even considered. This leads to a snowball effect; if code is not refactored to make it more maintainable then it becomes more and more expensive to make changes for other reasons.

In the end only the worst bugs and the most desirable new features are approved. It is hard to quantify, but the resulting lost opportunities can make a huge difference to the long-term viability of the product. Advances in software and hardware are accelerating -- if the software cannot be adapted to use them then it will be at a disadvantage to competitors who can.

Software that does not adapt to change will eventually atrophy and die.

Minimize Risk

OK, a change has been approved -- a major new feature or perhaps a serious bug needs to be fixed. But the problems don't stop there. There is the even more pernicious problem of how the change is made.

Most of the time changes are made, not in the best way, but in a way that reduces short-term risks. This may be a conscious decision of the designers but more often is due to the way the programmers work. (See my example above under The Cost of Rework).

Why do programmers work like this? It is sometimes due to laziness or fear (which is the main motivation behind the XP idea of Courage). But it's more often due to conditioning by poor management practices (see Why Good Programs Go Bad for full explanation).
  •  programmer do things the easy way (Code Reviews and Unit Tests can help here)
  •  unrealistic deadlines, with no time allocated to later refactor
  •  management intolerance of bugs caused by making changes properly (Unit Tests help here)
The end result is software that degenerates into the classic unmaintainable Ball of Mud.

Agile Approach

Agile methodologies take a completely different approach, by recognizing that when creating software you can't even get close to getting it right first time. The Agile catch cry is instead Embrace Change. Many people think this simple means we have to accept change (and the costs), but Agile actually questions the Waterfall assumptions, and tries to find ways that actually reduce the costs of change.

In other words rather than cope with that horrible cost of change curve above (associated with Waterfall methodologies) it changes the shape. There has been a lot of debate about how the curve looks under Agile but it might be something like this:

Diagram 2. Cost of Change (Agile)

However, in all the debate about the shape of the curve, an important point is missed. Mistakes are caught sooner due to continuous feedback from the customer, so we don't get so far along the curve before finding and fixing defects. For example, many problems are found under Scrum in the Sprint Review which would not be found until months later under the Waterfall model.

Further, Unit Tests, which I consider a fundamental part of Agile, have an even greater benefit. If Unit Tests are written at the same time as the code (or better still TDD is practiced) then many bugs will be found straight away that would not be found till later. This reduces the costs of delay.

Even more important is that if changes are required, Units Tests allow them to be made more easily and reliably. This reduces the costs of rework.

Finally, the disincentive to refactor the code is greatly reduced by Units Tests, which can greatly reduce the costs of lost opportunities, and extend the life of the software.

Conclusion

Using a Waterfall development methodology the costs of change are prohibitive and so are avoided or performed in a way that reduces risks (eg, the risk of introducing new bugs or needing a full regression test). Software developed and maintained like this becomes very expensive to maintain. To remain competitive (unless you have a nice cushy monopoly on your market) it will need to be discarded and rewritten from scratch.

An Agile development methodology turns this problem on its head by embracing change and reducing the costs of change. For example, the continuous customer feedback means problems are found much more quickly.

Further, the Agile approach to change is greatly enhanced by use of Unit Tests since they:
  • make code more maintainable and verifiable
  • reduce the cost of change by allowing changes to be made easily
  • allow code to be refactored to take advantage of better design/new technology
  • facilitate making changes properly, thus avoiding a maintenance nightmare
  • allows changes to be made without fear of bugs or lots of regression testing
I will elaborate on this and other advantages of Unit Tests next...

    Sunday 13 October 2013

    Book Review: Clean Code

    Over a year ago I was given a copy of the book Clean Code by Robert (Uncle Bob) Martin. There are many good things in this book. I guess the highest compliment I can pay is to note that I have changed my coding practices as a result of reading it.

    Actually all the developers where I work were given a copy of this book. In retrospect this is odd as all the example code is in Java and we do our coding in C, C++ and C#. (We probably got a free pallet of the books as we had Object Mentor come in and do an audit on our development practices a few years ago.)


    Anyway, I think everyone here who read the book got something out of it (though everybody I talked to said they skipped the Java examples).

    The Cover

    Of course, you should not judge a book by it's cover but I found a few things about the cover misleading. (I am not sure that anything is insinuated by the picture of the M104 galaxy on the cover which is both beautiful and distant and home to one of the largest known black holes!)

    First the name on the cover says Robert C. Martin so the implication is that he wrote it. It's not until you start reading some later chapters that you notice that some of them say "By ..." or "With ...". Apparently Uncle Bob did not write all of it.

    I generally avoid reading anthologies, as I prefer a book to be written by one author (or group of cooperating authors), rather than separate loosely related offerings from different people. This makes for a more concise, consistent and generally holistic text; whereas an anthology is invariably disjointed, contradictory and often repetitive.

    Admittedly, this book is not as bad as some anthologies, since most of the text seems to have been written or at least edited by Uncle Bob, but there are examples of repetition and contradiction which I mention later.


    My second complaint is that there is a long blurb on the back cover but absolutely no mention is made of Java. The perception I obtained by reading the blurb and the Introduction (which also does not mention Java) is that the book is suitable for all programmers, but that is not true at least for some chapters.

    Some parts and even whole chapters are almost completely Java-centric (eg Chapters 11 and 13). Not only are the examples in Java but often the text gives advice specifically aimed at Java programmers. This is another problem with using different authors as the chapters by Uncle Bob are aimed at all programmers (but still with Java examples) but some of the other chapters are aimed squarely at Java developers.

    Some code examples can be followed without knowledge of Java, but many require a fairly deep understanding. I did some Java programming about 15 years ago but I still could not understand most of them. I would prefer examples in another language or at least a better explanation of the bits that only an hardened Java programmer would understand.


    My final complaint about the cover is the inclusion of the word Agile in the title. (The full title is: Clean Code - A Handbook of Agile Software Development.) The publisher probably insisted on this title since Agile is the flavour of the decade. (The previous decade book titles had to include Object Oriented.)

    Admittedly, there is quite a bit of content specific to Agile Methodologies, but most of the book is not about agile techniques. In fact many of the ideas far predate agile.

    Good Things

    This book is full of good advice and ideas. Please don't take my negative comments (above and below) as lack of endorsement. Most developers, even experienced ones, can get an enormous amount out of the book.

    That said, I did find a few things that I strongly disagree with. I hope the arguments below can likewise convince you.

    The best thing about the book is that it presents a well-rounded summary of the important points of designing and writing good code. A lot of these are old ideas (though they seem to be presented as if they are new), but it is good to have them all in one place and presented in a reasonable order.

    There are many places in the book where I felt that I could have written the same thing almost word for word, such as the description of maximum code line length. (My personal convention is code not to go past line 100 and end of line comments not to exceed column 120.)

    There are also a few worthwhile things that I had not considered or read about before. An example is the section on creating code at different levels of abstraction (see page 36).

    Bad Things

    Probably my biggest complaint is that the first few chapters are far too detailed, stating the bleedingly obvious (though there are a few things that are bleedingly wrong - see Identifiers and Comments below). For example, there is a whole chapter on creating names for identifiers, then much of the same thing is considered in the next chapter (Functions). When I first wrote some coding standards (for a team of developers at AMP in 1986) all I said on the subject is:
    • identifiers with broad scope (eg, global variables and functions) should have long descriptive names
    • variables with narrow scope (eg, local variables) should have shorter names
    • all variables and functions should have a comment describing their purpose where they are declared
    I still believe this is enough. (I actually thought this might be too long and considered cutting it down.) I really can't see how Uncle Bob can justify writing dozens of pages on this subject.

    Finally, I will mention that Uncle Bob loves his TLAs (Three Letter Acronyms) such as DRY, SRP, etc which I am not sure is a good or bad thing. The book gives the impression that he not only invented all these acronyms but also the ideas behind them. Of course, this is not true as the ideas have been known and practised (not under those names) for decades. (An arguable exception is IOC, though I have seen similar approaches used in the past.)

    Unit Tests

    Unit Tests have been my pet subject for more than two decades, so I was pleased that there was a chapter on them. Like many people I independently discovered what are now called units tests (in my case in 1985) but I gave them the unglamorous name of automated module regression tests.

    Again, this chapter is a little verbose at explaining the simple things. It also misses some important areas like mock objects and black-box vs white-box testing.

    The Unit Tests chapter also covers TDD (test driven development). TDD should have been given its own chapter, and explained in more depth, due to its importance (it is a lot more than just unit tests).

    In this chapter Uncle Bob also says that unit tests do not need to be efficient. I disagree. Unit tests are just like any other piece of code and should have the quality attributes required of them. Of course, production code is more likely to require optimization, but that does not mean that units tests should be slovenly. In fact, I remember reading somewhere else in the book where he (or another of the authors) says that tests should run fast to avoid not being run at all - another contradiction.

    Also on the subject of units tests, it is mentioned in the following chapter (page 136), that unit test code can be allowed access to the internals of an object. This is simply wrong, tests should only ever test the interface of an object never how it is implemented internally.

    Identifiers and Comments

    There is one area this book really goes off the rails. Uncle Bob insists that you should try to avoid adding comments in your code by instead using long descriptive identifiers. This is the old nugget of self-describing code which I have already refuted in a previous blog (see Self Describing Code).

    But Uncle Bob goes further by promoting the idea of creating even more identifiers. The first way he does this is by creating temporary variables solely for the purpose of giving them a meaningful name. His other idea is to extract bits of code into tiny little functions for the same reason. Both of these ideas I really hate, for many reasons (described below), not the least of which is that I already have enough trouble thinking up good identifiers without this extra burden.

    I will use a numbered list to emphasize how many things are wrong with this approach:

    1. Over the past few decades many people have been infatuated with the idea of self-describing code. In fact this was the guiding principle behind the design of COBOL. All have been failures. (COBOL was a successful language but it was recognized that this aspect of the language was a failure.)

    2. The whole idea that long descriptive identifiers are good and comments are bad is contradictory. In many ways identifiers are comments - the compiler doesn't care about the characters of the identifier, just that whenever it is used it is spelt the same way.
    “identifiers
    are comments”

    Using long variable names instead of comments makes no sense.

    3. Repeatedly typing a long variable name becomes more and more tedious. I know that many editors/IDEs provide name-completion but it is still distracting to have to look at a list of names and pick the right one (and name-completion propagates horrible typos).

    Worse, whoever has to read the code must scan these long, tedious names. You can skip over long comments but variables names are harder to skip as they are embedded in the code.

    Research has shown that identifiers that are too long (ie more than 16-20 characters) make code difficult to read. This affects the understandability of the code and consequently the maintainability.

    4. Uncle Bob promotes the idea that code should read like a well-written novel. This is something I wholeheartedly agree with. I guess then, a variable would be analogous to a character in the novel. Let's look at how characters in novels are described.

    When a (major) character first appears in a novel they are introduced to the reader with their full name, and any relevant description of their appearance and/or character, etc (in Dickens this can go on for many pages). Subsequently, the character is only referred to by their first or last name or even simply as him or her.

    These characters also have relatively short, easily remembered names. More importantly different characters generally have very different names so that they are easily distinguished from each other. (Though I have read novels which were very confusing because there were two characters with similar names.)

    The analogy in code is that you "introduce" a variable by declaring it with a relatively short (but descriptive) name as well as using a comment that describes its purpose. (In fact, this is something that I have required of my team members for the last 27 years - ie all variables and functions to be described when declared.) The important thing about the variable name is that it is easily remembered, that its name gives an indication of its purpose, and that it is quite different to other identifiers in the same scope.

    Using a very long, overly descriptive, name that tries to describe the full purpose of the variable is equivalent to repetitively re-describing the same character throughout a novel. This is as tiresome when reading code just as it would be reading a novel.

    5. Uncle Bob's assumption seems to be that all comments have to be read. Some comments (usually end of line comments) are additional tips in case you can't understand the code. If you understand the code you don't need to read the comments.

    6. The first criticism Uncle Bob has of comments (page 54) is that they "lie". I agree that many comments have absolutely no value or even, if incorrect, a negative value. Even if they are initially correct they can quickly become out of sync with the code as code changes are made and the comments not updated accordingly.

    I guess Uncle Bob is coming from the agile stance of "favour working code over documentation" (another thing with which I wholeheartedly agree).

    However, as mentioned in point 2 (above), identifiers are just as much "documentation" as comments. In fact, in my experience misleading identifier names are far more of a problem than misleading comments.

    The other point is that just because something is done badly does not infer you should stop doing it. (Did we stop making airships after the Hindenberg and other airship disasters? Actually, you may think that's a counter-example but airships are a great alternative form of transport that with modern weather-forecasting could be made very safe.) Instead we should find ways to improve comments (and identifiers). For example, in my experience comments are better written and better maintained in code that is subject to code reviews.

    Please note at this point that I do think we can eliminate the needs for some comments by using unit tests. In this case we are actually favouring working code (the tests) over documentation (the comments).

    7. One thing I haven't mentioned yet but really bugs me is Uncle Bob's penchant for taking a small expression or even part of an expression and turning it into a function. The sole purpose of this is to be able to add another identifier (ie, the function name) to describe what is happening, rather than adding a comment.

    Now I have nothing against short functions. I believe that the ideal length for a function is less than 10 lines, but these functions that are actually a fraction of a line are bad for several reasons.

    First, Uncle Bob mixes this reason for creating short functions with the other reasons. I believe he should make clear that he promotes short functions to aid organization and understanding of the code, but also is promoting adding short functions as a replacement for comments.

    Second, it moves the actual code somewhere else. If you really want to look at the code, not just the function name, you have to go and find it.

    Third, the name that is given to the function may make perfect sense to whoever created it, but it may be gibberish to a later reader of the code. In my experience, no matter how long and descriptive a name, someone (actually most, if not all, people) will misinterpret it.

    Fourth, I don't think it is a good idea to change the actual code for the purpose of trying to explain it. It is tricky enough to create quality code without this extra consideration.

    Lastly, this approach actually goes against the agile principle of favouring code over documentation. You are replacing a piece of working code with the name of a function, and as I mentioned above identifiers are more documentation than code.

    8. A similar one is using temporary variables for the purpose of introducing another descriptive identifier and hence avoiding a comment.

    Using too many temporary variables has several problems. First, they often lead to bugs for many reasons, such as not being initialized, being of the wrong type (eg leading to overflows), using the wrong one when there are many of them, etc.

    Another problem is that the code bloats when using lots of temporaries, which makes it difficult to understand. I would much rather read a concise one line expression (even a complicated one) than try to decipher 10 lines of code using half a dozen temporary variables.

    Finally, when temporaries are used for control flow it can be very difficult to understand what the code is doing without stepping through it. Control-flow based on variables begins to look like self-modifying code which has been regarded as unacceptable for more than 50 years.

    9. I don't know about you but I often have to manually type in an identifier. I'm not sure why but it might be because I use multiple systems and I need to search on one system for something I found on a different system. Or it may be that a colleague has asked me to search for the use of a particular variable in the code.

    Long variable names are very tedious and difficult to type in correctly, especially if they use incorrect/ambiguous camel-casing (eg Bitmap vs BitMap).

    This is just one more reason to use short, simple, descriptive, easily remembered, easily differentiated variable names.

    10.Here is an example of a function name taken from one of Uncle Bob's good code examples:

        isLeastrelevantMultipleOfNextLaregrPrimeFactor (page 145)

    The problem is that despite it being almost 50 characters long I still do not understand it's purpose.

    Contents/Index

    Unfortunately, ever since I first read K&R (with its brilliant index), I tend to judge a book on how easily I can find information.

    At least Clean Code has a table of contents and an index but there are problems such as wrong page numbers. The index is below average - for example, look up "error handling" and you are directed to pages 8 and 47-48. However, there is a whole chapter on error handling on pages 103-112. Again this is probably symptomatic of the book being written by different people.

    Further the Introduction mentions that the book is divided into three sections but it is not really clear where the sections start and end.

    Conclusion

    I like almost everything about this book, except for the few things I have mentioned above. It is definitely worth reading by any programmer.

    However, I will note that unless you use Java you may not get as much out of it as you want. And it seems to me that you need to be an advanced Java programmer to understand some of the chapters not written by Uncle Bob. There are also some areas that are repetitive and contradictory due to the use of multiple authors.

    One thing I didn't like was the excruciating detail in the first few chapters - but this might be useful for inexperienced developers. On the other hand there was not enough information in other areas such as TDD, Unit Tests and Concurrent Programming.