Saturday 31 March 2012

Why Good Programs Go Bad

My blog post of February 1 talked a little about why good programs go bad in the section called Software Rusts.  I recently read some other ideas about this and want to explore it a bit more.

Why does software rust?

Late last year I was in Borders bookstore as they were (sadly) having a closing down sale and everything was 50% off.  I stumbled across a book I had never seen before, which was called The Pragmatic Programmer by Andrew Hunt and David Thomas.
I actually read of this story again as it was relayed in another book called Clean Code by Robert Martin which I have just started reading. This is what triggered me to write this post.  I will review both of these books in the future.
Flipping through the pages I came across an intriguing story. Apparently an unused building in New York can go for a very long time without being vandalized or touched in any way but as soon as a single window is broken and left unrepaired the building quickly degenerates till it is covered in graffiti and all the windows are broken etc.

This analogy was used to explain why code degrades over time.  The point was to keep the code in good repair, otherwise things will go downhill rapidly.

The story is extremely appealing but, even though there is an element of truth in it, I don't believe it is a very good analogy.  First, I don't think the average programmer is a vandal -- most will do their best to keep the software ostensibly in good condition.  The real problem is that while developers are concerned with the more visible details the important things are often neglected.  A better analogy is that while the programmers are busily repairing windows and painting the outside, termites are destroying the core of the building.  (See my February 1 post for more about this.)

Actually, on second thought I don't like that analogy either, since it does not reflect that while the programmers are making changes that appear to be improving the software their very changes are causing structural problems that will cause the whole thing to eventually collapse.  I'm trying to think of a better analogy....

Imagine that an old building is being renovated.  The owners need to use the building for a purpose significantly different to its original use.  So they have doors added or sealed, rooms added or even whole walls knocked out.  However, there are no drawings of the original design and the changes cause problems.  For example, a doorway was added in a structural wall weakening the whole structure (I got this idea from a Fawlty Towers episode).  Further renovations compound the problem until the building is ready to be condemned.

Developers
are too willing
to give users
what they want
without regard to
the overall design
This is how most software degrades.  Developers are too willing to give the users exactly what they want without regard to the consequences for the overall design.

The analogy falls down because a builder can say "I can't remove that wall as the whole building will collapse or I will have to add an enormous steel beam which you can't afford", but if a developer says something similar it is regarded with much more skepticism, mainly because there is nothing physical that can be seen and understood.  And, of course, most developers are not fully aware of all the consequences of their changes anyway.

Other problems with the building analogy are that there is much greater variety in types of software than there is with buildings.  There are far more design patterns in software (even though the idea for them came from architecture) and much software development does not have any formal design anyway.

Epiphany

Many years ago (about 8) I had one of those very rare moments, an epiphany of sorts, where a lot of things came together in my understanding of why software degrades over time and how to avoid it.  I had been using C++ for about 6 years and had just moved from a company where the quality of the code was average at best, to a company where the code was much better.

My first task at the new company was to make an addition to a market-leading piece of software.  I had also recently been studying agile methodologies, particularly Extreme Programming (XP).  I was particularly interested in unit tests, something I had discovered myself many years before (calling them automated module regression tests).

I was a little nervous about my first code review.  The code change was fairly trivial, but I had spent a lot of time making sure that it was done in such a way that it was efficient and, most importantly, had no possibility of affecting existing behavior.  In other words, my primary concern was not to introduce any bugs.  My change did not affect the interface to any class and did not even add any private members.

The code review did not go well.  It was suggested to me (thanks Dima) that a better way would have been to make an addition to an existing class.  On reflection I could see that the suggested alternative fitted much better with the existing design, despite the greater possibility of breaking something.  I also realized how my change was small but cumbersome and would have hobbled the design.  If retained, it would have made future changes even more cumbersome.

I had been
conditioned to make
code changes ...
in a way that
minimizes risk!
On further reflection I realized many more things.  I realized that I had been conditioned over many years of programming to make code modifications, not in the best way, but in a way that minimizes risk! I realized that most code changes I had seen and reviewed over many years for other C programmers were also made in this flawed fashion and that over time this will contort any good design and exacerbate even the worst designed piece of software.

This lead to an appreciation of refactoring.  I had read a bit about refactoring and thought it was a good idea but the true significance had been missed.  I also realized what is meant by courage in XP, which was something that had puzzled me until then -- modifying code properly (refactoring if necessary) means you often have to have the courage to take risks, particularly that you will introduce new bugs.

Most significantly I had a renewed respect for unit tests at least as a way to avoid this sort of mess if not a way to solve it.

Conditioning

If there is anything managers hate it is bugs.  Some calmly accept that mistakes may happen, others hit the roof, but we all know that they do not like them.  Apart from (in some cases) meeting deadlines, minimizing bugs is probably the highest priority of all managers I have ever worked with.

Like most programmers I have a very high regard for my ability, so I don't like anyone to know when I have created bugs.  The worst case is when you are belittled in front of colleagues by your manager for an apparently glaring mistake.  Given this sort of working environment, it is essential for their psychological well-being for programmers to do all they can to avoid creating bugs.

Now, in my experience, most programming work involves making changes to large programs that are anything from poorly written to appalling.  To make changes to these Balls of Mud you have to be very careful.  The only way to make changes to this sort of code is to try to change as little as possible, keeping to the very fringes to avoid upsetting the "house of cards".  In this sort of environment changes are made without any regard for preserving any existing design, if there was one; the emphasis is entirely on just getting it to work properly without side-effects.

For example, I have worked on a large program which over time had required many (usually minor) enhancements.  Requests for slight variations to existing features were handled by copying the code then adjusting the copied code.  In one case a large function (i.e. at least several hundred lines of code) was duplicated in its entirety then two small changes made.  Having such a massive amount of code duplication, I hope you agree, is mind-bogglingly bad in so many ways.

In this sort of work environment any attempt at refactoring has so many obstacles as to be impossible.  Even to make the suggestion that some of the code be refactored takes a great deal of courage.

In the end, the net effect is that most programmers are conditioned, by working with bad managers and bad code, to make changes in a way that minimizes risks but, in the long run, makes an absolute mess of the code.  If they are lucky a programmer might get to work with good code (and good managers) but by then the conditioning is so strong it can be hard to break.

Unit Tests

So, once again, we arrive at unit tests.  I believe units tests are the most important recent innovation in software development, for many reasons.  One of the most important reasons is that they allow code to be modified properly, and even refactored, with relative impunity.  (Does this mean you don't need courage if you have unit tests?)

With comprehensive units tests you need no longer have to worry about changes introducing bugs since the unit tests will catch them before anybody else even sees them.  So the code can be modified in the best way not the way that minimizes risks.  Further, even if the programmer does not fully understand the design and the best way to change it, good units tests will also catch attempts to modify the design in ways that compromise the original design.

Unit Tests
mean you can
refactor
without fear!
In addition, because changing requirements often shift the design in unanticipated directions the code will need to be refactored.  The reason refactoring is so often avoided is the knowledge that many of the bugs in the original code that have been found and fixed (sometimes over many years) will, frustratingly, reappear.  But if unit tests were used properly then all of the bugs will have had unit tests added when they were found to ensure they do not reappear.  Unit tests mean you can refactor with impunity!

Of course, unit tests are not without their problems.  Actually there are many problems (not insurmountable) which I will address in a later post.  One problem with unit tests is that after refactoring the code you may need to refactor the tests.  This is because unit tests should be white box (not black box) tests, in that they need to know something about how the code works internally (even though they should only call public interfaces).  If how the code works internally changes (i.e. after refactoring) then more unit tests will need to be added in light of these internal changes.