Software Development: November 2013

When writing Unit Tests you should always/never use White Box Testing. The correct version of this statement depends on what you mean by White Box Testing.

I have seen (and been involved in) this debate a lot. These are the opposing arguments:

1. You should never use white box testing as then you are testing the implementation not the interface of the module. The tests should not have to change when the implementation changes.

2. You should always use white box testing since you can't possibly test every combination of inputs. Making use of knowledge of how the module is implemented allows you to test inputs that are likely to give incorrect results, such as boundary conditions.

there are different
interpretations...
of the opposite of
black box testing

Paradoxically, both of these arguments are valid. The problem comes down to what is meant by white box testing. Unfortunately, the meaning has never been clear since it is obviously just the opposite of black box testing and everyone knows what black box testing means. The problem is semantic -- there are different interpretations of what is meant by the opposite of black box testing.

Here is my attempt to describe two different meanings of white box testing:

Meaning 1: Test that a module works according to its (current) design by checking how it behaves internally by reading private data, or intercepting private messages.

Meaning 2: Test a module using knowledge of how it is implemented internally, but only ever testing through the public interface.

Like cholesterol, there are both bad (Meaning 1) and good (Meaning 2) forms of white-box testing.

Bad White Box Testing

Recently a colleague came and asked me how a Unit Test should access private data of the module under test. This question completely threw me as I had never needed, or even thought, to do this. I mumbled something about not really understanding why he wanted to.

Later, when I thought about it I realised that Unit Tests should not be accessing private data or methods of the modules they are testing. Doing this would make the unit test dependent on the internal details of the module. Unit Tests should only test the external behaviour of a module never how it is actually implemented.

As I said in my previous post (Unit Tests - What's so good about them?), one of their main advantages is that they allow you to easily modify and refactor code and then simply run the Unit Tests to ensure that nothing has been broken. This advantage would be lost if the test depends on the internal implementation of the module. Changing the implementation (without changing the external behaviour) could break the test. This is not what you want.

Unit Tests should only test the interface of the module and never access private parts.

Good White Box Testing

The whole point of Unit Tests is that they are written by the same person(s) who wrote the code. The best tests are those that test areas that are likely to cause problems (eg, internal boundary conditions) and hence can only be written by someone with an intimate knowledge of the internal workings of the module.

For example, say you are writing a C++ class for infinite precision unsigned integers and you want to create some Unit Tests for the "add" operation (implemented using operator+). Some obvious tests would be:

   assert(BigInt(0) + BigInt(0) == BigInt(0));

   assert(BigInt(0) + BigInt(1) == BigInt(1));

   assert(BigInt(1) + BigInt(1) == BigInt(2));
   // etc

Of course, the critical part of a BigInt class is that carries are performed correctly when an internal storage unit overflows into the next significant unit. Doing simple black box testing you can only guess at how the class is implemented. (8-bit integers, 32-bit integers, BCD or even just ASCII characters could be used to store the numbers.) However, if one wrote the code one could know that, for example, internally 32-bit integers are used, in which case this is a good test:

   // Check that (2^32-1) + 1 == 2^32

   assert(BigInt::FromString("4294967295") + BigInt(1) ==

          BigInt::FromString("4294967296"));

This test will immediately tell you if the code does not carry from the lowest 32-bit integer properly. Obviously, you need other tests too, but this tests a critical boundary condition.

To find the sort of defects that the above test checks for, using black box testing, would require a loop that would probably take days to run. (I will talk next month about why Unit Tests should not be slow.)

Which is The Accepted Definition?

The whole confusion about White Box Testing is that the distinction above is not clear. Most definitions imply the "good" definition, but then say something that contradicts this. For example, the Wikipedia page on White Box testing (as at November 2013) does not make the distinction but seems to imply the "good" definition, but then suggests it is talking about the "bad" form - for example it says it is like like in-circuit testing which is the hardware equivalent of accessing private data.

My general conclusion, talking to colleagues is that the "good" definition is what most people think White Box Testing is or should be, but there are some people that use the "bad" definition.

I don't really care which definition of white box testing you think is correct as long as you distinguish between them, and as long as you specifically use "good" white box testing with your unit tests.

Disadvantages of White Box Testing

At this point I will note a few problems.

First, with some code the distinction between interface and implementation is blurred. This is a bad thing for many reasons, one of which is you don't know if you are doing "bad" white box testing. See
Best Practice for Modules in C/C++ on how to separate interface from implementation. Also see the sections on Information Hiding and Decoupling at Software Design.

Also with white box testing it is easy to forget to add new tests when the implementation changes. For example, using the BigInt example above, imagine that the code was enhanced to use 64-bit integers (eg, because most users were moving to 64-bit processors where there would be a large performance boost). If, after changing the implementation, the test was not modified (or a new test not added) to check for overflow of the 64-bit unit then the Unit Tests would not be complete.

Conclusion

There are two points to this post.

1. Make sure you understand what someone means by white box testing. When they say Unit Tests should not use white box testing then they are probably talking about "bad" white box testing which accesses the internals of the module under test.

It is pointless
to write Unit Tests
that only do
black box testing

2. When writing Unit Tests you should always use "good" white box testing to test for different combinations of parameters, boundary conditions and other likely problem areas.

It is pointless to write Unit Tests that simply do black box testing. The whole point of Unit Tests is that they are written by whomever wrote the code, in order to test likely problem areas.

This time I will briefly describe the many benefits of Units Tests then later look at their limitations and challenges and how to overcome them.

What is a Unit Test?

The term Unit Test was once a synonym for module tests, referring to any test of an individual unit or module. Originally done manually, such as using some test rig to enter input and check output, then later by stepping through code in a debugger. This was typically only performed once a module had been implemented or an enhancement completed.

Nowadays, the term Unit Test has evolved to the stricter definition of a comprehensive set of automated tests of a module that can be run at any time to check that it works.

I'm sure you noticed that I have talked about Unit Tests in past posts, when relevant. That has meant I have mentioned them in almost every post since they have benefits in many areas of software development, particularly Agile software development. It's now time to tie everything together (and even put a nice red bow on it for Christmas :).

The idea for what are now called Unit Tests has been around for many years. I myself independently discovered them idea in 1985, giving it the possibly more accurate but duller name of "automated module regression tests" (AMRT). However, the use of Unit Tests has only really started gaining momentum in the last decade or so since the rise of XP (Extreme Programming), and TDD (Test Driven Development).

Better Design

I talked a little about how Unit Tests help with software maintenance last week, and I will summarize again below, but first I will explain how Unit Tests can even assist with the original design and implementation before we even consider changing it.

There is a lot of evidence that creating Unit Tests, at the same time as you write the code to be tested, results in a much better initial design. It helps the programmer to think about things (boundary conditions, error conditions, etc) that are often ignored in the rush to get something completed.

creating
Unit Tests ...
results in a
better
initial design

TDD (which is, in a way, an extension of Unit Testing) further improves the software quality, for example in verifiability (see Verifiability). It also assists in ensuring that the Unit Tests themselves are correct.

Another advantage is that creating Units Tests, while writing the code (eg, using TDD) means that bugs are found much earlier. This allows the design to be suitably modified before it becomes ossified.

Finally, the main advantage to the design is that Unit Tests make it much easier to resist the urge to add unnecessary code. This is the idea behind YAGNI in XP.

I could write a whole post on YAGNI (and probably will) and have also briefly talked about it in Reusability Futility. In brief, it is the idea that you do the absolute minimum to implement something.

Why are extra things done anyway? There are lots of reasons:

a future extension is thought to be certain
adding it now avoids later costs of change (see below)
something is seen as a nice addition for the user and is trivial to add
the developer wants to try something more challenging
a more general problem is solved in the name of reusability, rather than the specific one required
anticipated code degradation means it will be hard/impossible to add later
distrust of code maintainers to later add a feature properly
as an attempt to avoid repeated regression testing when things are later added
finally (and ironically) changes are made to make it more maintainable

The consequences are:

a more complex design, which is less easily understood
undocumented features which are not debugged and tested properly
when software is rewritten or just refactored, undocumented features are forgotten or inadvertently disabled - this can be very annoying for users
reluctance to refactor from fear of breaking undocumented features
can often constrain the design making future changes more difficult
usually makes the code less maintainable

Note that there is one consequence that I have not touted here. I have seen it said (for example see the Wikipedia page for YAGNI) that it is faster and requires less effort to create a simple design. This may sometimes be true but in my experience it is often harder and more time-consuming to create the simpler design than a more complex one.

Unit Tests mean that most of the anticipated problems with YAGNI disappear. You don't need to add the feature now because Unit Tests allow you to add it easily later. That is they reduce the costs of change. Moreover, the code does not degrade and maintainers are more likely to make modifications in a way that does not subvert the original design, because they don't have to fret over introducing new bugs, and good Unit Tests can even guide them how to make the changes,

Further, I have found that the use of Units Tests with a simple design makes it easier to maintain than a design which is explicitly built to be maintainable!

a simple design [is]...
easier to maintain
than a design...
built to be
maintainable!

Finally, Unit Tests free developers to concentrate on the task at hand, rather than being distracted by tasks they are not really good at (like predicting future user requirements).

Handling Change

In brief, the problem is that the costs of changing software are traditionally very high, primarily due to the risk of introducing new bugs (and the consequent problems) and the need for regression testing. These costs cause two effects:

resisting change, which is bad for the long-term viability of software
emphasis on avoiding mistakes causing an unnecessarily large up-front effort

Resisting change can result in lost business opportunities simply by avoiding adding new features. Further, the code is not refactored to improve maintainability or take advantage of a better design. Further still, improvements due to external technical advances (eg, improved hardware or software libraries), are often missed.

Worst of all, even when changes are made they are not done properly for various reasons like:

1. It may be very hard to make the change in a way that is consistent with the original design. A good example of this is given at the end of Item 43 in Scott Meyers book Effective C++, where multiple inheritance is used to avoid making the correct changes to the class hierarchy.

when changes are made
they are not
done properly

2. The changes may affect other people. For example, a library or DLL from another section may need to be changed. There can be a reluctance to ask others to make changes that might appear to be for the sake of one's own convenience. I gave an example of how this happened to me in a previous post - Why Good Programs Go Bad.

3. Often making changes properly may increase the chance of new bugs appearing (or old bugs reappearing). An example of bad code changes made in the name of avoiding bugs, is the common practice of cloning a function, or module, to handle a change, and making slight modifications for the new circumstance. This preserves the original function or module so that there is no chance of bugs being introduced in existing behaviour; but the consequence is that there will be duplicate code, which violates the DRY principle, and causes a maintenance problem.

4. Even a simple change has the possibility of introducing new bugs. Hence manual regression testing is required. This can have a large cost and is usually very tedious for those that do the testing.

Finally, the problem of spending an enormous effort up-front to get the specification right first time has many well-known, undesirable consequences. First, it makes managers very nervous when there appears to be a large cost (wages) with very little tangible evidence that anything has been accomplished. The analysts also soon realize that they don't really know what is required and/or what they are doing and that there is no chance of getting the analysis and design right first time. There is a large amount of effort which could be better spent in another way - the Agile way.

Agile methodologies generally help greatly to reduce the cost of change by catching bugs (and other defects) much earlier. But Units Tests especially enhance this aspect of Agile methodologies by:

detecting bugs earlier (reducing the cost of delay)
make code more maintainable (reducing cost of rework)
allow changes to be made properly
making refactoring easier and less risky
work as a form of "living documentation" making changes easier (see below)
even guiding developers to ensure that future code changes are done properly

Documentation

Technical documentation has always been one of the biggest problems in software development. Studies have shown (at least in Waterfall-style projects) that poor technical specifications are the major reason for project failure, followed by poor estimations (which are usually due to incomplete specs).

Before I talk about the relationship between Unit Tests and documentation, let's just look at the problems and their causes. There are a lot of problems with technical documentation which I group into three rough categories:

1. Accurate. One problem is that documents are simply incorrect, and the reviewing/proof-reading required to remove all errors would be onerous. Of course, another major problem is that they are incomplete, often with gaping holes, and no amount of reviewing is going to find a gap that nobody has thought of.

One way to attempt to overcome this has been to require people with authority to sign off on a document (presumably after having read it). That way, at least you have somebody to blame when things go wrong! Having more people read and sign-off is good (in one way) as it increases the chance of spotting mistakes but then (in another way) the blame-ability for each individual is diluted. And trying to blame is not the Agile approach.

2. Up to date. One reason documents are incorrect is because things change, usually a lot. Documents, even if correct initially, are invariably never in accord with the software at any particular point in time. One of the main reasons, is finding time to update them. You don't want to keep modifying something when you suspect it will change again in the near future. The end result is you keep postponing updating the document until it becomes irrelevant and everyone has forgotten about it, or the task is so big that you can never find time.

One reason documents are not updated is that they are hard to verify. It can be very difficult to check if there is a discrepancy between the actual software and the documentation. Even though the document and the code are very closely related there is no direct connection between, except via the brains of the developers. This is another good example of DIRE (Don't Isolate Related Entities).

3. Understandable. Finally, documentation is difficult to read. There are many reasons, like being too vague, including irrelevant information, bad grammar and incorrect terminology, etc. A common problem is assumed knowledge on the part of the reader - a good introduction/summary at the start of a document is almost never done but can be a great time-saver and avoid a lot of confusion.

The lack of a summary is symptomatic of the basic problem - the author is entirely focussed on getting all the facts down. This is like what happens with a newbie programmer - they are so focussed on getting the software working (ie, correct) they give no regard to other important attributes (like understandability). Unfortunately, document writers rarely get past the "newbie" stage, being mainly concerned with correctness not with the understandability of what they write.

Favour Working Software over Comprehensive Documentation

It is no mistake that this, the second of the four elements of the Agile Manifesto, deals with documentation. And there is no better example of favouring code over documentation than Unit Tests, since they are working software which actually obviates the need for much documentation. On top of their other advantages Unit Tests work as a form of living documentation which records not only how the code is supposed to work but also shows others how to use it.

Most documentation is poor, but even with the best you are never certain you have understood it completely. Most programmers create a little test program to check their understanding. With Units Test that little test program is already done for you.

There are many other ways that Unit Tests work as an improved documentation. By now, you probably get the gist so I will just create a list:

correct - unlike documentation mistakes are immediately obvious
verifiable - you can easily check if Unit Tests are correct by running them
understandable - working code is much easier to read/test than documentation
modifiable - Unit Tests allow code to be more readily modified
up to date - Unit Tests (if run and maintained) are never out of date

Of course, like documentation Unit Tests can be incomplete (and often are). This is something I will talk about in the next blog, but for now I will simply say that code coverage analysis can help to ensure that tests are reasonably complete and up to date.

Finally, using Unit Tests as documentation can be thought of as a form of automation. Automation is another thing I am keen on, as I hate performing tedious tasks - and reading documentation to check that my code matches it is a perfect example of a tedious task. Running Unit Tests automates this task.

Organization

Finally, Unit Tests are good for organizing your work. You develop a rhythm when using Units Tests (and particularly when using TDD), of getting into a cycle of coding/testing/fixing. Somehow it becomes more obvious what you need to do next. For example, after implementing a feature you just "fix the next red light" until all Unit Tests pass.

Summary

Once you start using Unit Tests you keep finding more things to like about them. Your design is more likely to be correct and more verifiable. Further the initial design can be simpler because you don't have to try to guess what will happen in the future. And when things do change, enhancements are made more quickly and reliably and bugs are found faster. Best of all changes can be made properly and the code refactored without fear of introducing bugs.

However, the main point I was trying to get across is that without Unit Tests I don't believe an Agile methodology can work. (This is one of my criticisms of Scrum - that it does not prescribe Unit Tests as essential to the methodology.) Unit Tests allow the software to start out simple and evolve into what it needs to be. They allow us to resist trying to predict the future.

So why aren't they universally used? First, there is the perceived cost/benefit ratio - though I think the real benefits are greatly underestimated, especially when combined with other Agile techniques. Another reason they are not widespread is due to several challenges that I will cover in my next post...

Software Development

Saturday 9 November 2013

Unit Tests - White Box Testing

Sunday 3 November 2013

Unit Tests - What's so good about them?