Saturday 25 January 2014

Unit Tests - Best Practice

In my last article I looked at the inherent difficulties of creating Unit Tests. This time I will look at how they are often poorly implemented. For each of the points I also discuss how they can be avoided, that is what is best practice.

1. False success


This is a common problem that is either generally unrecognized or not discussed. At least, I have never seen it discussed.

The root problem is caused by the fact that Unit Tests are code and code often has bugs. Generally, for Unit Tests you think that there is just a binary result - the test passes or fails, but since Unit Tests can also have bugs, there are actually four different possibilities:

     1. Code is correct, Unit Test passes
     2. Code is incorrect, Unit Test fails
     3. Code is correct, Unit Test fails
     4. Code is incorrect, Unit Test passes

The first case is obviously the desirable situation. The next two are problems that will be flagged and consequently tracked down and fixed. Of course, in both cases 3 and 4 there are bugs in the Unit Tests, but case 4 is the major problem as it is indistinguishable from case 1.

Even more insidious is when the code is correct and the Unit Test passes (case 1) then later code changes causes a bug that should be detected by the Unit Test but it still passes (case 4)!

... code changes
cause a bug ...
but the Unit Test
still passes!

To be honest, I've rarely seen this situation but I suspect that there are a large number of hidden problems of this type lurking in Unit Tests. Why? Well, in the past I have fixed quite a few bugs in unit test code which were causing the test to fail (case 3), but I cannot recall ever having discovered and fixed a bug which caused a test to return a false positive (case 4). Since both types of bugs seem equally likely it seems that there are several such problems in those Unit Tests that have never been discovered.

So how do we avoid this problem? Using TDD goes a long way to eliminating the problem. That is, write the test first and make sure it fails before you write the code that makes it pass.

2. Incomplete Tests

Probably the most common problem with Unit Tests is they're incomplete. That is, there are many cases that are not tested. In fact, studies have shown that on
the most common
problem ...
is they're
incomplete
average Unit Tests only cover about half the code (though they usually cover the most commonly executed parts). For example, much error-handling is poorly tested.

A good way to estimate how complete your Unit Tests are is to use a code coverage tool. This checks  which lines of code are actually executed when running the tests. The aim should be to have 100% code coverage. If 100% coverage is not reasonable, for some reason, then the lines that are not tested should be thoroughly code-reviewed.

It is also important to remember that code coverage is only an indicator of how thorough your Unit Tests are. Even with 100% code coverage there may be gaps in your tests. For example, consider this test:


void LessThanTest()
{
    BigInt op1 = new BigInt(1);
    BigInt op2 = new BigInt(2);

    AssertTrue(op1 < op2);  // 1 < 2 is true
    AssertFalse(op2 < op1); // 2 < 1 is false
}

This code may have 100% code coverage but it still does not test the boundary condition where both operands are equal. We also need this test:



    AssertFalse(op1 < op1);  // 1 < 1 is false


3. Poor Code


Some developers have the perception that test code is not as important as application code and does not have to be written to the same standard. This may be a result of conditioning from earlier times where test rigs were written then discarded.

Or it may be a consequence of the "customer-centric" culture of most development. Unit Tests are not part of the application so are of no concern to the customer hence not important.

Forget that. It is just as important for Unit Test code to be well-written and maintained as it is for any code.

This is discussed at length in Clean Code by Robert C. "Uncle Bob" Martin, in the section Keeping Tests Clean (page 123). In that, Uncle Bob explains how a team created quick and dirty test code which made their tests unmaintainable which led to the Unit Tests being abandoned which in turn led to application code that stagnated because it could not easily evolve.

My experience is very different to Uncle Bob's. I have found that Unit Tests rarely need to change as the software evolves (assuming they test one simple thing - see my next point). Instead new tests are created to test new functionality, leaving the old tests unchanged to test old functionality. Only when existing behavior changes do existing tests need to change; but this usually requires a rewrite of the module including completely rewriting all the tests.

Of course, I agree completely with Uncle Bob that Unit Test code should be kept as clean as production code. However, in my experience this is not a major issue as good coders habitually create good code irrespective of where it is used.

The moral is that test code should be of good quality -- but there are many aspects to the quality of Unit Test code. For example, Unit Tests usually need not be optimized; on the other hand, they should not be too slow otherwise they will be disabled (see item 6 below). I discuss other aspects of writing good Unit Tests in the following sections.

4. Multi-purpose Tests

I have seen a lot of tests that are too long, trying to test many things. The problem is that if the behavior of one thing changes the whole test may be affected even though most of what is being tested has not changed. You may have to rewrite a large test just for one small difference.

A better way is to have a separate Unit Test for each piece of functionality. One school of thought is that there should only be one assertion per test. However, I think this is unnecessary and even complicates things. I prefer Uncle Bob's rule of one concept per test (see Clean Code page 131).

Why are large cumbersome tests created? One argument given to me for creating a large multi-purpose test is that the set-up and tear-down would have to be repeated if the code was split up into multiple tests. And copying and pasting the same set-up and tear-down code into multiple tests would be a violation of the DRY (Don't Repeat Yourself) principle.

My original counter-argument to this was that it's better to have multiple simple tests and violate DRY since the test code is not production code. However, this goes against my previous point that Unit Test code is not "2nd class", and hence should not violate DRY. On further consideration the obvious solution is to place the set-up and tear-down code into separate functions that are called as necessary from each of the tests.

It's important to note that one concept per test does not preclude having multiple tests per concept. In fact, it is quite normal to have a lot of almost identical tests verifying different aspects of the same concept.

In summary, each Unit Test should only test one concept. As a consequence you will have a great deal of very short tests, but that is a good thing. This makes it easier to find the test(s) related to some functionality, and makes it easier to modify, add and delete tests when required.

5. Tests that depend on other tests

Sometimes you see tests that call other tests in order to share code. Additionally, tests can unintentionally depend on side-effects of other tests, such as the existence of a file or other properties of the environment.
 Like the last point this is also bad for the maintainability and understandability of your Unit Tests.

To avoid this, without violating DRY, you can extract infrastructure (set-up and tear-down) code into separate functions which are then called from different tests. These infrastructure functions should never test anything themselves.

It also sometimes happens that a test has side-effects that subsequent tests will depend upon, whether intentional or accidental. A way to detect this is to initially runs tests individually. Also, if your test framework allows, let it run the tests in random order.

In brief, tests should be self-contained and independent of other tests.

6. Tests with external dependencies

As we saw above, having tests depend on other tests is not a good idea. A similar problem is when tests depend on some external module or function. Sometimes you need a test double to stand in for an external dependency.

How far you go with creating test doubles is a matter of judgement and considerable debate. One school of thought is that all external dependencies should be eliminated. This may necessitate creating a lot of test doubles and in my opinion this is not always necessary. For example, you rarely need test doubles for the operating system or CRT (compiler run-time) library calls.

Apart from avoiding troublesome externals, test doubles have another use. You may need to use them to simulate conditions that the external does not normally produce. I talked about this at length last month

My rule of thumb is if the external is fast, reliable and predictable you don't need a test double.

An example of when you would use a double is if it takes a long time since you risk that the tests will run too slowly and be disabled. Also, if the externals have bugs you don't want to spend time investigating failing Unit Tests only to find that they they are caused by external bugs.

In summary, use test doubles when external dependencies cause problems (or error conditions need to be emulated). See my post last month for more information on why, when, and how to use test doubles.

7. Black box testing

black box testing
is the
wrong approach
An important thing to remember is that the tests should be written by whoever wrote the code in order to take advantage of knowledge of how it works internally. Employing black box testing (where the tester knows nothing of the internals) is the wrong approach since, even with the simplest modules, testing every combination of inputs is simply not practical.

In brief, Unit Tests need to be written by someone familiar with the internals of the SUT. This is commonly referred to as White Box Testing. I talked about this in detail in my previous post on White Box Testing.

8. Delayed Test Writing

In the previous point we saw that Unit Tests should be written by the developer(s) who wrote the SUT. A similar requirement is that the Unit Tests should be written at the same time as the code they are testing.  This is another example of DIRE (see my blog next month for more on DIRE).

If writing the tests is delayed then the code will no longer be fresh in the developers' minds and things will be missed. In my experience this will result in poor and incomplete tests (if they are even done at all). Further, it greatly improves the verifiability of the code (see Verifiability).

In summary, Unit Tests should be written at the same time as the code being tested to take advantage of information only understood/remembered at the time. Ideally the developers should practice TDD (see point 1 above), where the tests are written before the code.

9. Tests not run regularly

Tests need to be run regularly. When found, problems with the tests need to be fixed immediately. I saw one project where the Unit Tests had not been run for some weeks (perhaps months). When I ran them more than half failed due to a change in behavior. This wasted quite a bit of time while we worked out what had happened.

It may seem obvious but tests should be run regularly and maintained to be consistent with the application code that they test.

10. No High-Level Tests

One final problem is when lovely Unit Tests are written for low-level modules but "high-level" modules are not tested. In many designs larger modules are built from smaller ones. In other designs some modules communicate with many other modules and are not easily tested.

In these cases, even though they are less amenable to Unit Tests, these modules should still have Unit Tests. For example, a high-level module that communicates with a lot of other modules may require a lot of test doubles.


Consider the analogy of a plane engine. An engine is composed of many parts each of which is thoroughly tested before being incorporated into the final engine design. For example, the spark plug supplier would have "Unit Tested" the spark plug design in isolation from the engine in which it is to be used.

So all parts are tested before being incorporated into the prototype for the engine. However, just because all the parts are thought to work correctly does not mean that we should not test the engine in toto before incorporating it into the plane prototype.

Too often, in software development, you see lot of low-level Unit Tests that test things like the "spark plugs". Then you have high-level acceptance tests - which is the equivalent of a test flight (with the engine already in the plane). What is missing is testing at the levels in between. Some people might call this integration testing (testing that module communicate with each other correctly) but it is more than that.


Don't stop creating tests when you get to higher levels in the design.

Summary

Unit Tests are extremely worthwhile but only if implemented properly. You need to watch out for the above 10 pitfalls and try to follow the corresponding guidelines (in red).

This is almost my last post on Unit Tests, which started last October with the post on Change.

In my next post I will summarize all the things I have said about Unit Tests and highlight the salient points.

Sunday 5 January 2014

Unit Tests - Challenges

Happy New Year! Here I return to my series of posts on Unit Tests.

I guess you know by now that Unit Tests have many advantages, so why isn't everybody using them? Well, you'd be surprised how many people are using them! But the uptake has been slow, probably due to ignorance of their benefits (already covered) and various "challenges". This week I will explain the challenges, why they are far outweighed by the benefits, and how to overcome them or at least lessen their effect.

First, I will mention something that is a major challenge in itself - actually implementing Unit Tests properly. Poor practices are the main reason that Unit Tests are rejected after they have been tried. (I had intended to cover this now but it will be in the next post Unit Tests - Best Practice which will include a discussion of poor practices like incomplete test coverage, etc -- so stay tuned!!)

1. You need to Understand the Code

The first problem with Unit Tests is something that nobody talks about - except that some people mention it as an advantage of them. (Maybe it is an advantage sometimes, but in my experience it is a drawback.)

In C, you can write code and know it will just work without truly understanding it. The design of the language, such as proper handling of zero (see Zero), asymmetric bounds (see Asymmetric Bounds), etc makes coding fast, and reliable. But with Unit Tests you suddenly have to understand things like boundary conditions or different combinations of inputs.

In this example I assume the coder knows the correct behavior of strncpy() - eg, that the destination string is not nul- terminated if the length is exceeded.

BTW As a test I wrote the code for strncpy() in 81 seconds.
For example, a C programmer can typically write the code for the standard library routine strncpy(), and get it right, in a minute or two. However, to consider all possible values and combinations of input parameters and create Unit Tests for them all would take at least an order of magnitude longer - at least 30 minutes, probably more.

Despite this I still believe that Unit Tests are more than worthwhile. (Though this was the main reason that I initially rejected them - see my previous post on Personal Experiences). First, for larger modules actually trying to understand how the code might react to different combinations of inputs may reveal bugs and lead to better code (see the Design section in Unit Tests - Advantages).

For modules that are likely to change (ie, most) there is an even more important consideration: you can change the module to your heart's content and you can simply run the Unit Tests to ensure that bugs have not been inserted. This can more than compensate for effort needed to understand the code and create the Unit Tests.

Even the code for strncpy(), mentioned above, might need to change. For example, it may have to be rewritten to copy 4 (aligned) bytes at a time for better performance. In this case having Unit Tests can be very useful for checking that the code still behaves correctly. (Of course, you probably need to add new tests after this change, for example that 4 character and 5 characters strings, strings with different alignment, etc are copied correctly.)

Finally, there are tools available to address this problem such as the excellent tool from Microsoft called Pex. Pex analyses your code and works out different sets of test values, for example, to check boundary conditions. It is not a fully automated system so you still have to understand the code, but it can save a lot of time.

2. They Take Time

There is no getting around it - writing Unit Tests takes time. Writing good Unit Tests can take much more time than the time to write the code being tested, for the reasons discussed below and next week. Many proponents of Unit Tests say that once you are used to writing them that it does not add appreciably to the time to write the code, but either they are not writing tests properly, are lying, or they know something that I don't.

Of course, you understand by now that, even if they take time, this is not wasted time. In the long term this is time exceedingly well spent. The trouble is you can't always control how you spend your own time. Many managers would not agree to anything that adds 10% to development time, 50% extra would make them very angry, 100% would be unthinkable - so imagine their reaction if you tell them you need to create some mock objects and a meta language for your Unit Tests which will add 500% to development time!

One day it will be standard practice to create Unit Tests (with full code coverage) as an accepted part of the process of creating software. However, until that time you either have to pray for a good boss, work overtime creating Unit Tests, or just grossly overestimate to give yourself some free time.

The good news is that there are a lot of tools and libraries around to assist with creating Unit Tests. I mentioned Pex above, but there are also tools that make it easy to create mock objects. These can save a lot of time.

3. They Require a Good Design

Before all else, in order to create Unit Tests you need a good design. (Of course, that is not the only advantage of good design - see Fundamentals of Good Design.) You need well-defined modules, each with simple well-understood interface(s). This is one reason it is hard to add Unit Tests to legacy code (see the next section).

There are different ways to split any design into modules. The way this is done can also affect how easy it is to test them. For example, the MVVM (Model, View, View-Model) design pattern is very useful for isolating the GUI from the rest of this system which makes it much easy to test the whole system in isolation from the GUI.

There are other aspects of the design which are rather specific to Unit Testing. For example, Dependency Injection (DI) makes it very easy to test modules with mock objects (see the section after next).

Finally, even just creating Unit Tests while you write the code (especially if you use TDD) will result in a better design.

4. They're Hard to Retrofit

Like a lot of programmers I have spent a large part of my career maintaining and modifying existing code (and cursing the bad to egregious design I had been lumbered with :). I have been quietly pushing the idea of Unit Tests to management at various employers for about 20 years - to which I get the obvious response: "Add some Unit Tests to our existing code to demonstrate the benefits".

Unfortunately, adding Unit Tests to existing code is difficult to impossible. Some people have reported some success in retrofitting Unit Tests to legacy code but in my experience these tests are not that useful (probably missing at least 80% of test cases).

A major problem for many legacy projects is that they do not have well-defined interfaces which is mandatory for being able to add Unit Tests (see previous section). Some do not even have any recognizable modules at all.

Even if you are lucky enough to have a well-designed legacy program there is still another problem. Only when the code is being written is it well understood. Even the person(s) who originally wrote it will begin to lose some understanding of it within a few days.

Unit Tests should be written at the same time as the code by the person who wrote the code since they (at that time) understand it best.

5. Often Need Test Doubles

The main reason that Unit Tests are not used for many new projects is that modules/objects have external dependencies that make it difficult or impossible to create Unit Tests without creating code to emulate those external dependencies. There are various ways that this emulation can be accomplished but they all take time and/or have other limitations. All methods of emulation nowadays fall under the umbrella term of test doubles, but there are various types (simulators, stubs, mock objects, etc) with different applicability and advantages/disadvantages as discussed below and in my next post.

There are
two reasons
to use
test doubles
Why do we need test doubles?

There are two general reasons to use test doubles. First, there may be undesirable consequences of using the real module which we want to avoid. Second, we may want to test conditions which the real module does not normally produce.

Here are examples of modules that require test doubles due to their undesirable behaviour:
  • communicates with a remote server and so is much too slow to use in a test
  • modifies a live database, such as updating customer details
  • uses resources, such as a printer module that consumes paper
  • depends on an environment that may not exist where the Unit Tests are run
  • sends test emails to real customers
  • requires manual intervention, such as clicking to confirm a message
  • returns different results each time it is used, such as stock values
  • does not even exist, for example a sub-module that has not been written
  • is not the final version, so has defects (bugs, poor performance, ...)
Here are some examples of external modules that require test doubles in order to simulate certain conditions:
  • hardware device driver, so you can simulate any possible error condition
  • system time, so that you can test what happens at midnight
  • remote communications, so you can test time-outs and other errors
  • return out of range or atypical values

Problems with Test Doubles

That's enough about explaining what test doubles are. Now, what are their problems? First, though often necessary, they can take a lot of time to create. Here again tools, like mock objects, can save time but may have their own limitations which in turn makes it difficult to decide what to do.

Even deciding whether or not to use a test double can be difficult. In my experience it is better not to use a test double, unless you really have to. By not using a double you may then discover integration problems caused by communication between modules that you would otherwise not be found until much later.
SUT    

The module or unit being tested is conventionally called the System Under Test. I will use the acronym SUT.

On the other hand many people recommend using test doubles for all external dependencies to avoid chasing problems not related to the SUT. This appeals to those of a scientific background since you isolate the SUT from any external influence that may affect your test results; that is, it reduces complexity by eliminating unknown variables.

I personally am of the opinion that it all boils down to how reliably the external module performs its task, and equally important how well-defined is its interface (which goes back to the point 3 of having a good design). If it has bugs then it may be better to use a test double in order to avoid chasing bugs not caused by the SUT. Another possibility is to create the same test twice, first using the real external module and also using a test double so it is obvious where the problem is occurring.

Another problem to watch for with test doubles is that they do not return the correct or expected results even when sent the wrong information. The problem of "false positives" is very common

The final problem that I will mention with test doubles is deciding which type to use. For example, it might be simple to use a mock object but in some circumstances mock objects are not a good idea as they can end up testing how the SUT is implemented internally. As I have mentioned before (in Unit Tests - White Box Testing) testing the implementation is a bad idea since changing how a module is implemented without changing its public interface should not invalidate your Unit Tests. This is a common mistake with mock objects that I will discuss next week.

Solutions

Mock objects are a neat solution for quickly creating test doubles. There are many mock frameworks that allow you to create a "mock" object on the fly that accepts function calls and responds with a canned response. (How easy this is depends on the framework and the language being used.) But note that with complex tests even setting up mock objects can be time-consuming.

There is also a danger to using mock objects -- as mentioned above, you may end up testing how the SUT is implemented not simply its external behaviour. I will discuss this next week but in brief it depends on the type of module the test double is created for. If you want to test how a module communicates with other peer modules then mocks are ideal. If the module is a private (subservient) module you normally want to check "what, not how", and avoid mocking.

An alternative is to use a simulators (some people call them "fakes") but the problem is they may take a lot longer to create than mock objects. An accurate simulation may take almost as long to create as the module it is simulating, while a trivial simulation may be useless. My advice is to start with a simple simulator and enhance it if and when required.

Another alternative is to ask the supplier of a 3rd-party module to provide a simulator. This can be a perfect solution as it will save you time and give more accurate results (since they should know better than anyone how their software is supposed to behave), and they can provide the same simulator to other customers. This practice is sometimes used by hardware manufacturers but could be used for any sort of module. It couldn't hurt to ask.

If the module to be simulated is developed in-house then another alternative is to clone the original and hollow it out into a shell that emulates the real thing without actually doing anything. You can also add the ability for the simulator to return possible error conditions so you can test error-handling.

6. Need Provability

Another problem with Unit Tests is that even if you have lots of tests and they all give a green light you still cannot be sure that they are adequate. People are lazy and/or make mistakes. Analysis of typical Unit Tests show they cover only about 50%-60% of the code.

Like all the problems I have mentioned here, there is a (partial) solution. A code coverage tool can tell you how much of the code your Unit Tests are executing. The target should be to get to 100% code coverage, or as close as possible. (Of course even 100% code coverage does not mean that there are no bugs.)

7. Poorly Implemented

Many shops try Unit Tests but strike problems due to poor practices. For example, if the tests are too slow people will stop running them. There are lots of pitfalls like these. I will discuss them next time, as well as how to look out for and avoid them.

Conclusion

That concludes the discussion on the inherent problems of Unit Testing. In summary, in order to create good Unit Tests you need a good design and to write the tests at the same time (or before) writing the code to be tested. They do take time to create but the benefits far outweigh the costs. Plus there are many time-saving tools and techniques such as mock frameworks, though you need to be aware of the limitations of any tools and techniques you use.