Saturday 14 December 2013

The Gas Factory Anti-Pattern

I know I promised to talk about the challenges of Unit Tests next (and I will soon) but I thought it an opportune moment to discuss a growing problem that greatly benefits from Unit Tests. I consider it to be one of the worst anti-patterns as it is largely unrecognized and has dreadful consequences for maintainability of software.

Luckily, most anti-patterns are occurring less frequently as more and more developers learn how to design and build software better. However, this anti-pattern
I sometimes discuss anti-patterns that I think are not well known or under emphasized. For example, see Layer Anti-pattern. If you don't know what anti-patterns are see the Wikipedia page.
seems to be on the rise. It also seems to occur most commonly in overtly well-designed code. I am talking about what some people have called the Gas Factory anti-pattern.

The Problem

Essentially the problem is caused by code that tries to do too much or be too flexible at the expense of being overly complex for the task for which it was intended. (I used to call it the Swiss Army Knife anti-pattern but Gas Factory has better negative connotations. :) This strikes at the heart of software design - see my first blog post Handling Software Design Complexity.

I first encountered this problem about 20 years ago (though it did not have a name until recently). I believe it is mainly due to the fact that about that time (or earlier) code reuse was being pushed as the software development "silver bullet" and a generation of designers got into the habit of trying to make their designs more general and flexible. But this flexibility comes at the largely unappreciated cost of complexity (and reducing complexity has always been my number one aim in life :). I have already covered a lot of this in my blog on the problems of making code reusable at Reusability Futility, so I won't go into that again.
Doing a Google search it seems that some people think that the Gas Factory is caused by the code bloat of a very inexperienced programmer. (You know the sort of code you would write when you first started to learn to program.) However, this is simply due to inexperience.

The Gas Factory anti-pattern actually tends to affect experienced developers more than inexperienced ones.

The problem is that the "best" designers are often the worst offenders. Just look at the horrendous Windows API which I have heard was created by very experienced developers. (Luckilly .Net is much better.) The practice is often motivated by the best of intentions and on the surface may even seem like the right thing to do, but the side-effects can be severe.

Thought Processes

I have identified some thought processes that I, and others, have been guilty of in the past that has caused the problem in some designs I have used:

1. An ambition to build some general purpose library that just happens to solve the problems at hand as a subset of a much more general set of problems. The cost is a library that has many disadvantages (longer to develop, harder to verify etc) but the main cost is in the increased complexity for anyone using or maintaining it.

The way to avoid this is to focus on the problem at hand and not on any more general problem, or possible future changes. This behavior is the motivation in Extreme Programming for the rule that you should never code for the future, only the immediate requirements (see YAGNI below).

2. When coding, a temptation to add some unneeded facility now, for a perceived negligible cost which would be hard to add later and (it is assumed) would surely be needed. I actually believe that it is sometimes a good idea to add such "free" features while the code is fresh in your memory as long as you can squeeze it into the sprint and it doesn't become an undocumented feature.

However, it may be a good idea to disallow this as a general policy of YAGNI. Prohibition reduces the chance of unauthorized changes and undocumented features creeping in, which can become a maintenance problem down the track.

The crucial part of this thought process is the phrase "hard to add later". Here again, Unit Tests (see previous post on Unit Tests) can help by making later changes much easier.

3. The desire to test some new technique or technology, even though its use will complicate the design for no real advantage. This is the worst reason, at least morally, as it places self-interest ahead of the interests of the customer or company. I am appalled that this practice is extremely common in the industry, since it is often tolerated or even goes unnoticed.
it places
self-interest
ahead of...
the customer

This problem is not often discussed but I am very glad that I read about it recently in "97 Things Every Software Architect Should Know". In fact it is "thing" number 1: Don't put your resume ahead of the requirements by Nitin Borwankar.

Of course, it may not be done consciously. It is often due to something the developer has recently read. They may, at that brief moment in time, honestly believe that trying some new pattern, practice, tool or algorithm is the best way to accomplish the task. However, I still feel that at the back of their mind their is some nagging guilt that it is not in the customer's best interests.

YAGNI

I have talked about the XP (Extreme Programming) principle of YAGNI (You Ain't Gonna Need It) before. To recap, this is a policy of only ever building what you know is needed right now. Actually many Agile proponents go further: only ever add the smallest incremental change that still leaves you with working software. YAGNI puzzled me at first but now I realize that not following this idea resulted in much of the problems and frustrations I have had to overcome in many projects in the past few decades.

Many of the libraries and tools I have worked with (and created too) have been overly complex by trying to anticipate future changes or other uses that almost always never eventuate. One former colleague, after spending many months creating such a library to be used as the basis of another program (hi Alex) described it as like "building the foundations for an airport runway, only to put a shack on it".

The counter-argument against YAGNI is that it will result in software that is harder to modify, because you are not allowing for future changes. First, I will say that often the simpler software ends up being easier to modify than the software that was designed to be modified. This is my experience, and I'm not sure why, but it may be that changes rarely eventuate as anticipated. However, I do agree that there is a lot of validity to this counter-argument but I have an ace up my sleeve...
  Unit Tests...
free you from the
burden
of thinking about
the future

There is also one final, crucial part of the puzzle: Unit Tests! If you have a full set of Unit Tests you can modify with impunity. This frees you from the burden of thinking about the future so you can concentrate on the important thing - what the customer wants right now.

Conclusion

The Gas Factory anti-pattern is growing. The unnecessary complexity it causes has many bad effects. First and foremost, it makes the the design of software hard to understand which makes it hard to build and maintain. Moreover, the reasons for doing it, are obviated by the use of Unit Tests.

If not (yet) the most serious bad design practice then it is the most insidious. It is so insidious that many people are not even aware of it -- it was even recently removed from the Wikipedia page on anti-patterns because somebody thought it was a furphy.

Saturday 7 December 2013

Unit Tests - Personal Experiences

Blogs are supposed to be about personal experiences, so I'm going to talk about my own personal experiences with Unit Tests, or as I first called them Automated Module Regression Tests. (This continues my coverage of Unit Tests which started last month and will continue next week when I talk about the problems of Unit Tests and how to overcome them.)

Discovery

Like a lot of people I discovered Unit Tests a long time ago but did not realise there full importance for a long time.

In 1985 I was working on several MSDOS projects for a small software company called Encom. We were using Microsoft C version 2.0 which was a re-badged version of Lattice C. In 1985 or 1986 Microsoft released version 3.0 of their C compiler (which I'll abbreviate as MSC3) which was, in fact, their Xenix (MS variant of UNIX) C compiler ported to MSDOS.

I had the job of rebuilding all our software with the new MSC3. This taught me a few subtleties of portability (especially about writing structs to binary files). For example, when Lattice C creates the memory layout for a struct it builds its bit-fields top-down (ie, using the top bits of the bit-field storage unit first), whereas MSC3 (and most C compilers) build bit-fields from the LSB up. This caused a bit (no pun intended) of havoc with our binary files.

Luckily, apart from the bit-fields problem and a few places where it was assumed that char was unsigned, our code was very portable, but one problem was that I'd used some Lattice string routines. The MSC3 run-time library instead had most of the "de facto" string routines found in UNIX compilers and omitted to implement the Lattice ones. (Microsoft provided a header file called v2tov3.h to assist with portability but it was pretty much useless.)

Lattice String Functions       

Note that I found the Lattice string routines very useful and better thought out and named than the MSC3 (ie, UNIX) string routines. For example, routine names had a prefix indicating what was returned (which connoted not only the type but the purpose of the value). For example, a function returning a pointer to a string had the prefix "stp". 


This is strangely reminiscent of Hungarian notation (ie, Application Hungarian not System Hungarian) that Microsoft revealed soon afterwards.

A few of the Lattice string routines had no equivalent in MSC3, so I had to rewrite these routines for the new C compiler. I gave each of the Lattice functions its own source file and, as I believe was common practice even then, I added a test rig at the end of the source file, using #ifdef  TEST, something like this:

/* stpblk - skip leading blanks
 * port of Lattice func to MSC3
 * ... */
#include <ctype.h>
#include <assert.h>

char *stpblk(char *str)
{
   assert(str != NULL);
   while (*str != '\0' && isspace(*str))
      ++str;
   return str;
}

#ifdef TEST /* use -DTEST cmd line option to build test rig */
int main()
{
   char buf[256];
   do
   {
      gets(buf);
      printf("stpblk on <%s> returned <%s>\n", buf, stpblk(buf));
   } while (buf[0] != '\0');

   return 0;
}
#endif /* TEST */

This test rig allowed me to enter a test string and see the result of the call to stpblk(). That way I could do a lot of exploratory testing to check that the function was working correctly.

It occurred to me that it would be more thorough to create a complete set of test cases and code the tests directly, rather than the more haphazard approach of manually trying all sorts of different values. That is, something like this:

/* ... as above */
#ifdef TEST
int main()
{
   assert(*stpblk("") == '\0'); /* Test empty string */

   assert(*stpblk(" ") == '\0'); /* just whitespace */
   assert(*stpblk(" ") == '\0');
   assert(*stpblk("\t") == '\0');
   assert(*stpblk("\n") == '\0');

   assert(strcmp(stpblk("a"), "a") == 0);
   assert(strcmp(stpblk(" a"), "a") == 0);
   assert(strcmp(stpblk("\ta"), "a") == 0);
   assert(strcmp(stpblk("a "), "a ") == 0);

   assert(strcmp(stpblk("abc"), "abc") == 0);
   assert(strcmp(stpblk(" abc"), "abc") == 0);
   assert(strcmp(stpblk("abc "), "abc ") == 0);
   assert(strcmp(stpblk("a b c"), "a b c") == 0);
   assert(strcmp(stpblk(" a b c"), "a b c") == 0);

   assert(strcmp(stpblk(" \xFF"), "\xFF") == 0);

   stpblk(NULL); /* should cause assertion */

   return 0;
}
#endif /* TEST */

I could then run the tests on the original Lattice compiler functions and my supposedly equivalent versions of the same functions to make sure they produced the same result. Further, if the function later needed to change (eg, to make it faster) I could simply re-run the tests to make sure it still worked. However, there were a few difficulties:
  • sometimes creating the tests took ten times longer than writing the code!
  • none of the tests actually found any bugs
  • there was no way to write tests to check that assertions worked without manual intervention (see last test above)

First Appraisal

Of course, my next thought was maybe we should do this for other functions/modules in our software. However, after some thought I rejected it because:
  • it was "impossible" to write tests for most of the code as it was mainly UI
  • a lot of code interacted with hardware which would require simulation
  • the cost/benefit ratio would not make it worthwhile
  • but mainly, the tests did not find any bugs

Another Attempt

I put the idea aside for awhile. I started working for a larger company (AMP) in 1986, and most of my work was user-interface stuff, and I assumed that automated tests like these would be impossible to create for that sort of code.

However, after a year or so I was given charge of a team with the responsibility of creating a library of modules designed to model many of AMP's policies (initially some new superannuation products but eventually we were to model many others such as insurance products, etc).

To me this seemed like an ideal application of my earlier discovery because:

  • each module was concerned purely with calculations - there was no messy interaction with the user, hardware, 3rd party libraries, etc
  • the calculations were complex and it was important to check that they were correct for all boundary conditions
  • the modules were likely to change as the company actuaries sometimes changed the rules
  • sometimes the initial code was far too slow - we would need to check that nothing was broken after optimization

It was around this time that I invented the term Automated Module Regression Tests (AMRT). I tried to promote the idea to management. Unfortunately, they were not keen on the idea as I admitted that it would at least double development time. Another problem was that the company had only recently embarked on a automated testing project which had been a major failure and anything that suggested "automated testing" was looked on with scorn. Finally, the technical manager of the section (PD) was very keen on some nonsense called "object oriented programming" and was not to be distracted from OOP by other silly ideas.

SQA

I left AMP (and C coding) soon afterwards. My next job was mainly working with assembler and performing sys admin tasks (UNIX), though I still did a bit of system programming in C.

None of these areas gave me any opportunity to try out AMRT, but, in 1993, I did a postgraduate course in SQA (software quality assurance) and I came to appreciate many of the subtleties of testing. (Only a small component of the course dealt with testing since quality assurance is about a lot more than testing.) Some important points on testing that I learnt:
  1. It is important to test code as soon as possible (see JIT testing) to avoid all sorts of problems (see Cost of Delay in a previous post on Change).
  2. Studies have shown that typical (Black Box) testing finds only 25% of all bugs.
  3. Studies show even the most thorough user testing finds less than 50% of bugs.
  4. It is impractical to test all combinations of inputs even for simple modules.
  5. Programmers should test their code thoroughly as only they know how it works.
  6. Code coverage tools should be used to check that all (or most?) of the code is executed during testing.
  7. Ongoing changes cause most bugs even in projects with a low initial bug count.

However, no mention was ever made about Unit Tests. To me it seemed that AMRT was an ideal solution to many of these problems. First, the tests can be written at the same time as the code - an example of JIT. Moreover, because it is done by the original coder it can be directed at areas where there might be problems (or where future modifications may cause bugs). But the major advantage I saw is that it allowed the code to be modified with greatly reduced chances of changing existing behavior, ie creating new bugs.

In 1994 I got to work as the Team Leader on a new project. This gave me an ideal opportunity to try several new things that I had been wanting to try:

* C++ development
* Barry Boehm's Spiral Development Methodology
* AMRT (ie, Unit Tests)

The project was a major failure but I did learn a few things about AMRT. First, they made making changes easy. Second they could work as a form of documentation.

Agile

In 1995 I rejoined Encom to get back to programming (in C and later C++). Again I tried to convince others of the benefits of AMRT. Then in the late 90's a colleague (SR) introduced me to XP (Extreme Programming). I actually wasn't that impressed with XP when I first read about it as it mainly seemed to be a re-hash of many ideas I had already seen proposed in the SQA course. However, after a while I did realize that at the core of XP there were some new ideas which seemed to address a lot of problems that I had even recently encountered in software development.

One of the practices in XP was Unit Tests. I did not realize for a while that what was meant by Unit Tests was exactly what I called AMRT. And I don't believe that XP emphasizes Unit Tests enough.

Conclusion

I took a very circuitous route to come to a full appreciation of Unit Tests. Initially I did not appreciate their worth. When I tried them they did not find any bugs. When combined with the obvious costs of creating them they did not seem worthwhile.

Over many years I slowly came to realize that the benefit of Unit Tests is not that they find bugs, but it is that you can run them at any time to ensure that the code behaves correctly. This allows you to refactor code without fear. It also allows the software to easily evolve which is fundamental to the Agile approach. There are other benefits which I mentioned here last month. Further the costs can be overestimate and mitigated by tools and techniques which I will talk about next month.

What is very surprising to me is that Units Tests are not given more prominence in Agile development. Having read many books on Scrum I have not seen them mentioned once! Even in XP they are not explained well and not emphasized enough.
   Unit Tests...
allow software
to evolve

A fundamental aspect of the Agile approach is that the code keeps changing in response to feedback from the users of the software. Unit Tests are absolutely crucial to this approach since they enable the changes to be made that allow the software to evolve.