Tuesday 24 July 2012

Dependency Injection

Last month I explained Inversion of Control (IoC).  A related term is Dependency Injection (DI) which is another technique for assisting modularity and decoupling.  The terms dependency injection and inversion of control often appear together so that some people think they are closely related or even the same thing.  This is not true, but DI is said to be an example of IoC as well as being a way to resolve dependencies in other uses of IoC.

There are many articles on DI on various web pages and blogs, and almost all are misleading or simply wrong.  They often describe design patterns that can use DI (like Strategy, Bridge, Adapter, etc) but do not explain DI.  For example, this article gives a beautiful description of the Bridge Pattern and it's advantages but wrongly calling it dependency injection.

In Brief

In essence, DI separates the mechanism for resolving references into a separate module, which is why it is good for modularity.  Remember the callbacks (and interfaces) that I talked about in the IoC post?  These are just function pointers that can point to any function with the correct signature.  An actual function pointed to is what the code depends upon to get the job done, so is referred to as a dependency.  How the address of a specific function is assigned to a function pointer is what DI is designed for.  A separate module resolves the dependencies and injects them into the other modules that require them.

An example always helps to make things clearer. The example below is in C, as I know that several readers of this blog are very familiar with C.  The fact that it is in C might also help to dispel the fallacy that DI is necessarily an object-oriented technique.

Version One - Original Code

The example is actually based on the first C program I ever wrote, which was an MSDOS program to display a list of files.  One of the features is that it would display the files sorted in different ways (by name, size, time of last modification, etc).  The information for the files was stored in an array of structures like this:

  struct dir_entry
  {
    char name[MAXFN];
    time_t modified;
    long size;
  } entries[MAX_ENTRIES];

Note that I have left out a lot of the code including #includes.  For example, you would need to include stdlib.h, string.h, assert.h, time.h, etc.

The code to display the files looked something like this:

  /* display.c */
  static int compare_file_name(const void *p1, const void *p2)
  {
     struct dir_entry * d1 = (struct dir_entry *)p1;
     struct dir_entry * d2 = (struct dir_entry *)p2;

     return _stricmp(d1->name, d2->name);
  }

  static int compare_modified(const void *p1, const void *p2) { ...
  static int compare_size(const void *p1, const void *p2) { ...
  static int compare_file_extension(const void *p1, const void *p2) { ...

  void display_files()
  {
    /* Get files of current directory into entries[] */
     ...

     /* Sort the files according to user preference */
     enum sort_type how = options_get_sortorder();

     switch (how)
     {
     case SORT_NAME:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_name);
          break;
     case SORT_MOD:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_modified);
          break;
     case SORT_SIZE:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_size);
          break;
     case SORT_EXT:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_extension);
          break;
     default:
          assert(0);
          break;
     }

     /* Display the files */
     ...
  }

As we saw in the previous blog this is an examples of IoC.  Control of the sorting is inverted by being passed to qsort() (a standard C library function).

This is a reasonable design.  First, it separates the code for sorting the files into a separate (standard) module (ie qsort).  There is also a separate module that handles the user options:

  /* options.h */
  enum sort_type { SORT_NAME, SORT_MOD, SORT_SIZE, SORT_EXT, };

  /* options.c */
  static enum sort_order current_sort_order;

  enum sort_type options_get_sortorder()
  {
    return current_sort_order;
  }

Diagram 1.

However, the above original code did not seem quite right to me  One thing was that the DISPLAY module knows how the sorting is done.  It seemed to me that only the SORT module should know any of the details of the sorting.  For example, if a new sort option was added then the above switch statement would need to be modified, to add a new case, and a new compare function written.

Version Two - Hide Sort Details

My instinct was to create a SORT module that wrapped all details of the sorting.  In fact I am sure that at least one tutor at university advised me that whenever code has to deal with details that are irrelevant to the task, then they should just be pushed down into a lower-level function.  (I later found that this is a simplistic approach but more on that in an upcoming blog post.)

So I created a new version of the code that hid the details of sorting in a separate SORT module.

  /* display.c */
  void display_files()
  {
    /* Get files of current directory into entries[] */
     ...

     /* Sort the files according to user preference */
     sort_files(entries, num_entries);

     /* Display the files */
     ...
  }

  /* sort.c */
  static int compare_file_name(const void *p1, const void *p2) { ...
  static int compare_modified(const void *p1, const void *p2) { ...
  static int compare_size(const void *p1, const void *p2) { ...
  static int compare_file_extension(const void *p1, const void *p2) { ...

  void sort_files(struct dir_entry *entries, size_t num_entries)
  {
     switch (options_get_sortorder())
     {
     case SORT_NAME:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_name);
          break;
     case SORT_MOD:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_modified);
          break;
     case SORT_SIZE:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_size);
          break;
     case SORT_EXT:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_extension);
          break;
     default:
          assert(0);
          break;
     }
  }

Diagram 2.

This is an attempt to decouple the sorting code from the display code, but is it any better?  It's basically the same as the previous code but has simply pushed the sorting into a subroutine.

In many years of reading (and often writing :) lots of bad code, I have come to dislike this approach.  It results in software that is very hard to decipher as function calls can end up being nested dozens of levels deep.  Moreover, it often means that related pieces of code can become separated resulting in a design that is inefficient and (more importantly) difficult to understand and modify.  It also seems to encourage duplicate code as you often see similar pieces of code all over the place.

Another indicator of a bad design is the number of modules that need to be modified to accommodate a simple change.  If a new sort order option was required then, using the above design, we would have to modify both the OPTIONS module and add a new compare option to the SORT module.

Finally, a principle you should always try to uphold is DRY (don't repeat yourself).  All the calls to qsort() in the above switch statement are almost identical.  Is there a way to eliminate this redundancy?

Version Three - Separating Comparison from Sorting

Looking again at the above code...  Is the  SORT  module the best place for the comparison functions?  Sure sorting needs to know how to call the appropriate comparison function, but this is just a matter of knowing the signature of the function.  A breakthrough was realizing that comparing can, and should, be decoupled from sorting.  This is an example of the strategy pattern - the caller of the strategy only need know the interface, while the concrete strategy (the actual strategy used) can be in a different module.

In the next version of the software I moved the comparison functions to the OPTIONS module.  A pointer to the current comparison function can be obtained from the  OPTIONS module and passed to qsort.  This avoids having the extra sorting layer - now the sorting "module" (ie, qsort) has the single responsibility of performing the sort.

  /* display.c */
  void display_files()
  {
    /* Get files of current directory into entries[] */
     ...

    /* Sort the files according to user preference */
    qsort(entries, num_entries, sizeof(struct dir_entry *), options_sort_strategy());

     /* Display the files */
     ...
  }

  /* options.c */
  typedef int (* PCOMPFUNC)(const void *p1, const void *p2);

  static int compare_file_name(const void *p1, const void *p2) { ...
  static int compare_modified(const void *p1, const void *p2) { ...
  static int compare_size(const void *p1, const void *p2) { ...
  static int compare_file_extension(const void *p1, const void *p2) { ...

  PCOMPFUNC options_sort_strategy()
  {
    switch (current_sort_order)
    {
    case SORT_NAME:
        return &compare_file_name;
    case SORT_MOD:
        return &compare_modified;
    case SORT_SIZE:
        return &compare_size;
    case SORT_EXT:
        return &compare_file_extension;
    default:
        assert(0);
        return &compare_file_name;
     }
  }


Diagram 3.

Now the  OPTIONS module is responsible for determining the sort order.  If new sort options are added only that module needs to change.  However, it is still up to display_files() to call options_sort_strategy() to find out how to sort.  Relieving the DISPLAY  module of this last burden is accomplished with DI.

Version Four - Dependency Injection

Using DI, the  OPTIONS module is given control (using IoC) of setting up the callbacks.  The inject_comparer() function was created to inject the comparer dependency into the  SORT  module.  It is invoked whenever the sorting order changes and sets the default sort order during startup.

  /* config.c */
  static PCOMPFUNC current_file_comparer;

  static void initialize()
  {
    current_file_comparer = find_default_comparer();
    inject_comparer();
  }

  void config_set_comparer(PCOMPFUNC f)
  {
    current_file_comparer = f;
    inject_comparer();
  }

  static void inject_comparer()
  {
    sort_set_comparer(current_file_comparer);
  }

  /* options.c */

    ...
    config_set_comparer(new_comparer);
    ...

  /* sort.c */
  static PCOMPFUNC comparer;

  void sort_set_comparer(PCOMPFUNC f)
  {
    comparer = f;
  }

  void sort_files(struct dir_entry *entries, size_t num_entries)
  {
    qsort(entries, num_entries, sizeof(struct dir_entry *), comparer);
  }



Diagram 4.

The main difference here is that the CONFIG module has control of setting the sort order.  It injects the comparer into the  SORT  module.  The  SORT  module is just a wrapper of qsort that also stores the current comparer.  The concrete comparer functions are no longer part of the display or  SORT  module, but are set up in the config module (eg, at startup, when the user changes the sort order, etc).

The advantage is that the configuration is dynamic.  The  CONFIG  module can even discover or create new comparers at run-time.

Moreover, the existing software does not need to change when a new comparer is added.  This is great for maintainability since it avoids code changes that have to be reviewed, tested, etc, and could potentially introduce new bugs.

Dependency Injection

As we saw above DI is useful when used with the strategy pattern; but, in general, it can be used with any design that uses run-time polymorphism.  The point is that there is a separate module that resolves references at run-time and injects them into the relevant module(s).  These references can be injected as an interface (ie, pointer to a table of function pointers), or anything that allows code to be executed indirectly.  In our C example we simply used a function pointer.


One thing I particularly like about DI is that new modules can be added without changing the existing software.  As long as the new modules can be loaded dynamically then it is just a matter of configuration to inject them into the system.  Maintainability is usually the most important attribute of software and this is the ultimate in maintainability as the existing code does not have to be modified or even rebuilt.

The concept behind DI has been used in a more limited way by Windows software that supported plug-ins.  (I first saw this used in Windows 3.1 more than 20 years ago.)  Plug-ins rely on DLLs that all implement the same interface (ie set of functions).  A plug-in manager is often responsible for discovering the plug-in DLLs (eg, by looking in a certain directory) and injecting them into the relevant parts of the system.

Finally, I should clarify the relationship between Dependency Injection and Inversion of Control.  DI is related to IoC in two ways.  First, it is an example of IoC since it inverts control by passing control of setting up dependencies to the CONFIG module.  It is also often used with other types of IoC in setting up dependencies, since IoC is usually implemented using callbacks (pointers to interfaces or pointers to functions as we saw in the blog last month).

Conclusion

DI is a useful technique for decoupling of modules.  It separates out the responsibility for making connections between other modules.  However, many of the advantages often put forward for DI are not due to DI per se, but to its uses with design patterns such as Strategy, Bridge, Adapter, etc.

The true advantage of DI is that it allows better configuration.  This can go as far as discovering and using new code at run-time (as in the plug-ins we discussed above).  It is very useful for connecting up mock objects for unit tests.  Also by allowing software to be extended without any change to the original code it is great for maintainability.

Sunday 1 July 2012

SQA and Agile Methodologies


In 1993 I did an SQA (software quality assurance) course at UTS (University of Technology, Sydney).  This got me thinking about things like development methodologies, software design etc and ultimately resulted in this blog.

Since that time the idea (or at least the term) QA in software development has become very popular.  The problem is that for most companies QA is just a fancy term for testing.  In reality, QA (quality assurance) has little to do with post-production inspection (ie testing).  Now, I am not trying to denigrate test teams as they perform a valuable service and generally do a good job of what is a tedious task, but testing is not QA.  At best it is a QC (quality control) procedure.

So what is SQA?  SQA is mainly based on ideas from mainstream QA so I will explain QA before getting into the specifics of SQA.  Later I look at an apparently irreconcilable difference between SQA and agile methodologies.

Principles of QA

Process Improvement

The emphasis in QA is on the process not the product.  For example, many years of production line experience have shown that inspecting the end product for defects is much less efficient (and more error-prone) than concentrating on improving the process that creates the product.

The QA emphasis is on doing things right first time.  That is, eliminating problems in the process will reduce defects and rework, with consequent improvements in quality and productivity.  This is based on the adage "prevention is better than cure".

Further, everyone is responsible for quality not a specialized QA team (though a QA team may be responsible for training and monitoring of quality).  The target is to have zero defects through continuous improvement of the process.

Cultural change

It is well recognized that a commitment to quality by all employees requires a large cultural change in many organizations.  (Remember quality is everyone's responsibility.)  Of course, the culture of any organization starts from the top, so it is important that senior management is committed to this cultural change and empowers people to make the changes.

Customer Focus

The focus is always on the customer (ie, the end user of whatever is being produced).  All employees should have an awareness of, if not direct interaction with, the actual users of the product.  Feedback from the customer is encouraged.  Employees whose job affects the customer (ie, everyone) should not be insulated from them by layers of bureaucracy.

Quality is the always seen as suitability of the product for the customer's use, not how many bells and whistles it has.  Further, the emphasis is always giving the customer what they need, not trying to make as much money out of them as possible.  (Similarly, it's better to build a partnership with suppliers than try to milk them for the lowest price for their services.)

When all employees have a clear focus on the customer, they will work together towards the common goal.

Another aspect, is that most employees have internal customers - those people who directly depend on what they do.  Providing a good service to internal customers is also important.

Empowerment

QA promotes a management style where all employees are empowered to make decisions to improve the work that they do.  They are provided with the information and equipment to do their job properly.  Most important is to drive out fear, for example, of making a mistake or not reaching production quotas.

Communication between the workers is important, in order to have a clear understanding of their common goal.  Barriers to communication include things like performance appraisals (which essentially promote competition not cooperation), and anything that promotes fear.

Also, employees are given training to perform their work or to take up new positions and self-improvement is encouraged.  

Techniques

There are many techniques associated with QA (and SQA) but one specific approach is the idea that related processes should be kept as close as possible to each other, especially in time.  For example, if a process is performed before it is needed then when it is eventually needed it is more likely to be incorrect (and require rework).  Furthermore, people's memories of the process are not fresh which can cause delays and further mistakes.  One example is the idea of JIT (just in time) from Japan.

SQA

SQA (software quality assurance) is essentially the same as QA but with emphasis on specific areas and the addition of a few techniques appropriate to software development.

First, the emphasis on improving the process is even more important for software.  Studies show that testing finds, at best, 30% of bugs.  Any programmer can confirm that there are so many possible permutations in the use of even a simple piece of software that it is impossible to test using a black box (ie post-production) approach.

Further, the quality of the software is not just related to what the user sees but also to the effect on developer quality attributes (see my post: Developer Quality Attributes).  A programmer needs to remember that whoever ends up maintaining her code is her internal customer.

Finally, there are specific techniques promoted in SQA that are based on QA principles.  For example, code reviews are a means of process improvement and a means of improving communication.  Further many agile techniques come directly from QA principles (see below).

SQA vs Agile

When I was first introduced to Extreme Programming (XP) by a colleague about 12 years ago (thanks Sam), I dismissed it as just a rehashing of SQA principles.  Admittedly there are a lot of similarities such as customer focus and feedback, empowerment, communication and especially emphasis on improving the development process.  Even the idea of YAGNI (you ain't gonna need it) is related to the QA idea of not doing something until it is needed - which is probably why it is also called JIT-design.  (Testing also benefits from the idea - see my post: JIT Testing).

However there is one glaring difference: SQA emphasizes doing things right first time, whereas agile relies on iterative refinement.  XP, and agile methodologies, in general say that you can never get the design of a large piece of software right the first time.

So which is correct?

I believe that in this case agile is correct but in general the QA principle of right first time is sound.  Software design is an exception that proves the rule.  Many of the techniques of agile actually support the idea of right first time since they are about avoiding errors and doing things properly.  For example, unit tests (popularized by agile methodologies) are incredibly useful for reducing defects (among other things) with reduced rework (bug fixing) and increased quality and productivity.

It is simply that it is often impossible to get the design of software correct on the first try no matter how hard you try.  The iterative approach of agile is not an abandonment of right first time but an attempt to get the design right as soon as possible.  It also improves the process and gets the customer more involved which are both fundamentals of QA.

Conclusion

Just a few simple points here:

1. QA in software is not about testing but about building quality into the product.
2. Agile methodologies are based on SQA principles more than most people realize.
3. Right First Time is not anathema to the agile approach.