Monday 25 February 2013

Ignore Divide By Zero

Over 30 years of reading and writing C code I have seen (and written) a lot of code which tries to detect and recover from division by zero. Generally, this is a mistake.  In almost all cases it is better for the run-time to terminate the program (perhaps first generating an assertion). I have tried to explain this to colleagues (even experienced ones) and they think I am crazy, but please hear me out.

Note that I am not saying that you should be oblivious to the possibility of division by zero. I am simply saying that trying to recover from it at the point of the error is usually a mistake.

Conventional Wisdom

When I first began learning about programming (at Sydney University in the late 1970's) the first thing that was drummed into us was that it is not enough to get your program working. You also need to cater for error conditions. (I wholeheartedly agree with this - lack of proper error handling is by far the biggest blight on C software.)

The first example my lecturer gave of error-handling was division by zero.
Floating Point Division By Zero

I am talking about integer divide by zero, not floating point, unless otherwise stated. However, the same arguments generally apply to floating point calculations.

The major difference is that floating point division by zero usually will not terminate the program but generate an infinite result. (Most implementations nowadays provide floating point numbers that include positive and negative infinity).
The guidance was simply to make sure it cannot occur. A few years later I became an aficionado of defensive programming and my policy became that whenever I used division I needed to add extra code to check for the possibility of the divisor being zero and somehow recover from the situation - usually simply by setting the result of the operation to zero.

Most experienced C programmers take this approach, but I have found that this is almost always the wrong approach. Let's first look at the possible situations where division by zero may occur then consider each one in more detail.
  • a bug causes the divisor to take a zero value when it should never be zero
  • a zero divisor resulting from user input
  • incorrect data from an external source
  • in rare cases division by zero may be mathematically valid and handled specially
Bugs

Most of the time the problem occurs due to bugs in other parts of the code that have slipped through. It is often argued that the problem should be detected and handled something like this:

  if (numRequests > 0)
    aveTime = totalTime / numRequests;
  else
    aveTime = 0;        // Bad idea

The problem with this is that the bug is now silently hidden. Perhaps this is good for the final release of the software but is certainly not good when debugging and testing. It's better to find and fix the bug than to cover it up. This is the problem of defensive programming which I talked about in a previous post.

Personally, I would just leave the test out altogether and let the run-time system terminate the program. This is the fail fast approach. But I would also add an assertion, especially if using floating point values, since some implementations may not terminate but instead generate infinity, which is probably not what was desired.

  assert(numRequests > 0);
  aveTime = totalTime / numRequests;

This should be adequate with good design, thorough testing and software that is adequately verifiable (see my post on verifiability), but with badly written software you may not be certain that a bug has not slipped through, so the only alternative is to try to recover. But often continuing with a strange value may cause subsequent problems or even data-corruption.  In the above case it may be that aveTime should always be greater than zero.

Having studied a great deal of these situations I found that it can be very difficult to decide on a value that makes sense for the continued safe and sensible operation of the software. For example, it may make most sense to set aveTime to some very large value (since mathematically speaking, dividing by zero produces an infinite result); but if both totalTime and numRequests are both zero then aveTime probably should also be zero.

  if (numRequests > 0)
    aveTime = totalTime / numRequests;
  else
  {
    assert(0);          // Don't hide this bug in debug/test

    // Try to recover sensibly in case a bug was never found
    if (totalTime == 0)
      aveTime = 0;
    else
      aveTime = INT_MAX;
  }

The other alternative in C++ is to throw a software exception and let the software recover at a higher level.

User Input

Sometimes a calculation can be the result of user input. It is important to validate user input when it is entered. Not validating can cause inconsistencies in that data which later leads to problems like division by zero.

  for (;;)
  {
    int numberOfItems = GetNumberOfItemsFromUser();
    if (numberOfItems > 0)
      break;
    DisplayErrorMessage("You must have at least one item");
  }
  .
  .
  aveCost = totalCost / numberOfItems;  // No divide by zero here

Of course, a lot of things could happen between the input validation and the use of the value. If it is at all possible that the value could be corrupted or input bypassed then the previous section (Bugs) again applies.

Bad Data

A lot of programs carefully validate user input but assume that data from other sources is valid. Unless you are sure that the data is valid, for example by using a CRC then you should validate data when you receive it.

Data can be corrupted due to many things like hardware or software failure or human error.  In software with security implications, deliberate tampering may be an issue and a CRC is not sufficient - use a cryptographic checksum like SHA1.

Expected

In very rare cases division by zero may not actually be an error condition, in which case you may need to handle it especially. This is the reason that IEEE floating point numbers allow for infinite numbers. Generally, this sort of code would be for a specialized scientific or mathematical purpose and would use floating point numbers anyway.

Otherwise, it could be achieved like this:

  if (elapsedTime == 0)
  {
    isInfiniteSpeed = true;
    speed = -1;
  }
  else
  {
    speed = distance / elapsedTime;
    isInfinite = false;
  }

Of course, later code that used speed would need to also check isInfiniteSpeed before using speed.

Conclusion

Generally, it is a mistake to detect and try to recover from a divide by zero error.  Except for the very unusual situation where it is not an error-condition (see Expected section above) then it indicates there was a problem earlier such as a bug, corrupt data, or user input that was not validated.

With verifiable and thoroughly tested software the problem should not happen. Trying to recover is actually detrimental as it may hide a bug which would normally be found in testing.

For poorly written software (not well-written and not easily verifiable) it might be worthwhile trying to recover from the problem. It would also be necessary for fail safe systems where software termination could have dire consequences.

The problem with trying to recover is that there may not be a reasonable value to use in the circumstance. It is conventional to use a value of zero but often this is the worst possible value to use.

The most important point is that if you do recover from a divide by zero error that you do not hide the fact that there is a defect in the code. During debugging and testing the software should generate an exception or make it immediately obvious that there is a problem with the code. For released software the problem should be detected and reported - for example, to a monitored error log file.