Saturday 13 July 2013

Nesting Functions Using Lambdas

A major problem with a lot of C code has always been that functions tend to become too long. It is sometimes difficult to move code into sub-routines if there are a lot of local variables you need to access. This is one advantage Pascal has over C, because it supports nested functions. But using lambda functions (currently supported in C++ and C#) you can achieve the same effect.

My rule of thumb is that functions should be 10 or less lines of code, but occasionally it makes sense for them to extend to 20 or even 30 lines.
Due to the ability to capture local variables lambda functions can be used to create nested functions. Lambda functions are provided in C++ and C# but not in C. (They are also rumored to appear in Java in the future.)

Lambda Functions

Last month I talked about the new lambda functions in C++11. It is great to be able to create simple unnamed functions especially when calling STL algorithms. If you create a lambda function like this:

   [] (int x, int y) { return x < y; }

the compiler will automatically create a little function and insert a call to the function at the point of its use. Note that the function does not need a name as it is defined and used right where it is required. The compiler infers that it has a boolean return value from the type of the return statement. (It also takes two integer parameters.)

But lambdas have more tricks than that. They can also capture variables from the containing scope where the lambda function is declared and used. For example:

   int value = something();
   ...
   [=] (int x) { return x < value; }

In this case the compiler has to create a temporary object that saves value in a private member and implements operator() in order for the function to be called. It is this ability to capture variables that allows lambda functions to be used as nested functions.

Nested Functions

If you haven't ever used a language that supports nested functions (like Pascal) then you probably don't realize what you are missing. If you find a function is becoming too big, it is a simple matter to split some of the code off into a nested function. Since a nested function has access to all local variables of its parent (and any ancestor functions) this is trivial to do. (If you have read much of this blog then you may know that I don't like Pascal much but this is one area where Pascal is better than C.)

In C, or any language that does not have nested functions, creating a sub-routine is a lot more difficult if it needs access to many local variables of the "parent" function. This is probably the greatest contributor to the fact that C functions are generally too long. When a function becomes too big it should be split up. (However, see Layer Anti-Pattern which explains that the obvious way to do this, using layers, is not always the best way.)

There are several ways you can give access to local variables in a sub-routine, such as:
  • pass a long list of local variables as parameters to the sub-routine
  • put all required local variables in a struct and pass a pointer to it
  • move the variables outside the function giving them file scope
  • make the variables into private member variables (in C++) to give all class methods access to them
None of these are good solutions in general, but let's consider each of these in a bit more detail to see why.

1. Pass as Arguments

Passing a long list of arguments is a bad idea. The more parameters a function has the harder it is to read the code. It is also tedious and error-prone to put the parameters at the right position in the function call. It's generally accepted that functions should have at most two (perhaps three) parameters.

Further, if any of the variables need to be modified, then they need to be passed by reference (ie, pointer or C++ reference). These sort of output parameters are also error-prone. Output from a function is conventionally performed through its return value. Having output also occur somewhere in a long list of parameters is confusing.

Finally, it can be inefficient, especially if some parameters are large objects (and not passed by const pointer/reference).

2. Pass in a Struct

The local variables to be shared could all be placed in a local struct. A pointer (or reference in C++) can be passed to the called function. The pointer can be const if none of the values are to be modified.

This is probably the best solution. It is fairly straightforward and efficient. However, it is tedious to set up and it sometimes makes little sense to put unrelated variables together in the same structure.

3. Use File Scope

The most common approach is to move shared variables outside the function to make them global. For example, if you have a function like this:

void g()
{
   int local = 1;

   for (...
      local *= something_complicated() + something_else();
   ...
}

You could move some of the code from g() into a new function, do_something() and give access to local as in the code below. (Note that this code has several problems, and at least one bug.)

int local = 1;

void do_something()
{
   local *= something_complicated() + something_else();
}

void g()
{
   for (...
      do_something();
   ...
}

The first problem with this is that the variable local now has global lifetime and is now only initialized once. Unless it is re-initialized every time g() is called then this change will probably cause a bug.

The other problem is that local and do_something() should be declared static. This is important so that the interface to the module is better defined. If this is not done then local can be modified and do_something() called from anywhere else in the program. This could be due to a deliberate attempt to bypass the official interface to the module, or it could be accidental due to name conflicts (ie, other global variables with the same name).

This is better:

static int local;

static void do_something()
{
   local *= something_complicated() + something_else();
}

void g()
{
   local = 1;

   for (...
      do_something();
   ...
}

That's better, but there is still a potential problem with g() -- it is not re-entrant. A re-entrant function is one that is called again while an earlier call is still active.

The most obvious form of a re-entrant function is simple recursion. However, functions can be re-entered in other ways, such as if the same function is called from different threads.

A common problem in Windows software is where a function is unwittingly re-entrant. This occurs when a function (directly or indirectly) makes a Windows system call that causes a message to be sent back to the caller and the message handler ends up calling the same function again. I have encountered several nasty problems of this nature in the past.

For the above reasons you should avoid file scope variables (and global/static variables in general). There is something comforting about having all you variables declared locally and safely tucked away on the stack.

4. Use Member variables

In C++ (and C#) there is another solution - use private member variables. This is an enormously popular approach and possibly one of the main reasons that people prefer C++ to C. However, I do not think it is always appropriate either.

For example, I have seen large classes, with hundreds of methods (member functions) that tuck away numerous private member variables where each is just used in two or a few methods. In fact, in my early C++ days, I created such classes (see the source code for my hex editor, which you can download from hexedit.com). A member variable that is used by only a small subset of all member functions is indicative of a bad design.

My rule of thumb is that a member variable should be used in at least half of all member functions.
In essence, using member variables like this is no that much better than that most heinous of crimes - using global variables. See my previous post if you don't know why global variables are evil.

Nested Functions Using Lambdas

The above discussion should have convinced you that there is no simple and elegant solution to nesting functions in C and even C++/C#. The advent of lambda functions provides that solution.

As I mentioned above a lambda function can refer to local variables of the scope in which it is declared. The compiler (in both C++ and C#) creates a special object to allow you to use any referenced variables from within the lambda function. In C# a reference to all the variables is stored in the object, but in C++ you can access them by value, reference or even const reference.

But enough of my confusing babbling. An example is the best way to explain what I mean. The example below shows how a long function can be made more readable by using lambda functions to create nested functions.

Example

It was difficult to find an example with which to demonstrate this problem and the solution. (The function must be necessarily long to demonstrate the problem we are trying to solve, but not too long as to be completely unwieldy.) After quite a bit of searching through old C code I found the function below, which, given an RGB color, generates the corresponding HLS values (hue, luminance and saturation). Note that this code was originally written in C a long time ago, but I have converted it to C# (as I don't have access to a C++ compiler that supports lambda functions at the moment).

First, here is the long version in C# (without lambdas). It is a bit too long at over 40 lines. Using nested functions could make the design more modular and the logic a bit clearer.

static void GetHLS(Color col, out int hue, 
                   out int luminance, out int saturation)
{
   const int hlsmax = 100; // Determines range of return value
   const int rgbmax = byte.MaxValue; // max of R, G, or B = 255

   int cmin = Math .Min(col.R, Math.Min(col.G, col.B));
   int cmax = Math .Max(col.R, Math.Max(col.G, col.B));
   int cdiff = cmax - cmin;

   luminance = ((cmax + cmin) * hlsmax + rgbmax) / (2 * rgbmax);

   if (cmax == cmin)
   {
      hue = -1;
      saturation = 0;
   }
   else
   {
      // Work out saturation
      if (luminance <= hlsmax / 2)
         saturation = (cdiff*hlsmax + (cmax + cmin)/2) / 
                                           (cmax + cmin);
      else
         saturation = (cdiff*hlsmax + (2*rgbmax-cmax-cmin)/2) / 
                              (2*rgbmax - cmax - cmin);

      // Work out hue
      int Rdelta = ((cmax - col.R)*(hlsmax/6) + cdiff/2)/cdiff;
      int Gdelta = ((cmax - col.G)*(hlsmax/6) + cdiff/2)/cdiff;
      int Bdelta = ((cmax - col.B)*(hlsmax/6) + cdiff/2)/cdiff;
      if (col.R == cmax)
         hue = Bdelta - Gdelta;
      else if (col.G == cmax)
         hue = (hlsmax / 3) + Rdelta - Bdelta;
      else
         hue = ((2 * hlsmax) / 3) + Gdelta - Rdelta;

      if (hue < 0)
         hue += hlsmax;
      if (hue > hlsmax)
         hue -= hlsmax;
   }
}

In the code below I have used lambdas to create six nested functions.

static void GetHLS(Color col, out int hue,
                   out int luminance, out int saturation)
{
   const int hlsmax = 100; // Determines range of return values
   const int rgbmax = byte.MaxValue;

   int cmin = Math.Min(col.R, Math.Min(col.G, col.B));
   int cmax = Math.Max(col.R, Math.Max(col.G, col.B));
   int cdiff = cmax - cmin;

   // Work out luminance first - required for others calcs
   int lum = ((cmax + cmin) * hlsmax + rgbmax) / (2 * rgbmax);

   // Nested functions
    Func<bool> IsGray = () => cmax == cmin;
    Func<int> Rdelta = () => ((cmax - col.R)*(hlsmax/6) + cdiff/2) / cdiff;
    Func<int> Gdelta = () => ((cmax - col.G)*(hlsmax/6) + cdiff/2) / cdiff;
    Func<int> Bdelta = () => ((cmax - col.B)*(hlsmax/6) + cdiff/2) / cdiff;
    Func<int> GetSaturation = () =>
    {
      if (IsGray())
         return 0;
      else if (lum <= hlsmax / 2)
         return (cdiff*hlsmax + (cmax + cmin) / 2) / 
                                      (cmax + cmin);
      else
         return (cdiff*hlsmax + (2*rgbmax - cmax - cmin)/2) /
                                    (2*rgbmax - cmax - cmin);
    };
    Func<int> GetHue = () =>
    {
      if (IsGray())
         return -1;

      int h;  // Calculated hue value
      if (col.R == cmax)
         h = Bdelta() - Gdelta();
      else if (col.G == cmax)
         h = (hlsmax / 3) + Rdelta() - Bdelta();
      else
         h = ((2 * hlsmax) / 3) + Gdelta() - Rdelta();

      if (h < 0)
         return h + hlsmax;
      else if (h > hlsmax)
         return h - hlsmax;
      else
         return h;
    };
   // End of nested functions

   luminance = lum;
   saturation = GetSaturation();
   hue = GetHue();
}

Note that, in this version, I have changed the logic a little. First, I create the IsGray() lambda function, which in this version is called twice (whereas the equivalent code was only called once in the original code above). This is somewhat compensated for by the fact that only two of the three delta functions (Rdelta, Gdelta, Bdelta) functions are called.

Note that creating the lambda functions will probably reduce the performance of GetHLS() to a small extent, due to the indirection of the function calls.

Conclusion

Lambda functions have another use beyond their obvious one of being able to create simple in-line, unnamed pieces of code. You can use them to create nested functions.

This is useful in C++ (if C++11 features are available to you) and C#. It is a shame you still can't do this easily in C - though you could compile C code as C++ just to use lambdas.

No comments:

Post a Comment