Sunday 8 November 2020

Rangeset - Go Generic Set

Last month I introduced you to generics in Go. This time I discuss a useful and substantial generic container that I created, and my experiences using the go2go tool to write and test it.

What about generic chans?

Last month I also mentioned generic chans and some potential cool uses.

Sorry, but first I want to discuss the rangeset package which has been working for a while now.

I'll talk about a generic channels package I'm working on very soon - promise!

This container I called rangeset. It contains a Set type similar to the one in the go2go example set package but with some interesting properties and performance advantages.

Its main advantage is that it can save huge amounts of memory for sets that have a large number of contiguous elements, and can be faster too. It will regurgitate elements in order, unlike sets based on hash tables. You can also invert a set and create the Universal set unlike any other Go set implementations I have seen.

Introduction

I first created a primitive type of "range" set in C many decades ago and found it useful in many scenarios. With the addition of templates (and STL) to C++ I created a complete STL compatible class. Many people found it very useful as it had much better performance (esp. memory usage) characteristics than the C++ std::set for may common types of sets.

Sets are typically implemented as a balanced binary tree (eg: C++ std::set) or a hash table (eg: C++ std:unordered_set). An important aspect of sets is how fast you can check if an element is in a set and balanced trees have a lookup time of O(log(n)) whereas hash tables are O(1). In practice, I have found balanced trees are adequate (since O(log) is not too bad) unless you have very large sets. They also have the advantage that you can "iterate" the elements in order which you can't do with a hash table (without additional complications like a separate index).

“lookup time ...
is O(log(r))”

A range set takes a different approach that makes it useful for sets with large numbers of elements that are all in one or a few contiguous ranges. Internally it uses a list of ranges, where each range is simply the first and (one past) the end of a contiguous span of elements. In my original C++ version of rangeset I used a linked-list of ranges. In the Go version I completely rewrote it to use a slice.

The lookup time for a range set is O(log(r)), where r is the number of contiguous ranges. In the worst case (where no elements are next to any other elements) the lookup time is no better than a balanced tree, or O(log(n)), where n is the number of elements. But in the best case (where all elements are in a single range) it is O(1) the same as a hash table.

Space performance is more interesting. In the worst case it is O(n) which is still better than a binary tree or a hash table. In the best case it is O(1). In fact, storing the universal set (the set of all elements) requires a tiny amount of memory -- just one range.

It's important to remember that you need to know something of the likely distribution of elements -- it's only useful for large sets if the elements tend to clump together. For example, the set of all prime numbers (which only has one contiguous range, ie: 2-3) is counter-indicated, but a good fit would be something like a set containing all numbers with a leading decimal digit "2" (ie: 2,20-29,200-299,2000-2999,...). It turns out that there are many real-world scenarios where a range set is useful.

One final thing to note is that the element type of the Go rangeset.Set type is constrained to be an integer type (byte, uint, int64, etc). Like a balanced tree the elements must be orderable - ie support comparison using less than (<) etc. (A hash table based set only requires the elements to be comparable - ie support the == and != operations.) In theory a range set could be created that allowed floating point or string elements but I found that these are not useful in practice and I found it handy to be able to increment values which is only supported by integer types.

Sets in Go (without Generics)

Maps as Sets

Traditionally sets in Go are implemented using a map with the map key type being the set's element type and ignoring the map value. So for a set of ints you would use a map with a key of int. Perhaps the obvious value type for the map is a bool (ie map[int]bool for a set of ints), but whether an element is in the set or not is better represented by the presence of the key in the map so an "empty" value type is used. The empty type conventionally used in Go is the empty struct, which uses no memory. So a set of ints would be map[int]struct{}.

When using a map as a set the set element type is restricted to types that are comparable - ie types that can have == and != applied. Hence you cannot have a set of slices, maps or funcs, since these types cannot be compared (except to nil). Additionally, a type is not comparable if it is a struct, array or interface that contains non-comparable component(s). (See my previous post for a full explanation of comparable types.)

Go's built-in map has no special support for sets so if you want to perform set operations you have to write functions for the specific type of set you are implementing. For example, you would need a function like this to find the intersection of sets of strings:

  func Intersect(set ...map[string]struct{}) map[string]struct{} {
 
    retval := make(map[string]struct{})
    if len(set) == 0
  { return retval }

      outer:

      for elt := range set[0] {

          for _, other := range set[1:] {

              if _, ok := other[elt]; !ok {

                  continue outer

}

}

          retval[elt] = struct{}{} // found in all sets

}

      return retval

}

Another deficiency of using a map is that the syntax is a little clumsier than it would be with a set type built into the language (or a generic set). Creating sets requires dealing with the unused empty struct as the map's value type.

      setA := StringSet{"a": {}, "b": {}, "c": {}}

      setB := StringSet{"c": {}, "d": {}, "e": {}}

      setC := StringSet{"e": {}, "a": {}, "c": {}}

      log.Println(Intersect(setA, setB, setC))  // {"c"}

Open Source Set Packages

Recognizing the problems with using the built-in map type to implement sets, there have been several open-source packages created (such as the excellent golang-set). These provide methods which allow all manner of set operations such as union, intersection, comparison (for equality), and some I don't even understand like Cartesian product.

But since Go does not (yet) have parametric polymorphism, they must use interfaces for polymorphism. This means internally they use a map something like this:

      type set map[interface{}]struct{}

The problem with storing the elements in an interface is that you lose compile-time type saftey. For example, it would be easy to accidentally add an integer to a set of strings and you could only discover the problem at run-time using type assertions. You could even add a value of a non-comparable type to a set which would cause a run-time panic.

Furthermore, it also affects performance since set elements must be "boxed" and "unboxed" as they are stored in an interface{}.

Sets using Generics

Go Authors sets package

Generics allow the above-mentioned problems with using sets in Go to be addressed (without the need for another built-in type). Undoubtedly, when Go has generics, sets will be one of the first things added to the Go standard library. In fact the "dev.gog2go" branch of the Go compiler repository includes a "proposed" implementation of exactly that in a sets package.

Internally this sets package uses a map (see sets.go2 line 10) to implement a generic set. It has the safety and performance of a map with convenience and reliability of tried and tested methods. There are basic methods (create a set, copy a set, compare sets, add/delete elements, size of the set, etc) and set operations (union, intersection, subset, etc)

There are also some useful methods to get or filter the elements of the set. But note that, since the elements are stored in a map, the order the elements are processed/filtered is indeterminate.

C++ range_set

I first became familiar with parametric polymorphism when "templates" were added to C++. I was particularly enamored with STL library originally designed by Alex Stepanov, and later incorporated into the C++ standard library. I won't go into all the wonderful details but one of the most useful things it provided was containers (vectors, lists, deques, maps, sets, etc).

My first experiment with C++ templates was to create a "range" set which was drop-in compatible with the STL set class, apart from the fact that the element type had to be an integer type. (Technically it could be any type that implemented operator<() and operator++() which would include integers, and possibly, other user-defined types.) Internally it was implemented using a linked-list of range structs rather than the balanced binary tree used by the STL set class.

If you are interested, the full source code (C++ header file) can be found here. I also wrote an article for the C/C++ User Journal in June 1999 that describes it (see A Container for a Set of Ranges).

Go rangeset package

A few months ago I decided to try implementing a generic range set class in Go using the experimental go2go tool. Although my C++ implementation used a linked list of ranges, I found that in Go a slice of ranges worked as well if not better. Each "range" simply stores the bounds of the range using asymmetric bounds (inclusive lower bound, exclusive upper bound). All operations maintain the property that the ranges are kept in numeric order and non-overlapping.

Like my C++ range set, the Go implementation tries to be compatible with Go authors sets package mentioned above. I have used the same method names, including Contains, AddSet, SubSet, Iterate, etc, so a rangeset could act as a drop-in replacement. Of course, there are additional methods that take advantage of the unique properties of a rangeset, such as the ability to return all the ranges. And as I've already mentioned there is a limitation that the element type must be an integer type.

Here is a simple example. (For a complete list of methods and functions see the README file.)

      var squares, evens rangeset.Set[int]

      for i := 1; i <= 12; i++ {

          squares.Add(i * i)

}

      for i := 2; i < 1000; i += 2 {

          evens.Add(i)

}

      // Find all the even squares

      fmt.Println(Intersect(squares, evens)) // {4,16,36,64,100,144}

Rangeset Implementation

Using the go2go Tool

I developed the rangeset package using the go2go tool. It is an amazing tool with very complete support for what I think Go generics will look like. In essence it translates .go2 source files (containing generics types/functions) into .go files which can then be built as normal.

It does have a few minor deficiencies such as no ability to do conditional compilation (build tags). It has very good support for tests, except that it does not support external tests, though that may be fixed by the time you read this. (Note that rangeset has a lot of tests as I discuss below).

There are many places you can find detailed steps on how to install and use the go2go tool but in brief you need to have a recent Go compiler and git installed, then clone the latest Go repo and build the dev.go2go branch. Then you can just run

$ go tool go2go build

to translate .go2 file to .go and invoke the Go compiler to complete the build of the package.

Spans

Internally, a set in the rangeset package (rangeset.Set) uses a slice where each element is a struct containing the start and end elements of a range. Essentially it is just:

 
        type Set[T constraints.Integer]  []struct{bottom, top T}
    

Note that constraints.Integer is a constraint defined in the constraints package that just specifies all of Go's integer types.

In years of using my C++ rangeset class I found it occasionally useful to have access to the actual ranges, so I created a separate type for this. I called this type Span to avoid confusion with other uses of the word range and to avoid conflict with Go's range keyword.

        type Span[T constraints.Integer]  struct{bottom, top T}
    

There is a Spans() method that returns all the ranges in the set as a slice of Span[T]. You can also quickly add a range of values to a rangeset using the AddRange() method. Of course, internally Spans are merged and split as necessary to maintain a normalized layout, which is the most efficient and allows operations such as comparing sets for equality. Interestingly this means that adding an element to a rangeset can reduce its memory usage if the new element is the missing value between two existing ranges, causing them to be joined.

See for the actual code defining these types.

It's also important to note that ranges are specified (and stored internally in Spans) using asymmetric bounds. A counter-note is that when a set is serialized to a string the ranges are encoded using inclusive bounds; this avoids confusion to users who may be exposed to these strings.

Number of Elements

To be compatible with the Go Authors set, rangeset.Set implements a Len() method. The problem with this method is that it returns an int but, depending on the element type, the number of elements can easily exceed the maximum int value. For example, in Go implementations where an int is 32 bits then uint, uint32, int64, uint64, and uintpr can have sets with more elements than the maximum int. (In the Go Authors set this is not typically a problem as you run out of memory before you can have a set that big.) Even a the set of all int or int32 (ie, rangeset.Universal[int]()) has one more element than can be represented in an int! In these cases the Len() method will silently overflow, since for compatibility it can't return an error.

If this can possibly occur then you need to use the Length() method instead. This returns two values -- the number of elements (as uint64) and the number of ranges in the set. Note that even using a uint64 for the length the number of elements in a universal set of int64, uint64 (and possibly int, uint and uintptr) is one more than the maximum uint64 - in this case you also need to check the number of ranges. That is, if both values returned from rangeset.Length() are zero then you have an empty set, but if length == 0 and number of ranges == 1 then you have a universal set.

Set Operations

There are simple methods to add or delete elements (or ranges of elements) and check for the presence of an element in a set. There are also methods to perform set operations such as Intersect() which find the set intersection of a set with another set. Similarly, the AddSet() method performs a set union, and SubSet() performs a set difference operation.

There are also functions that take a variable number of sets as parameters and return things like the union, intersection or whether they are all equal. I may later add other fancy set operations like power-sets and Cartesian products if there is demand for it.

The Complement operation and Universal Set

The Complement() method returns the inverse of a set - that is, any element in the set is removed and any element not in the set is added. This operation is fast, only taking time proportional to the number of ranges - O(r). The complement of the empty set is the universal set (and vice versa).

The function rangeset.Universal[T]() returns a Universal set - that is, as set that contains all elements possible for its element type. It's faster than taking the complement of an empty set.

Serialization and Deserialization

The String() method encodes a range set as a string; the NewFromString() function does the opposite converting an encoded string to a set. These are useful for serializing sets for transmission or storage, but can also be handy for seeing or even entering sets.

The format of the string is that the set is always enclosed in braces ({}) and elements or ranges are separated by a comma (,), while a range is two numbers separated by a colon (:). For example: "{1,3,5:9}" encodes a set of 7 elements. Note that to make it easier for users to deal with these string they use "inclusive" bounds (unlike how they are represented normally using asymmetric bounds as discussed above), so the string "{1:10}" represents ten elements (1 to 10 inclusive).

The string produced by the String() method is normalized - ie, elements and ranges are in order and not overlapping, but this is not a requirement of the string passed to NewFromString(). Plus, the special element "E" represent the end, or very first/last element of a set, so this code produces a set with the elements 1 and 99 to 255 inclusive:

 
          rangeset.NewFromString[uint8]("{99:E,1}")
    

Furthermore the shortcut "{U}" means the universal set - the same as "{E:E}".

Obtaining and Processing All Elements

Like the Go Authors set, a rangeset has a method to return all elements as a slice (called Values()), but be careful as this could produce a huge slice if there are any large ranges. There is also a Spans() method that returns all the ranges as a slice of Spans, which may be safer.

Also like the Go Authors set, there is a method (called Iterate()) that can be used to run a function on every method in the set. Also `Filter()` can be used to selectively delete elements from the set.

Unlike the Go Authors set, there are also methods to handle channels of elements. Iterator() (not to be confused with Iterate() mentioned above) returns a chan that receives all the elements in order. In contrast, ReadAll() adds all items it receives on a channel to the set. Both of these methods take a context.Context as their first parameter so they can be cancelled at any time, or after a specified timeout.

Testing

The rangeset package has about 20 table-driven and 20 non-table driven test functions. In all there are about 700 tests with 100% code coverage. I believe that every boundary condition and possibility is tested, but please tell me if you spot a gap anywhere.

Note that currently all the tests are internal tests since I could not get external test (using package rangeset_test) to work. However, all the tests are "external" tests - ie only test the public API of the package. I will convert the tests to external tests when possible.

To run all the tests:

$ go tool go2go test

Conclusion

A rangeset is not appropriate for every type of set. It has some performance advantages for sets with large ranges of contiguous elements. In fact very large sets can be implemented in a small amount of memory which would be impossible with other types of sets, such as sets with trillions of elements.

Many operations, such as adding, deleting and finding elements, use a binary search on the list of ranges. In the worst case (where few or no elements are contiguous) this gives time complexity of O(log) but in the best case (all elements are in a single contiguous range) the time complexity of O(1), like other set implementations. In practice it can have better performance than even a hash table.

Unlike hash table based sets, when the elements are iterated (or otherwise processed) they are seen in order. An unusual and sometimes useful feature is the ability to invert a set and even create a universal set (inverse of the empty set).

Another cool feature is the ability to send/receive sets to/from a chan. The Iterator() method returns a (generic) chan that returns all the elements of the set (in order); the ReadAll() method adds new elements to a set by reading from a chan. These channels can be used with a generic channels package I am currently working on which I will talk about next time!

Sunday 18 October 2020

Go Generics

There's been a lot written about generics in Go as they get closer to graduating into the language. I just want to talk about a few important things that I've not seen discussed elsewhere. In particular:

what parametric polymorphism actually is
why Go does not especially need it, but can benefit from it
C++ templates, and why Go doesn't need all that
latest proposed syntax (I'll update this post if/as it changes)
the importance of constraints
performance - of builds and generated code
how to try generics with the go2go tool or go2go playground
how generic channels may be a game changer

But first an introduction if you know nothing about it...

Parametric Polymorphism

As Rob Pike keeps reminding us, what is commonly called generics is technically parametric polymorphism. But even he admitted (when he talked at the Sydney Go Meetup last December) after saying "parametric polymorphism" many times, and stumbling on the words at least once, that generics is less of a mouthful. Personally I don't like the word either but it seems that generics is now the de facto, even official, name so I am going to stick with it. And, though not exactly the same, it is similar to what are often referred to as generics in other languages like C#, Java, etc, which I think is helpful for new Gophers.

So what is it? To simplify, there are two basic forms allowing you to specify type parameters either on a function or on a type. These are "compile time" not "run time" parameters as they evaluated and used by the compiler as it works.

Note that every programming language ever invented has generics built into the language. For example, Fortran (the first high-level language) had these built-in:

arrays - a generic type where the type of the element(s) is the type parameter
min - a generic function where the type parameter is implied by the type of the parameter(s)

Similarly Go already has built-in generic types (arrays, maps, slices) and functions (eg copy() which can work with any type of slice). Generally, when we talk about generics in a language we are usually talking about the "user-defined" types and functions. Go does not (yet) have this which is why you can't write a function that does what copy() does.

Function example

Here is a "user-defined" generic function in (the proposed new) Go, that prints the type of it's type parameter:

 
func f[T any]() {
    var v T // declare dummy variable of the type
    fmt.Printf("%T\n", v)

}

where the function f has one type parameter T [in square brackets] and no normal parameters (in round brackets). T is followed by the constraint any which means that the type parameter can be of any type at all. We can call the function like this:

func main() {    f[int]() // type parameter T is int
}

Type example

Here is a generic type:

type Pair[T1, T2 any] struct { first T1; second T2 }

where the Pair type has two type parameters, T1 and T2, used for struct fields which may be of any (unrelated) types.

func main() {    v := Pair[float64, string]{ 0, "abc"}
    point := Pair[int, int]{ 1, 2 }
    fmt.Printf("%v %v\n", v, point)
}

Does Go Need Generics?

This has been a big ongoing debate and I think that the Go community is finally leaning towards yes. People, who have used generics in other languages and have read about Go or only superficially tried it, are usually incredulous that Go does not have generics (myself included a few years ago). However, it has managed to get by without them for a long time so let's look at how and why.

One rarely discussed reluctance for adding generics to Go is the concern that build times will be reduced. Some may scoff but part of the joy of Go is the quick builds. In 2009 Russ Cox gave a convincing argument but the go2go tool seems to be pretty quick (see benchmarks later). My hope and firm belief is that when generics are added to the compiler proper, it will have negligible effect on compile times, at least for code that does not use generics.

Another argument for not adding generics to Go is that the inbuilt facilities are sufficient. I agree with this. Though enamoured with C++ templates, especially the STL, in all honesty 97% of what I used the STL for in C++ can be achieved just as nicely in Go using slices and maps. I'd even say that I make much more use of the in-built generic maps in Go than I ever did with the analogous C++ containers (std::map/set/unordered_map/multimap, et al).

Another point is that the Go community is far more tolerant of duplicated code. It's OK to have two functions with similar code - eg, one for ints and another for strings. DRY is good but some repetition can be tolerated.

A final argument for not having generics is that you can use go:generate to run tools to convert some form of generic code into Go code. Sure you can do this but it is a bit tedious and "non-standard". I don't really buy this one.

“use of interface{} ...
affects performance
and type safety”

On balance this lead me (and I think the Go community) to conclude that we can probably get by without generics. On the other hand, there has always been one thing that grates with me - the use of interface{} with containers and algorithms as the "poor mans" alternative to generics (for example, the linked list package from the Go standard library, which I have never heard of anyone liking or using). This affects performance due to boxing/unboxing (admittedly not always important) and type safety (always important unless you like Python or Javascript).

Generics will be useful for things like adding generic containers not already built into the language. But there is one killer reason Go should have generics - type-parameters on chans allow creation of some very useful facilities. More on this soon!

C++ Templates vs Go Generics

Before we go further, I want to talk about C++ templates. If that's not what you want, then you can skip this section. It may be useful background to see what C++ templates have that Go generics don't have.

Contracts, constraints, concepts

Is it all a con?

Constraints allow the creator of a generic type/function to say what capabilities a type must have to be used as the type parameter. For example, a Min() function makes no sense for a type that has no inherent ordering.

I was a bit dubious about the point of constraints at first. Why not just let the compiler work out if a type is unsuitable when it goes to use it. For example, Min() would need to use the less than operator, so the compiler would be able to detect the problem when you try to take the minimum of two structs because structs can't be compared for order (in Go).

However, there are some good reasons to have constraints in Go (or any language) as I discuss below.

I should mention first the one thing that Go generics will have and C++ doesn't have (yet) is constraints, as they are called in the latest design draft - see Type Parameters - Draft Design. (Compilers implementing the new C++ 20 standard will have a similar thing called concepts.)

Constraints (and the functionally similar contracts in earlier draft designs) have been the major sticking point for implementing generics. I discuss the syntax and use of constraints in detail later.

Anyway, here's what C++ templates have that Go generics do not have:

Specialization

Sometimes you want a type parameter of a specific type to be handled differently. For example, in C++ the STL provides the vector container (called vector<type T>). There is a specialization for bool (called vector<bool>) that is primarily intended to save space by cramming the elements 8 to a byte. This can save lots of memory for large vectors over a non-specialised version that would use a byte per element. However, vector<bool> is generally deprecated and rarely used.

C++ also supports "partial specialization" of a template that has more than one type parameter. Without getting into the details - some type parameter(s) are specialized with special code that keeps some type parameter(s) "generalized".

I don't believe we need these features in Go. Their usefulness is limited and it just adds complexity (and increases build times).

Member function templates

Originally C++ allowed type parameters on functions and types, but you could not add type parameters to methods of a class (called member function templates in C++ parlance). However, it was found necessary and added soon after. In Go, these would be called generic methods and here is an example (not valid code):

type BoolStack []bool

func (s *BoolStack) PushEqual[T comparable](v1, v2 T) { // not valid

*s = append(*s, v1 == v2)

}

I have occasionally found generic methods useful in C++ but only on certain large classes (which probably should have been avoided anyway). I'm not sure adding generic methods would be useful in Go. Moreover, there is a simple workaround - simply convert the method to a function like this.

func PushEqual[T comparable](s *BoolStack, v1, v2 T) {

*s = append(*s, v1 == v2)

}

There is still the limitation with the workaround that a func cannot be used to allow a type to implement an interface the way that a method can. We will have to see if Go needs generic methods.

Note that this is often confused with using a type parameter on a receiver of a generic type. That is allowed and is particularly useful. For example:

type stack[T any] []T

func (s *stack[T]) Top() T { // OK: method receiver *can* be generic

return (*s)[len(*s)-1]

}

Metaprogramming

C++ is famous for template metaprogramming -- there have been whole books written about it. To explain what it is, remember that when the C++ compiler processes templates it essentially generates internal code for the templates for each used "instantiation" of the type parameters. With some bewildering (even by C++ standards) tricks the compiler can be induced to create all manner of code at compile-time. Essentially you can write source code that generates source code (at compile-time) to become part of the resulting program.

What's the point? Well it can be used different things such as what go:generate is used for. For example, many algorithms require pre-computed values or tables. If you can't generate them during the build you could compute these values at program startup, but for efficiency (ie to avoid delays when the software starts running) it can be better to precompute the values and incorporate them into the code. Before template meta-programming it was common in C++ to write code to generate header files that are then incorporated into the build. Template metaprogramming allows these sorts of things to be actually done by the compiler at compile-time.

There are some advantages to this, of course, such as the source code being self-contained. However, it does complicate the compiler (and the code) with one consequence that builds become slower. For Go I think we can safely say that nothing like this will ever be added to the language, where these things are typically done at run-time (where the increased startup time is unimportant) or if necessary can be performed as part of the build, eg using go:generate.

Non-type Parameters

As well as type parameters, C++ also supports non-type template parameters. This allows a constant value to be used in the instantiation of a template. Why are they necessary? To see the point consider the declaration of an array in Go. An array (as I said above) is a generic type built into the language. In a declaration like this:

var a [4]int

we are declaring the array with two (compile-time) parameters. There is the type parameter (int) which says we want an array of integers. But there is also the non-type parameter (4) which says that the array has 4 elements.

In certain situations non-type compile-time parameters can be very useful in generic code (for example if arrays weren't built into Go). I would not be surprised if they were added to Go in the future - at least integer constant parameters.

One thing I would like to be able to create is a generic computational geometry package in Go. For example, a Point type would take a generic integer parameter for the number of dimensions, so a lot of code could be shared between 2-D and 3-D (etc) points.

A workaround in Go, that could allow an integer generic parameter may be to use arrays, though you couldn't have an unlimited number of values as you would have to specify all the possible arrays sizes in the parameter's constraint.

type Sizer interface {

type [0]int, [1]int, [2]int, [3]int // only used for size

}

[Ed 2021: the above is the old constraint syntax - here is what you need for Go 1.18+]

type Sizer interface {

[0]int | [1]int | [2]int | [3]int // only used for size

}

You can try this now on the go2go playground. [Ed: try Go Playground]

Defaulted and Variadic Template Parameters

C++ also supports default values for template parameters, if they are left out by the user of the template. It even supports variadic template parameters. Go does not support these sorts of things for normal (non-generic function) parameters, so it is unlikely to ever have these features.

Go Generics Syntax

How exactly Go generics will work has been hugely debated especially after the 2019 Draft Design. It's been clear for awhile that generics will simply add type-parameters to functions and types (and not have any of the bells and whistles of C++ that I mentioned above). But there were still a few unresolved issues -- how to handle constraints plus the use of brackets and precise syntax. I'll quickly explain the syntax then talk a bit more about the important subject of constraints.

Brackets

There has been too much debate about what sort of brackets are used to enclose type parameters - round (parentheses), angle, square, chevrons or guillemets (these are Unicode characters but that's OK since Go source code is UTF-8). The obvious choice, and one I would have liked, would have been angle brackets (<>) but it turns out this creates crazy ambiguities in the language. Round brackets were tried for a while but there were also ambiguities (with simple but clunky workarounds). Currently, perhaps finally, square brackets are being used and they work well enough.

A closely related change has been the removal of the need for the type keyword in a generic declaration. So a generic function in the original Draft Design such as this:

func Print(type T)(s []T) {  // 2019 syntax

is now written like this:

func Print[T any](s []T) { // current syntax

Note that the type keyword is gone and the round brackets have been replaced with square brackets. You might also have spotted the addition of the word any - this is a type constraint. Previously if there were no constraints on the types allowed for a type parameter then you could leave it out, but now you must provide a constraint, where any obviously means there's no constraint at all.

Speaking of constraints...

Constraints

Constraints allow you to say which types for a type parameter are suitable. For example, in the Min[T]() function already discussed it should only be used with type parameter that supports the less than (<) operation. How to handle constraints has probably been the most contentious part of adding generics to Go in the last year and has resulted in draft designs that (before this year) involved the contract keyword. (I contributed to the debate but I won't go into that history now.)

I'll just tell you how constraints work, including a few gotchas, in a moment. First we need to understand why constraints are even needed (after all C++ got by without them for decades).

Having come from C++, I did not see the need for constraints, at first. I knew it would make it easier for the compiler to give more meaningful error messages (rather than the monstrosities of C++), but there is a bit more to it than that.

Improved Error Messages

First, let's consider what the Min() function would look like without constraints:

func Min[T](v1, v2 T) T { // NOTE: invalid (missing constraint)

if v1 < v2 {

return v1

}

return v2

}

Now if someone called this (hypothetical) generic function with bool arguments like this:

Min(true, false) // ERROR

they'd get an error message about operator < not being defined for bool, and you'd have to look at the source code for Min() to understand exactly what is happening. This may not look too hard but with more complicated use of generics it becomes extremely difficult if not impossible for the compiler to produce easily decipherable error messages.

Now let's see how constraints help: because we are using less than (operator <) on the values of type T, we need to use the constraints.Ordered constraint:

func Min[T constraints.Ordered](v1, v2 T) T { // OK

if v1 < v2 {

return v1

}

return v2

}

Now if you try to call Min() with bool parameters you get an error that tells you that you can't - ie an error message like: bool does not satisfy constraints.Ordered.

On the other hand if you don't constrain the type parameter correctly, eg:

func Min[T any](v1, v2 T) T {

if v1 < v2 { // ERROR: cannot compare v1 < v2 (op < not defined for T)

return v1

}

return v2

}

you get an error since not all types support the less than operation.

Of course, you should ensure that you don't unnecessarily constrain a type parameter. So if you used constraints.Integer for the above Min() function's type parameters you'd be preventing Min() being called for float64 or string types.

The Other Benefit of Constraints

The other benefit of constraints, that I did not pick up on immediately, is that they act as a contract between the creator and the user of the generic function/type. Without a constraint the creator of the generic function/type may update their code and inadvertently change the allowed types. For example, if the creator of Min() made this code "improvement" (modifying our un-constrained Min() example from above):

func Min[T](v1, v2 T) T { // NOTE: currently invalid (missing constraint)

if v1 - v2 < 0 {

return v1

}

return v2

}

they may not realise they have made a change that will possibly break code of some clients. Only when someone uses Min() with string parameters would the problem be detected.

Min("abc", "xyz") // ERROR: operator - not defined for string

Min(1, 2) // still OK

But using constraints, the creator will know right away if they have violated the "contract", since not all types of constraints.Orderered allow subtraction.

func Min[T constraints.Ordered](v1, v2 T) T {

if v1 - v2 < 0 { // ERROR: invalid operation: (op - not defined for T)

return v1

}

return v2

}

Without constraints the problem may not be found until some unsuspecting user updates the package and finds that their code won't build.

Creating Constraints

So how do you use constraints when you create a generic function or type? First, you need to decide on what capabilities are needed for a type parameter. As we saw above the Min() function requires its type parameters to support the less than operation which is specified with a constraint of constraints.Ordered.

There are essentially three ways that you can constrain a type parameter, by specifying:

specific method signature(s) that the type must implement
explicit list of types (such as used in contraints.Ordered)
built-in constraints (such as comparable) - determined by the compiler

1. Method Signature Constraints

Lets look at method signatures first as I think they are the reason for the latest change to use interface instead of contract. Currently in Go, a list of method signatures can be bundled together for the purpose of defining an interface type. A similar requirement for constraints is to restrict a type to only types that implement certain method signature(s). Due to this similarity it was decided constraints can just recycle the syntax used for interface.

... it's important
to distinguish
the two different
uses of interface”

For example, say we want a generic function called Stringify() that requires its parameter to implement the String() method. This constraint is specified using the following interface (which is already available from the standard library fmt package):

type Stringer interface {

String() string

}

func Stringify[T Stringer](s T) string {

return s.String()

}

Note that it's important to distinguish the two very different uses of interface that this change introduces:

a type for variables of any type that implement certain method(s)
new use of creating a generic constraint for a type parameter

I think of these as run-time and compile-time use of interfaces.

I have already seen questions in the Go issue tracker where this double duty causes confusion, so you need to be clear on how the new "compile-time" use of interfaces for generics differs from the original use.

There is also the further confusion that interfaces (since they are a type) can also be generic and these generic interfaces can even be used a constraints on other generic types. I'm not going to give an example as, to be honest, I am not fully cognizant myself. BTW I believe that my earlier proposal for constraints is simpler and less confusing (see Example Types as Contracts).

2. List of Types

To allow the interface syntax to serve double duty as type parameter constraints it has been extended to allow a list of types to be included in an interface. For example, if a generic function only works for unsigned integers then you could create a constraint like this:

type UnsignedInteger interface {
//type uint, uint8, uint16, uint32, uint64

// Ed 2021: the above syntax is out of date, use this now:

uint | uint8 | uint16 | uint32 | uint64

}

func MinValue[T UnsignedInteger]() T { // min value for unsigned ints

return 0

}

It's important to note that this interface syntax can only be used for a constraint. An interface that includes a list of types cannot be used for the original "run-time" duty of interfaces like declaring an interface variable. (Ian Lance Taylor has a further proposal to allow this.)

You can, of course, combine a list of types plus method signatures in the same interface. Here is a constraint that only allows unsigned integers that have a String() method:

type Unsigned interface {

uint | uint8 | uint16 | uint32 | uint64

String() string

}

Looking back at the definition of our Min() function you might now guess where the constraints.Ordered constraint comes from. It's defined in the constraints package, which undoubtedly will be added to the Go standard library once the language supports generics. It is simply defined as a list of types that support comparison/ordering operations (==, <, >=, etc):

type Ordered interface {

//type int, int8, int16, int32, int64,

// uint, uint8, uint16, uint32, uint64, uintptr,

// float32, float64,

// string

// Ed 2021: the above syntax is out of date, use this now:

int | int8 | int16 | int32 | int64 |

float32 | float64 | string

}

Most of the time you won't need to define your own constraints but use those included in constraints package (or built in to the language). In fact it's more robust to do so in case the language changes (for example more types might be added to constraints.Ordered in the future).

3. Built-in Constraints

Finally, there are some constraints that can't be handled by a simple list of types. The only one (apart from any) that I am aware of is the comparable constraint, which just determines if two objects of a certain type can be compared for equality (ie support == and != operations). This is not a simple rule - for example some types of struct can be compared but others can't.

I think I talked about the rules for comparability of types before, but I'll explain it again in brief. Some basic types are not comparable, for reasons we won't go into (except to nil) - these are slices, maps and funcs. Further, if you create an array of a non-comparable type, or a struct that has one of these types as a field, or an interface that stores one of these types then that type is also non-comparable (though in the case of interfaces this cannot be determined at compile time and will cause a run-time panic).

Anyway, the simple rule is that if values of your generic type-parameter need to be compared for equality then you need to use the comparable constraint.

Trying Go Generics

You can try Go generics using the latest syntax very easily using either the go2go playground. You can also build the dev branch of the Go compiler source then use the included go2go tool - see go2go README for instructions.

The go2go tool translates your code quickly and the code runs fast. There are a few little deficiencies that I have found such as lack of support for external tests, build tags and unsafe.Sizeof but overall it works very well.

Conclusion

I can see some great packages being built with Go generics in the future. Especially important, I think, will be generic channels as this is something not offered in any other language and I think will greatly enhance the possibilities for concurrency.

I've also written a completely new generic set container called rangeset that implements sets (constrained to integers) using ranges. This has some very interesting performance capabilities for some types of sets and also allows inversion and universal sets (something not supported by the example go2go set package). I'll talk about this in a post soon.

I was also going to say a few more things that I haven't had time for. I'll try to flesh out my notes and publish another post in the next few weeks. This will include:

some benchmarks of generic code
+ compilation times of generic code using the go2go tool
my experiments with generic channels