Monthly Archives: January 2017

Accessing the elements of a tuple as variant

A further digression, because it turns out I want to be able to permute a tuple at run time. That means treating the element of a tuple generically. And I can just barely do this, for some tuples, in c++17.

So a slight digression into ADTs. Which in this case means Algebraic Data Types, not Abstract Data Types. But just algebra. No calculus, or differentiation, of data types. Not today.

1 Tuple is Product, Variant is Sum

1.1 Products

In algebra, we usually start out with addition. It’s simpler. But for types, multiplication, or product, is in many ways much more natural. Your basic struct, record, etc is a natural product of types. A type is some kind of collection of things. And I’m being a bit vague here because this is right in the area where set seems like a good idea, and then we get into sets of sets, sets that might contain themselves, and barbers who shave all the people who don’t shave themselves. There is rigour, but I don’t really want to have to go there.

But, if we start with the idea that a type is a collection of things, and that we don’t look to closely at the infinities, we are not going to be terribly wrong. So a type is a way of describing if a thing is in or out of the collection.

Now, I could pretend we don’t know what a struct is. Start with pairs, where there are no names of the components of the struct, and build that up. But we all have a notion of struct. It’s an ordered collection of types. The instances of the struct are all of the elements of each type contained in the struct, matched up with all of the other elements of all the other types in the struct. Known as the Cartesion product. So if you have a type A, and a type B, the collection of things in struct {A a; B b;} is the cross of As and Bs. That is {{a1, b1}, {a1, b2}, {a1, b3}, … , {a2, b1}, {a2, b2}, … {an, b1}, … {an, bm}} is all of the elements that are part of the type struct {A a; B b;}. The cardinality of {A, B} is the product of the cardinalities of A and B.

Structs are very natural in C++, but hard to deal with generically, so there’s a type that does it all for you, std::tuple. Getting at the parts of the tuple is a little more difficult that with a struct. You have to say std::get<0>(tuple), or std::get<int>(tuple). And the second might not even compile, if the tuple has more than one int. But you get tools for composing and decomposing tuples at compile time. And std::tuple lets you put pretty much any C++ type into the tuple, only restricting you when you try to, e.g. move a tuple that has an element that can’t be moved.

There should also be a type that acts as a unit for the product, the equivalent of 1 for multiplication. The empty tuple can work as a unit. It contains any of the list of no types. This implies that all empty tuples are equivalent, so its cardinality is 1. There can be only one. The product of a type with the empty tuple is entirely equivalent to the the type itself. There are no additional elements in the type, and you can convert back and forth between them. They are isomorphic, having the same shape.

Isomorphisms are important in talking about types, because most of the time we can’t actually distinguish between isomorphic types, at least for proving things. The phrase “up to isomorphism” shows up a lot. To be isomorphic means that we can write a transformation X from type A to type B, and a reverse transformation Y from type B to type A, such that Y(X(a)) == a for all a, and that for any function from a1 to a2, there is an equivalent function from b1 to b2. We could mechanically replace instances of a with the appropriate b and add calls to X and Y without changing the behavior of a program.

1.2 Sums

The other basic algebraic type is the sum type. The corresponding primitive in C++ is a union, with one difference. In most type systems, the sum type automatically remembers which of the allowed types is in it. A union doesn’t, so the standard technique is to embed the union in a struct that carries a tag saying which type in the union was most recently written, and can be read from. I’ll be ignoring type-punning schemes allowing a read of a different type than was written.

So a Sum type of type A and type B is the union of all of the things in A and all of the things in B. {a1, a2, a3, … , an, b1, b2, … , bm}. The cardinality of is the sum of the cardinalities of A and B.

The unit type of the sum is equivalent to zero. The empty sum type, although a valid type, has no elements in the type. It’s like the empty set. It’s often known as Void, where the unit for product is often called Unit. It may also be known as Bottom, where that is a computation that never completes. Since there are no elements of the type Void, it can’t be instantiated. And a product of Void and any other type is equivalent to Void. The c++ type void is related, but not exactly the same, because it also represents an empty argument list, a function that returns, but does not return any value (a subroutine), and is also functions as the universal pointer.

C++17 recently standardized a sum type to go with the long standardized std::tuple, std::variant. Std::variant remembers which of the alternative types was last set. It is almost never empty, only so if a write into one of the alternatives threw an exception. It is not allowed to hold void, references, arrays, or to contain no types. This is a bit unfortunate, because except for void std::tuple can do all of those things.

There were several competing models for what std::variant should be, with various tradeoffs being made. It was always clear that std::tuple had to be able to represent everything a struct can, and in fact there are now language features to destructure a struct into a tuple. There is no equivalent model for sum types. Union can’t hold anything but trivial types because there is no mechanism to track what to do on destruction, since there is no built-in mechanism to determine what the union was last written as.

One of the popular models for variant rises out of database-like interfaces. Even though databases are internally strongly typed, SQL queries are not. And the model of sending text over and getting some kind of response back makes it difficult to expose that to a host language. Particularly when the database schema may change, the query still be perfectly valid, but no longer return the same types. However, since we do know there is a relatively short list of permitted types in the database, a variant that allows just those types and the ability to query what type was returned can be quite useful, and not terribly hard to implement. There are JSON parsers taking similar approaches, only with the addition that a JSON type may have JSON objects contained in them recursively, and those have to be outside the object somehow, or the size of the object is unbounded.

From the implementors point of view, supporting pointers and arrays is a huge amout of extra work. Not allowing an array to decay to a pointer is quite difficult. References have issues when treated generically. Not to mention that references have decidely odd semantics in the differences between construction and assignment. And the degenerate case of an empty variant was also difficult. If that needs to be represented, the type std::monostate has been introduced, which is a type designed to have exactly one item in it, so that all instances of std::monostate are identical. This is also the same as the unit type for product types. It’s not an accident that it’s represented in Haskell as (), which is the empty tuple. All empty lists are equivalent. It could have been std::tuple<>, but no one in the room happened to think of that.

2 Tuple is a Heterogenous Container, what is the iterator?

The C++ standard says “tuples are heterogeneous, fixed-size collections of values” – [tuple.general]. Collections generally have iterator types associated with them, but that’s a bit of a challenge since the iterator model in C++ assumes that for a collection, the type of *(Collection<T>::iterator) is T. But if the collection isn’t on T, but on Types…, you doesn’t quite work to say *(Collection<typename… Types>) is of type …Types. You need something to hold that. But in many cases, std::variant can work. It doesn’t quiet work, since we’d really need a variant of references to the elements of the tuple, so that they could be written to. However, for many purposes we can come close. For the case I was looking at, making copies is perfectly fine. What I’m looking for is something roughly with the signature

template <typename... Types
auto getElement(size_t i, std::tuple<Types...> tuple) -> std::variant<Types...>;

That is, something that will get me the ith element of a tuple, as a variant with the same typelist as the tuple, with the index determined at runtime. All of the normal accessors are compile time. So need to do something that will make the compile time information available at runtime.

Start with something I do know how to do, idiomatically printing a tuple.

template <typename Func, typename Tuple, std::size_t... I>
void tuple_for_each_impl(Tuple&& tuple, Func&& f, std::index_sequence<I...>)
{
    auto swallow = {0,
                    (std::forward<Func>(f)(
                        I, std::get<I>(std::forward<Tuple>(tuple))))...};
    (void)swallow;
}

template <typename Func, typename... Args>
void tuple_for_each(std::tuple<Args...> const& tuple, Func&& f)
{
    tuple_for_each_impl(tuple, f, std::index_sequence_for<Args...>{});
}

template <typename... Args>
void print(std::ostream& os, std::tuple<Args...> const& tuple)
{
    auto printer = [&os](auto i, auto el) {
        os << (i == 0 ? "" : ", ") << el;
        return 0;
    };
    return tuple_for_each(tuple, printer);
}

Actually, a bit more complicated than the totally standard idiom, since it factors out the printer into a application across the tuple, but it’s not much more compilcated. The tuple_for_each constructs an index sequence based on the argument list, and delegates that to the impl, which uses it to apply the function to each element of the tuple. The _impl ought to be in a nested detail namespace, so as not to leak out. Swallow is the typical name for using an otherwise unnamed, and uninteresting, type to apply something to each element of the tuple for a side-effect. The void cast is to make sure the variable is used, and is evaluated.

The next step is, instead of an application of a function for its side-effect, instead a mapping of the tuple, returning the transformed tuple.

template <typename Func, typename Tuple, std::size_t... I>
auto tuple_transform_impl(Tuple&& tuple, Func&& f, std::index_sequence<I...>)
{
    return std::make_tuple(
        std::forward<Func>(f)(std::get<I>(std::forward<Tuple>(tuple)))...);
}

template <typename Func, typename... Args>
auto tuple_transform(std::tuple<Args...>&& tuple, Func&& f)
{
    return tuple_transform_impl(tuple, f, std::index_sequence_for<Args...>{});
}

template <typename Func, typename... Args>
auto tuple_transform(std::tuple<Args...> const& tuple, Func&& f)
{
    return tuple_transform_impl(tuple, f, std::index_sequence_for<Args...>{});
}

Because the std::tuple is not a template parameter, I have to supply a const& and a forwarding-reference form to cover both cases. And I’m ignoring volatile quals. The _impl function uses forwarding-reference parameters, which will decay or forward properly using std::forward. Using it is straightforward.

std::tuple<int, double, long> t = std::make_tuple(1, 2.3, 1l);
auto transform = tupleutil::tuple_transform(t,
                                            [](auto i) { return i + 1; });

EXPECT_EQ(3.3, std::get<1>(transform));

auto t2 = tupleutil::tuple_transform(std::make_tuple(4, 5.0),
                                     [](auto i) { return i + 1; });
EXPECT_EQ(6, std::get<1>(t2));

So, for functions over all the types in a tuple, tuple is a Functor. That is, we can apply the function to all elements in the tuple, and it’s just like making a tuple out of applying the functions to elements before making the tuple. If this sounds like a trivial distinction, you are mostly right. Almost all container-ish things are Functors, and a few non-containerish things are also. Plus Functor sounds more impressive.

The transform also suggests a way of solving the problem I was originally looking at. An array of the elements of the tuple each as a variant will let me permute them with std tools.

template <typename... Args, std::size_t... I>
constexpr std::array<std::variant<Args...>, sizeof...(Args)>
tuple_to_array_impl(std::tuple<Args...> const& tuple,
                    std::index_sequence<I...>)
{
    using V = std::variant<Args...>;
    std::array<V, sizeof...(Args)> array = {
        {V(std::in_place_index_t<I>{}, std::get<I>(tuple))...}};
    return array;
}

template <typename... Args>
constexpr std::array<std::variant<Args...>, sizeof...(Args)>
tuple_to_array(std::tuple<Args...> const& tuple)

{
    return tuple_to_array_impl(tuple, std::index_sequence_for<Args...>{});
}

And that can be used something like:

TEST(TupleTest, to_array)
{
    constexpr std::tuple<int, double, long> t = std::make_tuple(1, 2.3, 1l);
    auto arr = tupleutil::tuple_to_array(t);
    int  i   = std::get<int>(arr[0]);
    EXPECT_EQ(1, i);
}

TEST(TupleTest, to_array_repeated)
{
    constexpr std::tuple<int, int, int> t = std::make_tuple(1, 2, 3);
    auto arr = tupleutil::tuple_to_array(t);
    int  i   = std::get<2>(arr[2]);
    EXPECT_EQ(3, i);
}

The second test is there because I was about to write, “as you can see, we can tell the differece between variants holding the same type”, except that wasn’t true. The original version of to_ar

    constexpr std::tuple<int, double, long> t = std::make_tuple(1, 2.3, 1l);
    std::variant<int, double, long> v0{1};
    auto v = tupleutil::get(0, t);
    EXPECT_EQ(v0, v);

ray didn’t use the constructor form with std::in_place_index_t. The code I ended up with did, but not at this point. There’s nothing like writing out what something is supposed to do to make you look and keep you honest.

So here, we’re constructing an array of std::variant<Args…> and constructing each member with the argument pack expansion into the std::variant constructor using the Ith index value to get that element of the tuple, and recording that we’re constructing the ith alternative of the variant. The second test checks that. The 2nd element of the array must be the 2nd variant of the tuple, and can be retrieved only by std::get<2>().

This would allow me to permutate the elements of a tuple, but I’m fairly close now to being able to writing a version that allows choice of the element at runtime, rather than at compile time.

    constexpr std::tuple<int, double, long> t = std::make_tuple(1, 2.3, 1l);
    std::variant<int, double, long> v0{1};
    auto v = tupleutil::get(0, t);
    EXPECT_EQ(v0, v);

What I’m going to do is construct an array of the getters for the tuple, each of which will return the element wrapped in a variant. The signature of the array will be of function pointer type, because, quite conveniently, a non-capturing lambda can decay to a function pointer.

First getting the array of getters for the tuple

template <typename V, typename T, size_t I> auto get_getter()
{
    return [](T const& t) {
        return V{std::in_place_index_t<I>{}, std::get<I>(t)};
    };
}

template <typename... Args, std::size_t... I>
auto tuple_getters_impl(std::index_sequence<I...>)
{
    using V = std::variant<Args...>;
    using T = std::tuple<Args...>;
    using F = V (*)(T const&);
    std::array<F, sizeof...(Args)> array
        //        = {{[](T const& tuple){return V{std::get<I>(tuple)};}...}};
        = {{get_getter<V, T, I>()...}};
    return array;
}

template <typename... Args> auto tuple_getters(std::tuple<Args...>)
{
    return tuple_getters_impl<Args...>(std::index_sequence_for<Args...>{});
}

So first a function that returns a function that constructs a variant around the value of what’s returned from std::get<I>. Well, it could return anything that happens to have a constructor that takes a an in_place_index_t, take as the thing to be converted something that std::get<I> can extract from. This is actually a separate function because GCC was unhappy doing the template parameter pack expansion inline in the _impl function. Clang was happy with the expansion noted in the comment. I really have no idea who is wrong here, and the workaround was straight forward. The array is one of function pointers, which the returned lambdas can decay to.

Now the only remaining trick is to use this array as a table to dispatch to the appropriate getter for the tuple.

const auto get = [](size_t i, auto t) {
    static auto tbl = tupleutil::tuple_getters(t);
    return tbl[i](t);
};

Get the array as a static, so we only need to computer it once, and simply return tbl[i](t)

TEST(TupleTest, gettersStatic)
{
    constexpr std::tuple<int, double, long> t = std::make_tuple(1, 2.3, 1l);
    std::variant<int, double, long> v0{1};
    auto v = tupleutil::get(0, t);
    EXPECT_EQ(v0, v);

    int  i = std::get<0>(v);
    EXPECT_EQ(1, i);

    auto v2 = tupleutil::get(1, t);

    EXPECT_EQ(1ul, v2.index());
    double d = std::get<double>(v2);

    EXPECT_EQ(2.3, d);

    constexpr auto t2 = std::make_tuple(2.4, 1l);
    auto           v3 = tupleutil::get(0, t2);
    double         d2 = std::get<double>(v3);

    EXPECT_EQ(2.4, d2);
}

3 Source

All source is available at TupleUtil on GitHub, including org source for this post.