Currying in rust Part 3 (The circle of life ... aka why borrowchecker ... why?!)

We stopped last time with changing our explicit curried function into a generic application. Also I added an addendum about 1 advantage of using generics over explicit definitions.

fn add<T>(x: T) -> impl Fn(T)-> T
    where T: num::Num + Copy
{
    move |y: T| x + y
}

Victor Moroz was kind enough to point out in comment that we don't need the num trait and can stay without external dependencies for the moment. And I am a huge fan of reducing the dependency graph if we can.

His suggestion was:

fn add<T: Add<Output=T> + Copy>(x: T) -> impl Fn(T) -> T {
  move | y: T | x + y 
}

which can be expressed in the following form as well:

fn add<T>(x: T) -> impl Fn(T)-> T
    where T: Add<Output=T> + Copy
{
    move |y: T| x + y
}

The where clause is mainly to move parts of the rules for our type based behaviour out of the function definition .

We will now take a look at the changes just to be sure we understand the differences what has happened:

we remove the num trait
we replaced it with the std::ops::Add; trait
we defined the output type of Add has to be the same type as our type we apply our function to
it still has to implement the Copy trait (because we're moving x into a closure and never return it)

The Copy trait is our first contact with the borrow checker and the 'cheap' solution. Instead of thinking about lifetimes and references in our program, we just copy the value.

Now no one wonders why our new starting point is:

fn add_curry<T>(x: T) -> impl Fn(T)-> T
    where T: Add<Output=T> + Copy
{
    move |y: T| x + y
}

Today we're going to take a look at the 'pipe' function my friend has written and why all of the sudden lifetimes get important esp. when using references.

Lets start at the problem that pipe solves. Pipe is, as the name suggests, a pipeline which means we prepare a pipeline and everything inside the pipeline gets applied to the value that gets passed into the pipeline.

This sound easy enough. Lets start with an arbitrary use case of application.

use std::ops::Add;

fn main() {
    let add1 = add(1);
    let add2 = add(2);
    let add3 = add(3);

    let result = add3(add2(add1(1)));

    println!("{}", result); // prints 7
}

fn add<T>(x: T) -> impl Fn(T)-> T
    where T: Add<Output=T> + Copy
{
    move |y: T| x + y
}

as we can see we prepare our 3 lambdas, apply them on the value 1 and print the result (the code is copy and paste ready give it a try)

This is nice but what if we want to reapply our 3 lambdas over and over again?

use std::ops::Add;

fn main() {
    let add1 = add(1);
    let add2 = add(2);
    let add3 = add(3);

    let result = add3(add2(add1(1)));
    let result2 = add3(add2(add1(2)));
    let result3 = add3(add2(add1(3)));

    println!("{}", result); // prints 7
    println!("{}", result2); // prints 8
    println!("{}", result3); // prints 9
}

fn add<T>(x: T) -> impl Fn(T)-> T
    where T: Add<Output=T> + Copy
{
    move |y: T| x + y
}

This no to bad, but as programmers we want to have less work maintaining our code. Which means we want to prepare add1, add2 and add3 only one time, then apply it over and over again so we just have to maintain it a 1 point.

if we want to do 1 thing over and over again, what would be our normal solution? Usually ... a loop. however we are able to achieve it (recursions, gotos for example) .

We will try to solve it with both ways, recursions (recursion in the next article) and loops but we will start with loops since most of us are more familiar writing 'for/do-while/while/loop' than applying the more functional recursion pattern. Gotos are - to my knowledge - not supported as of yet.

In order to store and use our functions as a pipeline we need a data-structure to put our functions in. A short disclaimer: Yes it's possible with macros we just don't want to have to many moving parts in one topic.

So lets talk about data structures in rust and which one to pick.

We in general only go in 1 direction.
We are not that interested in changing the parts inside of our structure.
We are not interested in addressing a particular member from the outside.

1 direction? multiple members? Vector to the rescue.

https://youtu.be/ylkBY1KMvsc?t=10

So we need to build our vector containing the prepared functions:

fn main() {
    let add1 = add(1);
    let add2 = add(2);
    let add3 = add(3);

    let mut pipeline = Vec::new();
    pipeline.push(&add1);
    pipeline.push(&add2);
    pipeline.push(&add3);

    // let pipeline = vec![&add1, &add2, &add3];
    let mut input = 1;

    for apply_on in pipeline.iter() {
        input = apply_on(input);
    }

    println!("{}", input); // prints 7
}

In it's very basic form this already all we need, but it's hard to reuse this code so we have to extract it to another function, but first a small cleanup to reduce some of the noise

fn main() {
    let add1 = add(1);
    let add2 = add(2);
    let add3 = add(3);

    let pipeline = vec![
        &add1,
        &add2,
        &add3
    ];
     .......
}

using the macro vec! for creating our vector lets us move into a more declarative form of writing code.

now we want to extract our business logic / intent / behaviour into a function so we can reuse it.

Why use let on the the add function and don't just pass in the results of our function invocation as reference? Well mainly because we are currently lazy and we want our compile to guess the needed information we're not providing at the moment. ;) We will however reach that state within this article and I hope by then it will be clear why we only change certain bits and parts to get to our goal.

fn pipe<T>(pipeline: Vec<Fn(T) -> T>, data: T) -> T {
    let mut _result = data;

    for apply_on in pipeline.iter() {
        _result = apply_on(_result);
    }

    _result
}

lets take a look at the parts of our pipe function definition:

name is pipe
T as our generic Type
it takes a vector of functions
- those functions get our generic type T as input and return our generic type T as output
it takes a second parameter data of our type T
it returns something that is our type T

lets try and run it:

  --> src/main.rs:19:1
   |
19 | / fn pipe<T>(pipeline: Vec<Fn(T) -> T>, data: T) -> T
20 | | {
21 | |     let mut _result = data;
22 | |
...  |
27 | |     _result
28 | | }
   | |_^ `std::ops::Fn(T) -> T + 'static` does not have a constant size known at compile-time
   |
   = help: the trait `std::marker::Sized` is not implemented for `std::ops::Fn(T) -> T + 'static`
   = note: required by `std::vec::Vec`

not really what we want, lets look at the problem. The vector that needs to be prepared by our compiler does not know the actual size of itself.

Either we try to provide a certain size or since references in rust are always the same size we switch to references.

A brief summary: Pointers VS References

A reference is nothing but more like a meta pointer .... which doesn't really help if you're not familiar with those concepts I know!

Lets just say rather than copying a block of code in the memory (value based) we just point to an address in the memory and say 'hey look there'. That would be the idea of a pointer. The simplistic difference between a pointer and a reference is that a reference is something on its own in memory. It also allows us to add extra functionality where a pointer only points to a specific address.

This gets interesting with modern GCs (Garbage Collectors) for example in Java, JS or GO. One such use case would be the so called reference counting where the GC keeps track how many other parts of our code are using our reference so it can cleanup afterwards

Back to the problem at hand.

Lets just add the & in front of our pipe and our closures in our vector!

....
let pipeline = vec![
        &add1,
        &add2,
        &add3,
    ];
....
fn pipe<T>(pipeline: Vec<&Fn(T) -> T>, data: T) -> T

if we run this we get:

error[E0308]: mismatched types
  --> src/main.rs:14:23
   |
14 |     let result = pipe(pipeline, 1);
   |                       ^^^^^^^^ expected trait std::ops::Fn, found anonymized type
   |
   = note: expected type `std::vec::Vec<&std::ops::Fn(_) -> _>`
              found type `std::vec::Vec<&impl std::ops::Fn<({integer},)>>`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0308`.

So it does not know what our types are. hmmm lets try something different

let pipeline = vec![
        |x: i32| x  + 1,
        |x: i32| x  + 2,
        |x: i32| x  + 3,
    ];

Now our compiler knows what's coming in ... and

error[E0308]: mismatched types
  --> src/main.rs:10:9
   |
10 |         |x: i32| x + 2,
   |         ^^^^^^^^^^^^^^ expected closure, found a different closure
   |
   = note: expected type `[closure@src/main.rs:9:9: 9:23]`
              found type `[closure@src/main.rs:10:9: 10:23]`
   = note: no two closures, even if identical, have the same type
   = help: consider boxing your closure and/or using it as a trait object

error[E0308]: mismatched types
  --> src/main.rs:14:23
   |
14 |     let result = pipe(pipeline, 1);
   |                       ^^^^^^^^ expected reference, found closure
   |
   = note: expected type `std::vec::Vec<&std::ops::Fn(_) -> _>`
              found type `std::vec::Vec<[closure@src/main.rs:9:9: 9:23]>`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0308`

thanks for nothing .... it seam we need to change our approach. Lets return to our starting position where we changed our functions into references.

use std::ops::Add;

fn main() {
    let add1 = add(1);
    let add2 = add(2);
    let add3 = add(3);

    let pipeline = vec![
        &add1,
        &add2,
        &add3
    ];

    let result = pipe(pipeline, 1);

    println!("{}", result); // prints 7
}

fn pipe<T>(pipeline: Vec<&Fn(T) -> T>, data: T) -> T
{
    let mut _result = data;

    for apply_on in pipeline.iter() {
        _result = apply_on(_result);
    }

    _result
}

fn add<T>(x: T) -> impl Fn(T)-> T
    where T: Add<Output=T> + Copy
{
    move |y: T| x + y
}

changing the return type of add to fn (lowercase) wont help either since rust sees every closure as unique type .... hmm ... lets try something else .... lets put it in a box ... wait ... what is a box?

A box is to quote the rust documentation:

A pointer type for heap allocation.

great ... everything's clear now? I guess if you're a C / C++ Programmer who is really practicing it probably is ... but for us mere mortals who spend our life on the fruitful steps in proximity of our VM castles close to GC town .... we usually are not even sure what the heap and the stack is. So I will try to not embarrass myself to much but give an outline.

The Stack is the part of working memory that exists in context of our function calls
- call over -> memory free for reuse
The Heap is the part of working memory that exists outside the function call context
- call over -> memory still reserved for our stuff

So in our case the problem is that our compiler cannot resolve our closures as a specific type. So we move our closures outside of the function context into a Box and onto the heap. This way we can defer the unwrapped box to our closures. So we tell our compiler what is in the box and it does not define every closure as it's own type at compile time.

But what does this look like? first we need to change our add function and put the closure into a box

fn add<T>(x: T) -> Box<Fn(T)-> T>
    where T: Add<Output=T> + Copy
{
    Box::new(move |y: T| x + y)
}

what has changed?

we don't return an impl of Fn(T) -> T anymore but a Box containing a Fn(T) -> T
we put our entire closure into the Box and onto the Heap

fn pipe<T>(pipeline: Vec<&Box<Fn(T) -> T>>, data: T) -> T
    where T: Add<Output=T> + Copy
{
    let mut _result = data;

    for apply_on in pipeline.iter() {
        _result = apply_on(_result);
    }

    _result
}

what has changed?

we added a reference to a Box around our Fn inside the vector
we moved the T definition to the where clause

if we try to compile this we will get the following error message:

error[E0310]: the parameter type `T` may not live long enough
  --> src/main.rs:34:5
   |
31 | fn add<T>(x: T) -> Box<Fn(T)-> T>
   |        - help: consider adding an explicit lifetime bound `T: 'static`...
...
34 |     Box::new(move |y: T| x + y)
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |

We got a lifetime issue our compiler does not know how to handle the lifetimes of our implementations. in our code we see it will work so we now need to convince our compiler it does as well.

fn add<'a, T>(x: T) -> Box<Fn(T)-> T + 'a>
    where T: Add<Output=T> + Copy, T: 'a
{
    Box::new(move |y: T| x + y)
}

So what did we do? we added a lifetime annotation: 'a and we tell our compiler that everything that we connect with 'a lives the same amount of time and as long as we're using one of the parts connected it should not clear the memory.

Now we can use our pipe function as intended :)

Thanks for reading :)

Currying in rust Part 3 (The circle of life ... aka why borrowchecker ... why?!)

A brief summary: Pointers VS References

Product

Explore

Company

Blogs

Support