Introduction
Welcome to Using Box
In this lab, we will learn about using Box
Welcome to Using Box
In this lab, we will learn about using Box
<T>
{=html} to Point to Data on the HeapThe most straightforward smart pointer is a box, whose type is written Box<T>
. Boxes allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data. Refer to Chapter 4 to review the difference between the stack and the heap.
Boxes don't have performance overhead, other than storing their data on the heap instead of on the stack. But they don't have many extra capabilities either. You'll use them most often in these situations:
We'll demonstrate the first situation in "Enabling Recursive Types with Boxes". In the second case, transferring ownership of a large amount of data can take a long time because the data is copied around on the stack. To improve performance in this situation, we can store the large amount of data on the heap in a box. Then, only the small amount of pointer data is copied around on the stack, while the data it references stays in one place on the heap. The third case is known as a trait object, and "Using Trait Objects That Allow for Values of Different Types" is devoted to that topic. So what you learn here you'll apply again in that section!
<T>
{=html} to Store Data on the HeapBefore we discuss the heap storage use case for Box<T>
, we'll cover the syntax and how to interact with values stored within a Box<T>
.
Listing 15-1 shows how to use a box to store an i32
value on the heap.
Filename: src/main.rs
fn main() {
let b = Box::new(5);
println!("b = {b}");
}
Listing 15-1: Storing an i32
value on the heap using a box
We define the variable b
to have the value of a Box
that points to the value 5
, which is allocated on the heap. This program will print b = 5
; in this case, we can access the data in the box similar to how we would if this data were on the stack. Just like any owned value, when a box goes out of scope, as b
does at the end of main
, it will be deallocated. The deallocation happens both for the box (stored on the stack) and the data it points to (stored on the heap).
Putting a single value on the heap isn't very useful, so you won't use boxes by themselves in this way very often. Having values like a single i32
on the stack, where they're stored by default, is more appropriate in the majority of situations. Let's look at a case where boxes allow us to define types that we wouldn't be allowed to define if we didn't have boxes.
A value of a recursive type can have another value of the same type as part of itself. Recursive types pose an issue because at compile time Rust needs to know how much space a type takes up. However, the nesting of values of recursive types could theoretically continue infinitely, so Rust can't know how much space the value needs. Because boxes have a known size, we can enable recursive types by inserting a box in the recursive type definition.
As an example of a recursive type, let's explore the cons list. This is a data type commonly found in functional programming languages. The cons list type we'll define is straightforward except for the recursion; therefore, the concepts in the example we'll work with will be useful any time you get into more complex situations involving recursive types.
A cons list is a data structure that comes from the Lisp programming language and its dialects, is made up of nested pairs, and is the Lisp version of a linked list. Its name comes from the cons
function (short for construct function) in Lisp that constructs a new pair from its two arguments. By calling cons
on a pair consisting of a value and another pair, we can construct cons lists made up of recursive pairs.
For example, here's a pseudocode representation of a cons list containing the list 1, 2, 3
with each pair in parentheses:
(1, (2, (3, Nil)))
Each item in a cons list contains two elements: the value of the current item and the next item. The last item in the list contains only a value called Nil
without a next item. A cons list is produced by recursively calling the cons
function. The canonical name to denote the base case of the recursion is Nil
. Note that this is not the same as the "null" or "nil" concept in Chapter 6, which is an invalid or absent value.
The cons list isn't a commonly used data structure in Rust. Most of the time when you have a list of items in Rust, Vec<T>
is a better choice to use. Other, more complex recursive data types are useful in various situations, but by starting with the cons list in this chapter, we can explore how boxes let us define a recursive data type without much distraction.
Listing 15-2 contains an enum definition for a cons list. Note that this code won't compile yet because the List
type doesn't have a known size, which we'll demonstrate.
Filename: src/main.rs
enum List {
Cons(i32, List),
Nil,
}
Listing 15-2: The first attempt at defining an enum to represent a cons list data structure of i32
values
Note: We're implementing a cons list that holds only
i32
values for the purposes of this example. We could have implemented it using generics, as we discussed in Chapter 10, to define a cons list type that could store values of any type.
Using the List
type to store the list 1, 2, 3
would look like the code in Listing 15-3.
Filename: src/main.rs
--snip--
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Cons(2, Cons(3, Nil)));
}
Listing 15-3: Using the List
enum to store the list 1, 2, 3
The first Cons
value holds 1
and another List
value. This List
value is another Cons
value that holds 2
and another List
value. This List
value is one more Cons
value that holds 3
and a List
value, which is finally Nil
, the non-recursive variant that signals the end of the list.
If we try to compile the code in Listing 15-3, we get the error shown in Listing 15-4.
error[E0072]: recursive type `List` has infinite size
--> src/main.rs:1:1
|
1 | enum List {
| ^^^^^^^^^ recursive type has infinite size
2 | Cons(i32, List),
| ---- recursive without indirection
|
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `List`
representable
|
2 | Cons(i32, Box<List>),
| ++++ +
Listing 15-4: The error we get when attempting to define a recursive enum
The error shows this type "has infinite size." The reason is that we've defined List
with a variant that is recursive: it holds another value of itself directly. As a result, Rust can't figure out how much space it needs to store a List
value. Let's break down why we get this error. First we'll look at how Rust decides how much space it needs to store a value of a non-recursive type.
Recall the Message
enum we defined in Listing 6-2 when we discussed enum definitions in Chapter 6:
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(i32, i32, i32),
}
To determine how much space to allocate for a Message
value, Rust goes through each of the variants to see which variant needs the most space. Rust sees that Message::Quit
doesn't need any space, Message::Move
needs enough space to store two i32
values, and so forth. Because only one variant will be used, the most space a Message
value will need is the space it would take to store the largest of its variants.
Contrast this with what happens when Rust tries to determine how much space a recursive type like the List
enum in Listing 15-2 needs. The compiler starts by looking at the Cons
variant, which holds a value of type i32
and a value of type List
. Therefore, Cons
needs an amount of space equal to the size of an i32
plus the size of a List
. To figure out how much memory the List
type needs, the compiler looks at the variants, starting with the Cons
variant. The Cons
variant holds a value of type i32
and a value of type List
, and this process continues infinitely, as shown in Figure 15-1.
Figure 15-1: An infinite List
consisting of infinite Cons
variants
<T>
{=html} to Get a Recursive Type with a Known SizeBecause Rust can't figure out how much space to allocate for recursively defined types, the compiler gives an error with this helpful suggestion:
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `List`
representable
|
2 | Cons(i32, Box<List>),
| ++++ +
In this suggestion, indirection means that instead of storing a value directly, we should change the data structure to store the value indirectly by storing a pointer to the value instead.
Because a Box<T>
is a pointer, Rust always knows how much space a Box<T>
needs: a pointer's size doesn't change based on the amount of data it's pointing to. This means we can put a Box<T>
inside the Cons
variant instead of another List
value directly. The Box<T>
will point to the next List
value that will be on the heap rather than inside the Cons
variant. Conceptually, we still have a list, created with lists holding other lists, but this implementation is now more like placing the items next to one another rather than inside one another.
We can change the definition of the List
enum in Listing 15-2 and the usage of the List
in Listing 15-3 to the code in Listing 15-5, which will compile.
Filename: src/main.rs
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(
1,
Box::new(Cons(
2,
Box::new(Cons(
3,
Box::new(Nil)
))
))
);
}
Listing 15-5: Definition of List
that uses Box<T>
in order to have a known size
The Cons
variant needs the size of an i32
plus the space to store the box's pointer data. The Nil
variant stores no values, so it needs less space than the Cons
variant. We now know that any List
value will take up the size of an i32
plus the size of a box's pointer data. By using a box, we've broken the infinite, recursive chain, so the compiler can figure out the size it needs to store a List
value. Figure 15-2 shows what the Cons
variant looks like now.
Figure 15-2: A List
that is not infinitely sized, because Cons
holds a Box
Boxes provide only the indirection and heap allocation; they don't have any other special capabilities, like those we'll see with the other smart pointer types. They also don't have the performance overhead that these special capabilities incur, so they can be useful in cases like the cons list where the indirection is the only feature we need. We'll look at more use cases for boxes in Chapter 17.
The Box<T>
type is a smart pointer because it implements the Deref
trait, which allows Box<T>
values to be treated like references. When a Box<T>
value goes out of scope, the heap data that the box is pointing to is cleaned up as well because of the Drop
trait implementation. These two traits will be even more important to the functionality provided by the other smart pointer types we'll discuss in the rest of this chapter. Let's explore these two traits in more detail.
Congratulations! You have completed the Using Box