The Scope of Unsafe
I’d like to talk about an important aspect of dealing with unsafe code, that still regularly seems to catch people on the wrong foot:
When checking unsafe code, it is not enough to just check the contents of every
unsafe
block.
The “scope” in the title refers to the extent of the code that has to be manually checked for correctness, once unsafe
is used.
What I am saying is that the scope of unsafe
is larger than the unsafe
block itself.
It turns out that the underlying reason for this observation is also a nice illustration for the concept of semantic types that comes up in my work on formalizing Rust (or rather, its type system). Finally, this discussion will once again lead us to realize that we rely on our type systems to provide much more than just type safety.
Update (2016-01-11): Clarified the role of privacy; argued why evil
is the problem.
An Example
Before we dive into the deeper reasons for why we have to check (seemingly) safe code, let me start by convincing your that this is actually the case.
Consider the type Vec
, which is roughly defined as follows:
(I know this definition is not entirely right. If you want to know how the actual Vec
works, check out the corresponding section of the Rustonomicon.)
Roughly speaking, ptr
points to the heap-allocated block of memory holding the contents of the vector, cap
holds the size of that memory block (counted in number of T
), and len
stores how many elements are actually stored in the vector right now.
cap
and len
can be different if the memory block contains some extra (uninitialized) space beyond the end of the vector, which speeds up pushing an element to the end of the vector (at the cost of some extra memory usage).
It is very easy to add a function to Vec
that contains no unsafe
code, and still breaks the safety of the data structure:
Why is this bad? We can now “access” two more elements of the vector that have never been added! This means we will read from uninitialized memory, or worse, from unallocated memory. Oops!
So, this example clearly shows that to evaluate the safety of types like Vec
, we have to look at every single function provided by that data structure, even if it does not contain any unsafe
code.
The Reason Why
Why is it the case that a safe function can break Vec
?
How can we even say that it is the safe function which is at fault, rather than some piece of unsafe
code elsewhere in Vec
?
The intuitive answer is that Vec
has additional invariants on its fields.
For example, cap
must be greater than or equal to len
.
More precisely speaking, ptr
points to an array of type T
and size cap
, of which the first len
elements have already been initialized.
The function evil
above violates this invariant, while all the functions actually provided by Vec
(including the ones that are implemented unsafely) preserve the invariant.
That’s why evil
is the bad guy. (The name kind of gave it away, didn’t it?)
Some will disagree here and say: “Wait, but there is some unsafe
code in Vec
, and without that unsafe
code evil
would be all right, so isn’t the problem actually that unsafe
code?”
This observation is correct, however I don’t think this position is useful in practice.
Vec
with evil
clearly is a faulty data structure, and to fix the bug, we would remove evil
.
We would never even think about changing the unsafe
code such that evil
would be okay, that would defeat the entire purpose of Vec
.
In that sense, it is evil
which is the problem, and not the unsafe
code.
This may seem obvious in hindsight (and it is also discussed in the Rustonomicon), but I think it is actually fairly subtle.
There used to be claims on the interwebs that “if a Rust program crashes, the bug must be in some unsafe
block”. (And there probably still are.)
Even academic researchers working on Rust got this wrong, arguing that in order to detect bugs in data structures like Vec
it suffices to check functions involving unsafe code.
That’s why I think it’s worth dedicating an entire blog post to this point.
But we are not done yet, we can actually use this observation to learn more about types and type systems.
The Semantic Perspective
There is another way to phrase the intuition of types having additional invariants:
There is more to types like
Vec
than their syntactic definition gives away.
Imagine we define another type in our Rust program:
We will define only one function for this type:
This type looks exactly like Vec
, doesn’t it?
The two types are syntactically equal, and the same goes for the two evil
functions.
Still, MyType::evil
is a perfectly benign function (despite its name).
How can this be?
Remember that in a previous blog post, I argued that types have a semantic aspect. For example, a function is semantically well-typed if it behaves properly on all valid arguments, independently of how, syntactically, the function body has been written down.
The reason for MyType::evil
being fine, while Vec::evil
is bad, is that semantically speaking, Vec
is very different from MyType
– even though they look so similar.
If I have a Vec
, I actually know that ptr
is valid, that len <= cap
, and so on.
I know nothing like that about an arbitrary instance of MyType
: All I have here is that cap
and len
are valid elements of usize
.
In other words, the additional invariants that we associate with Vec
actually make this an entirely different type.
In order to formally describe what it means to be a Vec
, we can’t just say “well, that’s a struct
with these three fields”.
That would be wrong.
But still, that’s actually all the Rust compiler knows, and it is all that it actually checks:
When type checking Vec::evil
, the compiler thinks that Vec
is like MyType
, and under that assumption, it deems evil
a good function in both cases.
(Clearly, the compiler did not watch enough movies with serious villains.)
In other words, we have to look beyond the mere syntactic appearance of a type, and into its deeper, semantic meaning.
As part of my work on formalizing Rust, I will eventually want to prove the soundness of types like Vec
or RefCell
and their associated operations.
In order to do so, I will first have to figure out what exactly the semantics of these types are. (This will be fairly easy for Vec
, but really hard for RefCell
. Can you guess why?)
Then I will have to check every function operating on these types and prove that they are actually well-typed.
The compiler, however, cannot guess or check the actual semantics of Vec
, the additional invariants we came up with; and this can make seemingly safe code like Vec::evil
become unsafe.
Even if the Rust compiler says “yup, that function over there is fine, I checked every single statement of it”, that’s not enough because the Rust compiler has no idea about which types these functions are actually about.
All it sees is the syntactic surface.
While proving soundness of Vec
, it would then turn out that Vec::evil
is actually not a semantically well-typed function.
It will invalidate self
, such that is is no longer an instance of the actual (semantic) type Vec
.
This is just as bad as a function assigning *f = 42u32
to a f: &mut f32
.
The difference between Vec
and MyType
is no less significant than the difference between f32
and u32
.
It’s just that the compiler has been specifically taught about f32
, while it doesn’t know enough about the semantics of Vec
.
The Actual Scope of Unsafe
At this point, you may be slightly worried about the safety of the Rust ecosystem.
I just spent two sections arguing that the Rust compiler actually doesn’t know what it is doing when it comes to checking functions that work on Vec
.
How does it come that people carelessly use Vec
in their programs without causing havoc all the time?
Or, to put it slightly differently: If the scope of unsafe
grows beyond the syntactic unsafe
blocks, then just how far does it reach?
Does it sprawl through all our code, silently infecting everything we write – or is there some limit to its effect?
As you probably imagined, of course there is a limit. Rust would not be a useful language otherwise.
If all your additional invariants are about private fields of your data structure, then the scope of unsafe
ends at the next abstraction boundary.
This means that everything outside of the std::vec
module does not have to worry about Vec
.
Due to the privacy rules enforced by the compiler, code outside of that module cannot access the private fields of Vec
.
That code does not have a chance to violate the additional invariants of Vec
– it cannot tell the difference between the syntactic appearance of Vec
and its actual, semantic meaning.
Of course, this also means that everything inside std::vec
is potentially dangerous and needs to be proven to respect the semantics of Vec
.
Abstraction Safety
This nicely brings us to another important point, which I can only glimpse at here:
The purpose of type systems goes far beyond type safety: They (also) serve to establish safe abstractions.
If the type system of Rust lacked a mechanism to establish abstraction (i.e., if there were no private fields), type safety would not be affected.
However, it would be very dangerous to write a type like Vec
that has a semantic meaning beyond its syntactic appearance.
All code could perform invalid operations like Vec::evil
, operations that rely on the assumption that Vec
is just like MyType
.
There would actually be no bound to the scope of unsafe
.
To formally establish safety, one would have to literally go over the entire program and prove that it doesn’t misuse Vec
.
The safety promise of Rust would be pretty much useless.
This should not be entirely surprising if you read the aforementioned post about formalizing Rust’s type system, where I already argued that a proof of (syntactic) type safety does not help to justify safety of most Rust programs out there. I am now making a similar point, coming from a different angle.
The fact that Rust programmers can use Vec
and many other types without much care is a property of the type system that is independent of type safety.
We rely on the fact that the type system lets us build abstraction boundaries that cannot be violated by safe code.
We may call this abstraction safety.
Many type systems have this property, just think of private
fields in Java or C++, or opaque signatures in ML.
However, few languages embrace using unsafe primitives and encapsulating them behind a layer of abstraction as much as Rust does.
In some sense, Rust relies way more on abstraction safety than most languages – though in practice, programmers of all languages rely on abstraction safety a lot, if it is available.
(Most scripting languages, I am looking at you.)
Abstraction safety is a very deep point that deserves its own blog post, I only skimmed the surface here. Watch this space, a link to such a post will eventually appear. (And I hope I don’t have to write it myself.)
Posted on Ralf's Ramblings on Jan 9, 2016 and licensed under CC BY-SA 4.0.
Comments? Drop me a mail or leave a note on reddit!