Argosy: Verifying Layered Storage Systems With Recovery Refinement


[MUSIC]>>All right. Thanks everybody
for coming and thanks Tej Chajed for coming
to give us talk. It’s nice that many of you
were able to come attend this. He is a systems student working
on formal verification of software at MIT advised by Frans
Kaashoek and Nickolai Zeldovich. There is still some space
on his schedule. If later this afternoon
you’d like to talk to him, come to me and I will
arrange a spot schedule. So, let’s hear from Tej.>>Thanks, Jay. So to motivate why we’re talking
about layered storage systems, let me start with a story. The story will tragically
lack modularity, but then Argosy will
come in and fix that. Story starts with
Bob, a storage system developer who’s writing
a replication system. The replication system
takes two disks that are unreliable and it exports a single
logical disk with reliability. The way it works is
as writes come in, it writes them to
both disks and that ensures that your data is safe even
if one of these disks fails. Now, this is a storage system so Bob wants it to be able to
handle a crashes at anytime, say due to a power failure. The problem is that if the system crashes in the middle
of a right operation, then it leaves the disks
in an inconsistent state. So Bob knows how to handle this. He writes a recovery
procedure that copies from the first disk to the second and that restores
the invariance of the system. Now Bob is a careful guy. So he writes a machine check proof of correctness that his code is correct. Specifically that read and
write are atomic and have the right behavior as long as you run rep recover after every crash, and because he’s using the latest
verification infrastructure, his proof handles crashes
during recovery itself. Meanwhile, Alice writes
a write-ahead logging system that implements
a transactional API on top of a single disk and she
also proves her code correct with a machine
checked proof of correctness with this
log recovery procedure. Now, you’ll get
this picture and you think surely we can run
the write-ahead log on top of the disk replication
system and we can prove that composed
implementation correct. The trouble is that
the recovery procedure to make this code correct
needs to run both of the underlying recoveries and it turns out that this is
tricky to reason about formally. So when we have a proof that each of these recovery procedures
is correct in isolation including
crashes during recovery, crashes during the composed
recovery procedure don’t directly resemble crashes during
either of the sub recoveries. So the challenge is how do we
actually reuse these proofs? In prior work, it turns
out can’t do this. So Crash Hoare Logic our own work
from a verified file system has no way to move from a proof about one recovery procedure to
a proof about another. You can only have one
recovery procedure at a time. Argosy similarly has
no meta theory for composing two recovery procedures
and sort of implicitly assumes a global recovery procedure and [inaudible] while they
do support composition, they place some restrictions
on recovery procedures to make that proof go through and it just
allows both of these systems. This is where Argosy comes in. Argosy supports modular proofs that support exactly the style
that we wanted here where two developers
independently proved two implementations correct and there’s a generic theorem that
the composition is correct including
crashes during recovery. Argosy is compatible with
existing techniques. So we re-implemented Crash
Hoare Logic for the sake of this work in order to prove the implementation is
correct individually. So the main contribution
in this work is recovery refinement which
is the approach that we use to get modular proofs
and I’ll also talk if I have time about Crash
Hoare Logic which is how we connect Argosy to Crash Hoare Logic to prove
a particular implementation correct and see the paper for details on an example that we’ve verified featuring write-ahead
logging on top of replication. The same example I’ve
been using and all of our code is backed by
machine check proofs in the Coq proof assistant. I should say that if you want to interrupt me at any time feel free, if you have questions,
don’t wait till the end. So let me go into
recovery refinement next. To explain recovery refinement,
I’ll do so in two steps. First, I’ll talk about just
some background on how to prove normal execution correctness
using refinement and then next, I’ll talk about how to extend
refinement with notions of crash and recovery to arrive
at recovery refinement. So here’s some background
on refinement before we get into
recovery refinement. In refinement, the goal
is to prove that an implementation correctly
implements some exported interface, say this single disk in
the case of replication, on top of some code interface
that the code is written against. In this case, these
two unreliable disks. The replication system implements every high level operation say a write in terms of
the low-level operations. In this case, the write implementation
just writes to both disks in turn and similarly for the other
operations in the interface. So what does it mean for this
replication system to be correct? We define correctness
based on how you would use the replication system. Specifically, you take
some piece of code and you’d write it against
the disk interface and what you want to know
is that when you run this code on top of two disks
using the replication system, it behaves the right way. It behaves the way you expect. The way we formalized that
is the standard idea of trace inclusion where
the specification’s behaviors at the disk layer should include all of the running code behaviors
at the two-disk layer. So that’s just normal execution. But how do we prove
this property correct? The way we do that is using
an abstraction relation. This whole technique is also
called forward simulation. Now, this is all still
standard stuff so background. So with a forward simulation, the developer provides
an abstraction relation R which has two purposes. First, it relates the states
of the specification to the states of the code and also it enforces any
invariants over the code. So in this case,
the replication system maintains this invariant that
the two disks are equal. Then the developer proves that
for every execution of the code, there’s a corresponding execution of the specification that preserves the abstraction relation
and that lets you run more pieces of code
after this operation. To get the full trace
inclusion property, the developer just
needs to prove this for every operation in the system. So that’s all in the background
I have on refinement. Now we’ll extend all these
notions for the crashes and recovery which is
the new part of this paper. So here is the picture that
we had for an implementation. Now, we’ll extend it to
the crashes and recovery. Recovery just goes into
the implementation, but I’ve also added these power
symbols to both interfaces. These power symbols represent the semantics of what
happens on crash. For example, volatile
state is cleared. Your memories wiped, anything that’s in say buffered in
your disk goes away. So that’s just handles
modeling crashes and recovery. What about the correctness
for recovery? We still use trace inclusion, but we’ll extend it with notions
of crashes and recovery. To do that I’ll introduce
our crash semantics to describe the specification and then the recovery semantics to describe
what code we actually run. The crash semantics that we use is intuitively its atomicity and
the way we formalized that is you can crash before or after
between any pair of atomic operations and
standard assumptions for what’s atomic are things like
four kilobyte writes or file system operations
you might assume are atomic with respect to crashes. I’ll represent a crash in
the middle of a piece of code between any atomic operations with this jagged edged
followed by a power symbol. So that handles the specification.
What about the code? We’re definitely
going to run recovery after a crash in the middle
of the implementation. We know that the code
needs to do that. But we also want a model
crashes during recovery. So we insert zero or more iterations
of crashes during recovery first and that’s all we need to specify what trace
inclusion with recovery is. Its normal execution plus correctness
under crashes and recovery. How do we prove this property? We’re still going to use an idea
that’s like forward simulation. Well, let me introduce how we
come to the right condition. So the first observation
we make is that if you crash in the middle of
some long sequence of code, you must crash in the middle
of some operation. In this example, execution
that crash occurs in the middle of op2 and
prior to that crash, you can just use normal execution
correctness to establish that the abstraction relation
is preserved up to that point. So we can sort of ignore that because normal execution correctness,
we already handled that. What’s the specification
for a crash during op2? We said it’s atomicity. So we simply use either all of op two occurred and then
the system crashed or none of op2 ran and the system crashed and that condition for all the operations is all we
need for recovery refinement. So we still have normal
execution correctness for non crash execution and then
we also have this new condition that relates recovery to crashes and proving this for every operation
in the system is sufficient to prove the strong trace
inclusion property for running arbitrary pieces of code against
the top-level interface. So what I’ve talked about so far
only handles a single layer, but this talk was
supposed to be about layered storage systems.
Sorry, go ahead.>>Why do this specification after having inclusion crashing at all, can crashes be invisible
with specifications.>>Yes. Good question. So many
systems would try to hide crashes, but we also want to
reason about systems like file systems that cannot hide
crashes from the caller. Just fundamentally if you’re running code on top of a file
system and you crash, something will be lost or at least we’re going to lose
track of what you were doing. So we want to be able to give
these weaker specifications. If you go off and you write
some large distributed system, you’ll probably try to hide crashes. Basically, you need to
replicate for that to work.>>You said the relation about R and the way if I wanted to
reason about the low-level system, I would like to reason about it
through the abstraction with either System and R is
my interface to do that?>>So our specification
does not have R. So the specification should be that. So there’s a distinction between,
R is talking about states. The states are just helpful to specify what the behavior
of a piece of code is, but once you get to
the specification, you only need to look
at the interface and the externally visible behaviors which in this in our system
are returned values. But a natural extension
is things like the network packets
that you send out. Then you don’t need to
worry about how R works.>>You were talking about
a high-level trace.>>Yeah.>>I want to conclude
something about a local trace.>>Yes.>>You have to somehow transport
my property across R, right?>>So I think your property
should not be about states. It should only be about return values and if there is something
else that you care about, you should externalize it as
I/O behavior of some sort. The intermediate states
should be irrelevant. That’s the intention of this system. I didn’t talk about
this, but you do need to establish R initially as well. It’s an inductive invariant
and you’ll need a base case. It’s just generally not interesting. But if you don’t do that then you would need to look closely at R to make sure that it’s
actually satisfiable. Okay. So next I’ll talk about the Composition theorem that lets us actually take two
recovery refinements and compose them together for modularity. Before I do that, let me
talk about Kleene algebra, which is the approach
that we actually use to prove the composition theorem. So I’ve been using these
expressions informally in the talk, this alternation symbol and the star. It turns out that these
are exactly the operators of Kleene algebra, which is a formalism that
generalizes regular expressions. So you can use all, basically
all of your intuition from regular expressions to think
about these operators. We don’t just have these informally, we give them a formal meaning
in terms of matching transitions of the system. So sequencing corresponds to
sequencing of transitions, this alternation symbol is non-deterministic choice
between two possibilities, and the star is
zero or more iterations. If you look, you’ll see that these
are exactly the operators that we needed to define the semantics
of crafters and recovery. So Kleene algebra is quite closely, it seems like a good formulas to
at least model what’s going on. But it’s actually also useful
for reasoning principles. Kleene algebra has a rich theory for when two expressions are equal. I’ll talk about that theory in a bit when I go into
the composition terms proof. So let me just formalize what
the modularity that we get is. Now that we have recovery refinement, I can more precisely state
what composition is. If we have two recovery refinements
that are stacked like this, then all you’re going to see
is a theorem that says that the composed implementation is also a recovery refinement
and therefore, has this trace inclusion property, where that composition needs to
run both recovery procedures. So before, we have
this challenge of how do we get a proof about the composed recovery from
both individual recoveries. Now, I can formalize
a little bit more how we might go about
doing this reasoning. Before I do that, let me just
abbreviate these two with rep for the replication system
and log for the logging system, just to keep the slides
a little less cluttered. So now we can formalize what we
mean at least by under crashes. It’s zero or more iterations
of crashes during recovery, before recovery finally finishes. Here’s an expression that
represents crashes during the composed recovery and just
to break this down a little bit, it starts with either we can crash during the replication system itself or the replication system might fully recover but then re-crash
during the logging recovery. There’s some number of iterations of crashes during
recovery and finally, the system stops crashing and
the whole composed recovery runs. So this doesn’t seem any easier, the top two expressions don’t
show up in the bottom expression, but Kleene algebra can fix that. So Kleene algebra has, as I said, it has an equational
theory that lets us rewrite these expressions
to equivalent ones. So the first thing we apply it’s
a theorem called de-nesting, which will let us get rid of
this alternation altogether. Next, we apply theorem
called sliding. This might not seem simpler
but I’ll go through why this bottom expression
is easier to reason about. Kleene algebra has told
us that it’s exactly equivalent to the
original in terms of, it matches exactly
the same transitions. So after this rewrite, both of
these proofs start to apply. Before the parentheses,
the replication proofs establishes that the
recovery procedure restores the invariant
of that system. Then within the parentheses, the trace inclusion property
for the replication system says this behaves like a crash during the high level
logging recovery code. Where from the perspective
of the replication system, log recovery is just another program
that you wanted to run. That’s all we need from
the replication proof. Then we can take what we now know and plug it into
the write-ahead logging proof, which shows that the
logging systems recovery correctly restores its invariants. The upshot is that at the end
of this composed recovery, despite crashes during recovery, both systems invariants are restored. So I’ve talked about this in terms of informally restoring invariants, but the paper and especially the code formalize this more
precisely as recovery refinement. Let me just give a small flavor
of how not only is Kleene algebra useful for equational reasoning, just rewriting expressions
to equivalent ones, it also actually lets us
do refinement reasoning. So this little refinement diagram, just like the standard one that
we’ve been using in this talk, you can encode this as
an expression in Kleene algebra, as a subset relationship between
two possible transitions. There’s just one thing slightly
weird going on in this expression, which is that we run R
as if it were a program. Much like a program, R is also
a relation between states. Then that subset is just another operator in
Kleene algebra that say one regular expression matches the subset of one another
regular expression matches. Kleene algebra also has
a theory for reasoning about these subset relationships. So that was a lot of
abstract description of composition. Let me just go back and motivate why you might want this modularity, with a real example from our work. So here’s an anecdote about
where modularity failed us, in some prior work that we did. So we wrote this verified
file system called FSCQ and in one of those papers, we wanted to prove
some application correct on top of the specification and that job
fell to Frans, my advisor. So Frans wanted to prove
this little snippet of code correct. It’s not super
important what it does, all it’s trying to do is
atomically save a file, by first going through a temp file. He wanted to prove
that this is atomic. To make it atomic, we need to run this application specific
recovery procedure, that cleans up
the temp file on crash. It turns out that this code
took a 1,500 line proof because of this little fs recover that’s hiding in the application
recovery procedure. So as I said, password logic
has no way to move from the proof about one
recovery procedure, in this case the whole
system is proven with fs recover as the global
recovery procedure. We now we need to
move to a proof that has this recovery procedure, with one extra line of code. Proving that this is correct, basically required in
lining the entire proof, redoing the entire proof that fs recover handles
crashes during recovery, fs recover is correct when
composed with unlink, and that this entire procedure is the correct thing for atomic save. So we didn’t redo a file system, but in Argosy this should
have been completely trivial because you don’t even observe the lower levels
recovery procedure when you’re reasoning
about atomic save. So the story so far has had recovery refinement and
from recovery refinement, we’ve gotten to properties, correctness in the sense
of trace inclusion and also modularity because of
the composition theorem. But I haven’t talked about
how you’d actually establish a recovery refinement in the first
place, given just a program. This is where a Crash
Hoare Logic comes in. So Crash Hoare Logic is
the program logic from FSCQ, which lets us reason about crashes during code as well
as during recovery. So I’ll describe Crash Hoare Logic
which is mostly prior work, but then it will
connect back to Argosy. First, I’ll just give
a very brief overview of Hoare Logic in one slide. So the basis of Hoare Logic is this statement
called the Hoare triple, which it’s called a triple
because it’s three parts; a precondition, a post-condition,
and a piece of code. The interpretation
of this Hoare triple is that if you run the code in a state satisfying
P and it terminates, then the resulting state
will satisfy Q. So Crash Hoare Logic is
just an extension of Hoare Logic where instead
of writing Hoare triples, we write these crash specifications that describe what happens on crash. We have an extra post condition
which is a crash invariant, which will hold following a crash at anytime during this piece of code. So that’s just Crash Hoare law, that’s just crash specifications, but we also want to
reason about recovery. So Crash Hoare Logic
is another statement called a recovery
specification that it has, instead of just a piece of code, it also has a recovery procedure
that you intend to run. It describes what
happens if you crash in this code and you run recovery, including crashes during recovery. So here, I’ve just baked crashes during recovery into the definition, that doesn’t help us
prove this correct. So these crash
specifications are pretty easy to reason about
because they chain nicely. If you crash in
the beginning or the end, the new crash invariant
is going to just be the Hoare of the two
crash invariants. These don’t compose nicely at all. So how do we actually prove
these things correct? So for that basically in
FSCQ and again in Argosy, we prove a theorem
that lets us establish these from just crash specifications. So let me show you what that theorem
looks like diagrammatically. So first, you write
some crash spec for the code, and pre and post-condition
just chain as expected because that’s
just normal execution and recovery doesn’t even
come into the picture. Then you prove a crash
specification for recovery. The interesting thing
about the specification is it should be item potent. Meaning that crashes during recovery should imply
the precondition for recovery. Intuitively, this is what you want because if you crash during recovery, you’re going to need
to rerun recovery. Then we just need to
chain these two together. So when you crash in
the middle of the code, you’re going to go into
this recovery crash loop, and then finally when
you stop crashing, we’ll get this post wreck condition. All we really need to do
in Argosy is connected Crash Hoare Logic to
recovery refinements. We still need
an abstraction relation. Then all we actually need is to prove a particular Crash Hoare
Logic recovery specification, using the standard techniques
of Crash Hoare Logic and these will encode
recovery refinement. I’ll just show that
graphically. There’s a builder. So the pre-condition is the precondition for both of
these refinement diagrams, that there should exist
some abstract state satisfying R. R is
also a precondition. Then the post-condition is the normal execution part
of recovery refinement, while the recovery post-condition
is the crash and recovery part. I’ve de-emphasized these lower-left
parts because they’re actually baked into the definition of what this recovery
specification needs. I’ll skip this. Okay. So while this Crash Hoare
Logic part of our Argosy is fairly standard, it
does complete the picture. It actually lets us
prove programs correct. In particular, Crash Hoare Logic is what lets us reason
about process during recovery for logging and
for replication separately. So we didn’t just
write a paper on this, we also implemented and verified all of this in the
Coq Proof Assistant. Our framework is fairly small, as far as our systems go. It’s only about 3,200 lines of
code for the generic theory. Our example on top of
that is 4,000 lines. By contrast, FSCQ was
like 40,000 lines of code of which 20,000
was the framework. This example does extract
to Haskell and run. Our code is open source. Let
me briefly talk a little bit about what some directions for future work that we’re
actively working on now. So the first is, the big limitation of Argosy is that it’s sequential, it only supports
sequential storage systems. So we’re now working on
a story for crashes, crash safety combined
with concurrency. The way we’re doing that is
instead of extending Argosy, we’re instead taking the ideas
from Argosy and extending in existing concurrency
infrastructure. It’s based on Concurrent
Separation Logic, a very old idea, but a modern instantiation of it
in the framework called Iris. Iris has been used successfully for a lot of concurrency projects with a long line of [inaudible] papers
establishing the framework, and then many more papers
with applications. But we’re the first to come up with a crash safety
techniques for Iris. We also want a better story
for running code. So currently, in many of our systems, I have extracted the Haskell and this gets us a lot of
performance problems. I don’t want to be
too down on Haskell, but we lack control in this workflow, because we can’t really control what the extracted code looks like and get good performance out of Haskell, especially when trying to get
high performance for concurrency. One of the things you
need to do, for example, is reduce memory allocations
which is quite hard to do. Our new plan is instead to import Go into Coq, which I’m
very excited about. This is a very fun thing to do. Another thing that we’re doing is we are subjecting students
to this- oh, yeah.>>[inaudible]>>Sure. I was like half ready to just give
a separate presentation on this. But I’m happy to chat
about this offline. So it’s a completely
trusted pipeline, where I basically wrote a small translator to translate
a small subset of Go into Coq, instead of going the
other way around. So we translate Go programs
into some model in Coq.>>[inaudible] Go? Essentially yes.So Iris is
actually fairly general. So Iris doesn’t really come into
the picture of the specification, but we have a transition
relation that describes all of the operations
that we care about from Go. So heap operations and file
system operations, because we need to
use the file system.>>[inaudible] go with some sort of a dedicated model for every
[inaudible] into the file system?>>Yeah, it’s not quite deep. We translate into our standard
representation of programs, which is a shallow embedding. In the coming fall semester, we will subject students to this, including setting it actually. So Argosy actually spun off
from course infrastructure. Now, we want to backport
the improvements on Argosy back to
that course infrastructure. For the sake of
students, we basically need to make the Crash
Hoare Logic part more usable and make sure that this whole infrastructure
is a little bit easier to use. So to summarize, I’ve
talked about Argosy, which gives us modular proofs
of layered storage systems. It appeals to Kleene algebra, both to model crashes and recovery, as well as for principles to reason about the composition
theorem as well as correctness. We’ve introduced recovery refinement, which is the approach that we use to get all of these nice features and that it gives us the right conditions to establish
trace inclusion for recovery. Finally, I’ve talked about
how recovery refinement also gives us modular proofs, and lets us compose
implementations together. Thanks, and I’m happy to take more questions. Yeah.>>When you talked about
the compositionality of using this Kleene algebra, you ended up at this big expression, and one of the things
that cycles like the aligning system runs and crashes.>>Yeah, it can find it.>>Division runs and crashes.>>Yes, this.>>Then you said well, that thing
behaves like something simpler.>>Yeah.>>So how hard is it to do that through and why do you
have to do that through?>>I’ve instantiated
this example with rep and log, with those two systems, but we did a single
composition theorem for two generic layers without ever looking at what the layers
actually do. Internally.>>Okay. So that behaves like inclusion merit is
works for everything.>>Yes. That’s something that
you would have to prove. That’s something that you get from the fact that the system
is a recovery refinement. That’s the only assumption
that we make.>>Can you give us
some more info for why that refinement is [inaudible]. The replication part [inaudible]>>Yeah. It’s because
there’s a small pond, which is not great of this example. The thing that’s
being refined is this is the logging recovery procedure, compiled down to run on top of the two disk interfaces using
the replication system. So before I said that code input, followed by crashes during recovery, refines the high-level code
at the upper layer. What the refinement is
establishing is that the log recovery procedure
at the replication layer, behaves like a log of recovery
procedure at the disk layer. Yeah.>>You show this philosophical code
that Frank had to write it.>>[inaudible]>>In Argosy, would they have
to write it differently? Would they have to use
FS recover at all?>>No, you would not. Sorry.
let me find the slide. So you would express
atomic copy either as- so you’d probably express this atomic copy example as a layer that has
one operation, atomic save. Maybe you’d also re-export
all the file system operations. Then your recovery procedure would not be able to
mention FS recover. They would be added in
the process of combining with the files [inaudible]. Now, also another thing
you’d get for free is that unlink would become
an atomic operation, whereas this unlink call is actually a call to a bunch
of disk operations, because in FSCQ, like everything is written against the disk
API. There is no layer.>>One more question.>>Yeah.>>In practice, what
invariants you can use as the invariant of
a recovery procedure?>>Oh, yeah, like the crash
invariant. This invariant under root.>>If you have this invariant to get [inaudible] and
you require invariant. I’m wondering, in practice,
what invariants you have.>>Yeah. So the replication system,
it’s fairly simple. There’s sort of
a standard trick which is if I were to finish recovery now, it would be correct, but
like a standard invariant. That’s a standard way to think
about what the invariant says. That’s like an always true statement. In the case of
the replication system, because this is
a sequential system and everything is stable right away, if you crash, you can only ever crash in the middle of one write operation. So, the invariant is
either the disk are fully synchronized or they are
desynchronized at one point. Recovery, basically, finds
that point in a loop. In the logging example, it’s a little bit more complicated. But similarly, if you crash, we can basically ignore
whatever progress was made so far. Because actually, in reality,
we’re going to redo it. That’s because we do physical
logging, so it’s really simple. There’s also like undo-redo logging, which is more complicated
where if you crash, you actually need to undo
what you did during recovery. But I think similarly, you’d use this idea that if I
were to finish now, I’d be okay, and you just
proved that that’s true for even the things that you’ve
done right now, so done so far.>>Does this exclude things like
arbitrary corruption on my disk? From what you were saying, you differ at most one location that you’re
trying to find and fix up, right?>>Yeah. So I have another->>Totally trashed by drive
by sort of a bunch of, I don’t know, magnets and
cosmic rays, or whatever. Right? Then this may
not be true. Okay.>>Yeah. So in
the replication system, we have a model of
how it’s unreliable. The model says that at anytime, one of the disks might
go away entirely. Then from then on, reads will fail, when you try to read from that desk. So if a disk fails, then actually, the disks are now synchronized,
and so we can return. We have to make this assumption
that only one disk can fail, because if both disks failed,
then clearly, you’re screwed. I think this is actually
like a big difficulty in proving replication systems correct is they’re
not unconditionally correct. You have to rely on some assumption.>>Any more questions? All right. Let’s thank Tej.

Leave a Reply

Your email address will not be published. Required fields are marked *