[MUSIC]>>All right. Thanks everybody

for coming and thanks Tej Chajed for coming

to give us talk. It’s nice that many of you

were able to come attend this. He is a systems student working

on formal verification of software at MIT advised by Frans

Kaashoek and Nickolai Zeldovich. There is still some space

on his schedule. If later this afternoon

you’d like to talk to him, come to me and I will

arrange a spot schedule. So, let’s hear from Tej.>>Thanks, Jay. So to motivate why we’re talking

about layered storage systems, let me start with a story. The story will tragically

lack modularity, but then Argosy will

come in and fix that. Story starts with

Bob, a storage system developer who’s writing

a replication system. The replication system

takes two disks that are unreliable and it exports a single

logical disk with reliability. The way it works is

as writes come in, it writes them to

both disks and that ensures that your data is safe even

if one of these disks fails. Now, this is a storage system so Bob wants it to be able to

handle a crashes at anytime, say due to a power failure. The problem is that if the system crashes in the middle

of a right operation, then it leaves the disks

in an inconsistent state. So Bob knows how to handle this. He writes a recovery

procedure that copies from the first disk to the second and that restores

the invariance of the system. Now Bob is a careful guy. So he writes a machine check proof of correctness that his code is correct. Specifically that read and

write are atomic and have the right behavior as long as you run rep recover after every crash, and because he’s using the latest

verification infrastructure, his proof handles crashes

during recovery itself. Meanwhile, Alice writes

a write-ahead logging system that implements

a transactional API on top of a single disk and she

also proves her code correct with a machine

checked proof of correctness with this

log recovery procedure. Now, you’ll get

this picture and you think surely we can run

the write-ahead log on top of the disk replication

system and we can prove that composed

implementation correct. The trouble is that

the recovery procedure to make this code correct

needs to run both of the underlying recoveries and it turns out that this is

tricky to reason about formally. So when we have a proof that each of these recovery procedures

is correct in isolation including

crashes during recovery, crashes during the composed

recovery procedure don’t directly resemble crashes during

either of the sub recoveries. So the challenge is how do we

actually reuse these proofs? In prior work, it turns

out can’t do this. So Crash Hoare Logic our own work

from a verified file system has no way to move from a proof about one recovery procedure to

a proof about another. You can only have one

recovery procedure at a time. Argosy similarly has

no meta theory for composing two recovery procedures

and sort of implicitly assumes a global recovery procedure and [inaudible] while they

do support composition, they place some restrictions

on recovery procedures to make that proof go through and it just

allows both of these systems. This is where Argosy comes in. Argosy supports modular proofs that support exactly the style

that we wanted here where two developers

independently proved two implementations correct and there’s a generic theorem that

the composition is correct including

crashes during recovery. Argosy is compatible with

existing techniques. So we re-implemented Crash

Hoare Logic for the sake of this work in order to prove the implementation is

correct individually. So the main contribution

in this work is recovery refinement which

is the approach that we use to get modular proofs

and I’ll also talk if I have time about Crash

Hoare Logic which is how we connect Argosy to Crash Hoare Logic to prove

a particular implementation correct and see the paper for details on an example that we’ve verified featuring write-ahead

logging on top of replication. The same example I’ve

been using and all of our code is backed by

machine check proofs in the Coq proof assistant. I should say that if you want to interrupt me at any time feel free, if you have questions,

don’t wait till the end. So let me go into

recovery refinement next. To explain recovery refinement,

I’ll do so in two steps. First, I’ll talk about just

some background on how to prove normal execution correctness

using refinement and then next, I’ll talk about how to extend

refinement with notions of crash and recovery to arrive

at recovery refinement. So here’s some background

on refinement before we get into

recovery refinement. In refinement, the goal

is to prove that an implementation correctly

implements some exported interface, say this single disk in

the case of replication, on top of some code interface

that the code is written against. In this case, these

two unreliable disks. The replication system implements every high level operation say a write in terms of

the low-level operations. In this case, the write implementation

just writes to both disks in turn and similarly for the other

operations in the interface. So what does it mean for this

replication system to be correct? We define correctness

based on how you would use the replication system. Specifically, you take

some piece of code and you’d write it against

the disk interface and what you want to know

is that when you run this code on top of two disks

using the replication system, it behaves the right way. It behaves the way you expect. The way we formalized that

is the standard idea of trace inclusion where

the specification’s behaviors at the disk layer should include all of the running code behaviors

at the two-disk layer. So that’s just normal execution. But how do we prove

this property correct? The way we do that is using

an abstraction relation. This whole technique is also

called forward simulation. Now, this is all still

standard stuff so background. So with a forward simulation, the developer provides

an abstraction relation R which has two purposes. First, it relates the states

of the specification to the states of the code and also it enforces any

invariants over the code. So in this case,

the replication system maintains this invariant that

the two disks are equal. Then the developer proves that

for every execution of the code, there’s a corresponding execution of the specification that preserves the abstraction relation

and that lets you run more pieces of code

after this operation. To get the full trace

inclusion property, the developer just

needs to prove this for every operation in the system. So that’s all in the background

I have on refinement. Now we’ll extend all these

notions for the crashes and recovery which is

the new part of this paper. So here is the picture that

we had for an implementation. Now, we’ll extend it to

the crashes and recovery. Recovery just goes into

the implementation, but I’ve also added these power

symbols to both interfaces. These power symbols represent the semantics of what

happens on crash. For example, volatile

state is cleared. Your memories wiped, anything that’s in say buffered in

your disk goes away. So that’s just handles

modeling crashes and recovery. What about the correctness

for recovery? We still use trace inclusion, but we’ll extend it with notions

of crashes and recovery. To do that I’ll introduce

our crash semantics to describe the specification and then the recovery semantics to describe

what code we actually run. The crash semantics that we use is intuitively its atomicity and

the way we formalized that is you can crash before or after

between any pair of atomic operations and

standard assumptions for what’s atomic are things like

four kilobyte writes or file system operations

you might assume are atomic with respect to crashes. I’ll represent a crash in

the middle of a piece of code between any atomic operations with this jagged edged

followed by a power symbol. So that handles the specification.

What about the code? We’re definitely

going to run recovery after a crash in the middle

of the implementation. We know that the code

needs to do that. But we also want a model

crashes during recovery. So we insert zero or more iterations

of crashes during recovery first and that’s all we need to specify what trace

inclusion with recovery is. Its normal execution plus correctness

under crashes and recovery. How do we prove this property? We’re still going to use an idea

that’s like forward simulation. Well, let me introduce how we

come to the right condition. So the first observation

we make is that if you crash in the middle of

some long sequence of code, you must crash in the middle

of some operation. In this example, execution

that crash occurs in the middle of op2 and

prior to that crash, you can just use normal execution

correctness to establish that the abstraction relation

is preserved up to that point. So we can sort of ignore that because normal execution correctness,

we already handled that. What’s the specification

for a crash during op2? We said it’s atomicity. So we simply use either all of op two occurred and then

the system crashed or none of op2 ran and the system crashed and that condition for all the operations is all we

need for recovery refinement. So we still have normal

execution correctness for non crash execution and then

we also have this new condition that relates recovery to crashes and proving this for every operation

in the system is sufficient to prove the strong trace

inclusion property for running arbitrary pieces of code against

the top-level interface. So what I’ve talked about so far

only handles a single layer, but this talk was

supposed to be about layered storage systems.

Sorry, go ahead.>>Why do this specification after having inclusion crashing at all, can crashes be invisible

with specifications.>>Yes. Good question. So many

systems would try to hide crashes, but we also want to

reason about systems like file systems that cannot hide

crashes from the caller. Just fundamentally if you’re running code on top of a file

system and you crash, something will be lost or at least we’re going to lose

track of what you were doing. So we want to be able to give

these weaker specifications. If you go off and you write

some large distributed system, you’ll probably try to hide crashes. Basically, you need to

replicate for that to work.>>You said the relation about R and the way if I wanted to

reason about the low-level system, I would like to reason about it

through the abstraction with either System and R is

my interface to do that?>>So our specification

does not have R. So the specification should be that. So there’s a distinction between,

R is talking about states. The states are just helpful to specify what the behavior

of a piece of code is, but once you get to

the specification, you only need to look

at the interface and the externally visible behaviors which in this in our system

are returned values. But a natural extension

is things like the network packets

that you send out. Then you don’t need to

worry about how R works.>>You were talking about

a high-level trace.>>Yeah.>>I want to conclude

something about a local trace.>>Yes.>>You have to somehow transport

my property across R, right?>>So I think your property

should not be about states. It should only be about return values and if there is something

else that you care about, you should externalize it as

I/O behavior of some sort. The intermediate states

should be irrelevant. That’s the intention of this system. I didn’t talk about

this, but you do need to establish R initially as well. It’s an inductive invariant

and you’ll need a base case. It’s just generally not interesting. But if you don’t do that then you would need to look closely at R to make sure that it’s

actually satisfiable. Okay. So next I’ll talk about the Composition theorem that lets us actually take two

recovery refinements and compose them together for modularity. Before I do that, let me

talk about Kleene algebra, which is the approach

that we actually use to prove the composition theorem. So I’ve been using these

expressions informally in the talk, this alternation symbol and the star. It turns out that these

are exactly the operators of Kleene algebra, which is a formalism that

generalizes regular expressions. So you can use all, basically

all of your intuition from regular expressions to think

about these operators. We don’t just have these informally, we give them a formal meaning

in terms of matching transitions of the system. So sequencing corresponds to

sequencing of transitions, this alternation symbol is non-deterministic choice

between two possibilities, and the star is

zero or more iterations. If you look, you’ll see that these

are exactly the operators that we needed to define the semantics

of crafters and recovery. So Kleene algebra is quite closely, it seems like a good formulas to

at least model what’s going on. But it’s actually also useful

for reasoning principles. Kleene algebra has a rich theory for when two expressions are equal. I’ll talk about that theory in a bit when I go into

the composition terms proof. So let me just formalize what

the modularity that we get is. Now that we have recovery refinement, I can more precisely state

what composition is. If we have two recovery refinements

that are stacked like this, then all you’re going to see

is a theorem that says that the composed implementation is also a recovery refinement

and therefore, has this trace inclusion property, where that composition needs to

run both recovery procedures. So before, we have

this challenge of how do we get a proof about the composed recovery from

both individual recoveries. Now, I can formalize

a little bit more how we might go about

doing this reasoning. Before I do that, let me just

abbreviate these two with rep for the replication system

and log for the logging system, just to keep the slides

a little less cluttered. So now we can formalize what we

mean at least by under crashes. It’s zero or more iterations

of crashes during recovery, before recovery finally finishes. Here’s an expression that

represents crashes during the composed recovery and just

to break this down a little bit, it starts with either we can crash during the replication system itself or the replication system might fully recover but then re-crash

during the logging recovery. There’s some number of iterations of crashes during

recovery and finally, the system stops crashing and

the whole composed recovery runs. So this doesn’t seem any easier, the top two expressions don’t

show up in the bottom expression, but Kleene algebra can fix that. So Kleene algebra has, as I said, it has an equational

theory that lets us rewrite these expressions

to equivalent ones. So the first thing we apply it’s

a theorem called de-nesting, which will let us get rid of

this alternation altogether. Next, we apply theorem

called sliding. This might not seem simpler

but I’ll go through why this bottom expression

is easier to reason about. Kleene algebra has told

us that it’s exactly equivalent to the

original in terms of, it matches exactly

the same transitions. So after this rewrite, both of

these proofs start to apply. Before the parentheses,

the replication proofs establishes that the

recovery procedure restores the invariant

of that system. Then within the parentheses, the trace inclusion property

for the replication system says this behaves like a crash during the high level

logging recovery code. Where from the perspective

of the replication system, log recovery is just another program

that you wanted to run. That’s all we need from

the replication proof. Then we can take what we now know and plug it into

the write-ahead logging proof, which shows that the

logging systems recovery correctly restores its invariants. The upshot is that at the end

of this composed recovery, despite crashes during recovery, both systems invariants are restored. So I’ve talked about this in terms of informally restoring invariants, but the paper and especially the code formalize this more

precisely as recovery refinement. Let me just give a small flavor

of how not only is Kleene algebra useful for equational reasoning, just rewriting expressions

to equivalent ones, it also actually lets us

do refinement reasoning. So this little refinement diagram, just like the standard one that

we’ve been using in this talk, you can encode this as

an expression in Kleene algebra, as a subset relationship between

two possible transitions. There’s just one thing slightly

weird going on in this expression, which is that we run R

as if it were a program. Much like a program, R is also

a relation between states. Then that subset is just another operator in

Kleene algebra that say one regular expression matches the subset of one another

regular expression matches. Kleene algebra also has

a theory for reasoning about these subset relationships. So that was a lot of

abstract description of composition. Let me just go back and motivate why you might want this modularity, with a real example from our work. So here’s an anecdote about

where modularity failed us, in some prior work that we did. So we wrote this verified

file system called FSCQ and in one of those papers, we wanted to prove

some application correct on top of the specification and that job

fell to Frans, my advisor. So Frans wanted to prove

this little snippet of code correct. It’s not super

important what it does, all it’s trying to do is

atomically save a file, by first going through a temp file. He wanted to prove

that this is atomic. To make it atomic, we need to run this application specific

recovery procedure, that cleans up

the temp file on crash. It turns out that this code

took a 1,500 line proof because of this little fs recover that’s hiding in the application

recovery procedure. So as I said, password logic

has no way to move from the proof about one

recovery procedure, in this case the whole

system is proven with fs recover as the global

recovery procedure. We now we need to

move to a proof that has this recovery procedure, with one extra line of code. Proving that this is correct, basically required in

lining the entire proof, redoing the entire proof that fs recover handles

crashes during recovery, fs recover is correct when

composed with unlink, and that this entire procedure is the correct thing for atomic save. So we didn’t redo a file system, but in Argosy this should

have been completely trivial because you don’t even observe the lower levels

recovery procedure when you’re reasoning

about atomic save. So the story so far has had recovery refinement and

from recovery refinement, we’ve gotten to properties, correctness in the sense

of trace inclusion and also modularity because of

the composition theorem. But I haven’t talked about

how you’d actually establish a recovery refinement in the first

place, given just a program. This is where a Crash

Hoare Logic comes in. So Crash Hoare Logic is

the program logic from FSCQ, which lets us reason about crashes during code as well

as during recovery. So I’ll describe Crash Hoare Logic

which is mostly prior work, but then it will

connect back to Argosy. First, I’ll just give

a very brief overview of Hoare Logic in one slide. So the basis of Hoare Logic is this statement

called the Hoare triple, which it’s called a triple

because it’s three parts; a precondition, a post-condition,

and a piece of code. The interpretation

of this Hoare triple is that if you run the code in a state satisfying

P and it terminates, then the resulting state

will satisfy Q. So Crash Hoare Logic is

just an extension of Hoare Logic where instead

of writing Hoare triples, we write these crash specifications that describe what happens on crash. We have an extra post condition

which is a crash invariant, which will hold following a crash at anytime during this piece of code. So that’s just Crash Hoare law, that’s just crash specifications, but we also want to

reason about recovery. So Crash Hoare Logic

is another statement called a recovery

specification that it has, instead of just a piece of code, it also has a recovery procedure

that you intend to run. It describes what

happens if you crash in this code and you run recovery, including crashes during recovery. So here, I’ve just baked crashes during recovery into the definition, that doesn’t help us

prove this correct. So these crash

specifications are pretty easy to reason about

because they chain nicely. If you crash in

the beginning or the end, the new crash invariant

is going to just be the Hoare of the two

crash invariants. These don’t compose nicely at all. So how do we actually prove

these things correct? So for that basically in

FSCQ and again in Argosy, we prove a theorem

that lets us establish these from just crash specifications. So let me show you what that theorem

looks like diagrammatically. So first, you write

some crash spec for the code, and pre and post-condition

just chain as expected because that’s

just normal execution and recovery doesn’t even

come into the picture. Then you prove a crash

specification for recovery. The interesting thing

about the specification is it should be item potent. Meaning that crashes during recovery should imply

the precondition for recovery. Intuitively, this is what you want because if you crash during recovery, you’re going to need

to rerun recovery. Then we just need to

chain these two together. So when you crash in

the middle of the code, you’re going to go into

this recovery crash loop, and then finally when

you stop crashing, we’ll get this post wreck condition. All we really need to do

in Argosy is connected Crash Hoare Logic to

recovery refinements. We still need

an abstraction relation. Then all we actually need is to prove a particular Crash Hoare

Logic recovery specification, using the standard techniques

of Crash Hoare Logic and these will encode

recovery refinement. I’ll just show that

graphically. There’s a builder. So the pre-condition is the precondition for both of

these refinement diagrams, that there should exist

some abstract state satisfying R. R is

also a precondition. Then the post-condition is the normal execution part

of recovery refinement, while the recovery post-condition

is the crash and recovery part. I’ve de-emphasized these lower-left

parts because they’re actually baked into the definition of what this recovery

specification needs. I’ll skip this. Okay. So while this Crash Hoare

Logic part of our Argosy is fairly standard, it

does complete the picture. It actually lets us

prove programs correct. In particular, Crash Hoare Logic is what lets us reason

about process during recovery for logging and

for replication separately. So we didn’t just

write a paper on this, we also implemented and verified all of this in the

Coq Proof Assistant. Our framework is fairly small, as far as our systems go. It’s only about 3,200 lines of

code for the generic theory. Our example on top of

that is 4,000 lines. By contrast, FSCQ was

like 40,000 lines of code of which 20,000

was the framework. This example does extract

to Haskell and run. Our code is open source. Let

me briefly talk a little bit about what some directions for future work that we’re

actively working on now. So the first is, the big limitation of Argosy is that it’s sequential, it only supports

sequential storage systems. So we’re now working on

a story for crashes, crash safety combined

with concurrency. The way we’re doing that is

instead of extending Argosy, we’re instead taking the ideas

from Argosy and extending in existing concurrency

infrastructure. It’s based on Concurrent

Separation Logic, a very old idea, but a modern instantiation of it

in the framework called Iris. Iris has been used successfully for a lot of concurrency projects with a long line of [inaudible] papers

establishing the framework, and then many more papers

with applications. But we’re the first to come up with a crash safety

techniques for Iris. We also want a better story

for running code. So currently, in many of our systems, I have extracted the Haskell and this gets us a lot of

performance problems. I don’t want to be

too down on Haskell, but we lack control in this workflow, because we can’t really control what the extracted code looks like and get good performance out of Haskell, especially when trying to get

high performance for concurrency. One of the things you

need to do, for example, is reduce memory allocations

which is quite hard to do. Our new plan is instead to import Go into Coq, which I’m

very excited about. This is a very fun thing to do. Another thing that we’re doing is we are subjecting students

to this- oh, yeah.>>[inaudible]>>Sure. I was like half ready to just give

a separate presentation on this. But I’m happy to chat

about this offline. So it’s a completely

trusted pipeline, where I basically wrote a small translator to translate

a small subset of Go into Coq, instead of going the

other way around. So we translate Go programs

into some model in Coq.>>[inaudible] Go? Essentially yes.So Iris is

actually fairly general. So Iris doesn’t really come into

the picture of the specification, but we have a transition

relation that describes all of the operations

that we care about from Go. So heap operations and file

system operations, because we need to

use the file system.>>[inaudible] go with some sort of a dedicated model for every

[inaudible] into the file system?>>Yeah, it’s not quite deep. We translate into our standard

representation of programs, which is a shallow embedding. In the coming fall semester, we will subject students to this, including setting it actually. So Argosy actually spun off

from course infrastructure. Now, we want to backport

the improvements on Argosy back to

that course infrastructure. For the sake of

students, we basically need to make the Crash

Hoare Logic part more usable and make sure that this whole infrastructure

is a little bit easier to use. So to summarize, I’ve

talked about Argosy, which gives us modular proofs

of layered storage systems. It appeals to Kleene algebra, both to model crashes and recovery, as well as for principles to reason about the composition

theorem as well as correctness. We’ve introduced recovery refinement, which is the approach that we use to get all of these nice features and that it gives us the right conditions to establish

trace inclusion for recovery. Finally, I’ve talked about

how recovery refinement also gives us modular proofs, and lets us compose

implementations together. Thanks, and I’m happy to take more questions. Yeah.>>When you talked about

the compositionality of using this Kleene algebra, you ended up at this big expression, and one of the things

that cycles like the aligning system runs and crashes.>>Yeah, it can find it.>>Division runs and crashes.>>Yes, this.>>Then you said well, that thing

behaves like something simpler.>>Yeah.>>So how hard is it to do that through and why do you

have to do that through?>>I’ve instantiated

this example with rep and log, with those two systems, but we did a single

composition theorem for two generic layers without ever looking at what the layers

actually do. Internally.>>Okay. So that behaves like inclusion merit is

works for everything.>>Yes. That’s something that

you would have to prove. That’s something that you get from the fact that the system

is a recovery refinement. That’s the only assumption

that we make.>>Can you give us

some more info for why that refinement is [inaudible]. The replication part [inaudible]>>Yeah. It’s because

there’s a small pond, which is not great of this example. The thing that’s

being refined is this is the logging recovery procedure, compiled down to run on top of the two disk interfaces using

the replication system. So before I said that code input, followed by crashes during recovery, refines the high-level code

at the upper layer. What the refinement is

establishing is that the log recovery procedure

at the replication layer, behaves like a log of recovery

procedure at the disk layer. Yeah.>>You show this philosophical code

that Frank had to write it.>>[inaudible]>>In Argosy, would they have

to write it differently? Would they have to use

FS recover at all?>>No, you would not. Sorry.

let me find the slide. So you would express

atomic copy either as- so you’d probably express this atomic copy example as a layer that has

one operation, atomic save. Maybe you’d also re-export

all the file system operations. Then your recovery procedure would not be able to

mention FS recover. They would be added in

the process of combining with the files [inaudible]. Now, also another thing

you’d get for free is that unlink would become

an atomic operation, whereas this unlink call is actually a call to a bunch

of disk operations, because in FSCQ, like everything is written against the disk

API. There is no layer.>>One more question.>>Yeah.>>In practice, what

invariants you can use as the invariant of

a recovery procedure?>>Oh, yeah, like the crash

invariant. This invariant under root.>>If you have this invariant to get [inaudible] and

you require invariant. I’m wondering, in practice,

what invariants you have.>>Yeah. So the replication system,

it’s fairly simple. There’s sort of

a standard trick which is if I were to finish recovery now, it would be correct, but

like a standard invariant. That’s a standard way to think

about what the invariant says. That’s like an always true statement. In the case of

the replication system, because this is

a sequential system and everything is stable right away, if you crash, you can only ever crash in the middle of one write operation. So, the invariant is

either the disk are fully synchronized or they are

desynchronized at one point. Recovery, basically, finds

that point in a loop. In the logging example, it’s a little bit more complicated. But similarly, if you crash, we can basically ignore

whatever progress was made so far. Because actually, in reality,

we’re going to redo it. That’s because we do physical

logging, so it’s really simple. There’s also like undo-redo logging, which is more complicated

where if you crash, you actually need to undo

what you did during recovery. But I think similarly, you’d use this idea that if I

were to finish now, I’d be okay, and you just

proved that that’s true for even the things that you’ve

done right now, so done so far.>>Does this exclude things like

arbitrary corruption on my disk? From what you were saying, you differ at most one location that you’re

trying to find and fix up, right?>>Yeah. So I have another->>Totally trashed by drive

by sort of a bunch of, I don’t know, magnets and

cosmic rays, or whatever. Right? Then this may

not be true. Okay.>>Yeah. So in

the replication system, we have a model of

how it’s unreliable. The model says that at anytime, one of the disks might

go away entirely. Then from then on, reads will fail, when you try to read from that desk. So if a disk fails, then actually, the disks are now synchronized,

and so we can return. We have to make this assumption

that only one disk can fail, because if both disks failed,

then clearly, you’re screwed. I think this is actually

like a big difficulty in proving replication systems correct is they’re

not unconditionally correct. You have to rely on some assumption.>>Any more questions? All right. Let’s thank Tej.