– Good afternoon, thanks for coming. We have quite a treat this afternoon. We have Jacob Steinhardt

from Stanford University. He was advised by Percy Liang and got his undergrad from MIT. His research is very

timely, as machine learning has gotten very, very

popular and applied widely. It’s also been abused widely. Jacob attempts to develop methods that actually defend against attacks against learned models

in people that actually maliciously give bad training

data and things like this. He also believes machine learning should not just be an

optimization procedure but actually give results

that are intuitive and predictable, not just

the output of a black box. A lot of his work deals with, at the intersection of

optimization, machine learning, assistive learning theory, and

inference, and computation. He is the recipient of a Hertz fellowship and an NSF fellowship. We are really excited to have him, so please join me in welcoming him. (audience applauding) – Thanks a lot, Kevin. Before I start, I just want to

acknowledge my collaborators, Moses, Ilias, Gautum,

Daniel, Pravesh, Pang Wei, Jerry, Percy, Aditi, Alistair, and Greg. They’ve all been a huge help

throughout all of this process and a lot of this work wouldn’t have been possible without them. So today I’m going to be

talking to you about provably secure machine learning,

but before I dive into that, I want to tell you two stories. The first story is about Twitter. On April 23rd, 2013, Syrian hackers compromised the Associated

Press’ Twitter feed and they put out the following tweet. It reads breaking, two explosions in the White House and

Barack Obama is injured. And afterward, this is what

happened to the stock market. There was about a $136

billion drop in three minutes, although it was fortunately reversed. But part of what caused

this was analytics software and other statistical algorithms

being fooled by this tweet. This isn’t the only time that

Twitter has been attacked. Perhaps more famously,

during recent elections such as the US and French elections, bots have taken to Twitter

to try to influence public opinion and one

of the things they do is they put out fake tweets

to try to fool Twitter’s statistical rating

algorithms to show topics that people wouldn’t have seen otherwise. And an important point

in both of these cases is that it’s not the

traditional components of software that are being attacked here, it’s actually the underlying

statistical algorithms. This is the first story,

and the second story is about computer vision over

the past six or seven years, largely driven by

advances in deep learning, open domain audit recognition

has improved substantially to the extent that there’s

even papers stating that they’ve exceeded

human level performance. However, it turns out that the accuracy actually drops to zero in

the presence of an attacker and in fact, has been

zero this entire time. (audience laughing)

Yes. To understand what’s happening here, we can look at a phenomenon

discovered in 2014 by Szegedy et. al. called

adversarial images. What this phenomenon is, is they can take essentially any image, in this

case, an image of a panda, add some small imperceptible noise, so small that these images render the same on this projector screen,

and cause state-of-the-art computer vision systems

such as neural networks to misclassify them,

essentially arbitrarily, in this case as a gibbon

with high confidence. This goes beyond pandas and gibbons. It’s been shown that

you can take stop signs, put stickers on them

that look like graffiti and cause them to be

classified as yield signs. You can create

imperceptible audio commands that are, or sorry,

imperceptible audio noise that’s interpreted as a

command by your smart phone and hackers can modify malware, non-functional parts of the code, to evade detection by malware systems. What’s going on here? It’s not that the people

designing these systems don’t know what they’re doing, it’s due to a fairly fundamental issue with the way that most machine

learning is done today, which is that most

machine learning assumes, either implicitly or explicitly, that the training distribution,

the distribution of data that was collected to build the system, is similar in some way

to the test distribution, the data that the system is deployed on, and an attacker is free to play something that looks very different

and therefore violates the assumption that the

system was built on, which creates an

exploitable vulnerability. Beyond these invasion

attacks at test time, another source of

vulnerability for ML systems, is in the data collection process itself via a phenomenon called data poisoning. Here, the idea is that

often, if data is collected from users, for instance

on a web based system, simply by creating fake user accounts, an attacker can inject

fake data into the system. This is perhaps most familiar

in the case of fake reviews, although it happens

more generally any time one is building

a recommender system, even more generally, when everyone is trying to

collect data in a crowd source manner

and perhaps more subtly, in the case of malware and spam detection because many of

these classifiers are built based on historical. And since this is

malware and spam, this actually creates a

backdoor for attackers and there’s a line of

work showing that this backdoor can

actually be exploited to degrade performance

of the system. Let’s suppose that we

wanted to try to deal with these issues. One thing that is perhaps

tempting is to just design as many attacks as we

can think of in the lab and if a system works

well against every attack we can think of, we might be

tempted to label it secure. However, an attacker

has a strong incentive to find whatever attack

you didn’t think of and this leads to a sort

of arms race which is actually played

out in this literature on adversarial images. So since their discovery

in 2014, there’s been a number of defenses proposed to try to mitigate this issue, but eventually all of these

defenses have been broken. In fact, this has gone

on for quite a while. There’s been at least 100

papers published on this in the last three or four years. But at the end of the

day, it’s really the case that the attacker just keeps winning and usually within a few months

of a defense being proposed, a new attack comes out that beats it. In fact, at the most

recent ICLR conference, which is one of the main

deep learning conferences, the majority of submitted

defenses were actually broken before the peer review process

ended and this was only counting papers that

were eventually accepted. This goes beyond

adversarial images. This has also played out in

the literature on malware and more generally it

happens behind closed doors at many tech companies. This brings us to an

important takeaway, which is that if we only

rely on empirical evaluation against a known set of attacks, then we’re often going to end

up in a security arms race where the defender is usually

going to be playing catch up. The perspective we’re

going to take in this talk is that to avoid this arms race, we should do the following two things. We should first define

a formal threat model, laying out a comprehensive

family of attacks based on the capabilities

that the attacker has, and then design systems where we can prove formal security guarantees

under that threat model. Now, from the perspective of

traditional computer security, this is actually a fairly

common thing to try to do, but it hasn’t been pursued

as much in machine learning, partially because there’s

a number of difficulties. Often the systems are interacting directly with the physical world

so you end up with a very broad attack surface, and also the formal

specification is often unclear. It’s not clear how you would

formally specify the event that a self-driving car

crashes, for instance. Nevertheless, in this talk we will see that it’s possible to

achieve meaningful proofs of security under two

different threat models. To preview what these models will be, we could consider the

following schematic of a machine learning

pipeline, where for instance, if we’re building a

spam detector, we would first collect

training data. So these would be emails labeled by users as spam or non-spam. Then we would train a model to separate the legitimate emails

from the spam emails. And then finally, we

would deploy this model on new incoming test data,

in this case these two green emails here to classify

them as spam or legitimate. As we’ve already seen, there’s at least two sources of vulnerability here. The first is at the training stage where an attacker can

inject fake training data via data poisoning. And the second is at the

deployment or test stage, where an attacker can

manipulate examples arbitrarily. For instance, perhaps

changing irrelevant words in this spam email to cause it to be misclassified as legitimate. The first two parts of this

talk are going to focus on data poisoning and the final

part is going to focus on test time on these adversarial examples. These are going to be in the

traditions of robust statistics and robust optimization, respectively. I’m going to dive right in

and we’re going to start with the first part,

which is data poisoning, and in particular, trying to understand the space of optimal attacks. Again, recall that data

poisoning is where an attacker adds fake data to manipulate

the learned model. Here, we only have clean data and this is the model that we learn, but if an attacker can add fake data, they might cause us to

learn a very different model and in particular, could even cause legitimate email to be misclassified. We can formalize this via

the following threat model where we say that there’s

n points of clean data, and an adversary is allowed to add epsilon n additional poisoned points, and they can add these however they want, and in fact can even add them with knowledge of the clean data. Intuition, here, is that,

or rather the motivation, is that this parameter

epsilon controls the strength of the attacker, so it’s

implicitly making an assumption of an upper bound on, for instance, the number of fake accounts

that an adversary could create before they run out of

resources or get detected. Let’s try to understand

what happens in this model. If we aren’t careful and

perhaps do the the normal machine learning thing we would do, which is to choose a model that minimizes the overall error on all of our data. So in this case, the union of the clean and the poisoned data. Well, then what could

happen, the problem is that it actually turns

out that in some cases, even with a single poisoned data point, an attacker could move

the model arbitrarily far just by playing some extreme

outlier that completely dominates all other

considerations in the model. But of course, this is a

very easy thing to detect, so to make things less degenerate,

we’ll assume that we have some form of outlier detector

that’s going to try to detect and remove suspicious looking points and to specify this outlier detector, we’ll specify it in

terms of a feasible set. So this is going to be the set of points that don’t get removed

by the outlier detector. The reason we call it the

feasible set is it’s the places that an attacker can

play without their attack just being useless because

it’s just going to get removed. Yes? – But ordinary regularization

would avoid problems like this without any particular consideration of adversarial manipulation. – I think even if you regularized, there’s situations where a single point can move you arbitrarily far. It depends on your loss function. (student mumbling) – So I agree you could

tune the regularizer and I can maybe talk about

that a bit more later in the empirical section. It turns out that actually sometimes, if you regularize too much, you become even more vulnerable to attack

rather than less vulnerable. Cool. Let’s see, where were we? So, yes, I agree that you could

try to tune the regularizer, but in general, things could

take you arbitrarily far. We’re going to use this outlier detector to try to prevent this. A question, though, is once

we have this outlier detector, is it actually good enough? Even if we’re removing outliers, you could run into problems

where the adversary still has a fair amount of power if your outlier detector

isn’t very specific. We can actually state this a

bit more formally by saying, given a description of the clean data and a description of the feasible set, what is the worst case error

that an attacker can induce, get for a given algorithm? And we can state this even more formally via the following optimization problem, where we’re going to simulate an attacker. The attacker is trying to maximize the overall error on the clean data, where their maximizing

over all poison points that lie within the

feasible set and this error is for the model of

parameters that are minimizing the overall error on the

clean plus the poisoned data. Note, just for clarity, now

I’ve only shown the feasible set for the positive class

and there should also be another feasible set

for the negative class. But to avoid cluttering the

slides, I’m just showing one. Yes? – Classify this, because

usually detectors just try to maximize errors. Detectors try to maximize

a particular type of error. Is this somehow we could learn? – It’s not equivalent,

so you could imagine targeted attacks, as well. Where, instead of, instead of having this

overall error function, you have an error function

on a sub-population. In this, it’s going to

turn out that actually even if you’re just trying

to maximize overall error, the attacker can be quite successful. We’re starting here

because even here it’s actually quite

hard to defend. But yes, targeted attacks

would be even harder. Other questions before

I move on? Cool. We have this optimization problem and this optimization

problem actually has a name. It’s called a bilevel optimization problem to denote the fact that we have this inner second level optimization for a theta hat that’s plugged into

the outer maximization. If we wanted to compute the optimal attack which would help us to understand how vulnerable or not we were

for this outlier detector, we could just try to solve this. The problem is that bilevel optimization in general is quite challenging. And in fact, even if

you’re handed the solution and you just want to verify

that the solution is optimal, this is already NP-hard, even for linear and quadratic functions. In practice, it’s also hard. Another place you could

look is to the literature on robust statistics, where

in fact there is a tool called influence functions,

which give you some notion of how vulnerable a model is to outliers. However, formally these require you to have a negligible fraction of outliers and in practice, also

once you start having even a small constant

fraction of outliers, they no longer necessarily

tell you the right thing. Finally, you might be tempted

to simply design attacks and test empirically, but

as we’ve already seen, this is heuristic and could be

quite far from the worst case and leads to this arms race

that we’ve seen before. This brings us to our contribution, which is based on work with

Pang Wei Koh and Percy Liang where we showed how

to upper bound this bilevel problem by

a convex min-max problem, which can be solved very efficiently. And in particular, since

it’s an upper bound, it gives us certificates of

robustness on a given data set. So this gives you a proof that no attacker can induce more than

a certain error. But also it turns out that we can sort of invert this upper bound

to attain strong attacks on many data sets, as well,

and we’ll see this later. Let’s try to understand how

this min-max problem arises. Recall, our goal is to

bound the error of the worst case attack

which corresponds to this bilevel

optimization problem. We’re trying to maximize

the error on the clean data. And the basic intuition

is that if the attacker is going to do a lot of damage, then they better be

able to find some point that has large error under

the true model parameters. Because if they can’t find

any points with large error, then at least intuitively

with only a small fraction of bad data, they’re not

going to be able to move the model by very much. We can actually formalize

this via the following analytical device of a

super attacker, where what we’re going to

do is we’re going to take our attacker and we’re going

to make their life easier. So instead of only giving

them credit for error on the clean data, we’ll

give them credit for error on either of the clean

or the poisoned data. And now the key point is

that the thing that the attacker is

maximizing, and the thing that the learner is minimizing,

are the same function. So this reduces to a

min-max problem which turns out to be much easier to solve than the original

bilevel problem. The intuition behind this

min-max problem is you have this learner that’s trying

to fit all of the points, both the clean and the

poisoned points, and there’s an adversary that’s

trying to add poison points such that it’s hard to fit them at the same time as the

clean data. This gives you the following

algorithm which corresponds to doing gradient descent

on this min-max error. You’re going to start with the clean model on the clean data, this is

simulating the attacker. And note that here I’m showing

you both feasible sets now, just so I can show you

the algorithm. We’re going to first

find the feasible point that has the highest error. In that case, this is this point and then we’re going to incrementally update the model parameters

towards that point. And there we’re just going to keep repeating until convergence. So we’re going to add the next, the new worst point and

we’re going to keep going. And at the end of the day, one of two things is

going to happen. Either you’re going to end up with a model that successfully fits all of this data, both the blue data and

the red data, in which case this is going

to give us a certificate, so a proof that there’s no

attack that can work well, or you’re going to not

have a model, this model isn’t going to

fit all of the data well. And in this case,

actually these red points, that we generated along the way, are going to form a

strong attack in a way that we can actually formalize. – Yeah, Sean? – I think I missed something, so why was the previous

formulation not zero sum? Why isn’t that min-max? – [Jacob] Which, the bilevel problem, or? – Yeah. – So this is zero sum here. – [Sean] So that’s zero sum,

but that’s when you put in– – But here it’s no

longer zero sum because, we’re not assuming, we’re

not taking a learner that somehow playing the mass equilibrium, we’re taking a learner

that’s just minimizing the normal empirical error. – I missed a point, why is

this optimization problem that you’re trying to

solve here, a given and/or, substitutive by the next

one which is easier? – Oh, so it’s not equivalent

in the sense of being equal. It’s that this is an upper

bound because, oops, sorry, so this is

the sum of the errors on the clean data points. And this is the sum of the errors on the clean plus the

poisoned data points. And so at least if you have a non-negative error function, which you typically do, then that’s just going

to be strictly larger. You’ll always get an

upper bound. – Why is the learner not

trying to win the game? Right, the learner is

our learner. The learner knows we’re

playing a game, so it should just try

to fit the data. It should try to beat

the adversary. – Yeah, that’s right, so you could imagine taking other strategies

other than this one. I mean, in some sense,

this is already trying to do something

to defeat the adversary because you have this outlier detector, you have this outlier detector that is trying to detect outliers. Now you’re saying maybe we should modify the learning algorithm in some way. – Limited what the learn

can do artificially. – What would be things

that you’d want to do? You’d want to like tune the

regularization parameter or something like that? – The parameters of the model. Whatever my best function

is against the adversary, I want to tune my

parameters to optimize that. – I’m not 100% sure

what you’re suggesting, but in the next section, we will, I will present an algorithm that tries to deal with the adversary. It’s possible I’ll answer

your question then. And if not, feel free to ask again. Any other questions before I move on? So where were we? Right. We have this algorithm where

we’re either going to get the certificate or we’re

going to get this attack via these red points. And here’s where we can say formally, if we let theta star be the model that we obtain at the

end of this algorithm, and we let X star be the

corresponding red points that we generated, then we get

the following two guarantees. The first is an upper bound,

which says that the error against any attack, including

the worst case attack, is at most this quantity E star which is the average error of theta star on the clean data plus epsilon times the error against

the final attack iterate. So no attack can create

training loss larger than this. And then conversely, we have

the following lower bound, which says that this particular attack, X star actually realizes this error, at least with respect

to the super attacker that we did find. The error under the super attacker is at least E star minus some quantity that goes to zero and

then goes to infinity. Here I’ve only talked about training loss, so we should really care about test error. But you can actually bound the test error via what’s called a uniform

convergence argument. I won’t go into detail

of that in the talk, but I’d be happy to

talk about that offline or in the questions. So this is the theory, but let’s look at what this actually looks like in practice. Here’s results on the Enron spam dataset, so this is a corpus of

emails that was released into the public domain

during the Enron lawsuit. It’s about 70% legitimate

emails, 30% spam emails. It’s about 4,000 examples total. And what we did was we trained

a support vector machine, what we considered a learner that was a support vector machine, on

this dataset and for features, we used what’s called the

bag-of-words representation. This representation has one coordinate for every word in the vocabulary, and the value of that coordinate

is the number of times that word appears in the email. And then for the feasible set, what we’re going to do

is we’re going to remove, the outlier detector’s

going to remove points that are too far away from

the mean of their class and also remove points that

are invalid in the sense that they aren’t non-negative

integers in every coordinate because they weren’t

non-negative integers, they couldn’t correspond to word counts. The attacker has to play

points that look like this. Here’s the results, so first of all, here’s what we get for our certificate. This green line is our

provable upper bound on the training loss on this dataset. And this blue line is

the actual training loss that’s induced by the attack,

the corresponding attack. And this red line is what the test loss or the test error looks like. You can see that they all track

each other relatively well, but you should perhaps be unhappy because if you look at the x-axis, this only goes to about 3% poisoned data, and this y-axis is on the order of one, so the error is actually

increasing quite rapidly. And in fact this is

what we generally found, so except on very simple,

low-dimensional datasets where you could actually get robustness, in general, most datasets

are quite attackable. And you can make this even more stark, if instead of looking at the error of, under the SVM’s error metric,

you look at the actual zero-one error, so the fraction

of misclassified examples and it goes from about 3% error in the absence of an

attacker to around 14% error with only a half a percentage

point of poisoned data. And note that on this dataset, this only corresponds to

adding about 20 emails and we’re increasing the

overall error by about 10%. It’s not just that this particular

defense is a bad defense, the same attack evades many

other detection algorithms based on, say k-nearest neighbors, or PCA. And to get at Pedro’s

question from before, it’s also the case that this is robust to different hyper

parameters of the algorithm, so it still works even if you tune your regularization parameter up or down, and it also still works,

for instance, if I swap out my learning algorithm from exact

learning to SGD or Adagrad. There’s reasonably good

transferability there, as well. It actually turns out

that this is, this is a fairly fundamental

aspect of the fact that this dataset is just

very high dimensional and we’ll get to this in the next section. Just to summarize this

section, we first of all see in that once we define

a formal threat model, often security problems can

turn into optimization problems. In this case, we ended

up with this bilevel optimization problem. And then we saw how to get a certificate via analyzing the super attacker and how to get strong

attacks via exploiting the duality of this

min-max problem. However, the attacker ends

up winning in high dimensions and so this motivates the

next section, where we’re going to ask can

we design learning algorithms that are actually robust

in high dimensions? We’re still talking about data poisoning, but now we’re going to look at

this high dimensional setting and I’m also going to abstract, yes, Sean? – In these convex settings, can you just figure out the value of the game and figure out what’s achievable? Because now you’re giving an upper bound, you look at a particular

algorithm and corrupt it. You actually have written out

what the value of the game is. Like in this SVM case. Can you actually just try to? – Well, so it’s a bit, it’s a bit tricky because the game is defined

in terms of the clean data set. And of course the optimal strategy, if you know the clean dataset, is to just return the

minimizer on the clean dataset. So you have to somehow

define a distribution over clean datasets that

the learner doesn’t know. So I think even formalizing

that is a bit tricky. – One of my students has a paper where in a similar setting, he does this. I’ll give you a pointer later. – Okay, great, yeah. Cool. Let’s abstract things

away a bit. Now instead of thinking about just linear classification tasks,

we’re going to think about general learning tasks,

where we say we observe n points in some d-dimensional space. One minus epsilon fraction of these points come from some distribution

p star that we care about, and the remaining epsilon

n points are controlled by an attacker that can

place points arbitrarily and with knowledge of our algorithm. Our goral is to estimate some parameter of this distribution p start. In the previous section,

it would have been some linear separator, but more generally, we could care about the mean. We could care about the best fit line in the case of regression. We could care about a ranking in the case of a recommender systems. But for this section of the

talk, I’m going to mainly focus on the problem with mean estimation, although our results hold more generally. Here the problem is given,

end points in d-dimensions, one minus epsilon of them come from the distribution we care about. There are many in our outliers and we want to estimate the

mean of the distribution. In one dimension, this is

fairly well understood, where the empirical mean could

be far from the true mean due to outliers, and so you

might use more robust estimates such as the median or the trimmed mean. In higher dimensions, it’s tempting to generalize this strategy. For instance, in the presence of outliers, as we did in the previous section, remove points that are far from the mean and take the empirical

mean of the remainder. However, it turns out that

this and actually a large class of natural strategies actually

fail in high dimensions, in the sense that they

incur an error that grows with the square root of the dimension, which makes essentially

most results vacuous. I’ll explain this more on this slide. To understand what’s going

on, let’s imagine the case where actually the data is

a very simple distribution. The clean data is just a

Gaussian with some unknown mean, mu, and variance one in each coordinate. Now the basic problem is

that in high dimensions, all of the sampled points

are going to be far away from the true mean. This is just because of

how L two distance works in high dimensions. You have d coordinates,

you have variance one in each coordinate, and so

these are going to add up to give you an error of

the square root of d. And so an adversary, or sorry,

a distance of square root d. But now this means that an adversary, only by playing points that

are no further from the mean than any of the good

points can actually shift the empirical mean by quite a bit. And it’s even worse than that. So it’s not just that the

distance isn’t any further away. All of these points have the

exact same probability density under the true data

generating distribution as the good points, and

so there’s no strategy that can simply look at points one by one, decide whether they

look legitimate or not, and then throw them out, that can avoid this dimension dependent error. This brings us to another

important take away, which is that in high dimensions, identifying points that

look not like real data and removing them is

actually the wrong goal. An adversary only by planting points that look perfectly

normal can nevertheless shift the learn model

substantially and so instead, need to look at more subtle things such as some sort of population level notion of influence on

the learned parameters. Perhaps because of this,

even though there’s been a long standing interest

in robust estimation, it’s only quite recently

that actually progress in this high dimensional

setting has been made, so in fact, the first efficient algorithms even for this problem of mean estimation that avoided dimension

dependent error bounds were only in the last couple of years. And so, in this section of the talk, I’m going to discuss our contribution to this literature which is based on work with accommodations of Moses Charikar, Pravesh Kothari, and Greg

Valiant, where we showed how to develop estimators

that work for a large class of estimation and learning problems, and also could tolerate a

larger fraction of outliers than was previously thought possible, and this required actually introducing a new model of learning, which

I’ll get to in a little bit. Let me briefly sketch our approach. It’s based on a reduction from

this mean estimation problem to the problem of matrix de-noising. Now, this might seem odd

because mean estimation is a very simple problem

and matrix de-noising is a very complex problem,

but it turns out to be fairly fruitful in this

adversarial setting. The idea is that we’re

going to take our data, we’re going to stack it into a matrix. And we’re going to think of

this matrix as a noisy version of the following rank one matrix, which is just the true mean

repeated over and over again. To recover this rank one matrix, we can solve the following

minimization problem, which trades off between some

squared reconstruction error to the observed data and some

convex relaxation of the rank that tries to encourage the

solution to have low rank. The problem is, we’re

unlikely to end up with this. What we’ll actually probably

end up with is this, where we have a bunch of copies

of the mean of the bad data. And in fact, it’s going

to be even worse than this because we’re going to

end up with all sorts of linear combinations of the

good mean and the bad mean, and possibly there might be even several directions of variation and so it won’t be clear

how to read things off. But the key observation

is that if this happens, there must be a very

small fraction of points, the bad points that are responsible for these additional

directions of variation, and they must jointly exert high influence on the learned parameters x-bar. And by moving to the dual of this problem, it’s actually going to be possible to decompose that influence

into an additive form and assign blame back

to individual points. And so, in that case, we’ll

be able to remove them and so whenever the adversary

exerts high influence on the learned parameters,

we’ll be able to detect that and remove those points. And if the adversary doesn’t

exert high influence, then we’re just in this

case, and then we’re happy, so we don’t even need to do anything. That’s the flavor and I won’t go further into the technical details, but let me just tell

you what result we get for this problem of mean estimation. Our first result is the following. Let’s assume that the

fraction of outliers epsilon is at most 1/8, and that

the true distribution has variance grounded by sigma squared, then it’s possible to recover

an estimate of the mean, whose error goes at the most sigma, the bound on the variance,

times square root of the epsilon, the fraction of outliers. The key point here is that

the error doesn’t depend on d, so we’ve gotten rid of the square root of d factor from before, which

is essentially the difference between vacuous bounds

and meaningful bounds. But it turns out that you

can do even better than this. So we can get rid of this

assumption on epsilon altogether. And in fact, with no

upper bound on epsilon, we can nevertheless get an

error that goes as sigma over the square root of one minus epsilon. So note that this is

bounded even as the fraction of outliers epsilon approaches one. In particular, this means that we can actually handle even a

majority of outliers, but in this model, the

semi-verified model, which I’ll explain on the next slide. Yes? – Sorry, quick question,

is this independent of the distribution or

for multivariate Gaussian? – This doesn’t assume

Gaussian, it just assumes that the distribution

has bounded covariance. – [Pedro] That’s it. – Yes. Other questions? Yes? – You need n to be much larger

than the one minus epsilon, right, so this doesn’t cover the example that you gave before where

you had n like around 4,000 and d was also like 5,000. – Yeah, so formally you need d to, you need the number of good points to at least be on the

order of the dimension. Even if it’s a constant factor

less than the dimension, you can still get meaningful bounds. Cool. So let me explain what’s going on here, how we can handle the

majority of bad data. It should perhaps seem

like this is impossible, because for instance, say

if 2/3 of the data is bad, then you could end up with

two clusters of bad data that look identical to the good data, and you have no way of

telling which is which. But the key point is that actually, if this is what we see,

we actually do know a lot because at least if we have

this bound on the variance, and these clusters are

further apart then that bound, then we know that the

good data must mostly lie within a single one of these clusters. And so the mean of the data must be close to one of these three green points. We can actually narrow things down to just three possibilities,

and if we observe even a small number of

points, say from this cluster, and know that those points

are all legitimate data, then we can actually be quite confident that the mean is within the

cluster and get a good estimate. So this led us to introduce a model that we call the

semi-verified learning model where we assume that

there’s some small number of verified points that are known to come from the true distribution. Aside from security guarantees, this actually had an

interesting implication to another problem in learning based on unsupervised clustering. So this is the problem

of learning mixtures of distributions, where we can reduce this to the semi-verified model

by thinking of the good data, or sorry, thinking of

one mixture component as the good data and the remaining mixture component as the arbitrary data. And so once we can handle

a majority of outliers, we can actually solve clustering problems. This is actually interesting

because in general, these clustering problems

are NP-hard, but in practice, are often solved with ease

and so there’s a line of work in the learning theory

community trying to understand when we can actually cluster efficiently. There’s one result going

back to Van Paula and Wang showing that with the separation

of about the fourth root of k, so if each of

these mixtures has mean that are far apart by

at least the fourth root of the total number of mixture components, then if everything is Gaussian, we can successfully learn efficiently. But Gassianity is of

course a strong assumption, so there’s another line of work showing that if you just have bounded variance, then you can learn with a separation that goes as about the square root of k. And if you just apply our

results as a black box, you actually recover

this clustering result, but you get robustness for free. This is the first thing that made us think that something good was

happening in this setting, that we were able to reproduce state-of-the-art results with robustness. But you can actually do even better, and we pushed the techniques

further to get clustering to work under an assumption

called the Poincare assumption. This is stronger than second moments, but substantially weaker than Gaussianity. We can deal with a separation

that is essentially independent of the total

number of clusters. So this is improving on

this 15 year old results of Van Paula and Wang, both in

terms of weaker assumptions, better separation, and we get robustness. These techniques apply to other problems, as well, as I’ve mentioned. We’ve applied them, for instance, to problems in crowd sourcing,

and community detection, and item frequency estimation,

and also to general convex, and even non-convex optimization problems. Just briefly, we in fact

revisted this attack on Enron and we showed that using our techniques, we can actually do substantially better, reducing the error from

about 25% to about 8%. To summarize this section of the talk, we’ve first of all seen

that outlier removal fails in high dimensions in a

fairly fundamental way, but we can nevertheless

get robust algorithms via this notion of

population level influence. We introduced a new model that can handle a majority Of outliers and used this model to actually improve on

long standing bounds in other problems in learning. Now I want to take us to the

final section of the talk. We’ve mostly focused on

training data attacks so far, but now we’re going to go to test time. In particular to revisit this

problem of adversarial images. So remember that these images are images that look innocuous, but

that are misclassified by state-of-the-art

machine learning models such as neural networks. I think even if you don’t

care about security, you should be worried

about this phenomenon because they really show that we’re not learning the right thing. These are fairly persistent,

so despite hundreds of papers trying to avoid them, they still exist. You can even create

them in the real world. This we’ve already seen, this stop sign that can be transformed into a yield sign. This is a 3D printed turtle

that is classified as a rifle if you take a picture of

it on your smartphone. And this is a sticker that

you can place on anything to turn it into a toaster. There’s also a Kaggle competition on this, recently, hosted by Google Brain, where about 100 teams competed

in attack and defense. We competed in both and did fairly well. But what we found was that attack is really much, much easier than defense. In fact, all of these attacks

were completely black box, so they didn’t know

anything about the model that they were attacking. You couldn’t even query the model. It was really a black box. And nevertheless, if

you take the top three attacks together, at

least one of them beats essentially every single

defense in the competition in the sense of driving the accuracy from around 90% to say below 30%. This underscores this arms

race that we saw before and in this section of the talk, I’m going to discuss our

work towards trying to start to end this arms race

by building a network that is provably secure

on the MNIST dataset. To talk about provable security, I need to give you a formal threat model, so the model we’re going to use is what’s called the L

infinity attacker model. Here, an attacker is given

some clean input image and is allowed to modify every

single pixel in the image but only by some small amount, epsilon. So each of the pixels

is allowed to be changed by some small amount. This can be seen pictorially here, where this point here

is robust in the sense that every perturbation

around it has the same label, whereas this point is not

robust because there’s this region here where the label gets flipped. This picture is exactly what’s underlying this adversarial image with

the panda and the gibbon. This L infinity model is,

there’s actually a lot of problems with it in

the sense that attackers could do many things other than these L infinity perturbations. But it turns out that it’s actually quite difficult to defend,

even in this model. In fact, this panda and gibbon example, and this entire arms race

that I showed before, and the Kaggle competition, were all under this L infinity model. We thought that this was a good test bed for trying to understand how

formal verification methods could work for these neural networks. In particular, our goal is going to be to train a classifier

that is verifiably robust under this L infinity threat model. You should perhaps think that this problem would be quite hard because

even verifying facts about traditional software

is quite difficult and their things are actually designed with some purpose in mind, whereas for machine learning systems,

it’s just this mess of nodes, and edges, and weights that

are learned organically from data and often

have no explicit design. However, we’re going to be able to do the following two steps, which

is actually going to cause learning and verification to

work together hand in hand. The basic idea is we’re

first going to formulate a verification strategy,

which is actually going to be essentially vacuous on

arbitrary neural networks. If you just take a network

with random weights, the bounds are going to

be completely meaningless. However, there’s going

to be a rich subfamily of neural networks where

the bounds are actually good and where the verification strategy gives you a relatively tight analysis. We’ll just be able to bias learning towards these verifiable networks. So even though there’s other networks where we don’t do well, we can

just avoid them and find ones where the verification

strategy actually works. This leads to another important take away, which is that learning,

rather than actually fighting against verification and

making everything harder, actually has the potential

to make verification easier. Let’s try to understand

this, so recall that our goal is to train a classifier

that’s going to be robust in some neighborhood of the data. This is actually a fairly

classical thing to want to do and falls within the per

view of robust optimization, where, what you would typically do is you would take whatever

error function you care about and define a robust error function which is just the maximum

of the error function over all permissible perturbations. In this case, all the perturbations in the L infinity attacker model. The problem, however, is

that this maximization is going to be quite complex here, because this is maximizing over the inputs to a neural network. It’s highly non-convex, and

it’s very high dimensional, so we’re not going to be able to compute this maximum exactly. What might we do to try to deal with this? The first thing to realize is

that this decision boundary from the neural net isn’t

just some discrete boundary. It’s actually the zero level set of some continuous scoring function that assigns a score to each class. So we can look at the

gradient of that function. We might want to conclude

that if the distance to the boundary is very large compared to the magnitude of the gradient, then we would actually be

robust and not attackable. The problem, of course,

is that there could be some other direction that is perhaps even essentially

orthogonal to the gradient, in which we’re much closer to the boundary due to curvature in this scoring surface. And in fact, this is

what happens in practice. If you just try to train a network to have small gradient relative

to the decision boundary, then you’ll end up with

networks where the gradient is essentially zero at

every single training point. In fact, the property generalizes, so it’s also zero at

every single test point. But there’s points nearby

every training point and nearby every test point

that have very high gradient, and so you’re actually not robust and you can’t generate attacks. This is called the gradient

masking phenomenon, which has been observed

by a number of authors. We need to do something more difficult, we need to somehow get a uniform bound on the gradient over this

entire L infinity ball. To do this, we actually

need to open the hood on the network and look inside. I’ll describe the strategy that works for two layer of L unit networks, which relies on the following observation that the gradient only depends

on which units are active. Don’t worry if this doesn’t

make complete sense, but the basic idea is that

there is kind of these three layers and the gradient depends only on the sum of all

pats from the input layer to the output layer through

some subset of active nodes. This turns into a common

tutorial optimization problem which actually ends up being an instance of the weighted maximum cut problem. While this is NP-hard, you can bound it to semidefinite program and relaxations. If you push this bound on the gradient back through everything,

you get an upper bound on this robust error function. If you minimize this, you get a network that is both robust and verifiably robust. For those of you who

are into optimization, if you just did this,

you’d get an algorithm that’s very slow. But you can, there’s a version of this based on dual gradient

descent that ends up being still slower than normal

training, but reasonably doable. Let’s look at the actual, yes? – For this to work, you need

the relatively small fraction of the hidden units to be

active for any given sample. – No. We’re actually taking worst case overall possible sets of hidden units. – Then how are you exploiting the fact that only some units are active? – We’re not exploiting the fact that only some units are active, we’re saying that the gradient,

basically the point is if I condition on which units are active, then my function’s actually

just linear for a value network. Essentially, the way we get an upper bound is we’re de-coupling

over these two choices. We’re saying, conservatively, we’ll assume that the adversary could

put us in any of these possible linear regions,

and we want the gradient in all of these regions to look small. – And how does this,

okay, how does this vary with the number of such linear regions? – You’re saying, how does the

efficacy of the bound vary with the number of linear regions? – Exactly. – I don’t know if I can

say anything in theory, but at least empirically,

it seems like you could, well, first of all, there’s

lots of zero crossings. You get about 50% active and 50% inactive, so about half of the units are active. It didn’t seem like increasing

the number of hidden units substantially degraded

the quality of the bound. That’s just an empirical statement, on MNIST, yes, that’s right. Let me show you our results. So as Pedro said, this

is on the MNIST dataset. This is a dataset where you’re

trying to classify things into 10 different handwritten digits. Oh, you look like you have

a question, or, oh, okay. We trained a two layer neural

network with 500 hidden units. We considered two methods of training. The first is normal training,

so in machine learning, this is actually what would be

called adversarial training. We take some known attack strategy and we optimize our network to do well against that specific attack. Then we also consider our

method where we’re minimizing an upper bound on the worst case attack. Here’s what happens, so first

of all, this blue dashed line is the accuracy of the

normal training method against the attack that

it was optimized for. In some sense, this is

an oracle that knows what attack it’s going

to have to deal with. You can see that it in

fact does quite well, but we can’t prove anything about it. So it’s possible that

this method is robust, but the best we can show is that in fact, it could be extremely non-robust. If we instead minimize our upper bound, which recall, doesn’t know anything about this specific attack, it actually does almost as well as the oracle method. And, we can achieve reasonably

good provable robustness, as well, against the worst case attack. What’s happened, is we’ve

gone from this picture where we have this, where we

have potential non-robustness, to this picture where we’re

actually provably robust. We’ve seen that minimizing

even this loose bound can lead to robust models

and you can achieve both empirical and provable robustness. I guess to get to your

question about MNIST, I think to scale this up,

the issue is not necessarily with the looseness of

this particular bound, but extending things to deeper networks. We do have approaches to doing that– – I wasn’t thinking so

much about scaling up as the fact that MNIST is a

very special class of problems where things are in a very

low dimensional manifold because they just digits

that are mostly white. – We’ve tried this on some other datasets. I think it’s, it works on

things other than just MNIST, although for instance, for

CFR, for housing numbers, the bounds are quite loose

and this is something that we’re working on right now. Two minutes, okay, so I should

get to future work, then. We’ve now, so now we’ve sort of seen how to get both robustness at

training time and test time. Let me just summarize and tell you about some other directions I’m interested in. First of all, some take aways. We saw in general that

empirical success is not enough and leads to this arms race,

but that formal threat models can actually turn security

into optimization. For data poisoning, we

saw that removing outliers is the wrong goal, and we also saw that robustness can be useful as another learning

problem, such as clustering. For adversarial images,

we also saw that learning can actually make verification easier. Beyond security, my

sort of general interest is in trying to treat

machine learning as software because it’s increasingly being used as part of learning software systems and give it their liability properties you would want good software to have. Security’s of course one,

but there’s other properties you’d want, as well, for instance, you would like things

to not fail silently. You would like to be

able to detect failures and respond appropriately,

which has led us to a line of work on having good estimates of uncertainty under limited computation. You’d also like algorithms

that will behave predictably and generalize well to new situations. This has led us to work trying

to guarantee a performance for a well defined family of

counterfactual distributions. In both cases, an important

theme is that you can’t just rely on the data or on

measured prediction accuracy to succeed, you actually

need some underlying theoretical scaffolding to ensure success. In addition to these lines of work, an area that I’m interested

in in the future, is what you might call humans as software. This is trying to take seriously the fact that machine learning algorithms run on data and this data

is collected by humans, and it might be imperfect

in a number of ways. There’s easier versions

of this, for instance, where data might just

be labeled unreliably, which fall fairly squarely into the robust learning framework

we’ve seen in this talk. But there’s other more difficult problems, for instance, in robotics,

one would often train robots based on demonstrations. These demonstrations could have

all sorts of idiosyncrasies that you would want to remove from the actual deployed system. And then maybe the hardest version of this is what I call proxy signals. Many web companies such

as Facebook or Quora, maybe in theory, at least optimistically, they would like to optimize

for human happiness, but they don’t get to observe

the happiness of their users. They only get to observe

what things they click on or how long they dwell on a webpage. So you get systems that are very good at optimizing clicks and dwell time, but maybe not at making

people happy and fulfilled. I think this is, in general,

a very important problem because machine learning

is this very powerful optimization tool that’s

being increasingly applied to many areas of society. It’s important to make sure

that we can actually optimize the things we care about

rather than just the things that are easy to measure. I’ll leave you with that, and I’d like to once again

thank my collaborators. (audience applauding) – [Kevin] We have time

for a question or two. – In many domains, what

attackers do is the combination of data poisoning and

adversarial examples. They poison the data to make their example they’re gonna generate be on

the wrong side of the boundary. They don’t actually need to

change the boundary a lot because they know and they

can also design their example. Any thoughts on what to do

when both of these things are present at the same time? – I think that’s definitely a problem that you’d actually need to worry about if you want these to be secure end-to-end. The main reason we focused

on each individually is just because even if you

can only do one of these, right now the attacker

is generally winning. I think, I don’t see any strong obstacles to sort of combining these,

I think actually the thing that’s harder is dealing

with targeted attacks rather than these

population level attacks. I think there are a lot

of challenges there. – This is more like,

slightly disappointing thing about some of these experiments. Like you would, suppose I

didn’t care about security, but I just wanted to do

better machine learning. In a sense, the middle part

of your talk was saying, look, these algorithms

can do better estimation. And I feel like in these image examples, the fact that I’m getting

corrupted with this noise and, you take a human and give

this L infinity perturbation to images, you’re gonna

get the same digits. It feels like forcing the

system to be more stable. It should actually give us

better learning algorithms because we’re kind of

telling it more examples of what isn’t the right answer. And often when we write these things, it gives us a trade

off, and that’s sort of, like in these object

recognition, for human, it seems like it would

only help the system if you told it these are not stop signs. Typically, when you do this, it just destroys the performance. – Yeah. I mean, there has been some work, not by us, but by other people showing that at least in

some cases, you can actually use this as a regularizer

which improves things. But the issue is, you

need much more model capacity than you

would otherwise to actually robustly

fit these things. One place people have looked at this is, for instance, in

semi-supervised learning, where you can kind of

try to impose smoothness on your unsupervised

data to try to do things. I guess one other thing

I’d say is, aside from the regularization, if there are actually real

outliers in your data set, then, which I think often can happen

in a lot of applications, especially say genomic

data which often has very systematic

corruptions in it. Then you might hope to

gain something. We actually did get

slightly higher accuracy when we ran our method

than just naive training on one of our datasets, but it turned out it’s ’cause there was like

one outlier that was really messing you up. It would be an easy to

detect outlier, so I don’t think our method should get very much

credit for that. (inspiring music)

(audience applauding)

underwater stuff are welcome. Cool.

In blockchain technology, data mutability is decentralized. The idea is to not have a central authority controlling the data. To identify data attacks, some verified data is used as reference; because without reference, the attacks can shift the trusted data set and get the wrong information in the database. How do these things work together? Are many data users verified in blockchain technology, or is it like a more efficient population sampling that can analyze more samples?

What happens when attacks feed in a lot of pseudo-legitimate data? In a sense that if the data seem authenitic, the radius of the confident data shrinks and so now false data can be fed into the distance closer than previously possible, and then if the user grows over time the radius increases again, now consuming the false data as legitimate. This could happen during New Year sales, where just before the sale attacks feed good data and decrease the range and add false data outside the range. But as the sales go off, the range increases and the database takes the false data as legitimate- sort of like a bounce phenomenon. Is that a sensible question to ask?