Hi! This video is a 10 minute introduction to

the basics of robust statistics. We will start by looking at estimating the

location and correlation parameters to see how robust estimators behave relative to their

classical non-robust counterparts. I will then discuss some of my own research

into estimating a sparse precision matrix in the presence of cellwise contamination. At the end are some references you can use

to find out more about the various techniques presented. So, let’s get started. The aim of robust statistics is to develop

estimators that model the bulk of the data and are not unduly influenced by outlying

observations or observations that are not representative of the true underlying data

generating process. To explore this idea we consider the simple

case of location estimation. We will look at three estimators, the mean, the median

and the Hodges-Lehmann estimator of location which is just the median of the pairwise means. In this example we have 10 observations drawn

from a uniform distribution over the range 0 to 10. All three estimates of location start

off close to the true parameter value, which is 5 in this case. What we will do is observe

how they behave when we artificially corrupt some of the observations. We start by taking the largest observation

and moving it to the right. When we do this the mean also starts to increase which is

what you would expect from a non-robust estimator. In contrast, the Hodges-Lehmann estimator

and the median remain unchanged. It is this resilience to contamination that makes them

what we call robust estimators. If we take the two largest observations and

move them to the right, the median stays the same; the Hodges-Lehmann estimate jumps from

5 to 5.5 but then stays constant and the mean reacts as before by increasing as the contaminated

observations increase. We observe similar behaviour when the three

largest observations are contaminated. When the four largest observations are contaminated

the Hodges-Lehmann estimate now behaves just like the mean, in that, both estimators are

in breakdown, that is to say they are no longer representative of the bulk of the data. And when five observations are contaminated,

even the median is no longer sure which observations represent the original data and which are

the contaminated observations. So it also returns an estimate that is no longer representative

of the location parameter of the original data generating process. We can observe similar behaviour in multivariate

data sets. In this example we are looking to estimate the correlation between variable.s In the lower half of the matrix we have scatter

plots of 30 observations. For example this scatter plot shows variable 2 on the y-axis

and variable 1 on the x-axis. Whereas this one has variable 2 on the x-axis and variable

3 on the y-axis. In the upper half of the matrix we have the

true parameter values that I used to generate the data, the classical correlation estimates

and the robust correlation estimates between each of the pairs of variables. The robust estimator that I have used is the

MCD estimator and you can find links to further details at the end of the presentation. We are going to observe what happens to our

estimates when we take three observations and move them away from the main data cloud.

Let’s focus on the relationship between variable 1 and variable 2. As the observations move

further away from the data cloud, the classical correlation estimator is decreasing and eventually

it becomes zero suggesting that there is no correlation between variable 1 and variable

2. As the contamination moves further away, the classical estimate is now giving a negative

value suggesting a negative relationship between variable 1 and variable 2. In contrast, the robust estimates have all

stayed reasonably close to the true parameter values as they are modelling the core of the

data and are not as influenced by these outlying values. Let’s look at a real example. Some work I

do for the industry body Meat and Livestock Australia is concerned with predicting the

eating quality of beef. They run a large number of consumer tests,

where consumers are asked to rate pieces of meat on their tenderness, juicyness, flavour

and give an overall score. The aim is to develop a predictive model that

rates each cut of meat as 3, 4 or 5 star based on what we know about consumer preferences. Here’s a scatter plots of ratings for almost

3000 pieces of meat. There’s obvious structure in the data set,

with tenderness, juicyness, flavour and overall being highly positively related. However,

there is also a lot of noise, with some consumers giving very high scores for some variables,

and low scores for other variables. For example up here you have some observations

where consumers have given very high tenderness scores but very low overall scores. If you calculated classical covariances, you

can see there there is reasonably strong positively linear relationships between the variables,

however, robust techniques such as the Minimum Covariance Determinant can be used to highlight

the tightest core of the data, which represents the relationship between the variables for

the majority of consumers, which suggests that true underlying correlations between

variables are much stronger than the classical method would otherwise suggest. For example, the relationship between flavour

and overall goes from 0.86 to 0.99 when you restrict attention to the the tightest half

of the data. Something that I’ve looked at in my research

is estimation in the presence of cellwise contamination. Transitional robust techniques, such as the

Minimum Covariance Determinant we used earlier, assume that contamination happens within the

rows of a data set. Furthermore, even the most robust estimators

can only cope with at most 50% of the rows being contaminated before they no longer work

effectively (as we saw earlier with the median breaking down when there were 5 contaminated

and 5 uncontaminated observations). This assumption of row-wise contamination

may not be appropriate for large data sets. What we have here is a heat map of a data

matrix. The white rectangles represent corrupted cells. In this situation with a small amount

of cellwise corruption that affects less than half of the rows, classical robust methods

will still perform adequately, however as the proportion of scattered contamination

increases, or the contamination is allowed to spread over all the rows, then you might

end up in a situation where all observations have at least on variable that is contaminated. There is still a lot of “good” data in this

data set, the challenge is extracting information about the core of the data without your estimates

being overwhelmed by the contaminating cells. In particular, I have looked at estimating

precision matrices in the presence of cellwise contamination. A precision matrix is just

the inverse of the covariance matrix, however, often in large data sets, you also want to

assume that the precision matrix is sparse, that is there are a number of zero elements

in the precision matrix. This is particularly useful, when modelling Gaussian Markov random

fields where the zeros correspond to conditional independence between the variables. We’ll briefly apply these ideas to a real

data set to finish off. We have 452 stocks or shares over 1258 trading days. We’ll observe

the closing price which we convert to daily return series and we want to estimate a sparse

precision matrix, to identify clusters of stocks. That is, groups of stocks that behave

similarly. To do this we used a robust covariance matrix

as an input to the graphical lasso, details of which can be found in the references at

the end of the presentation. If we look at the return series for the first

6 stocks in the data set, we can see that there are a number of unusual observations

scattered throughout the data. For example you have this negative return for 3M Co, again

for Adobe and AMD also has a number of unusual observations. This suggests that perhaps there

is a need for a robust estimators when analysing this data set. We’re going to visualise the results as a

network of stocks. If a there is a non-zero entry in the estimated precision matrix between

two stocks, then they will both appear in the graph with a line linking them. Furthermore,

I have coloured the stocks by industry. We can see here in the classical approach

that we have some clusters of stocks. However, if we use the robust approach, these clusters

are much more densely populated. So it has identified linkages between more stocks than

the classical approach did, reflecting the fact that the robust method is not as influence

by those unusual outlying observations. So we’ve got a financials cluster over here.

We’ve got an information technology cluster down here. A utilities cluster and a energy

cluster. If we add some additional contamination, the

classical approach is no longer able to identify those clusters of stocks, it gives you just

this soup of linkages with no clear structure. The robust approach, on the other hand, gives

you essentially the same as what we had before when I didn’t add the extra contamination

in. So you’ve still got Information Technology clusters. You’ve still got a utilities cluster

over here and a financials cluster down there. So what this tells you is that the robust

methods are modelling the core of the data and are relatively unaffected by unusual or

extreme observations. So, that’s been an extremely brief introduction

into the idea of robust statistics. The take home message is that the robust methods

are designed to model the core of the data without being unduly influenced by outlying

observations that are not representative of the true data generating process. If you’d like to know more about any of the

methods presented here you can check out these links. For the Hodges-Lehmann estimator, the

Minimum Covariance Determinant, this is a nice recent review paper, or the graphical

lasso. Here’s some of my work. There’s a paper that’s

to appear in Computational Statistics and Data Analysis on the robust estimation of

precision matrices in cellwise contamination. To understand that, it would perhaps help

to go back to an earlier paper on robust scale estimation. Or you could check out my PhD

thesis which is available at that link there. If you would like to get in touch, there are

my details and you can find a copy of these slides at this link.

this video is very good

Pls come up with more videos like these ðŸ™‚

Very well explained, thank you. How did you create the cluster map?.