Intro of Corellation “r” to measure linear strength


Hello! Hello, it is Mr. Tarrou. Finishing up my notes for introducing scatter
plots. Now we are early, as I have said, and you
are probably figuring out early in the course of Statistics. A course of which there is not a whole lot
of background knowledge we have from other classes. So unfortunately if you watched 15 minutes
of my notes a minute ago, and all of this again is just the concepts and to show you
what to pay attention to, but we are still not actually going to do any problems. So I apologize for that. But there is so much to throw out at you at
once. The next video will be about how to actually
create these graphs with your calculator. And um… actually we might have one more
graph in here I am not going to be able to go over today either. But let’s try and get this done. Take a couple of minutes and copy this. Ok. So we are going to start talking about we
are going to measure the strength and direction of a linear relationship. All kinds of…. Scatter plots can have all
kinds of patterns in them, but since we are only in Introduction to Statistics we can
only deal with the linear patterns. And how we are going to measure that linear
relationship, and this all about the definition and the properties of correlation, we are
not going to see how to make that in this video. So if you need to know that, skip to another
video where you see a picture of a calculator. Correlation ‘r’ measures, that is a standard
letter for correlation.. at least in my textbook, measures the strength and direction of a linear
relationship. Again, it has to be linear. Let’s take a look at the formula before we
look at these three bullets here. The formula for ‘r’ which your calculator
or computer will do for you, is going to be r equals 1/n-1 times the summation of X sub
i minus the mean of x over S sub x times y sub i minus the mean of y divided by the standard
deviation of y. These little pieces if you have been following
along with me, or hopefully working in your own stats book, look like z score calculations. They look like standardized test statistics
and indeed they are. So we are going to convert all of the measures
of x and y into standard deviations away from the mean. So keeping that in mind, you are standardizing
the x and y values, so r has no unit of measure. It is not in inches, or feet, or miles, or
grams. It has no unit of measure. It is been standardized out with the… well
you know right now as z score process. You will learn another test called a t-score
much later. Changing the units of measure does not effect
r. Well if r does not have a unit of measure,
then maybe say if you want to change the x axis from… I don’t know… for some reason inches to
feet. If you want to change that unit of measure,
it is not going to effect r because r does not have a unit of measure. So if you convert one or the other axis unit
of measure it is not going to effect r. And it is also not effected by flipping the
x and the y values. In other words, if for some reason I felt
like whatever was on the x axis I now want to be on the y axis that is not going to change
the strength measurement for the linear relationship. It is not going to change r. Now on another screen you are going to see
that r ranges from -1 to 1. So that might look like if you have a scattering
of points like this, where they are very widely spread out, but maybe see a small downward
trend to those points that might have a correlation value of say negative .2. Whereas if I eliminate some of the randomness
of these points, but yet still keep a downward slope, that might give you… Excuse me, that should have been -.2. Negative one to positive one for r. So if I narrow up the band of that downward
trend, maybe that would make it an r value of say negative point eight. If my pattern of points is positively associated,
and I kept kind of a tight pattern here maybe instead of -.8 it is positive .8. Now positive .8, this is not a slope. So whether my line looks like this, or like
this, or even like this if I keep the pattern of these points that gave me this r value
of .8 the same but somehow could rotate their slope, that would not effect r. r is not slope. It is strength and direction of a linear relationship. Now if I were to rotate these points though
and make the pattern completely horizontal, whether there was a band of points or even
if they were perfectly horizontal line, if I give you a pattern where one variable is
changing and the other variable is not then there is no relationship. That would yield you an r value of zero. Now your textbook will have a lot examples
of what a scatter plot looks like and the correlation value that goes along with it
so you can estimate it. As far as actually calculating correlation,
if you are using a graphing calculator you will need to make your regression line which
[laughing] you have not heard me talk about yet. So this is all about the definition and the
properties of correlation. We are not going to calculate, because we
are not going to do a calculator lab and make a regression line which I have not even introduced
yet. Moving on! WHOOO! I have a lot to talk about. So…. [nanananana] Here are some other properties
of correlation. What I said, horizontal and vertical patterns
will yield a correlation value of zero, even if that pattern is perfectly linear. Now I am talking perfectly horizontal or perfectly
vertical. One variable is changing and the other is
not. r takes on value between negative one and
positive one. If the r values are exactly 1 or -1, that
means that the points follow an exactly straight linear pattern. It is not about the slope, it is about the
strength of that linear relationship. r is not resistant… If you back up a little bit, you will see
that the calculation of r uses the average again, any calculation that uses average is
not resistant. Remember that is a measurement that is strongly
effected by outliers. And correlation requires both variables to
be quantitative. Well as I talked about in the last video,
scatter plots… this is a measurement that you take on scatter plots… are for quantitative
data. Both the x and y axis have to be quantitative,
so of course the same has to be true when you are working with a math formula and you
are calculating r or correlation. Alright. Trying not to use all my 15 minutes. BAM! Or not run out
of time. So… [nanananana] Ok. Hopefully you go back and you pause that video
so you can copy the notes. Now one thing about finding correlation, your
calculator has no idea what those numbers represent when you put them in there… or
the computer. So if you give a calculator or computer some
numbers, it is going to calculate those numbers whether or not it should or if it is appropriate. You can have…. let’s read what I wrote. Curved scatter plots can yield correlation
values close to one or negative one, but since the calculation is only good for linear patterns
these results are misleading. Again, I am talking about curved scatter plots. We cannot find strength of curved relationships. This is only intro to Statistics. You will have some questions in your textbook
that maybe have a linear pattern, and you have maybe another graph that is… let’s
say… curved somehow, or you may have a pattern that is somewhat linear but has an outlier
somewhere. Well, all of these may, and there will be
a question in your book somewhere, where all of these graphs give the same r value. Maybe this r value is .7. Ok. Well this is not linear, this is not linear,
and this is linear. I will show you this when we do a calculator
lab. But is possible for very weird looking graphs
to all give you the exact same correlation. But yet, only one of them linear. So when you make a… or when you want to
talk about correlation and the r value of the strength of a linear relationship, you
must make a residual plot. I am going to show you that today? Unfortunately no! I am trying to just give you an outline of
concepts you need to know. But the residual plot, at least in structure,
is the original x values still on the x axis and the residuals on the y axis. The residuals of the y’s. And when you make a residual plot, I will
go more in-depth later, but when you make a residual plot and you want to validate linearity
you do not want to see a pattern in that residual plot. Because what a residual plot will do when
you see it being made, it magnifies any curvature that is in the graph. So if you try to run a straight line through
a pattern that is curved it is going to magnify that curvature. So you might have a scatter plot that looks
straight, but when you make that residual plot it will highlight that curve. So residual plots will verify that data is
linear and again like we talked about with Normality… and Normality.. and Normality…
it might be a couple of videos before I define that for you. Actually it will be the next one where we
do a calculator lab. Mean and standard deviation of x and y axis
should be included in your description of a scatter plot. Not just correlation. That is a incomplete description. You should not just say that a scatter plot
has a correlation of .7 and move on. Like what else is there? Where does it start and end at? What is slope? Is it curved or straight? Are there outliers? So don’t just think that r is all the end-all
be-all to describing a linear scatter plot. Alright. One more little page. [nanananana] And if you take correlation and
you square it, you get something called the coefficient of determination. Please do not change the wording of this definition. It can change the meaning of it, or just have…
you can have something of complete non-sense very very easily. So if I take the value of r and I square it,
I get something called r^2 or the Coefficient of Determination. It is the percent of variation in y that is
explained by the least squares regression line of y on x. What the heck does that mean. How are you supposed to interpret that. I have given you a little made up scatter
plot here. I did not put a vertical scale. I apologize, but it is just for the definition
of r squared or this coefficient of determination. So I tracked mileage supposedly between the
speeds of 30 and 70 and compared that to the miles per gallon I was getting in a vehicle
or a collection of vehicles. We get this downward trend which makes sense. The faster you drive the more wind resistance
and such and your mileage will go down. Well let’s say I have collected this data,
I have shown that it is linear which again I will show you in a calculator lab video
a little later, and we get an r value of .8 and an r squared value then would be .64. .8 squared is .64. this literally is squaring. It is not just a little footnote or notation. How are you supposed to interpret that .64
based on this definition? Well, your summary statement would be, “64
percent of the variation in miles per gallon… the variation in y… is explained by your
speed between 30 and 70 miles per hour. 64 percent of variation in y is explained
by the speed your are traveling between 30 and 70 mph. What could the other 36% be? I don’t know. The weight of the car, the air in the tires,
the speed of the wind that particular time you were driving within that speed. That is not really anything that we have to
come up with or explain. But we do have to know the definition of r^2,
we do have to make a good summary statement based on that definition. Please do not change the wording. Just use it as is, and there you go. The next time you hear me talk about a regression
line, or scatter plots, you will be drawing regression lines through those but we need
another set of notes for that. Then a calculator lab to finally figure out
to get all of these numbers that we want to use later on. BAM! Have a great day:)

21 Replies to “Intro of Corellation “r” to measure linear strength”

  1. Your videos help me so much, it makes stats seem so easy, yet still gives me a solid foundation for understanding the material, your videos are honestly the reason I have an A in stats right now haha, I just work out of the book and watch your videos

  2. Hi, in the last part of video with speed and MPG scatter plot, shouldn't r = – .8 since it is a downward slope? Not sure, just asking?

  3. THANK YOU!!! Yes I forgot the negative sign and no one else let me know for nine months. You also made me realize how bad the audio was on my old camera…YIKES! I am going to have to re-record these someday. Right now I am working on Calculus videos and will go back and do Geometry when the next school year starts. Thank you again…sorry for the mistake.

  4. You're welcome, all the way to the Philippines!…and thanks for helping make the Philippines one of my biggest viewer bases:) Keep watching, liking and spreading the word about my channel:D

  5. I just want to let you know that as a 10th grader who's teacher comes into class on some days, and simply says "Eh, I don't feel like teaching. You guys can have a study hall. ," you are a savior. I have my AP Statistics test this Friday and I was so frustrated and stressed to the point that my nails had become nubs! Thank you so, so, so, so much! You can't comprehend how much you're helping students everywhere!

  6. Well I'm glad my lessons were so helpful and you can start letting your nails grow back now that you found my channel and subscribed:) As a full-time teacher, I can tell you some days are harder than others but you didn't let that stop you from taking steps to seek out extra help. That's the kind of self-motivation that will get through school and life…keep it up! So BAM!!!…go into your test with confidence on Friday and you'll do great:) THANKS for sharing your kind words.

  7. You're an amazing teacher. I've done so bad on my first two tests, was confused by the material. Reading the book wasn't enough for me. But after watching these videos I feel confident about the upcoming test for chapter three. 🙂 Thank you so much.

  8. THANKS:)…and good luck on your upcoming tests and thanks for subscribing!
    Please continue to spread the word about my channel and help me groW:D

  9. Stats has been going against everything I've learned from the past math classes 🙁 Well not everything..but it's really confusing. For example in geometry/trig/algebra the question would tell you to round to the nearest whole#, tenth, hundredth, or whichever. Statistics is not like that… Is it okay if we don't round at all? Like say my calculator gives m e r=0.84653 . Would it be okay if I wrote down 0.84? I've never come across a question in stats that says "round to the nearest tenth" so feel a bit lost and exposed since I'm not sure what to write down. 🙁 What would you suggest?

  10. I have to say you are wonderful, my class is online and well when I read it I did not understand much at until I came to your channel. I most definitely will let others know about you. Again thanks. 

  11. Hi Prof Rob. Just wondering if values are quite tight can you still do correlations on it? I also noticed that you have said this measures the linear relationship is there one for a non-linear or when you have a curved scatter plot? Thanks for the video.

  12. THANK YOU!!! (no I am not shouting at you) 15 minutes of your video is more educational than a week in a Stat class of university. Thanks again!

Leave a Reply

Your email address will not be published. Required fields are marked *