Hello! Hello, it is Mr. Tarrou. Finishing up my notes for introducing scatter

plots. Now we are early, as I have said, and you

are probably figuring out early in the course of Statistics. A course of which there is not a whole lot

of background knowledge we have from other classes. So unfortunately if you watched 15 minutes

of my notes a minute ago, and all of this again is just the concepts and to show you

what to pay attention to, but we are still not actually going to do any problems. So I apologize for that. But there is so much to throw out at you at

once. The next video will be about how to actually

create these graphs with your calculator. And um… actually we might have one more

graph in here I am not going to be able to go over today either. But let’s try and get this done. Take a couple of minutes and copy this. Ok. So we are going to start talking about we

are going to measure the strength and direction of a linear relationship. All kinds of…. Scatter plots can have all

kinds of patterns in them, but since we are only in Introduction to Statistics we can

only deal with the linear patterns. And how we are going to measure that linear

relationship, and this all about the definition and the properties of correlation, we are

not going to see how to make that in this video. So if you need to know that, skip to another

video where you see a picture of a calculator. Correlation ‘r’ measures, that is a standard

letter for correlation.. at least in my textbook, measures the strength and direction of a linear

relationship. Again, it has to be linear. Let’s take a look at the formula before we

look at these three bullets here. The formula for ‘r’ which your calculator

or computer will do for you, is going to be r equals 1/n-1 times the summation of X sub

i minus the mean of x over S sub x times y sub i minus the mean of y divided by the standard

deviation of y. These little pieces if you have been following

along with me, or hopefully working in your own stats book, look like z score calculations. They look like standardized test statistics

and indeed they are. So we are going to convert all of the measures

of x and y into standard deviations away from the mean. So keeping that in mind, you are standardizing

the x and y values, so r has no unit of measure. It is not in inches, or feet, or miles, or

grams. It has no unit of measure. It is been standardized out with the… well

you know right now as z score process. You will learn another test called a t-score

much later. Changing the units of measure does not effect

r. Well if r does not have a unit of measure,

then maybe say if you want to change the x axis from… I don’t know… for some reason inches to

feet. If you want to change that unit of measure,

it is not going to effect r because r does not have a unit of measure. So if you convert one or the other axis unit

of measure it is not going to effect r. And it is also not effected by flipping the

x and the y values. In other words, if for some reason I felt

like whatever was on the x axis I now want to be on the y axis that is not going to change

the strength measurement for the linear relationship. It is not going to change r. Now on another screen you are going to see

that r ranges from -1 to 1. So that might look like if you have a scattering

of points like this, where they are very widely spread out, but maybe see a small downward

trend to those points that might have a correlation value of say negative .2. Whereas if I eliminate some of the randomness

of these points, but yet still keep a downward slope, that might give you… Excuse me, that should have been -.2. Negative one to positive one for r. So if I narrow up the band of that downward

trend, maybe that would make it an r value of say negative point eight. If my pattern of points is positively associated,

and I kept kind of a tight pattern here maybe instead of -.8 it is positive .8. Now positive .8, this is not a slope. So whether my line looks like this, or like

this, or even like this if I keep the pattern of these points that gave me this r value

of .8 the same but somehow could rotate their slope, that would not effect r. r is not slope. It is strength and direction of a linear relationship. Now if I were to rotate these points though

and make the pattern completely horizontal, whether there was a band of points or even

if they were perfectly horizontal line, if I give you a pattern where one variable is

changing and the other variable is not then there is no relationship. That would yield you an r value of zero. Now your textbook will have a lot examples

of what a scatter plot looks like and the correlation value that goes along with it

so you can estimate it. As far as actually calculating correlation,

if you are using a graphing calculator you will need to make your regression line which

[laughing] you have not heard me talk about yet. So this is all about the definition and the

properties of correlation. We are not going to calculate, because we

are not going to do a calculator lab and make a regression line which I have not even introduced

yet. Moving on! WHOOO! I have a lot to talk about. So…. [nanananana] Here are some other properties

of correlation. What I said, horizontal and vertical patterns

will yield a correlation value of zero, even if that pattern is perfectly linear. Now I am talking perfectly horizontal or perfectly

vertical. One variable is changing and the other is

not. r takes on value between negative one and

positive one. If the r values are exactly 1 or -1, that

means that the points follow an exactly straight linear pattern. It is not about the slope, it is about the

strength of that linear relationship. r is not resistant… If you back up a little bit, you will see

that the calculation of r uses the average again, any calculation that uses average is

not resistant. Remember that is a measurement that is strongly

effected by outliers. And correlation requires both variables to

be quantitative. Well as I talked about in the last video,

scatter plots… this is a measurement that you take on scatter plots… are for quantitative

data. Both the x and y axis have to be quantitative,

so of course the same has to be true when you are working with a math formula and you

are calculating r or correlation. Alright. Trying not to use all my 15 minutes. BAM! Or not run out

of time. So… [nanananana] Ok. Hopefully you go back and you pause that video

so you can copy the notes. Now one thing about finding correlation, your

calculator has no idea what those numbers represent when you put them in there… or

the computer. So if you give a calculator or computer some

numbers, it is going to calculate those numbers whether or not it should or if it is appropriate. You can have…. let’s read what I wrote. Curved scatter plots can yield correlation

values close to one or negative one, but since the calculation is only good for linear patterns

these results are misleading. Again, I am talking about curved scatter plots. We cannot find strength of curved relationships. This is only intro to Statistics. You will have some questions in your textbook

that maybe have a linear pattern, and you have maybe another graph that is… let’s

say… curved somehow, or you may have a pattern that is somewhat linear but has an outlier

somewhere. Well, all of these may, and there will be

a question in your book somewhere, where all of these graphs give the same r value. Maybe this r value is .7. Ok. Well this is not linear, this is not linear,

and this is linear. I will show you this when we do a calculator

lab. But is possible for very weird looking graphs

to all give you the exact same correlation. But yet, only one of them linear. So when you make a… or when you want to

talk about correlation and the r value of the strength of a linear relationship, you

must make a residual plot. I am going to show you that today? Unfortunately no! I am trying to just give you an outline of

concepts you need to know. But the residual plot, at least in structure,

is the original x values still on the x axis and the residuals on the y axis. The residuals of the y’s. And when you make a residual plot, I will

go more in-depth later, but when you make a residual plot and you want to validate linearity

you do not want to see a pattern in that residual plot. Because what a residual plot will do when

you see it being made, it magnifies any curvature that is in the graph. So if you try to run a straight line through

a pattern that is curved it is going to magnify that curvature. So you might have a scatter plot that looks

straight, but when you make that residual plot it will highlight that curve. So residual plots will verify that data is

linear and again like we talked about with Normality… and Normality.. and Normality…

it might be a couple of videos before I define that for you. Actually it will be the next one where we

do a calculator lab. Mean and standard deviation of x and y axis

should be included in your description of a scatter plot. Not just correlation. That is a incomplete description. You should not just say that a scatter plot

has a correlation of .7 and move on. Like what else is there? Where does it start and end at? What is slope? Is it curved or straight? Are there outliers? So don’t just think that r is all the end-all

be-all to describing a linear scatter plot. Alright. One more little page. [nanananana] And if you take correlation and

you square it, you get something called the coefficient of determination. Please do not change the wording of this definition. It can change the meaning of it, or just have…

you can have something of complete non-sense very very easily. So if I take the value of r and I square it,

I get something called r^2 or the Coefficient of Determination. It is the percent of variation in y that is

explained by the least squares regression line of y on x. What the heck does that mean. How are you supposed to interpret that. I have given you a little made up scatter

plot here. I did not put a vertical scale. I apologize, but it is just for the definition

of r squared or this coefficient of determination. So I tracked mileage supposedly between the

speeds of 30 and 70 and compared that to the miles per gallon I was getting in a vehicle

or a collection of vehicles. We get this downward trend which makes sense. The faster you drive the more wind resistance

and such and your mileage will go down. Well let’s say I have collected this data,

I have shown that it is linear which again I will show you in a calculator lab video

a little later, and we get an r value of .8 and an r squared value then would be .64. .8 squared is .64. this literally is squaring. It is not just a little footnote or notation. How are you supposed to interpret that .64

based on this definition? Well, your summary statement would be, “64

percent of the variation in miles per gallon… the variation in y… is explained by your

speed between 30 and 70 miles per hour. 64 percent of variation in y is explained

by the speed your are traveling between 30 and 70 mph. What could the other 36% be? I don’t know. The weight of the car, the air in the tires,

the speed of the wind that particular time you were driving within that speed. That is not really anything that we have to

come up with or explain. But we do have to know the definition of r^2,

we do have to make a good summary statement based on that definition. Please do not change the wording. Just use it as is, and there you go. The next time you hear me talk about a regression

line, or scatter plots, you will be drawing regression lines through those but we need

another set of notes for that. Then a calculator lab to finally figure out

to get all of these numbers that we want to use later on. BAM! Have a great day:)

@bubblegum16ful

Thank you very much:)

Your videos help me so much, it makes stats seem so easy, yet still gives me a solid foundation for understanding the material, your videos are honestly the reason I have an A in stats right now haha, I just work out of the book and watch your videos

@Free411 You made my night. Thank you very much. Keep up the good work…and that A:)

Thank you:)

Hi, in the last part of video with speed and MPG scatter plot, shouldn't r = – .8 since it is a downward slope? Not sure, just asking?

THANK YOU!!! Yes I forgot the negative sign and no one else let me know for nine months. You also made me realize how bad the audio was on my old camera…YIKES! I am going to have to re-record these someday. Right now I am working on Calculus videos and will go back and do Geometry when the next school year starts. Thank you again…sorry for the mistake.

how i wish that my teacher is as enthusiastic as you… thanks a lot!

You're welcome, all the way to the Philippines!…and thanks for helping make the Philippines one of my biggest viewer bases:) Keep watching, liking and spreading the word about my channel:D

I just want to let you know that as a 10th grader who's teacher comes into class on some days, and simply says "Eh, I don't feel like teaching. You guys can have a study hall. ," you are a savior. I have my AP Statistics test this Friday and I was so frustrated and stressed to the point that my nails had become nubs! Thank you so, so, so, so much! You can't comprehend how much you're helping students everywhere!

Well I'm glad my lessons were so helpful and you can start letting your nails grow back now that you found my channel and subscribed:) As a full-time teacher, I can tell you some days are harder than others but you didn't let that stop you from taking steps to seek out extra help. That's the kind of self-motivation that will get through school and life…keep it up! So BAM!!!…go into your test with confidence on Friday and you'll do great:) THANKS for sharing your kind words.

You're an amazing teacher. I've done so bad on my first two tests, was confused by the material. Reading the book wasn't enough for me. But after watching these videos I feel confident about the upcoming test for chapter three. 🙂 Thank you so much.

THANKS:)…and good luck on your upcoming tests and thanks for subscribing!

Please continue to spread the word about my channel and help me groW:D

Stats has been going against everything I've learned from the past math classes 🙁 Well not everything..but it's really confusing. For example in geometry/trig/algebra the question would tell you to round to the nearest whole#, tenth, hundredth, or whichever. Statistics is not like that… Is it okay if we don't round at all? Like say my calculator gives m e r=0.84653 . Would it be okay if I wrote down 0.84? I've never come across a question in stats that says "round to the nearest tenth" so feel a bit lost and exposed since I'm not sure what to write down. 🙁 What would you suggest?

I have a question how do you specify whether a point is an outlier or an infulential point??

I have to say you are wonderful, my class is online and well when I read it I did not understand much at until I came to your channel. I most definitely will let others know about you. Again thanks.

Fantastic, as always.

Hi Prof Rob. Just wondering if values are quite tight can you still do correlations on it? I also noticed that you have said this measures the linear relationship is there one for a non-linear or when you have a curved scatter plot? Thanks for the video.

THANK YOU!!! (no I am not shouting at you) 15 minutes of your video is more educational than a week in a Stat class of university. Thanks again!

I love this teacher. he made get an 81. Thanks!

Sharing with colleagues and liking. Thank you for such great videos!

Subbed