3.1 Scatter Plots And Linear Correlation

-Scatter plots are often formed when the answers to questions being correlated are not clear-cut. For example: Do pollution levels affect the ozone layer in the atmosphere? Is job performance related to high marks in high school? These questions do not offer clear-cut data resulting in a perfect positive or negative correlation they result in Scatter plots. Two-variable statistics provide methods for detecting relationships between variables and for developing mathematical models of these relationships. The visual pattern in a graph or plot can often reveal the nature of the relationship of two variables.

-In plotting data you need to determine witch of the two is the dependent (or response) variable, y, that is affected by another variable, the independent (or explanatory) variable, x. Variables that have a linear correlation have to have a proportional change between both variables. A positive (or perfect positive correlation or direct) linear correlation is when the X and Y variables increases at a constant rate with each other. A negative ( or perfect positive correlation or inverse) linear correlation happens when the Y axis decreases at a constant rate as the X axis increases.

Scatter plots are very similar to line graphs using horizontal and vertical axes to plot data points. However, they have a very specific purpose. Scatter plots show how much one variable is affected by another. The relationship between two variables is called their correlation. A scatter plot shows such relationships graphically, usually with the independent variable as the horizontal axis and the dependent variable as the vertical axis. The line of best fit is a strait line that passes as close as possible to all the points on the scatter plot. The stronger the correlation, the more closely the data points cluster around the line of best fit. For example:

These are examples of the 5 different possibilities of correlations in scatter plots. As you can see there is a High Positive Correlation, Low Positive Correlation, Low Negative Correlation, High Negative Correlation, and No Correlation.

A perfect positive correlation is given the value of 1. A perfect negative correlation is given the value of -1. If there is absolutely no correlation present the value given is 0. The closer the number is to 1 or -1, the stronger the correlation, or the stronger the relationship between the variables. The closer the number is to 0, the weaker the correlation. So something that seems to kind of correlate in a positive direction might have a value of 0.67, whereas something with an extremely weak negative correlation might have the value -.21. You will see when this comes into effect with the Correlation Coefficient.

The correlation coefficient a concept from statistics is a measure of how well trends in the predicted values follow trends in past actual values. It is a measure of how well the predicted values from a forecast model "fit" with the real-life data.
The correlation coefficient is a number between 0 and 1. If there is no relationship between the predicted values and the actual values the correlation coefficient is 0 or very low (the predicted values are no better than random numbers). As the strength of the relationship between the predicted values and actual values increases so does the correlation coefficient. A perfect fit gives a coefficient of 1.0. Thus the higher the correlation coefficient the better.
The closer the correlation coefficient is to one, the more the points will fall along a line stretching from the lower left to the upper right. The closer the correlation coefficient is to negative one, the more the points will fall along a line stretching from the upper left to the lower right.

The following diagram illustrates how the correlation corresponds to the strength of a linear correlation.
Negative Linear Correlation Positive Linear Correlation
| Strong | Moderate | Weak | Weak | Moderate | Strong |
-1 -0.67 -0.33 0 0.33 0.67 1
Correlation Coefficient, r
The equation for the Correlation Coefficient, in its algebraic manipulated form is:

where n is the number of data points in the sample, x represents individual values of the X variable, and y represents individual values of the variable Y. ( Note that ∑x² is the sum of the squares of all the individual values of X, while (∑x)² is the square of the sum of all the individual values.

Example:

A company studied whether there was a relationship between its employees’ years of service and number of days absent. The data for eight randomly selected employees are shown below.

Employee Years of Service Days Absent Last Year x² y² xy
Jim 5 2 25 4 10
Leah 2 6 4 36 12
Efraim 7 3 49 9 21
Dawn 6 3 36 9 18
Chris 4 4 16 16 16
Cheyenne 8 0 64 0 0
Karrie 1 2 1 4 2
Luke 10 1 100 1 10
∑x= 43 ∑y=21 ∑x²=295 ∑y²=79 ∑xy=89

To see the examples, graphs, scatter plots, and equations click "file" and down load the "scater plots" file. Thank you.

Comment:
By: Osman Osman
You have spet a good effort on this ….. Well done
Still need some visuals like graphics ..

Peace *__*
by: bridget
good work on the amount of information posted, very easy to
understand. could add some pictures/ graphs. =)

page revision: 12, last edited: 15 May 2007 03:35