3.2 Linear Regression Regression is the analytical technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear correlation, you can develop a simple mathematical formula of the relationship between the two variables by finding the line of best fit.

Interpolation (estimating between data points) and Extrapolation (estimating beyond the range of data) are used for predicting the equation for this line.

Generally, you can "eyeball" a good estimate of the line of best fit on a scatter plot when the linear correlation is strong. But an analytical method of using a least square fit gives more accurate results, especially for the weaker correlations.

For the line of best fit in the least-squares method,

• the sum of the residuals is zero (the positive and negative residuals cancel out)
• the sum of the squares of the residuals has the least possible value

It can be shown that this line has the equation

y=ax+b, where a= n(∑xy) – (∑x)( ∑y)/n(∑x2) – (∑x)2

Example 1: Applying the Least-Squares Formula

This table shows data for the full- time employees of a small packaging company.

Age (years) Annual Income (\$000)
33 33
25 31
19 18
44 52
50 56
54 60
38 44
29 35

a) Use a scatter plot to classify the correlation between age and income. The scatter plat shows a strong linear correlation between age and income level.

b) Find the equation of the line of best fit analytically.

Age ,x Income,y x2 xy
33 33 1089 1089
25 31 625 775
19 18 361 342
44 52 1936 2288
50 56 2500 2800
54 60 2916 3240
38 44 1444 1672
29 35 841 1015
∑x=292 ∑y=329 ∑x2=11712 ∑xy=13221

Substitute these totals into the formula for a.

a= n(∑xy) – (∑x)( ∑y)
n(∑x2) – (∑x)2
=8(13221) – (292)(329)
8(11712) – (292)2

=1.15

Now, you need the means of x and y.

Mean of x= ∑x/n mean of y=∑y/n b=41.125 – 1.15(36.5)
=292/8 =329/8 = -0.85
=36.5 =41.125

Now substitude the values of a and b into the equation for the line of best fit.
y = ax + b
1.15x – 0.85

Therefore, the equation of the line of best fit is y = 1.15x – 0.85.

c) Predict the income for a new employee who is 21 and for an employee retiring at 65.
Substitute:

y = ax + b

For the 21 year old: y = 1.15(21) – 0.85
= 23.3

For the 65 year old: y = 1.15(65) – 0.85
=73.9

Therefore, if predicted properly the new employee’s income would be \$23300 and for the retiring employee would have an income of about \$73900.

Comment:
By: Osman Osman
Nicely done…. Good job
I think u have the best explanation among all..!!

Peace *__*

page revision: 11, last edited: 09 May 2007 17:09