Linear Regression

http://www.emathzone.com  is a very good web site for simple explanations. “Predicting” dependent variables with independent variables. Predict is important word here.

Regression:

The word regression was used by Frances Galton in 1985. It is defined as “The dependence of one variable upon other variable”. For example, a weight depends upon the heights. The yield of wheat depends upon the amount of fertilizer. In regression we can estimate the unknown values of one (dependent) variable from known values of the other (independent) variable.

Linear Regression:

When the dependence of the variable is represented by a straight line then it is called linear regression, otherwise it is said to be non linear or curvilinear regression.
For Example, if ‘X’ is independent variable and ‘Y’ is dependent variable, then the relation Y = a bX is linear regression.

….

And that is from http://dss.princeton.edu/online_help/analysis/regression_intro.htm

Simple Linear regression is when you want to predict values of one variable,  given values of another variable. For example, you might want to predict a  person’s height (in inches) from his weight (in pounds). Imagine a sample of ten  people for whom you know their height and weight. You could plot the values on a  graph, with weight on the x axis and height on the y axis. If there were a  perfect linear relationship between height and weight, then all 10 points on the  graph would fit on a straight line. But, this is never the case (unless your  data are rigged). If there is a (nonperfect) linear relationship between height  and weight (presumably a positive one), then you would get a cluster of points  on the graph which slopes upward. In other words, people who weigh a lot should  be taller than those people who are of less weight. (See graph below.)

The purpose of regression analysis is to come up with an equation of a line that  fits through that cluster of points with the minimal amount of deviations from  the line. The deviation of the points from the line is called “error.”  Once you  have this regression equation, if you knew a person’s weight, you could then  predict their height. Simple linear regression is actually the same as a  bivariate correlation between the independent and dependent variable.

Standard multiple regression

is the same idea as simple linear regression,  except now you have several independent variables predicting the dependent  variable.  To continue with the previous example, imagine that you now wanted to  predict a person’s height from the gender of the person and from the weight.   You would use standard multiple regression in which gender and weight were the  independent variables and height was the dependent variable. The resulting  output would tell you a number of things. First, it would tell you how much of  the variance of height was accounted for by the joint predictive power of  knowing a person’s weight and gender. This value is denoted by  “R2”. The output  would also tell you if the model allows you to predict a person’s height at a  rate better than chance. This is denoted by the significance level of the  overall F of the model. If the significance is .05 (or less), then the model is  considered significant. In other words, there is only a 5 in a 100 chance  (or  less) that there really is not a relationship between height and weight and  gender. For whatever reason, within the social sciences, a significance level of  .05 is often considered the standard for what is acceptable. If the significance  level is between .05 and .10, then the model is considered marginal. In other  words, the model is fairly good at predicting a person’s height, but there is  between a 5-10% probability that there really is not a relationship between  height and weight and gender.

In addition to telling you the predictive value of the overall model, standard  multiple regression tells you how well each independent variable predicts the  dependent variable, controlling for each of the other independent variables. In  our example, then, the regression would tell you how well weight predicted a  person’s height, controlling for gender, as well as how well gender predicted a  person’s height, controlling for weight.

To see if weight was a “significant” predictor of height you would look at the  significance level associated with weight on the printout. Again, significance  levels of .05 or lower would be considered significant, and significance levels  .05 and .10 would be considered marginal. Once you have determined that weight  was a significant predictor of height, then you would want to more closely  examine the relationship between the two variables. In other words, is the  relationship positive or negative? In this example, we would expect that there  would be a positive relationship. In other words, we would expect that the  greater a person’s weight, the greater his height. (A negative relationship  would be denoted by the case in which the greater a person’s weight, the shorter  his height.) We can determine the direction of the relationship between weight  and height by looking at the regression coefficient associated with weight.  There are two kinds of regression coefficients: B (unstandardized) and beta  (standardized). The B weight associated with each variable is given in terms of  the units of this variable. For weight, the unit would be pounds, and for  height, the unit is inches. The beta uses a standard unit that is the same for  all variables in the equation. In our example, this would be a unit of  measurement that would be common to weight and height. Beta weights are useful  because then you can compare two variables that are measured in different units,  as are height and weight.

If the regression coefficient is positive, then there is a positive  relationship between height and weight. If this value is negative, then there is  a negative relationship between height and weight. We can more specifically  determine the relationship between height and weight by looking at the beta  coefficient for weight. If the beta = .35, for example, then that would mean  that for one unit increase in weight, height would increase by .35 units.  If  the beta=-.25, then for one unit increase in weight, height would decrease by  .25 units. Of course, this relationship is valid only when holding gender  constant.

A similar procedure would be done to see how well gender predicted height.  However, because gender is a dichotomous variable, the interpretation of the  printouts is slightly different. As with weight, you would check to see if  gender was a significant predictor of height, controlling for weight. The  difference comes when determining the exact nature of the relationship between  gender and height.  That is, it does not make sense to talk about the effect on  height as gender increases or decreases, since gender is not a continuous  variable (we would hope). Imagine that gender had been coded as either 0 or 1,  with 0 = female and 1=male. If the beta coefficient of gender were positive,  this would mean that males are taller than females. If the beta coefficient of  gender were negative, this would mean that males are shorter than females.  Looking at the magnitude of the beta, you can more closely determine the  relationship between height and gender. Imagine that the beta of gender were  .25. That means that males would be .25 units taller than females. Conversely,  if the beta coefficient were -.25, this would mean that males were .25 units  shorter than females.  Of course, this relationship would be true only when  controlling for weight.

As mentioned, the significance levels given for each independent variable  indicates whether that particular independent variable is a significant  predictor of the dependent variable, over and above the other independent  variables. Because of this, an independent variable that is a significant  predictor of a dependent variable in simple linear regression may not be  significant in multiple regression (i.e., when other independent variables are  added into the equation). This could happen because the variance that the first  independent variable shares with the dependent variable could overlap with the  variance that is shared between the second independent variable and the  dependent variable.  Consequently, the first independent variable is no longer  uniquely predictive and thus would not show up as being significant in the  multiple regression. Because of this, it is possible to get a highly significant  R2, but have none of the independent variables be significant.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: