Multicollinearity

It was the nightmare of last week. Stata said “multi collinearity between variables” I tried with excluding one variable and than another and the problem was solved at last. We determined the highly collinear variables with “correlation matrix” http://www.stata.com/help.cgi?correlate

After that variables that has p>t which are bigger than 0,1 became problem. I combined some variables and excluded some again. We are continuing. (I must add something for interpreting frontier and regression results)

Anyway, wanted to share “what is collinearity” From http://dss.princeton.edu/online_help/analysis/regression_intro.htm

Multicollinearity is a condition in which the IVs are very highly correlated  (.90 or greater) and singularity is when the IVs are perfectly correlated and  one IV is a combination of one or more of the other IVs.  Multicollinearity and  singularity can be caused by high bivariate correlations (usually of .90 or  greater) or by high multivariate correlations. High bivariate correlations are  easy to spot by simply running correlations among your IVs. If you do have high  bivariate correlations, your problem is easily solved by deleting one of the two  variables, but you should check your programming first, often this is a mistake  when you created the variables. It’s harder to spot high multivariate  correlations.  To do this, you need to calculate the SMC for each IV. SMC is the  squared multiple correlation ( R2 ) of the IV when it serves as the DV which is  predicted by the rest of the IVs. Tolerance, a related concept, is calculated by  1-SMC. Tolerance is the proportion of a variable’s variance that is not  accounted for by the other IVs in the equation. You don’t need to worry too much  about tolerance in that most programs will not allow a variable to enter the  regression model if tolerance is too low.

Statistically, you do not want singularity or multicollinearity because  calculation of the regression coefficients is done through matrix inversion.  Consequently, if singularity exists, the inversion is impossible, and if  multicollinearity exists the inversion is unstable. Logically, you don’t want  multicollinearity or singularity because if they exist, then your IVs are  redundant with one another. In such a case, one IV doesn’t add any predictive  value over another IV, but you do lose a degree of freedom. As such, having  multicollinearity/ singularity can weaken your analysis.  In general, you  probably wouldn’t want to include two IVs that correlate with one another at .70  or greater.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: