Tuesday, 15 February 2011

Lecture 1: Regression and Multiple Regression

Questions can be answered by looking at correlation or by looking at regression analysis

Typical regression analysis
  1. make a scatterplot to see if there is a relationship
  2. Regression analysis
  3. look for problems with data including normality, outliers etc

y = value
y(hat) = predicted value for the point

a point with a high influence is one with high leverage and large outlier. Cooks D can be used to assess for large influence values.
a point cannot be deleted without having good reasons. The results might be analysed with and without the point.
An outlier might have an influence on the residual but not the slope, unless it is at one end of the data set.

Rules of thumb for outliers
studentized residuals = maximum 2-3 units of SD
Cooks Distance <1>

Mathematical transformation changes the mathematical description of the data. The difference between the points is not different.
Typical transformations: squaring, square root, logarithmic, or exponential
Squaring or cubing is useful if data is negatively skewed. It makes data more normally distributed
Using square root, or natural logarithm is useful if data is positively skewed