EPSY6210_20100322


EPSY 6210, 03.22.2010

(last time, finished going thru handout and output file on health/doctor visits, etc.)

 

SCHEDULE ADJUSTMENTS

 

Review

 

Case Four Regression

one or more predictors that are correlated with a dependent variable (and they may be correlated with each other as well) and one or more predictor variables that are not correlated with the dependent variable BUT which are correlated with the other predictors (suppressor variables) and they help the earlier predictors do a better job of explaining variance (predicting).

 

suppression = is a good thing!

 

your Pearson's r between your y and your x-suppressor = 0 

 

what is the structure coefficient for the x-suppressor? = 0 

 

beta weight for the x-suppressor? = NOT 0

 

Horst first discovered this suppression stuff around WWII-era

you don't usually look for suppression as an expectation; it's something you discover as you go along.

statistics isn't about prophesying, but about trial and error until things make sense.

 

          beta              r-s (structure coefficient)   squared structure coefficient 

x1     .30                    .10                                             .01

x2     .58                   .48                                              .23

x3     .22                   .41                                              .17

 

 

to identify suppressors:

 

<<< BREAK >>>

 

effect size statistics

 

both groups are related some can be transformed into each other between groups.

 

we haven't talked about corrected effect sizes. so far we have talked about un-corrected effect sizes.

 

uncorrected = they are all biased. the uncorrected statistics are probably over-estimates of what you would get in a future sample (or the population).

 

generalizability matters; we want our stats to be replicable in future studies.

 

why might our r-squared for our particular sample be higher than it might be in a future study?

 

OLS analyses, OLS regression = Ordinary Least Squares

 

sampling error has a direct impact on where the regression line is drawn

 

there are different kinds of corrections that you can make to r-squared.

 

there are two levels of sampling error:

  1. the current sample
  2. and for the future sample

 

what affects sampling error? (all these corrections assume a truly random sample)

  1. n (size of the sample); the larger the sample, the lower the sampling error; as n goes down, we need to correct more 
  2. number of variables (k) you've measured; the more you measure, the greater changes you have of error
  3. theoretical population effect; population r-squared that includes future groups to be measured; the effect you would get if you measured everyone in the population. as the theoretical population effect is bigger, there is less sampling error.
    1. does total GRE score predict grad-level GPA? -- for all students in the US.  assume that it's perfect, r-squared (1).
      1. what might the scatterplot look like? all points would be on the regression line. (population effect would be perfect).
      2. sample 20 people from that population: what r-squared would you expect? 1; because the r-squared of the population is PERFECT, then inherently any sample from that population is also PERFECT.
      3. (doesn't even matter how you choose your sample, in all cases it would be 1.)
    2. what is the population r was 0? you don't know what a sample would be from that population of no effect.
    3. as the population effect goes down, the sampling error goes up.

 

there are many forms of corrections; we're going to look at just one.

 

this is a theoretical correction, therefore there are different ways to produce a correction formula;

some may be more or less accurate based on the situation.

 

the Ezekiel correction (or Whrry correction) is what we're looking at. (this is what SPSS uses.)

 

(using data from first homework to calculate shrinkage.)

 

MIDTERM