If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

EPSY6210_20100222

Page history last edited by Starr Hoffman 14 years, 1 month ago

EPSY 6210

02.22.2010

review of last week's 3-D example for multiple regression
youre trying to find out if x1 and x2 can predict y
a plane of best fit, rather than a line of best fit
both the slope for x1 and the slope for x2 affect this plane

More on Case Two Multiple Regression

in area world, squared metric: Venn diagram for y, x1, and x2 (the SOS for each of these)

r-squared for x1 and r-squared for x2 is .25 (each explains 25% of y)
xi and x2 DO NOT overlap... what does this mean?
r-squared for x1,x2 is 0
overall variance explained is .5 (50%) = multiple R-squared (R-squared for y,x1,x2); the R is capitalized to signify this "multiple" relationship
R-squared = r-squared1 + r-squared2, IF the predictors are orthoganal or unrelated
what is the shaded-in area (the two r-squared areas on the Venn diagram)? = SOSy-hat
3 observed variables (y, x1, x2), 1 error variable, 1 y-hat variable
get the y-hat score from the regression equation
- y-hat = a + b1x1 + b2x2 (TWO slopes to account for); its unstandarized
- standardized = y-hat = Beta1x1 + Beta2x2
  - Beta1 = r1; Beta2 = r2, IF the predictors are orthoganal or unrelated (neither Beta/r relationship is affected by the other)
    - in that case, this tells what what the relationship between your predictors is
if a Beta weight = 0 in any situation (EVEN when predictors are orthagonal or unrelated),,, THEN none of the variance in x is useful to predict the outcome -- its not just that there is no relationship between x and y, but that it is USELESS in that situation (so we multiply it by 0 in the regression equation)
R-squared = r-squared of y-hat (y-hat is just the cumulative r-squared, essentially r-squaredx1 + r-squaredx2)
- thus, r-y-hat the (square root of r-squared-y-hat) = .71
- we've added both predictor variables (x1, x2) into one synthetic variable, y-hat
- y-hat's Pearson r is the relevant thing we're trying to get to, the relationship between y-hat and the dependent variable y
- how do the two predictors (x1, x2) relate to the variance that was explained (y-hat. or the portion of y that has been explained by x1 and x2)?
  - they each make up half of it
- thus, a structure coefficient = a correlation (Pearson r) between an observed variable and a synthetic variable. (noted as r-sub-s)
  - in Area World, its a squared structure coefficient
  - if one Beta has a 0 and the other Beta has a 1, then the 0 Beta is useless for predicting y, and the 1 Beta is useful for predicting y
  - (SIDE NOTE ABOUT BETA: If there is more than one predictor variable, then the literature tends to use Beta more often)
  - instead of asking about the relationship between the predictor and y (r-squared or Beta), the squared structure coefficient (r-sub-s) tells about the relationship between the predictor and y-hat
  - y-hat is structured (structure coefficient) of x1 plus x2
    - IF the predictors are orthagonal (unrelated), then r-sub-s1 + r-sub-s2 should ALWAYS = 1
  - how can you find out the correlation between the self-esteem and IQ of the 12 people here?
    - compute the Pearson r between self-esteem and IQ
  - the structure coefficient is the Pearson r between an observed and synthetic variable
    - correlate the values of y-hat and x1 in the computer (calculate the Pearson r in SPSS)
    - if we square that Pearson r, we get the squared structure coefficient
      - that gives us the amount of y-hat that is accounted for by the variance in x1
      - the more you re-read the Thompson Beta weights paper, the more this will make sense

writing up this example on a Venn diagram using graph paper...
- SOSy = 6
- SOSx1 = 8 (1 explains y)
- SOSx2 = 12 (2 explains y)
- SOSy-hat = 3
- predictors: x1 (height), x2 (age), y (GPA)
- so the stats are...
  - r-squaredyx1 = .17 (1 divided by 6)
  - r-squaredyx2 = .33 (2 divided by 6)
  - together, x1 and x2 account for 50% of the variance in y (R-squared or multiple-R-squared = .5)
  - r-squaredx1x2 = 0 (they are orthagonal or not related)
  - r-sub-s1 = .33 (how much of the effect (area of y-hat; amount of the variance in y explained) is due to x1, 1 divided by 3)
  - r-sub-s = .67 (same question for x2)
  - the squared structure coefficient is a SECONDARY process in SPSS; correlate your predictors with y-hat (pearson r for y-hat and predictor, then square it manually)
  - r-sub-s1 = ryx1 divided by R (multiple-r; the square-root of R-squared)
  - r-sub-s1-squared = r-squared-yx1 divided by R-squared
    - in SPSS.... R is on the regression; r-squared-yx1 is a pearson r separate from the regression
    - since Beta = r, then y-hat = (square-root of r-squared-yx1)x1 + (square-root of r-squared-yx2x2
    - in this case, y-hat = .41x1 + .57x2
    - the Latoya (LT) equation: R-squared-y-x1-x2 = .50 = r-squared-y-x1 + r-squared-y-x2 = Beta-squared-x1 + Beta-squared-x2

breaktime

Case Three Multiple Regression (aka real-world scenario)

Case Three Multiple Regression: two or more predictors that are correlated in some way (their correlation is what separates this from Case Two Multiple Regression, which is not real-world scenario)

Example:

SOSy = 6
SOSx1 = 6 (2 explains y; 2 correlates with x2 but only 1 of that also explains y)
SOSx2 = 6 (3 explains y; 2 correlates with x1 and of that 1 also explains y)
R-squared = .5 (50% of y can be explained by both x1 and x2 together)
r-sub-s-squared-1 = .67
r-sub-s-squared-2 = .67
- some of y-hat can be explained by both predictor variables, THUS the two structure coefficients add up to MORE THAN 1
because the predictors aren't orthagonal, Beta NO LONGER = r
Beta weights arbitrarily carve up the effect and cannot be equally divided between the predictors
- THUS we must use the squared structure coefficients to determine the Beta weights
new equation: R-squared = Beta1(r-y-x1) + Beta2(r-y-x2) = the Knowles formula
- when the predictors were orthagonal, we were saying R-squared = (r*r) + (r*r) = (Beta*Beta) + (Beta*Beta); this uses the same principle, adjusted because we have non-orthagonal predictors
- Beta times r works now instead of Beta-squared (since Beta no longer = r)
- WHY DOESN'T BETA = r? BECAUSE: r-x1 is x1's relationship with y; Beta-x1 can't account for ALL of that relationship because x1 and x2 FIGHT over the part of their relationship with y that they SHARE or both account for (where x1 and x2 are correlated)
- Beta1 = [r-y-x1 - (r-y-x2)(r-x1-x2)] DIVIDED BY [1 - r-squared-x1-x2]
- Beta2 = [r-y-x2 - (r-y-x1)(r-x1-x2)] DIVIDED BY [1 - r-squared-x1-x2]
  - now we can mathematically see that when there is no correlation between x1 and x2 (r-x1-x2 = 0), that Beta1 = r-y-x1
  - if they are perfectly correlated (r = 1), then because r-squared-x1-x2 = 0, then the equation gives you an error, because you cant divide by 0... so the Beta weight isn't computable (there is no Beta weight). But this will NEVER HAPPEN IN THE REAL WORLD.
- to say that that a predictor has a ~~SMALL effect~~ because of a small Beta weight is a WRONG interpretation of that data; DO NOT *only* interpret Beta weights!!!
- what matters in determining how much x1 and x2 explain of the dependent variable (three things): x1 and y, x2 and y, and x1 and x2
- the fact that these two predictors overlap and are related in some way DOES NOT NECESSARILY MEAN that they have an interaction effect

HOMEWORK THAT IS DUE NEXT WEEK (assignment #3)

opening SPSS and trying this out...

first step (determining relationship between v1 and all of your predictors): running a regression
- 3 predictors (v2, v3, v4), 1 outcome (v1)
- step one: bivariate correlation -- get Pearson r's on everything
- THEN run multiple regression (you will automatically get b's and Beta weights)
- SPSS: analyze (correlations for all 4 variables), click PASTE not OK
- add comment: ** assignment 3, step 1. ** run correlations among the variables.
- now for the regression: go to regression, then linear, then v1 is dependent variable--other 3 are independents (predictors), hit PASTE not OK
- now add comment: ** run multiple regression with 3 predictors.
- now highlight all syntax and RUN it.
- on the "coefficients" section under "B" where it says "constant" instead of a variable name, that's your A (y-intercept)
- if you get weird scientific notation on B, and Beta = 0, essentially b = 0 (there is no slope--it's a straight line without slope)

note: there is no "How to do SPSS" handout although it's mentioned in the homework
step 2:
- use the unstandardized weights to compute synthetic variables (y-hat scores and error scores)
- compute y-hat first, since y minus y-hat = error scores
- compute yhat=a+(b2*v2)+(b3*v3)+(b4*v4).
  - one example: y-hat= 2.2 + (0*v2)+(.2*v3)+(.3*v4).
  - you have all this information (a and b2, b3, b4) in your "coefficients" readout in the Output file
  - DON'T FORGET to end that command line with a PERIOD
- compute e=v1-yhat.
- EXECUTE.
- now go back to the dataset, and you will be given scores for yhat and for error