• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

• Get control of your email attachments. Connect all your Gmail accounts and in less than 2 minutes, Dokkio will automatically organize your file attachments. You can also connect Dokkio to Drive, Dropbox, and Slack. Sign up for free.

View

# Copy of EPSY6210_20100201

last edited by 9 years, 5 months ago

# EPSY 6210

02.01.2010

(2nd class)

normal curves can take lots of different shapes (get rid of the "bell curve" idea)

• normal distribution is a symmetrical curve

## Turned in First Homework:

here's what we learned from it...

• mean = goes up (or down) by the same amount as the constant added/subtracted to/from the dataset
• SD = same (their variability is the same)
• SD for population = same (see above)
• skewness = same (it doesn't have stronger or weaker central tendency--same PATTERN on graph); if you have a normal (symmetrical) distribution, then your skewness should be 0)
• kurtosis = same (measures normality; may be calculated differently in SPSS or other methods)
• covariance = same
• pearson's r = same
• this is fairly clearly indicated by the scatterplot (graph)

multiplicative constants:

• mean = multiplied by the amount of the constant

• SD = multiplied by the amount of the constant

• SD for population = multiplied by the amount of the constant, then absolute value (essentially; actually done by squaring) --statistical difference (spreadoutness) can't be negative.

• skewness = same (but possibly with a different positive/negative sign; the pattern won't change, in shape, but may change in direction) (multiplying doesn't change central tendency/symmetry)

• remember: skewness is positive when the tail goes UP by the HIGHER numbers

• kurtosis = same (multiplying doesn't change normality ("shape")) (doesn't change much--it does slightly change the distribution) (???)

• covariance = multiplied by the amount of the constant (can see in the scatterplot--the spreadoutness between the Y datapoints is larger, although the pattern is the same)

• covariance is similar to variance (which is SD squared, or simply without the last square root step)

• covariance = the variance of two variables, crossed together (what is the pattern of their relationship together)

• the covariance will be affected by a change in the SD, because it does not remove the influence of SD anywhere in its formula

• if multiplied by a negative constant, it will change direction on the scatterplot (and thus be negative covariance result)

• pearson's r = same (because we removed the influence of the SD)

• if multiplied by a negative constant, it will change direction on the scatterplot (and thus be negative covariance result)

• it changes first by the additive constant, then by the multiplicative one (so see above)

• Z scores = remove the influence of the SD (divide by SD--that's how you remove its influence)
• the property of all Z scores: the mean = 0 and the SD = 1.
• we standardize Z scores to make that happen; this puts the scores in a standard metric
• SD = shows the datapoints relation to the rest of the dataset (gives it context and thus meaning)
• pearson r (correlation) = standardized covariance (put the scores/datapoints on the same metric--can compare different groups of data now.) -- you standardize it by removing the influence of the SDs... to put it in a broader (standardized) context.
• property of all r = ranges from -1 to +1
• because pearson's r is standardized (and covariance is not), pearson's r is used more often in journal articles--you can't compare variables using just the covariance. (Because r falls within 1 to -1, it's easy to see the r in an article and understand what that means (a strong correlation (1), a weak/non-existent correlation (0), or an inverse correlation (-1)).)

### when in doubt, graph something.

• if x = 1, 2, 3, 4
• and y = 2, 4, 6, 8
• then the sample size is  4 (four people tested on 2 variables)
• these variables are positively related (they go up together)
• graph it: a straight line, going up a tad steeply
• it's a perfect pattern; as x goes up, y doubles
• if you can do mathematical stuff to x (like multiplying by 2), and turn it into y exactly, you have a perfect relationship
• in this case, 2x = y
• in another instance, 2x + 1 = y (linear equation; because when graphed, it makes a line)
• a few cases can dramatically influence our results (on a graph, a few more datapoints can change a curvilinear pattern (best fit) to a linear pattern (line of best fit)).

this points to a major point of multiple regression: it's all about predicting a relationship between variables

• we want to hammer at x until we can find a value (prediction) as close to y as possible.
• y with a carrot on top ("y hat") = predicted score
• should be as close to y as possible IF you want a strong, positive correlation.
• if the pattern/relationship is NOT perfect (linear), then r = less strong

r = if x gets bigger as y gets bigger, then r is between 0 and +1 (positive value).

centroid (cartesian coordinate): on the graph, the plotted mean of x and the plotted mean of y

• the line of best fit MUST go through the centroid (even though it might not go through any others)
• the means of x and y are the numbers that best represent x and y... therefore the line of best fit must go thru both means.

line of best fit:

• goes thru centroid
• trying to get x as close to y as possible (better predictor, thus a more perfect line and thus closer to r = 1)
• y hat = a + bx ...so we can USE this formula to calculate y hats (predicted scores) for future x scores (no error and y hat = y?  then r = 1)
• y - (y hat) = "error" = tells us how close a prediction we have
• this can be negative or positive, depending on its direction (if y hat is bigger or smaller than y)
• four variables: x, y, y hat, and error
• plot x with y hat and you get a line (because it's a linear equation)

all of our analyses (this semester) are related to each other;

1. they are all correlational
2. they all yield r-squared-type effect sizes
3. they all apply weights to observed variables to create synthetic (unobserved) variables
4. the synthetic/unobserved variables become the focus of the analysis

## SPSS/PASW

• "what's the relationship between x and y?" (pearson r)

• "does x predict y?" (regression; this is more relevant when we have more than one predictor)

• need to know how to calculate y hat and error scores in SPSS...

• i want to take x and predict y; i want to perform a regression

• SPSS: analyze - descriptive stats - descriptives - calculates various descriptive stats  (we've done this before)

• tip: click/look around and try to find things (he won't always tell us how to calculate things in SPSS)

• save the file: format is ".sav" for data files (the "Daddy" file)
• SPSS (regression) = analyze - regression - linear - (y is usually dependent variable--the outcome of interest); (x is the independent variable or predictor) - statistics (pick descriptive)
• then click SAVE (y hat = predicted values, unstandardized; error = residuals, unstandardized (standardized forms transform them into Z scores))
• then click PASTE (NOT "ok") -- this builds a command file
• "Syntax Editor" (syntax or command file) opens in a new window
• can add comments to show myself what I'm doing by typing an asterisk and space, then comment, and end each comment/command with a period (* .)
• file, save as: (give it same name as the datafile--different extension); file extention = .sps (the "Mommy" file)
• need to put the Daddy & Mommy files together to get a "Baby" file
• highlight what you want to run; click the "play" arrow button on the top menu (it runs the selection)
• that produces the Output file (the "Baby" file)
• ANOVA = really our regression summary table; (regression = between; residuals = within)
• click back to the dataset; PRE = y hat; RES = error
• if you run the same regression a second time, you'll get two more variable columns, appended with "2" instead of "1" -- values will be the same, simply calculated them a second time
• it's lined up in the dataset table just like you'd calculate it:  y - pres (yhat) should = res (error)
• if you want to check it graphically, graphs - scatterplot - for the variables you want to check
• --add fit line - linear - apply (ignore the curves)

# TO DO:

• email Paul and ask him to put SPSS/PASW on work computer
• let Annie know what he says
• email Dr. Henson about graphical book on figuring out basic statistical concepts
• read handouts from Dr. Henson (last week, and workshop)
• do homework
• look at book from Annie / library
• look at textbook...
• start research topic -- paper / project