| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Copy of EPSY6210_20100201

Page history last edited by PatriciaSimpson@unt.edu 13 years, 1 month ago

EPSY 6210

02.01.2010

(2nd class)

 

normal curves can take lots of different shapes (get rid of the "bell curve" idea)

  • normal distribution is a symmetrical curve

 

Turned in First Homework:

here's what we learned from it...

 

additive constants (alter dataset by adding/subtracting a specific amount): 

  • mean = goes up (or down) by the same amount as the constant added/subtracted to/from the dataset
  • SD = same (their variability is the same)
  • SD for population = same (see above)
  • skewness = same (it doesn't have stronger or weaker central tendency--same PATTERN on graph); if you have a normal (symmetrical) distribution, then your skewness should be 0)
  • kurtosis = same (measures normality; may be calculated differently in SPSS or other methods)
  • covariance = same
  • pearson's r = same
    • this is fairly clearly indicated by the scatterplot (graph)

 

multiplicative constants:

  • mean = multiplied by the amount of the constant

  • SD = multiplied by the amount of the constant

  • SD for population = multiplied by the amount of the constant, then absolute value (essentially; actually done by squaring) --statistical difference (spreadoutness) can't be negative.

  • skewness = same (but possibly with a different positive/negative sign; the pattern won't change, in shape, but may change in direction) (multiplying doesn't change central tendency/symmetry)

    • remember: skewness is positive when the tail goes UP by the HIGHER numbers

  • kurtosis = same (multiplying doesn't change normality ("shape")) (doesn't change much--it does slightly change the distribution) (???)

  • covariance = multiplied by the amount of the constant (can see in the scatterplot--the spreadoutness between the Y datapoints is larger, although the pattern is the same)

    • covariance is similar to variance (which is SD squared, or simply without the last square root step)

    • covariance = the variance of two variables, crossed together (what is the pattern of their relationship together)

    • the covariance will be affected by a change in the SD, because it does not remove the influence of SD anywhere in its formula

    • if multiplied by a negative constant, it will change direction on the scatterplot (and thus be negative covariance result)

  • pearson's r = same (because we removed the influence of the SD)

    • if multiplied by a negative constant, it will change direction on the scatterplot (and thus be negative covariance result)

 

additive & multiplicative constants together:

  • it changes first by the additive constant, then by the multiplicative one (so see above)

 

more info on these concepts:

  • Z scores = remove the influence of the SD (divide by SD--that's how you remove its influence)
    • the property of all Z scores: the mean = 0 and the SD = 1.
    • we standardize Z scores to make that happen; this puts the scores in a standard metric
  • SD = shows the datapoints relation to the rest of the dataset (gives it context and thus meaning)
  • pearson r (correlation) = standardized covariance (put the scores/datapoints on the same metric--can compare different groups of data now.) -- you standardize it by removing the influence of the SDs... to put it in a broader (standardized) context.
    • property of all r = ranges from -1 to +1
    • because pearson's r is standardized (and covariance is not), pearson's r is used more often in journal articles--you can't compare variables using just the covariance. (Because r falls within 1 to -1, it's easy to see the r in an article and understand what that means (a strong correlation (1), a weak/non-existent correlation (0), or an inverse correlation (-1)).)

 

"Some people like to think about this crap."  (lol)

 

when in doubt, graph something.

  • if x = 1, 2, 3, 4
  • and y = 2, 4, 6, 8
  • then the sample size is  4 (four people tested on 2 variables)
  • these variables are positively related (they go up together)
  • graph it: a straight line, going up a tad steeply
  • it's a perfect pattern; as x goes up, y doubles
  • if you can do mathematical stuff to x (like multiplying by 2), and turn it into y exactly, you have a perfect relationship
    • in this case, 2x = y
    • in another instance, 2x + 1 = y (linear equation; because when graphed, it makes a line)
  • a few cases can dramatically influence our results (on a graph, a few more datapoints can change a curvilinear pattern (best fit) to a linear pattern (line of best fit)).

 

this points to a major point of multiple regression: it's all about predicting a relationship between variables

  • we want to hammer at x until we can find a value (prediction) as close to y as possible.
  • y with a carrot on top ("y hat") = predicted score
    • should be as close to y as possible IF you want a strong, positive correlation.
    • if the pattern/relationship is NOT perfect (linear), then r = less strong

 

r = if x gets bigger as y gets bigger, then r is between 0 and +1 (positive value).

 

centroid (cartesian coordinate): on the graph, the plotted mean of x and the plotted mean of y 

  • the line of best fit MUST go through the centroid (even though it might not go through any others)
  • the means of x and y are the numbers that best represent x and y... therefore the line of best fit must go thru both means.

 

line of best fit:

  • goes thru centroid
  • trying to get x as close to y as possible (better predictor, thus a more perfect line and thus closer to r = 1)
    • y hat = a + bx ...so we can USE this formula to calculate y hats (predicted scores) for future x scores (no error and y hat = y?  then r = 1)
  • y - (y hat) = "error" = tells us how close a prediction we have
    • this can be negative or positive, depending on its direction (if y hat is bigger or smaller than y)
  • four variables: x, y, y hat, and error
  • plot x with y hat and you get a line (because it's a linear equation)

 

all of our analyses (this semester) are related to each other;

  1. they are all correlational
  2. they all yield r-squared-type effect sizes
  3. they all apply weights to observed variables to create synthetic (unobserved) variables
  4. the synthetic/unobserved variables become the focus of the analysis

 

SPSS/PASW

  • "what's the relationship between x and y?" (pearson r)

  • "does x predict y?" (regression; this is more relevant when we have more than one predictor)

    • need to know how to calculate y hat and error scores in SPSS...

    • i want to take x and predict y; i want to perform a regression

  • SPSS: analyze - descriptive stats - descriptives - calculates various descriptive stats  (we've done this before)

  • tip: click/look around and try to find things (he won't always tell us how to calculate things in SPSS)

  • save the file: format is ".sav" for data files (the "Daddy" file)
  • SPSS (regression) = analyze - regression - linear - (y is usually dependent variable--the outcome of interest); (x is the independent variable or predictor) - statistics (pick descriptive)
    • then click SAVE (y hat = predicted values, unstandardized; error = residuals, unstandardized (standardized forms transform them into Z scores))
    • then click PASTE (NOT "ok") -- this builds a command file
    • "Syntax Editor" (syntax or command file) opens in a new window
    • can add comments to show myself what I'm doing by typing an asterisk and space, then comment, and end each comment/command with a period (* .)
    • file, save as: (give it same name as the datafile--different extension); file extention = .sps (the "Mommy" file)
    • need to put the Daddy & Mommy files together to get a "Baby" file
      • highlight what you want to run; click the "play" arrow button on the top menu (it runs the selection)
    • that produces the Output file (the "Baby" file)
      • ANOVA = really our regression summary table; (regression = between; residuals = within)
    • click back to the dataset; PRE = y hat; RES = error
      • if you run the same regression a second time, you'll get two more variable columns, appended with "2" instead of "1" -- values will be the same, simply calculated them a second time
      • it's lined up in the dataset table just like you'd calculate it:  y - pres (yhat) should = res (error)
      • if you want to check it graphically, graphs - scatterplot - for the variables you want to check
        • --add fit line - linear - apply (ignore the curves)

 

TO DO:

  • email Paul and ask him to put SPSS/PASW on work computer
    • let Annie know what he says
  • email Dr. Henson about graphical book on figuring out basic statistical concepts
  • read handouts from Dr. Henson (last week, and workshop)
  • do homework
  • look at book from Annie / library
  • look at textbook...
  • start research topic -- paper / project

 

 

Comments (0)

You don't have permission to comment on this page.