Psychology 6140

The SAS input files are linked on this page; some R and SPSS versions are available on the Hebb server or the web.

For **TWO** of these problems,

- perform the analyses guided by the Questions section, some more detailed than others.
- answer the questions, and,
- prepare a Results/Discussion section, with description of your methods of analysis, results and your interpretation, and any accompanying figures and/or tables deemed necessary, suitable for a research report on the data.

For ease of reading, please format your paper
**with figures and tables
presented inline** where possible, rather than as a manuscript submission,
where figures and tables generally appear at the end.
If you use R and R Studio, you may find it convenient to write your
reports using R markdown ,
which allows you to mix normal writing with R code and output.

Weight and body girth measures were taken at three occasions: 9 weeks, 3 months, and 1 year. The weight measures were first expressed as a percentage overweight value, taking the persons height and age into account. These overweight percentages were then expressed as a percentage change on each occasion, relative to the initial baseline value taken prior to the start of the course. For example, a value of -5.5 for OW9 means a 5.5% decrease in the overweight percentage at 9 months, relative to the overweight percentage at baseline. Similarly, the girth measures (the 3 months values to be analyzed here) were expressed as percentage change from the baseline values.

COND - Condition: 1=Experimental, 2=Control STATUS - Slimming status: 1=Experienced, 2=Novice CLINIC - A or B OW9 OW3 OW1 - change in overweight percentage, at 9 weeks, 3 months, 1 year, relative to baseline. BUST -- ARM - percentage change in various girth measures at 3 months vs. baseline.

- The researchers first concern was with the overweight variables. Carry out a multivariate analysis to determine if mean differences exist in the OW variables according to the between-S variables. Perform a parallel analysis treating the OW variables as a repeated measure factor (for this purpose, assume the measures were equally spaced in time). Summarize and contrast the results of these analyses.
- The researchers next wished to assess the impact of condition and status on the girth change measures; in particular they wished to know if the behavioural manual or slimming status had differential effects on the measures at different body locations. Perform an appropriate analysis to answer these questions. Summarize these results and describe how your analysis relate to the questions asked.
- A final question is whether weight change measures add anything to the analysis of treatment and status effects. Repeat the analysis performed for the previous question, but enter the overweight measures as covariates (predictors).

A study was carried out with two randomly assigned groups of
Alzheimer's patients, one group being given lecithin and the other
given a placebo over a 6 month period. To assess memory functioning
in a sensitive way, two types of free recall tests were given to each
subject at each of five times: 0, 1, 2, 4, and 6 months. In the
first type the * same* words were repeated at each test; in the
second, * different* but equivalent words were used each time.
Hence, differences in performance on the two types of tests should be
attributable to long-term learning.

The design therefore has one between-S factor and two within-S factors. The major question is whether the difference in performance on the two test types is the same or not for the two groups.

Scores on the repeated test are denoted A1 - A5; scores on the non-repeated test are B1 - B5. Each score is the number of words recalled out of 30. Group is coded 1 = Placebo, 2 = Lecithin.

- Examine the data for multivariate outliers, and examine the need for a transformation of these variables to approximate symmetry. [Since the data are counts out of a maximum, they are analogous to proportions.]
- Carry out the complete repeated measures analysis for the 3-way design for these data, with appropriate tests for (a) whether assumptions of the univariate (mixed model) analysis are met; and (b) polynomial trends for the TIME factor. Note that the times points are unequally spaced, so you will have to include the time values in the REPEATED statement.
- In a data step construct new variables,
ABAR = mean( of A1-A5 ); BBAR = mean( of B1-B5 ); SBAR = mean( of A1--B5 ); ABDIF= ABAR - BBAR;

Carry out a univariate analysis of group differences on each of the variables ABAR BBAR SBAR ABDIF and show how these relate to the analyses carried out in step (2).

The response variable Y= `died` indicates
whether the subject was alive (0) or dead (1) when he/she left ICU.
The remaining variables in columns 3-21 are the predictor
variables. The binary predictor variables are all coded so that
the value 1 corresponds to a possible risk factor.
In addition, the 3-level variable `race` has been supplemented
by a binary variable `white` and the variable `coma`
supplemented by a binary variable `uncons = (coma>0)`
A code sheet for the variables is provided in **Table 1**. You can find the
data in `N:\data\icu.sas`, as a SAS input file,
as an SPSS input file, `N:\data\icu.sps`,
an SPSS system file, icu.sav
and in `N:\data\icu.dat` as a plain ASCII data file (for use with
any other statistics package.
In R, the data set `ICU` is contained in the vcdExtra package.

Column | Description | Codes/Values | Variable |
---|---|---|---|

1 | Identification Code | ID Number | id |

2 | Vital Status | 0 = Lived 1 = Died | died |

3 | Age | Years | age |

4 | Sex | 0 = Male 1 = Female | sex |

5 | Race | 1 = White 2 = Black 3 = Other | race |

6 | Service at ICU admission | 0 = Medical 1 = Surgical | service |

7 | Cancer part of present problem | 0 = No 1 = Yes | cancer |

8 | History of Chronic Renal Failure | 0 = No 1 = Yes | renal |

9 | Infection Probable at ICU admission | 0 = No 1 = Yes | infect |

10 | CPR prior to ICU admission | 0 = No 1 = Yes | cpr |

11 | Systolic blood pressure at ICU admission | mm Hg | systolic |

12 | Heart Rate at ICU admission | beats/min | hrtrate |

13 | Previous ICU admission within 6 mths. | 0 = No 1 = Yes | previcu |

14 | Type of admission | 0 = Elective 1 = Emergency | admit |

15 | Fracture: Long bone, multiple, neck, Single area or Neck | 0 = No 1 = Yes | fracture |

16 | PO_{2} from initial Blood Gases | 0 =>60 1 =~=\!60 | po2 |

17 | PH from initial Blood Gases | 0 =~=\!7.25 1 =<7.25 | ph |

18 | PCO_{2} from initial Blood Gases | 0 =~=\!45 1 =>45 | pco |

19 | Bicarbonate from initial Blood Gases | 0 =~=\!18 1 =<45 | bic |

20 | Creatinine from initial Blood Gases | 0 =~=\!2.0 1 =>2.0 | creatin |

21 | Level of Consciousness at ICU admission | 0 = No Coma/Stupor 1 = Deep Stupor 2 = Coma | coma |

- Run a logistic regression using all 19 predictor variables --
age sex white service cancer renal infect cpr systolic hrtrate previcu admit fracture po2 ph pco bic creatin uncons

Which variables appear to be strong predictors of survival? Which variables appear to be unnecesary? Are there any variables which, on logical grounds, should be included in any model? - Use forward, backward, or stepwise selection to determine a minimal model where all predictors are individually significant at the 0.05 level, but perhaps forcing any variables you consider necessary to remain. Compare this model to your model with all variables in terms of goodness of fit and lack of fit.
- Investigate whether any quadratic terms (in a quantitative predictor) or interaction terms are necesary among those predictors in your model from the previous step.
- Without further analsis (or biological background) does the model seem reasonable? Which terms did you expected to see?
- In a sample of this size and heterogeneity, it may be considered likely
that there are one or more cases of high leverage or influence.
Find the 3-6 cases with the largest value of Cook's D (the C statistic
in SAS) and interpret the nature of their influence on your model.
[Hint: Use the
`%inflogis`macro.] - Investigate the predicted probability of death while in the ICU for the patients in this sample. Use your final model to obtain predicted probabilities. Plot Pr(died) as a function of age, with other variables held fixed at high or low values of some of your important risk factors.
- Given the level of success of your model in predicting a patient's outcome, how might this study be re-done to increase the accuracy of prediction? Is there an alternative design, or other variables which should be measured?

Holzinger & Swineford (1939) gave 24 tests of a variety of psychological abilities to junior high school students at two schools. These data are typical of the kinds of ability tests which have been used throughout the history of factor analysis and are one of the most widely studied sets of correlations in the factor analysis literature. The factor analytic problem is to determine the number and kind of dimensions or latent abilities which may be used to describe the correlations among these tests. Sample test items from the tests are given in HolzingerSwineford.pdf. The orienting questions here are more detailed than in other problems; you are free to choose a reasonable subset.

The data for this problem are available in several forms on the Hebb N: disk:

The raw data for both samples is contained in the file
**
psych24r.sas**,
in SPSS format as **
psych24r.sps**,
and in CSV format as **
psych24r.csv**. An R script,
**
psych24r.R** is also provided for reading in the raw data from the CSV file.
The raw data also gives sex and chronological age
for all subjects. The two samples are distinguished by the variable
GRP.

The correlations for the Grant-White sample (N=145) are
contained in the file
** psych24c.sas**. The means, standard
deviations and measures of reliability(2) are
also contained in the correlation file for the Grant-White data.
The same correlations are also provided in R format, in
** psych24c.R**.

------------------------

(2) Gorsuch (1974) reports that "the raw data
does not always agree with the published statistics". Here
we will assume that the correlations are correct.

------------------------

The common practice in analysis of these data is to include
variables 25 and 26 but not variables 3 & 4, (25 & 26 were attempts
to develop better tests for variables 3 & 4) when the Grant-White
sample alone is analyzed. However, in order to be able to compare
the results of the two schools, we will ignore variables 25 and 26
here. Moreover, to reduce the size of the problem somewhat, we will
only use the first 18 variables, named V1-V18 in the SAS files. (For
those wishing to try the AMOS program, two raw data files, ** GRANT
AMD** and ** PASTEUR AMD**, containing only V1-V18 are available
on the Hebb server.)

From the descriptions of the tests, attempt to develop some theory, however vague, of the manifest content of these 18 tests. Which ones should tend to tap the same underlying abilites? How many different abilities? What results are reported in previous analyses? Do they make sense?

Examine the raw data in the Grant-White sample (GRP=1) for:

- Univariate outliers or unusual observations on individual variables (Proc UNIVARIATE with PLOT option).
- Univariate normality (skewness, kurtosis)
- Multivariate outliers: observations whose distance from the centroid is large (see OUTLIER SAS on the class disk)
- Do the means and standard deviations agree with the published data?

Use the correlation matrix from the Grant-White data (in psych24c.sas) for this part.(3)

------------------------

(3) Since the data set contains the standard deviations, the analysis can be done using either the correlation or covariance matrix.

------------------------

- Determine the number of factors necessary to adequately explain the correlations among these tests. Do various criteria tend to converge or do they indicate different numbers of factors?
- Find a rotated factor solution which provides an interpretable description of the correlations among the tests. Are orthogonal or oblique factors more resonable? If oblique, are all factors correlated, or can some pairs be considered independent?

- The simplest is just perform a Procrustes target rotation
specifying a target factor pattern determined from your
exploratory analyses. [The
*target*factor pattern is a matrix of 0s and 1s where the 1s specify the variables hypothesized to load on each factor. For PROC FACTOR, this matrix is read in with columns specifying the variables.] - A stronger test is available by fitting a restricted factor model using PROC CALIS, LISREL, AMOS, or the R packages sem and lavaan.