[Previous] [Next] [Up] [Top]

Assignment 6

Multivariate Data Analysis
Psychology 6140

The readings and problems assigned here are meant to cover the weeks until the start of next term.

Readings

  1. Cliff, Chapter 7-9

Supplements

Modelling and interpreting interactions in multiple regression
A discussion of why interaction terms in a regression model should use (products of) centered variables.

Problems (due: Fri, Jan 13)

A survey of salaries was undertaken in a large corporation to quantify those factors that determine salary differentials, and to determine if the company's salary policy guidelines were being followed. The data set records the person's salary and the number of years of experience, education level (coded 1=completed high school; 2=completed college; 3=completed advanced degree), and whether the person had management responsibilities (coded 1=mgt; 0 otherwise).

You can do the calculations for this problem with APL or SAS. With APL, you can use the function REGRES in the library 6140 GLM for the regressions and SCAT for the plots With SAS, you can use PROC REG and/or PROC GLM, together with PROC PLOT. The data are available in two forms on the class disk:

  1. Fit a linear regression predicting salary from years of experience alone.
    1. Find the overall F* value for this model, and the observed t* for the hypothesis, H sub 0 : beta sub 1 = 0. In what sense are these values equivalent?
    2. Find the fitted values and residuals from this model. Make a scatter plot of residual vs. years of experience, identifying the 6 different education - management groups with different plotting symbols. [In APL, you will have to color or identify the points by hand; In SAS, use PLOT yvar * xvar = GROUP.] Is there any evidence in this plot that education and/or management group predicts salary after years of experience have been taken into account? Examine a univariate display of the residuals as one batch. Is there any evidence of violations of assumptions of the model or of unusual observations?
  2. Construct two dummy (indicator) variables for level of education, and add education and management to the prediction equation.
    1. Test the hypothesis that education and management variables together add significantly (beyond experience) to the prediction of salary.
    2. Find 95% confidence intervals for each of the regression weights in this model. Describe verbally what each of the raw regression weights mean in terms of the problem situation.
    3. Find fitted values and residuals. Scatter plot residuals against years of experience, identifying the 6 groups by different plotting symbols. Is there any evidence in this plot that there remains systematic variation in salaries which is not accounted for in this model? Any indication of unusual observations?
  3. Construct variables to represent the interaction between education and management and add these to the model.
    1. Do the interaction variables provide a significant improvement in the goodness of fit of the model? Test the marginal (partial) increase in regression SS due to each interaction variable; state formally the hypothesis which is being tested.
    2. Obtain residuals from this model and plot against experience as before. Is there now any evidence of systematic variation related to group? Any unusual observations? Any high-leverage observations?
  4. Write a short (1-2 page) description summarizing the results of these analyses. One's normal expectations are that salary would increase with experience (career progress), education and management responsibility. Given this, which of the three models provides the best account of the data, and how should this model be interpreted?

© 1995 Michael Friendly

Author: Michael Friendly
Email:friendly AT yorku.ca

[Back] to PSY6140 info.