[Previous] [Next] [Up] [Top]
Multivariate Data Analysis
The readings and problems assigned here are meant to cover the
weeks until the start of next term.
- Cliff, Chapter 7-9
A survey of salaries was undertaken in a large corporation to
quantify those factors that determine salary differentials, and to
determine if the company's salary policy guidelines were being
followed. The data set records the person's salary and the number
of years of experience, education level (coded 1=completed high
school; 2=completed college; 3=completed advanced degree), and
whether the person had management responsibilities (coded 1=mgt; 0
- Modelling and interpreting interactions in multiple regression
- A discussion of why interaction terms in a regression model should
use (products of) centered variables.
You can do the calculations for this problem with APL or SAS.
With APL, you can use the function REGRES in the library 6140 GLM
for the regressions and SCAT for the plots With SAS, you can use
PROC REG and/or PROC GLM, together with PROC PLOT. The data are
available in two forms on the class disk:
- The file SURVEY SAS creates the data set SALARY with
the variables SALARY, EXPRNC, EDUC, and MGT. You can copy
this file to your A-disk and add SAS statements to it to
answer the questions below.
- The data are also in the APL library 6140 DATA as the variable
SURVEY, which you can get with the command,
COPY '6140 DATA SURVEY'
- Fit a linear regression predicting salary from years of
- Find the overall F* value for this model, and the
observed t* for the hypothesis, H sub 0 : beta sub 1
= 0. In what sense are these values equivalent?
- Find the fitted values and residuals from this model.
Make a scatter plot of residual vs. years of
experience, identifying the 6 different education -
management groups with different plotting symbols.
[In APL, you will have to color or identify the
points by hand; In SAS, use PLOT yvar * xvar =
GROUP.] Is there any evidence in this plot that
education and/or management group predicts salary
after years of experience have been taken into
account? Examine a univariate display of the
residuals as one batch. Is there any evidence of
violations of assumptions of the model or of unusual
- Construct two dummy (indicator) variables for level of
education, and add education and management to the
- Test the hypothesis that education and management
variables together add significantly (beyond
experience) to the prediction of salary.
- Find 95% confidence intervals for each of the regression
weights in this model. Describe verbally what each of
the raw regression weights mean in terms of the
- Find fitted values and residuals. Scatter plot residuals
against years of experience, identifying the 6 groups
by different plotting symbols. Is there any evidence
in this plot that there remains systematic variation
in salaries which is not accounted for in this model?
Any indication of unusual observations?
- Construct variables to represent the interaction between
education and management and add these to the model.
- Do the interaction variables provide a significant
improvement in the goodness of fit of the model? Test
the marginal (partial) increase in regression SS due
to each interaction variable; state formally
the hypothesis which is being tested.
- Obtain residuals from this model and plot against
experience as before. Is there now any evidence of
systematic variation related to group? Any unusual
observations? Any high-leverage observations?
- Write a short (1-2 page) description summarizing the results of
these analyses. One's normal expectations are that salary
would increase with experience (career progress), education
and management responsibility. Given this, which of the
three models provides the best account of the data, and how
should this model be interpreted?
© 1995 Michael Friendly
Email:friendly AT yorku.ca
to PSY6140 info.