Silver Blaze Problems in Regression and Multivariate Analyses
Multivariate Data Analysis
Psychology 6140
In reviewing or critiquing research reports, the some of most important
questions to consider include these:
- Is the research problem sufficiently well-defined?
- Do the methods of data collection (subject populations,
research instruments, etc.) bear on the research problem in
an unambiguous way?
- Do the methods of analysis do justice to the data and the
problems posed?
- Are the result adequately reported in the sense that they both allow
readers to fully understand them and can stand up to the scrutiny of
extensions, replications and challenges?
However, the most difficult problems for the reader to judge are
often those of the "Silver Blaze" variety.(1) Because the author(s) typically attempt to
present their results in the most coherent way, finding problems
often involves going beyond the information given, reading between
the lines, and asking yourself about things that are not described
explicitly.
-----------------------
(1) In Silver Blaze, Inspector
Gregory asked Holmes how he knew the identity of the thief, to
which Holmes replied, Because of the curious incident of
the dog in the night-time. Gregory protested, But
the dog did nothing in the night-time, and Holmes said,
That was the curious incident -- the dog should
have been barking.
-----------------------
As a group effort in reacting to the research papers you are
each critiquing, let us see if we can make up a list of Silver
Blaze problems that might occur in regression studies. Some of
these are examples of what may be called lurking variables
-- a variable which has an important effect, and yet is not
included among the predictor variables considered (or presented) by
the author. Such variables may be omitted because its existence is
unknown, or its influence is thought to be negligible, or simply
because data on it are not available or difficult to obtain. See
Joiner (1981; Amer. Statistician, 227-233) for
examples of lurking variables. I've started it off with a few
examples of things you might look for.
- 1. Underlying associations with outside variables.
- A classical example of spurious correlation is the
astounding correlation of 0.998 cited by Yule and Kendall
(1950; Introduction to the Theory of
Statistics) the number of people in the U.K.
classified as "notified mental defectives" and the
number of "wireless licenses issued". However,
both variables are yearly figures from 1925 to 1937. The
spurious correlation arises from the fact that both
variables happened to be increasing over time: radios were
becoming common household items in the U.K., while an
increase in recognition of mental illness and facilities for
its treatment was also taking place.
A more recent example: the Places Rated
Almanac (Boyer & Savageau, 1985) contains nine
composite variables related to climate, housing costs,
health care, arts and cultural facilities, etc. for 329
metropolitan areas in the US. Several analyses of these data
pointed out a rather high correlation between the arts and
health measures. This is due, however, to an underlying
correlation of each of these with population.
- 2. Unmeasured variables and influential observations
- The data on fuel consumption (fuel.sas) in the US showed a moderately
good prediction of fuel consumption per capita from gasoline
tax, proportion of licensed drivers and per capita income.
It was thought that expressing the variables in per capita
terms eliminated the effects of varying state population.
Influence plots, however, pointed to a few states (Wyoming,
South Dakota) as greatly underpredicted, influential
observations. Some thought led to the suggestion that
population density might be important. This
variable, when tried, let to a better 1-predictor model than
the best model from the other variables! Had the
influential outliers been deleted, this conclusion would not
have been reached.
- 3. Improper randomization or experimental control
- Draper & Smith (1966) give data on an experiment on the
effect of three variables (solar radiation, soil moisture,
and temperature) on the amount of vitamin B* in turnip
greens. Relatively careful analysis and graphical display
showed nothing unusual. However, an index plot of the
response against the order of listing of the data in the
text book shows a straight line with better fit than the
three predictors! (See the analysis in the file turnip.sas.)
Joiner (1988) suggests that either the vitamin content of
the turnips or the chemical reagent used to measure vitamin
B* may have decayed over time.
- 4. The New England Blackout
- A curious reporter found that there was an unexpectedly high
number of births on the Monday and Tuesday exactly nine months
after the famous New England blackout in 1965. He wrote an article
suggesting the obvious causal inference. What's wrong with his
reasoning? [Click here for an answer]
- 5. The dangers of over-fitting
- In the heyday of mathematical modelling someone is reported
to have said, "Give me three parameters, I can fit an elephant.
Give me four, I can make it wag its tail".
Bob Agnew has a short piece on Fitting Sickness
with a real-world illustration. Another nice illustration
concerns fitting polynomial models to
Galileo's experiments
on inclined planes. See also stepsim2.sas for an example of overfitting when random predictors
are added to a data set and stepwise fitting is used.
Try to suggest additional items or questions to consider under the
following headings; feel free to add additional topics.
-
Sample design: -
- Is the population studied clearly described?
- Is sample representative of this population? random?
- Sample size relative to number of predictors?
- Severe restriction of range on predictors?
-
Measures and instruments used:
- reliability?
- validity?
-
Analysis: Model building- Only stepwise?
- Evidence of multicolinearity? (Take particular care with polynomial models or
models with moderator (interaction) variables.)
- Examination of influence?
- Cross-validation or replication attempted?
-
Reporting:
- Coefficients and std errors?
- Are signs and magnitudes of coefficients interpreted?
- Do they make sense?
- Incremental test statistics?
- Do the stated conclusions actually follow from the results/analyses?
-
Global:
- Alternative research strategies? Tradeoffs?
- What else might be considered?
Questions for Reviewers
The following questions are suggested in Leon Glaser's article,
"Some Notes on Refereeing", American Statistician,
1986, 40(4), 310-312.
- Is the problem of substantial interest?
- Are problems from previous research solved?
- Does the study introduce novel methods?
- Are the author(s) points argued clearly and concisely?
- Is the article appropriate for the journal in question?
Other pointers:
- How to Review a Journal Article
Guidelines for Reviewing, from Journal of Marriage and Family
- How to Review a Research Paper
from L. Shebilske (1997). How to review a journal article. ISSPR Bulletin, 13(2), p. 19 - 20
and
- B.A. Maher (1978)
A reader's, writer's, and reviewer's guide to assessing research reports in clinical psychology.
Journal of Consulting and Clinical Psychology, 46, 835-838.
-
Statistical Methods in Psychology Journals: Guidelines and Explanations
An excellent paper on the application and reporting
of statistical methods to psychological
research
(American Psychologist, August 1999, Vol. 54, No. 8, 594-604).
- Each person volunteers as the primary reviewer for a paper from the
Research Applications document; and
also as the secondary reviewer for one other paper selected
by another student.
- Grades for this work will be based on the primary reviewer's oral
and written presentation only.
- The primary reviewer should prepare and distribute a 1-2 page summary
of the paper (with your name and the article reference)
in advance of the discussion class.
- The primary reviewer should then prepare a brief written referee's report,
(say, 4-8 double-spaced
pages, with
a preference for the lower half of the range; half that for single-spaced)
as if
reviewing the paper for journal publication. You may attach a
copy of your 1-2 page summary if you like, or include a brief synopsis. An additional goal is
to be both helpful and critical, as warranted.
This is due one week after the in-class presentations.
- The secondary reader can take the role
of the author and reply to criticisms by the primary reader, or
take some other role which will complement that of the primary reader.
- A small, additional task for the secondary reader:
Imagine you are the social science reporter for a national newspaper.
Write a brief article (1-2 paragraphs), with a headline, describing what is newsworthy about this paper.