Silver Blaze Problems in Regression and Multivariate Analyses

Multivariate Data Analysis
Psychology 6140

In reviewing or critiquing research reports, the some of most important questions to consider include these:

Is the research problem sufficiently well-defined?
Do the methods of data collection (subject populations, research instruments, etc.) bear on the research problem in an unambiguous way?
Do the methods of analysis do justice to the data and the problems posed?
Are the result adequately reported in the sense that they both allow readers to fully understand them and can stand up to the scrutiny of extensions, replications and challenges?

However, the most difficult problems for the reader to judge are often those of the "Silver Blaze" variety.(1) Because the author(s) typically attempt to present their results in the most coherent way, finding problems often involves going beyond the information given, reading between the lines, and asking yourself about things that are not described explicitly.
-----------------------
(1) In Silver Blaze, Inspector Gregory asked Holmes how he knew the identity of the thief, to which Holmes replied, Because of the curious incident of the dog in the night-time. Gregory protested, But the dog did nothing in the night-time, and Holmes said, That was the curious incident -- the dog should have been barking.
-----------------------

As a group effort in reacting to the research papers you are each critiquing, let us see if we can make up a list of Silver Blaze problems that might occur in regression studies. Some of these are examples of what may be called lurking variables -- a variable which has an important effect, and yet is not included among the predictor variables considered (or presented) by the author. Such variables may be omitted because its existence is unknown, or its influence is thought to be negligible, or simply because data on it are not available or difficult to obtain. See Joiner (1981; Amer. Statistician, 227-233) for examples of lurking variables. I've started it off with a few examples of things you might look for.

1. Underlying associations with outside variables.: A classical example of spurious correlation is the astounding correlation of 0.998 cited by Yule and Kendall (1950; Introduction to the Theory of Statistics) the number of people in the U.K. classified as "notified mental defectives" and the number of "wireless licenses issued". However, both variables are yearly figures from 1925 to 1937. The spurious correlation arises from the fact that both variables happened to be increasing over time: radios were becoming common household items in the U.K., while an increase in recognition of mental illness and facilities for its treatment was also taking place.
A more recent example: the Places Rated Almanac (Boyer & Savageau, 1985) contains nine composite variables related to climate, housing costs, health care, arts and cultural facilities, etc. for 329 metropolitan areas in the US. Several analyses of these data pointed out a rather high correlation between the arts and health measures. This is due, however, to an underlying correlation of each of these with population.
2. Unmeasured variables and influential observations: The data on fuel consumption (fuel.sas) in the US showed a moderately good prediction of fuel consumption per capita from gasoline tax, proportion of licensed drivers and per capita income. It was thought that expressing the variables in per capita terms eliminated the effects of varying state population. Influence plots, however, pointed to a few states (Wyoming, South Dakota) as greatly underpredicted, influential observations. Some thought led to the suggestion that population density might be important. This variable, when tried, let to a better 1-predictor model than the best model from the other variables! Had the influential outliers been deleted, this conclusion would not have been reached.
3. Improper randomization or experimental control: Draper & Smith (1966) give data on an experiment on the effect of three variables (solar radiation, soil moisture, and temperature) on the amount of vitamin B* in turnip greens. Relatively careful analysis and graphical display showed nothing unusual. However, an index plot of the response against the order of listing of the data in the text book shows a straight line with better fit than the three predictors! (See the analysis in the file turnip.sas.) Joiner (1988) suggests that either the vitamin content of the turnips or the chemical reagent used to measure vitamin B* may have decayed over time.
4. The New England Blackout: A curious reporter found that there was an unexpectedly high number of births on the Monday and Tuesday exactly nine months after the famous New England blackout in 1965. He wrote an article suggesting the obvious causal inference. What's wrong with his reasoning? [Click here for an answer]
5. The dangers of over-fitting: In the heyday of mathematical modelling someone is reported to have said, "Give me three parameters, I can fit an elephant. Give me four, I can make it wag its tail". Bob Agnew has a short piece on Fitting Sickness with a real-world illustration. Another nice illustration concerns fitting polynomial models to Galileo's experiments on inclined planes. See also stepsim2.sas for an example of overfitting when random predictors are added to a data set and stepwise fitting is used.

Reviewer's Checklist

Try to suggest additional items or questions to consider under the following headings; feel free to add additional topics.

Sample design: -
- Is the population studied clearly described?
- Is sample representative of this population? random?
- Sample size relative to number of predictors?
- Severe restriction of range on predictors?
Measures and instruments used:
- reliability?
- validity?
Analysis: Model building- Only stepwise?
- Evidence of multicolinearity? (Take particular care with polynomial models or models with moderator (interaction) variables.)
- Examination of influence?
- Cross-validation or replication attempted?
Reporting:
- Coefficients and std errors?
- Are signs and magnitudes of coefficients interpreted?
- Do they make sense?
- Incremental test statistics?
- Do the stated conclusions actually follow from the results/analyses?
Global:
- Alternative research strategies? Tradeoffs?
- What else might be considered?

Questions for Reviewers

The following questions are suggested in Leon Glaser's article, "Some Notes on Refereeing", American Statistician, 1986, 40(4), 310-312.

Is the problem of substantial interest?
Are problems from previous research solved?
Does the study introduce novel methods?
Are the author(s) points argued clearly and concisely?
Is the article appropriate for the journal in question?

Other pointers:

How to Review a Journal Article Guidelines for Reviewing, from Journal of Marriage and Family
How to Review a Research Paper from L. Shebilske (1997). How to review a journal article. ISSPR Bulletin, 13(2), p. 19 - 20 and
B.A. Maher (1978) A reader's, writer's, and reviewer's guide to assessing research reports in clinical psychology. Journal of Consulting and Clinical Psychology, 46, 835-838.
Statistical Methods in Psychology Journals: Guidelines and Explanations An excellent paper on the application and reporting of statistical methods to psychological research (American Psychologist, August 1999, Vol. 54, No. 8, 594-604).

Assignment

Each person volunteers as the primary reviewer for a paper from the Research Applications document; and also as the secondary reviewer for one other paper selected by another student.
Grades for this work will be based on the primary reviewer's oral and written presentation only.
The primary reviewer should prepare and distribute a 1-2 page summary of the paper (with your name and the article reference) in advance of the discussion class.
The primary reviewer should then prepare a brief written referee's report, (say, 4-8 double-spaced pages, with a preference for the lower half of the range; half that for single-spaced) as if reviewing the paper for journal publication. You may attach a copy of your 1-2 page summary if you like, or include a brief synopsis. An additional goal is to be both helpful and critical, as warranted. This is due one week after the in-class presentations.
The secondary reader can take the role of the author and reply to criticisms by the primary reader, or take some other role which will complement that of the primary reader.
A small, additional task for the secondary reader: Imagine you are the social science reporter for a national newspaper. Write a brief article (1-2 paragraphs), with a headline, describing what is newsworthy about this paper.