The results of each analysis will be discussed during the session. You have the option of presenting a 15-minute oral discussion of your results or preparing a poster. Each participant will be given the opportunity to display his/her results and other conference attendees will be encouraged to discuss them with you.
Since this session is treated as an organized contributed-paper session, you must submit a title and abstract to the Society office, chaired by Prof. Friendly. The abstract must be postmarked not later than January 15, 2017. The abstract can be very general. A sample as a model is provided below. Variations might include descriptions of the analyses you carried out, or what you found. You have until the conference to work on the analysis. Please send a (possibly revised) copy of your abstract by email to the conference chair no later than two weeks before the conference. After the meetings you will be given an opportunity to include a short paper describing your analysis in the 2017 Proceedings of the Statistical Education Section of the Hotelling Society.
This paper describes and summarizes the relation ships between 1987 salaries of major league baseball players and the player's performance. We use regression methods to show relationships between a player's salary to his 1986 and career performance statistics and the player's team. Some surprising findings are discussed.
The data is stored in various formats (SAS input files, SAS stored datasets, SPSS system (.sav) files) on the Hebb server,
library(corrgram) data(baseball)
baseball <- read.csv("http://friendly.apps01.yorku.ca/psy6140/lib/baseball.csv") pitcher <- read.csv("http://friendly.apps01.yorku.ca/psy6140/lib/pitcher.csv") team <- read.csv("http://friendly.apps01.yorku.ca/psy6140/lib/team.csv")
File | SAS input file (raw data) |
SAS dataset in psy6140\lib\ |
View it | SPSS dataset | CSV dataset (comma delimited) |
---|---|---|---|---|---|
Hitters | baseball.sas | psy614.baseball
(V9: baseball.sas7bdat) |
hitters.htm | baseball.sav | baseball.csv |
Pitchers | basepitc.sas | psy614.pitcher
(V9: pitcher.sas7bdat) |
pitcher.htm | pitcher.sav | pitcher.csv |
Team | baseteam.sas | psy614.team
(V9: team.sas7bdat) |
teams.htm | team.sav | team.csv |
Hitters 921 | bball92.sas | bball92.html | bball92.csv | ||
Attendance2 | MLBattend.sas | MLBattend.csv | |||
1Contains the Major League baseball players who played at least one game in both the 1991 and 1992 seasons, excluding pitchers, with 1992 salaries. | |||||
2Contains home game attendance, wins, losses, etc. for all teams from 1969-2000 |
Note: The data was entered by the Society's clerical staff from coding forms provided by the symposium organizers. While the staff were generally careful and skilled at data entry, and printouts were checked for obvious errors, no checks against original sources were carried out, and the possibility that errors are contained in the data cannot be ruled out. Standard statistical practices of data screening should therefore be included in your analyses.
If, on examination of the data, you deem some other question(s) of greater interest, feel free to make that the basis of your contribution. In addition you are free to construct other variables from those given (such as batting average, which is calculated as a ratio of the number of hits to times at bat).
Finally, you are also free to gather and use other data related to your chosen question.
There is one observation per hitter in the file baseball.sas. Unless otherwise noted, all performance statistics refer to the 1986/87 baseball season. Career statistics count all years that a player actually played in the major leagues (some can get sent back down to the minors and get called up again). The variables are:
NAME | Hitters name |
LEAGUE | Player's league |
TEAM | Player's team |
ATBAT | Times at Bat: Number of official plate appearances by a hitter. It counts as an official at-bat as long as the batter does not walk, sacrifice, get hit by a pitch or reach base due to catcher's interference. |
HITS | Hits: |
HOMER | Home Runs |
RUNS | Runs: The number of runs scored by a player. A run is scored by an offensive player who advances from batter to runner and touches first, second, third and home base in that order without being put out. |
RBI | Runs Batted In: A hitter earns a run batted in when he drives in a run via a hit, walk, sacrifice (bunt or fly) fielder's choice, hit- batsman or on an error (when the official scorer rules that the run would have scored anyway). |
WALKS | Walks: A ``walk'' (or ``base on balls'') is an award of first base granted to a batter who receives four pitches outside the strike zone. |
YEARS | Years in the Major Leagues. As far as we can tell, this counts all years a player has actually played in the Major Leagues, not necessarily consectutive. |
ATBATC | Career Times at Bat |
HITSC | Career Hits |
HOMERC | Career Home Runs |
RUNSC | Career Runs Scored |
RBIC | Career Runs Batted In |
POSITION | Player's position(s). See list of codes used below under Coding for some of the variables. (You are free to recode these as you see fit.) |
PUTOUTS | Put Outs. A put out is credited when a fielder causes a batter or runner to be, well, put out; e.g., catches the batter's fly ball, tags a base runner out before he reaches the base, etc. |
ASSISTS | Assists. An assist is credited when a fielder assists in a play causing a player to be put out; e.g., |
ERRORS | Errors |
SALARY | 1987 Annual salary on opening day (in 1000$) |
BATAVG | Batting Average, calculated as 1000*(HITS/ATBAT) |
BATAVGC | Career Batting Average, calculated as 1000*(HITSC/ATBATC) |
NAME | Pitcher's name |
TEAM | Team at the end of 1986 |
LEAGUE | League at the end of 1986 |
WINS | Number of Wins |
LOSSES | Number of Losses |
ERA | Earned Run Average |
GAMES | Number of Games |
INNINGS | Number of Innings pitched |
SAVES | Number of Saves |
YEARS | Years in the major leagues |
WINSC | Number of Wins during his career |
LOSSESC | Number of Losses during his career |
ERAC | Earned Run Average during his career |
GAMESC | Number of Games during his career |
INNINGC | Number of Innings pitched during his career |
SAVESC | Number of Saves during his career |
SALARY | 1987 annual salary ($1000s) |
LEAGUE7 | League at the beginning of 1987 |
TEAM7 | Team at the beginning of 1987 |
LEAGUE | League |
DIVISION | Division |
RANK | Position in final league standings 1986 |
TEAM | Team |
WINS | Number of wins in 1986 |
LOSSES | Number of losses in 1986 |
ATTHOME | Attendance for home games in 1986 |
ATTAWAY | Attendance for away games in 1986 |
SALARY | 1987 average salary ($1000) |
$team.
which can be used to print or display the team
names in more readable form. (Note that team codes uniquely
distinguish American and National League teams in the same city.)
value $team 'ATL'='Atlanta ' 'BAL'='Baltimore ' 'BOS'='Boston ' 'CAL'='California ' 'CHA'='Chicago A ' (Sox) 'CHN'='Chicago N ' (Cubs) 'CIN'='Cincinnati ' 'CLE'='Cleveland ' 'DET'='Detroit ' 'HOU'='Houston ' 'KC '='Kansas City ' 'LA '='Los Angeles ' 'MIL'='Milwaukee ' 'MIN'='Minnesota ' 'MON'='Montreal ' 'NYA'='New York A ' (Yankees) 'NYN'='New York N ' (Mets) 'OAK'='Oakland ' 'PHI'='Philadelphia ' 'PIT'='Pittsburgh ' 'SD '='San Diego ' 'SEA'='Seattle ' 'SF '='San Francisco' 'STL'='St. Louis ' 'TEX'='Texas ' 'TOR'='Toronto '
N National A American
W West E East
The list below shows the complete set of 2-character codes used
for player's position in the baseball.sas hitter's file. These
values define a SAS format, $posfmt.
which can be used to print
the positions in the form shown on the right of the = sign.
value $posfmt '1B' = 'First Base' '2B' = 'Second Base' 'SS' = 'Short Stop' '3B' = 'Third Base' 'RF' = 'Right Field' 'CF' = 'Center Field' 'LF' = 'Left Field' 'C ' = 'Catcher' 'DH' = 'Designated Hitter' 'OF' = 'Outfield' 'UT' = 'Utility' 'OS' = 'Outfield & Short Stop' '3S' = 'Third Base & Short Stop' '13' = 'First & Third Base' '3O' = 'Third Base & Outfield' 'O1' = 'Outfield & First Base' 'S3' = 'Short Stop & Third Base' '32' = 'Third & Second Base' 'DO' = 'Designated Hitter & Outfield' 'OD' = 'Outfield & Designated Hitter' 'CD' = 'Catcher & Designated Hitter' 'CS' = 'Catcher & Short Stop' '23' = 'Second & Third Base' '1O' = 'First Base and Outfield' '2S' = 'Second Base and Short Stop';In addition, the BASEBALL SAS file defines another format
$pos.
,
which can be used to collapse positions into a shorter list, based
on a player's primary fielding position:
/* Recode position to short list */ value $pos 'CS','CD' ='C ' 'OS','O1','OD' ='OF' 'CF','RF','LF' ='OF' '1O','13' ='1B' '2S','23' ='2B' 'DO' ='DH' 'S3' ='SS' '32','3S','3O' ='3B' ;For example, to find the average salary of players according to the collapsed position codes, use
format position $pos.;
with the MEANS procedure:
proc means data=psy614.baseball; class position; format position $pos.; var salary;