Baseball teams are faced with an interesting problem everyday: arranging nine players in an order in such a way that the highest number of potential runs is met. Effective optimization can increase a team’s win total by 3 or more per year, which is enough for some teams to make or miss the playoffs. Now let it be known, there is no one perfect lineup. Perfect optimization of a lineup is not robust, or consistent. Out of 200 attempts, the most optimal lineup only holds 17 times when put through a regression system. But near-optimization is very robust when regarding runs scored, because near-optimization is easily duplicated. However, with the hundreds of combinations for batting orders, there are plenty of very bad ones. But the overall goal should be to find as close as optimal order as one could derive from data.
But how does one go about the finding an near-optimal or optimal lineup? Joel S. Sokol of the Georgia Insitiute of Technology published a paper in 1998 that goes into heavy detail regarding the subject, often using equations and assumptions that even myself I do not understand. However, his findings seem to be very rock solid, although his methods being very unorthodox. The findings in this article can be referenced here: http://www2.isye.gatech.edu/~jsokol/boouu.pdf
While Sokol’s data is from 1998, the number of runs each event (such as a single) is very unlikely to change over time, especially since he used the entire 1998 season as a dataset.
Sokol begins his paper stating:
“An important question for the batting order problem is how a player’s skills interact with those of other players. There are many ways by which players can contribute offensively to their team, and we show that an appropriately-designed two-dimensional measure based on interactions with other players gives a much more complete view of player value and player utility.”
Or in laymen’s terms, players affect themselves and also the batters who follow them. If we measure both of those situations, we can effectively create a lineup that generates the most number of runs.
Therefore, I used the information derived by Sokol to determine two values for each Indians’ player: a “realization value” (how many runs per plate appearance that person will create for themselves) and a “potential value” (how many runs per plate appearance that batter will generate for the following hitters).
I added every single event a player performed and divided it by number of plate appearances to find each the realization (R) and potential (P) values for every player. Here are the coefficients used, or more or less how many runs on average each event creates. Omitted from the model are situational factors such as clutch hitting and batting order protection, which have been proven to be chance/myth.
The statistics I used were usually from the player’s most recent seasons (I usually used the past 4 or 5 years). I believe that using only this season’s statistics would be much too small of a sample size; it would be best to have a sample of over 1000 plate appearances. Here are the values for every Indians’ batter.
|Player||Realization Value||Potential Value|
Red donating the player is poor at a specific area, Green denoting that a player is good at a specific area.
(R-, P+) Table-setters: Michael Bourn, Michael Brantley
(R+, P+) All-around contributors: Nick Swisher, Carlos Santana, Jason Giambi
(R+, P-) Run producers: Asdrubal Cabrera, Mark Reynolds, Ryan Raburn, Yan Gomes
(R-, R-) Weak hitters: Mike Aviles, Drew Stubbs
Jason Kipnis is the odd man out; he does not really fit into any of the categories above. He is neither a run producer, nor not really a table setter either. He is not good at both to become all-around, but he surely is not a weak hitter.
Yan Gomes’ sample size is much too small, i.e. why his values are so high. Therefore, we will stick him lower in the order, because a manager should not change his lineup based on “hot streaks”, which actually do not exist in the first place.
Optimally, you want the players with high potential values batting at the top, followed by all around hitters, then your run producers, and finally your weak hitters at the bottom. Therefore, we can come up with the following as our optimal batting order:
1. CF Michael Bourn – high potential hitter, gets on-base at a high rate and does not hit home runs. Speed is a plus.
2. LF Michael Brantley – second highest potential hitter, speed is a plus.
3. 1B Carlos Santana – provides both power and the ability to get on base.
4. DH Nick Swisher – provides both power and the ability to get on base.
5. SS Asdrubal Cabrera – good run producer.
6. 3B Mark Reynolds – good run producer.
7. 2B Jason Kipnis – best of the weaker hitters.
8. C Yan Gomes – too small of a sample size.
9. RF Drew Stubbs – worst hitter in the starting lineup.
Terry Francona errors in placing hitters such as Michael Brantley and Carlos Santana so low in the order. Santana is great at getting on-base, and Brantley is not a run producer, he is a singles hitter. Therefore, those two should be batting at the top of the order. Players like Asdrubal Cabrera and Jason Kipnis are run producers, and there are more optimal hitters to hit two and three.
Asdrubal Cabrera is in line for many more RBI opportunities with this lineup, and he must be ready for it. So far this season he is batting .224 with RISP with 0 HR and 15 RBI, poor numbers for a run producer. Asdrubal has shown his ability to hit for power since 2011, and he would be a solid five hitter if he could go for extra-bases more often.
The Indians lineup is robust when you can bat a player like Jason Kipnis 7th. He probably would not like the change too much, but he is not getting it done at number two spot, especially with an OBP of .314 this season. Compare that to Michael Brantley’s .350 OBP. I have no idea why Terry is batting Michael Brantley 7th. One reason could be that Terry is very loyal to his players and knows Brantley will move around the lineup without problem.
Many may say “If it ain’t broke, don’t fix it.” However, that is a bogus argument. Whenever the opportunity to increase the potential numbers of run this team could score arises itself, it should be taken. Using the model described above, the Indians could create a more-optimal batting order that could possibly win them a game or two more per season.