Optimizing the Indians’ Batting Order Using Mathematical Modeling

June 2, 2013

Baseball teams are faced with an interesting problem everyday: arranging nine players in an order in such a way that the highest number of potential runs is met. Effective optimization can increase a team’s win total by 3 or more per year, which is enough for some teams to make or miss the playoffs. Now let it be known, there is no one perfect lineup. Perfect optimization of a lineup is not robust, or consistent. Out of 200 attempts, the most optimal lineup only holds 17 times when put through a regression system. But near-optimization is very robust when regarding runs scored, because near-optimization is easily duplicated. However, with the hundreds of combinations for batting orders, there are plenty of very bad ones. But the overall goal should be to find as close as optimal order as one could derive from data.

But how does one go about the finding an near-optimal or optimal lineup? Joel S. Sokol of the Georgia Insitiute of Technology published a paper in 1998 that goes into heavy detail regarding the subject, often using equations and assumptions that even myself I do not understand. However, his findings seem to be very rock solid, although his methods being very unorthodox. The findings in this article can be referenced here: http://www2.isye.gatech.edu/~jsokol/boouu.pdf

While Sokol’s data is from 1998, the number of runs each event (such as a single) is very unlikely to change over time, especially since he used the entire 1998 season as a dataset.

Sokol begins his paper stating:

“An important question for the batting order problem is how a player’s skills interact with those of other players. There are many ways by which players can contribute offensively to their team, and we show that an appropriately-designed two-dimensional measure based on interactions with other players gives a much more complete view of player value and player utility.”

Or in laymen’s terms, players affect themselves and also the batters who follow them. If we measure both of those situations, we can effectively create a lineup that generates the most number of runs.

Therefore, I used the information derived by Sokol to determine two values for each Indians’ player: a “realization value” (how many runs per plate appearance that person will create for themselves) and a “potential value” (how many runs per plate appearance that batter will generate for the following hitters).

I added every single event a player performed and divided it by number of plate appearances to find each the realization (R) and potential (P) values for every player. Here are the coefficients used, or more or less how many runs on average each event creates. Omitted from the model are situational factors such as clutch hitting and batting order protection, which have been proven to be chance/myth.

The statistics I used were usually from the player’s most recent seasons (I usually used the past 4 or 5 years). I believe that using only this season’s statistics would be much too small of a sample size; it would be best to have a sample of over 1000 plate appearances. Here are the values for every Indians’ batter.

Player	Realization Value	Potential Value
Aviles (2010-2013)	.116	-.141
Bourn (2009-2013)	.095	-.086
Brantley (career)	.100	-.113
Cabrera (2011-2013)	.129	-.130
Giambi (2009-2013)	.123	-.118
Gomes (career)	.195	-.170
Kipnis (career)	.118	-.117
Raburn (2009-2013)	.130	-.160
Reynolds (career)	.125	-.134
Santana (career)	.127	-.111
Stubbs (2011-2013)	.100	-.124
Swisher (2009-2013)	.137	-.113

Red donating the player is poor at a specific area, Green denoting that a player is good at a specific area.

(R-, P+) Table-setters: Michael Bourn, Michael Brantley

(R+, P+) All-around contributors: Nick Swisher, Carlos Santana, Jason Giambi

(R+, P-) Run producers: Asdrubal Cabrera, Mark Reynolds, Ryan Raburn, Yan Gomes

(R-, R-) Weak hitters: Mike Aviles, Drew Stubbs

Jason Kipnis is the odd man out; he does not really fit into any of the categories above. He is neither a run producer, nor not really a table setter either. He is not good at both to become all-around, but he surely is not a weak hitter.

Yan Gomes’ sample size is much too small, i.e. why his values are so high. Therefore, we will stick him lower in the order, because a manager should not change his lineup based on “hot streaks”, which actually do not exist in the first place.

Optimally, you want the players with high potential values batting at the top, followed by all around hitters, then your run producers, and finally your weak hitters at the bottom. Therefore, we can come up with the following as our optimal batting order:

1. CF Michael Bourn – high potential hitter, gets on-base at a high rate and does not hit home runs. Speed is a plus.

2. LF Michael Brantley – second highest potential hitter, speed is a plus.

3. 1B Carlos Santana – provides both power and the ability to get on base.

4. DH Nick Swisher – provides both power and the ability to get on base.

5. SS Asdrubal Cabrera – good run producer.

6. 3B Mark Reynolds – good run producer.

7. 2B Jason Kipnis – best of the weaker hitters.

8. C Yan Gomes – too small of a sample size.

9. RF Drew Stubbs – worst hitter in the starting lineup.

Terry Francona errors in placing hitters such as Michael Brantley and Carlos Santana so low in the order. Santana is great at getting on-base, and Brantley is not a run producer, he is a singles hitter. Therefore, those two should be batting at the top of the order. Players like Asdrubal Cabrera and Jason Kipnis are run producers, and there are more optimal hitters to hit two and three.

Asdrubal Cabrera is in line for many more RBI opportunities with this lineup, and he must be ready for it. So far this season he is batting .224 with RISP with 0 HR and 15 RBI, poor numbers for a run producer. Asdrubal has shown his ability to hit for power since 2011, and he would be a solid five hitter if he could go for extra-bases more often.

The Indians lineup is robust when you can bat a player like Jason Kipnis 7th. He probably would not like the change too much, but he is not getting it done at number two spot, especially with an OBP of .314 this season. Compare that to Michael Brantley’s .350 OBP. I have no idea why Terry is batting Michael Brantley 7th. One reason could be that Terry is very loyal to his players and knows Brantley will move around the lineup without problem.

Many may say “If it ain’t broke, don’t fix it.” However, that is a bogus argument. Whenever the opportunity to increase the potential numbers of run this team could score arises itself, it should be taken. Using the model described above, the Indians could create a more-optimal batting order that could possibly win them a game or two more per season.

: Uncategorized

: batting order, Cleveland Indians, Indians, joel s sokol, markov, markov chain model, mathematical modeling, moneyball, runs, runs scored, Terry Francona

35 Comments

Sean Porter says:

June 2, 2013 at 9:45 pm

Its like you read my mind David – especially concerning switching Kipnis and Brantley. Brantley in my mind is a prototypical #2 hitter (patient, consistent, gets on base) while Kipnis would be better suited right behind the middle of the order.

I’m a big Gomes fan, but keeping him in the bottom third of the lineup now makes sense. While he’s hitting around .300 (and in my opinion has good enough patience and consistent, quick swing to maintain a good average) he does not draw walks yet. Like ever.
Steve Alex says:

June 3, 2013 at 1:38 am

I agree with everything except batting Swisher 4th. He’s already doing that and has produced 20 RBI in a third of a season. That’s terrible. He’s a good player, but he has no business batting 4th.
- David White says:
  
  June 3, 2013 at 3:20 pm
  
  Swisher’s line (.264 AVG/.362 OBP/.829 OPS) is right at his career averages of (.256 AVG/.361 OBP/.828 OPS). Extrapolated to 150 games, Swisher will finish around 21 home runs, slightly below his career average. However, RBI is an event in baseball in which Swisher does not have control of. Swisher hasn’t been at the plate with runners on base as often, therefore his RBI number will drop even if he hits at the same clip. Is that his fault? The answer is no.
Sean Porter says:

June 3, 2013 at 7:14 am

The problem is, this team is loaded with #5 or 6 hitters. There really isn’t a prototypical #3 or 4 guy on the roster.
- David White says:
  
  June 3, 2013 at 3:21 pm
  
  I think Swisher and Santana work pretty well in the 3 and 4, both have high career OBPs (Swisher: .361 Santana: .367) and can hit for power.
Steve Alex says:

June 3, 2013 at 8:39 am

I think Santana is an ideal #3 hitter with his power and high on-base average. All those walks would be much more productive at the front end of a rally instead of at the end. Reynolds has shown a good approach with men on base and would probably make a good #4 hitter even with the low batting average. Then you could put Swisher, Cabrera and Kipnis behind them with Brantley such an obvious choice for #2 I can’t believe they aren’t doing it. He’s like Carney Lansford. He works counts, giving Bourn time to steal. He hits for a high average but not much power, and puts the ball in play with a low strikeout rate. He can move runners and is good at situational hitting. He’s a perfect #2.
- David White says:
  
  June 3, 2013 at 3:22 pm
  
  The reason I wouldn’t want to bat Reynolds fourth is that he removes runners from the game without replacing them. Therefore, while he may have high RBI totals, we’d be starting from square one for the 5-9 hitters. I would want him to hit optimally after the “potential value” has reached it’s peak.
Justin E says:

June 3, 2013 at 9:22 am

I love that line-up. Offensive can be explosive but is going through too many dry spells. Francona has done a decent job of mixing people in but he has really mixed up his order. The biggest thing that fans want to see is ACab drop in the order. His SO rate is high and his OBP is low. That is not what you need from a #3 hitter.
Joseph Werner says:

June 3, 2013 at 9:54 am

I’m going to respectfully disagree on a few aspects, David.

First, the run events for the data Mr. Sokol used is more likely than not going to change. The AL averaged 5.01 runs/game in 1998. Thus far, the AL average is more than a full half-run lower, at 4.45, a mark that’s tied for the second lowest in the AL since 1992.

Tom Tango, as well as a few other mathematicians, wrote a book called, well, The Book. In there one of the chapters describes lineup construction.

Here’s a quick overview: http://www.beyondtheboxscore.com/2009/3/17/795946/optimizing-your-lineup-by

Basically, your three best hitters should bat somewhere in the #1, #2, and #4 slots. Your fourth- and fifth-best hitters should occupy #3 and #5 slots.” And, “From slot #6 through #9, put the players in descending order of quality.” This is just the surface of it and the chapter goes in far greater detail with expanded context.

Using wOBA, the lineup would look like:

1. Bourn, CF
2. Santana, C
3. Cabrera, SS
4. Swisher, 1B
5. Reynolds, 3B
6. Kipnis, 2B
7. Brantley, LF
8. Stubbs, RF
9. Giambi, DH

It doesn’t change your lineup too much. But it does provide a different look/reasoning to that of Sokol’s.
- David White says:
  
  June 3, 2013 at 3:33 pm
  
  Granted, this is one way to construct a lineup. To use cliches, there is more than one way to skin a cat.
  
  I really think the problem with the wOBA modeled lineup is that the run production would more or less cease after Reynolds, whereas in Sokol’s lineup he is trying to disrupt to supercharge the lineup with potential runs and then bring in heavy realization values. The merits of both systems could be discussed at length, and granted I’ve only been looking at information for two weeks.
  
  The Book’s formula for wOBA is surely more accurate than the model used in 1998. However, I found using Sokol’s model very interesting because it was the first algorithmic model that optimized only one lineup at a time to determine efficiency, which made it n times faster.
  
  I wish I had league average information and the time to create a regression model to calculate how many runs per game are expected out of different lineups.
  
  The number of runs in the AL may have changed, however the runs that each event creates is unlikely to change, there is just a higher frequency of said events (leading to more runs). Hopefully I’ll be able to write a followup piece using other sorts of weighted averages and models.
  
  Thank you for the kind comment, I greatly appreciate it Joe.
- medfest says:
  
  June 3, 2013 at 6:13 pm
  
  I served jury duty last week and finally got around to reading “The Book” .It’s a must read for any stat geek.
  
  I think the Indians line up would be effective in several forms,I do have a problem with batting two left handers back to back at the top of the order.This leaves the Tribe open to a LOOGY derailing a potential run scoring inning.
  - David White says:
    
    June 3, 2013 at 7:25 pm
    
    I see, that is a valid argument. I think I’ll write a follow-up argument once I read “The Book” and look at other arguments I’ve seen. Thanks for the feedback, it’s always appreciated.
Joseph Werner says:

June 3, 2013 at 10:04 am

I would like to add, David, that I enjoyed the piece. As you pointed out there is no perfect lineup construction. And if my memory serves correct, I think the most optimized lineup only adds about 10-20 runs per season, or the equivalent of about one to two wins.

Great piece though!
DaveR says:

June 3, 2013 at 10:28 am

I was reluctant on Stubbs as fill-in leadoff but he has been adequate there when given a chance. He is absolutely terrible in the 9. I agree with sliding a Giambi, Gomes, or better hitter there. Stubbs kills rallies more often than not when given the chance. It’s too bad we need his glove. I’m not sure where else to hide him.
- David White says:
  
  June 3, 2013 at 3:38 pm
  
  Stubbs was brutal last year in 106 games batting either leadoff or second. His .299 OBP killed the Reds last season, and Dusty Baker was silly enough to keep him in that spot just because he had speed.
  
  I like Stubbs’ defensive ability and his speed, but we have to stick him ninth. He’s just too much of a liability offensively. I wouldn’t rule out the Indians acquiring another bat at the trade deadline that could play right field, and Stubbs could become a fourth outfielder. But that is just mere speculation.
MyTribe says:

June 3, 2013 at 4:42 pm

Asdrubal Cabrera a run producer and Carlos Santana a table setter? Sounds like those two should have been switched in the line-up in 2012 when Cabrera stopped walking from the 2 spot, going 81 at bats in a row without a walk, while Santana was averaging a walk/hit by pitch per game during the same stretch of time.

Unfortunately, your assessment of Brantley is wrong as rain. Brantley’s ability to make contact and not strike out often along with his speed makes him an excellent batter in between two players who strike out much more frequently.

And this study, done in 1998, would put it at the height of the steroid wars as well, no?
- David White says:
  
  June 3, 2013 at 4:52 pm
  
  I don’t see how putting a singles hitter around strikeout hitters will help benefit anything. Brantley’s .299 AVG and .353 OBP and low SLG% (.378) make him an ideal hitter to put in front of power hitters. I’d want Cabrera and Reynolds hitting their home runs with men on-base. But once again, I respect your opinion and even The Book’s formula has Brantley batting seventh.
  
  Hitting more home runs would not change the coefficients in the model, because a home run in 1998 didn’t score more runs than one does now, it was just hit more often.
  
  These coefficents may have changed slightly, and there may be better modeling for factors such as errors, sacrifice flys, and different type of stats. However, I do not have the time/resources to create a better, more recent model. I do think that Sokol’s model does a very fair job of evaluating a player’s talent and then compiling said players into a logical nine man order.
  
  Thank you for your comment, it’s greatly appreciated.
Steve Alex says:

June 3, 2013 at 10:25 pm

David White says it isn’t Swisher’s fault that he’s not driving in runs because he hasn’t batted with men on base very often and its out of his control. True? You be the judge. Here are the stats with runners in scoring position:
Swisher: 11 for 49 (.224), 21 RBI
Reynolds: 18 for 61 (.295), 41 RBI
Brantley: 16 for 42 (.381), 26 RBI
Santana: 13 for 40 (.325), 22 RBI.
As you can see, Swisher has had more chances than Brantley or Santana, yet driven in fewer runs. Reynolds has had 25% more opportunities, but driven in twice as many runs. Your argument doesn’t withstand the facts, Mr. White. The Yankees didn’t bat Swisher 4th. Neither should we.
- David White says:
  
  June 4, 2013 at 11:09 am
  
  As my data shows, either Carlos Santana or Nick Swisher would be suitable cleanup hitters. If you would have actually read my piece entirely, you would have seen that and also saw that I stated there is no perfect lineup.
  
  However, based on your line of thought, we should bat Yan Gomes cleanup (.998 OPS). Using only RISP as a baseline for who should bat where is narrow-minded and does not give the whole picture. If Nick Swisher hits a home run with a man on first and two outs in a one-run ball game, that won’t appear in the “RISP” category.
  
  I don’t see how comparing the Yankees (average payroll of $211,841,690 during Swisher’s years, inflation adjusted) and the Indians (2013 payroll of $78,430,300) is at all fair. I’m sure if we had Robinson Cano, Mark Teixeira and Curtis Granderson, we wouldn’t be batting Swisher 4th.
  
  I thank you for your comment. I appreciate the feedback.
Steve Alex says:

June 4, 2013 at 12:54 pm

I apologize for putting you on the defensive. I did read your article entirely. I just don’t agree that Swisher is a good #4 hitter based on how he is performing now relative to his teammates.
- Steve Alex says:
  
  June 4, 2013 at 1:16 pm
  
  Let me add this: It’s a moot point anyway because Swisher isn’t going anywhere. The team promised him the cleanup spot when he signed and Francona is loyal to a fault when it comes to his veterans. I just hope it doesn’t hurt the team. Not many contenders would stay with a guy in the #4 spot who has 20 RBI in June. That’s a good two weeks for Miguel Cabrera.
  - David White says:
    
    June 4, 2013 at 2:25 pm
    
    I hope he picks it up offensively. Luckily, we’re only about a third of the way into the season, so there is a ton of time for Swisher to calibrate.
Kyle says:

June 4, 2013 at 2:44 pm

Really liked this article, however I would have to disagree somewhat. I like Santana batting in the fifth hole. Sure he “profiles” more as 3 or 4 hitter but Terry has made, in my opinion, the valued point that batting him in the 5 spot decreases some of the pressure off of him and I think that is why we are seeing a much better Carlos Santana this season. I love the guy, might be my favorite player but I think the 5 hole is right for him with Reynolds behind him because of Santana’s high OBP. This team has two glaring holes when it comes to the lineup.. No real #2 hitter or #3 hitter. Kipnis is struggling and I would like a move down but I’m not sure who you would move up to replace him. Brantley is an option but personally I have liked what I have seen from him in the middle of the order (3-6). He gets on base and even though he is a singles hitter he does have doubles power and speed. When he has been in those positions this year he has down fairly well. You aren’t going to get the homeruns from him but anytime you are getting on base you have the chance to drive runs in or score so that is why I don’t mind him hitting in the middle. I’m looking for players that have a higher BA and OBP in that role.
- David White says:
  
  June 5, 2013 at 8:39 am
  
  Thanks Kyle, I appreciate it.
Sean Porter says:

June 4, 2013 at 6:24 pm

While I enjoy the relatively new “Moneyball” or Bill James outlook on baseball, there is one flaw in over-relying on stats: Baseball is not played by robots.

Anyone who played baseball for any amount of time knows that there inevitably is a player on every team that will put up overall very good stats, but when the going gets tough, will fold like a cheap tent. (See: Alex Rodriguez)

Anyone who played baseball for any amount of time knows that there inevitably is a player on every team that will put up “decent” stats, but when the going gets tough, when you need that clutch hit, they are the player you hope is in the batter’s box. (See: Paul O’Neill)

I know sabermetrics guys scream and shout that there is no such thing as a “clutch” hitter, but I played years of baseball. It’s complete crap. There are players who rise to the occasion, and there are players who go 5-5 against crap teams who completely shit the bed against elite competition.
- David White says:
  
  June 4, 2013 at 9:05 pm
  
  clutch hitting or pitching skill may exist, but if you see extreme clutch statistics, either positive or negative, expect regression toward the player’s normal capability.
  - David White says:
    
    June 4, 2013 at 9:10 pm
    
    Moreover, contact hitters would be more willing to give in to the pitcher and settle for single with an RBI or two rather than aggressively shoot for an extra-base hit. And that would explain why traditional power hitters score poorly in this “clutch” stat — they aren’t willing, or are just plain unable, to give in to the pitcher and instead just approach the high-leverage plate appearance just as they would any other situation.
- The Doctor says:
  
  June 7, 2013 at 6:52 pm
  
  any lineup that moves the out-making duo of kipnis and cabrera out of the 2-3 spots is fine with me.
  - The Doctor says:
    
    June 7, 2013 at 6:53 pm
    
    d’oh – not sure why this came through as a reply instead of as its own comment. my bad.
Sean Porter says:

June 4, 2013 at 6:29 pm

With Cabrera’s injury, it will be interesting if THIS is the time the lineup gets overhauled…

I’d start instantly with moving Brantley to the #2 hole, moving Kipnis down, and putting Santana at #3.

cf M. Bourn
lf M. Brantley
dh C. Santana
1b N. Swisher
3b M. Reynolds
c Y. Gomes
2b J. Kipnis
ss M. Aviles
rf D. Stubbs/R. Raburn

When Cabrera comes back, I’d put him in either the #5 or 6 spot.
- David White says:
  
  June 4, 2013 at 9:02 pm
  
  I don’t Terry would put a rookie in front of Jason Kipnis. But I like that lineup.
Steve Alex says:

June 5, 2013 at 12:38 am

That’s a good lineup, Sean. Brantley works counts and puts it in play in the #2 spot. Santana draws 100 walks and hits for decent average and power in the #3 spot, and the Yanimal moves up!!!
AndyS says:

June 6, 2013 at 11:47 am

If strictly optimizing the lineup (when all are healthy), I’m interested to hear others opinions on….

1 – Bourn – CF
2 – Brantley – LF
3 – Santana – C
4 – Swisher – RF
5 – ACab – SS
6 – Reynolds – 1B (struggling at 3rd)
7 – Aviles – 3B
8 – Gomes – DH (can interchange w/ Santana positionally)
9 – Kipnis – 2B (round the lineup with speed)

Leaves Pinch hitting to Raburn (righty) and Giambi (lefty). Stubbs a situational pinch runner late in the game (or just for outfield coverage).

I’d like to see this as our primary lineup but continue to move guys around for rest.

Is this crazy? Of the 30 lineups we’ve tried this year I don’t think I’ve seen these same 9 guys in the lineup at the same time (not necessarily in that order).
- David White says:
  
  June 6, 2013 at 3:53 pm
  
  The batting order is in flux, with Cabrera’s injury, Gomes’ power surge, among other things. With that lineup, I’d like to see Gomes and Kipnis batting in front of Aviles, because they are better hitters and I don’t want a rally stopping with Aviles at the plate. Jason Kipnis doesn’t deserve to bat ninth, he just doesn’t deserve to bat 2nd or 3rd, imo. However, your suggestions are interesting.
  
  According to baseball-reference.com, these are the Indians most frequented lineups. http://i.imgur.com/F5fEqp6.png
Sean Porter says:

June 7, 2013 at 10:40 pm

What we are seeing lately is what happens to a lineup of streaky .260 hitters.

I’d give anything for a player on the Indians who could consistently hit .300 with moderate power. I’m not asking for Albert Pujols circa 2007 – just a good, consistent hitter with some pop to put in the 3 hole.

Swisher batting 3rd tonight was a Manny Acta-like move by Francona.