The 2001 Major League Baseball season starts today in Puerto Rico, with the Toronto Blue Jays taking on the Texas Rangers. Toronto's rookie manager, Buck Martinez, has to choose a batting order for the opening game. Much has been made in the Toronto media about the "battle for the 2 slot". The popular consensus seems to be that Martinez has one of three choices for his lineup: One, Homer Bush bat in the second slot (the opening day lineup from 2000); Two, Alex Gonzalez bats number two; Or three, Jose Cruz, Jr. bats in the second slot. Actually Martinez has far more than three possible lineups. With a 25 man roster he has 741 354 768 000 possible batting orders! Of course, we can assume he is not going to start any pitcher in the field and this cuts his choices down to a mere 259 459 200 or 79 833 600 depending on whether he has 12 or 13 pitchers on the roster respectively. If we assume that we are only going to look at the starters from last year's opening day lineup, who are all projected to be the starters in this years lineup, Martinez has a much more manageable 362 880 possible lineups. The Jays could play 2240 seasons with these nine starters and never once repeat a batting lineup! So the questions of the day are:
For Blue Jays fans such as myself the year 2000 was the year of the homer. The Jays hit a lot of home runs (244). Despite the power, the Jays had more difficulty with run scoring in general, and were below the AL median in runs scored (8th with 861 runs scored). The other reason it was the year of the homer: Homer Bush. Two words that still make me cringe as a Jays' fan. Homer Bush put up a terrible OPS of 524 (to put that in perspective 6 of the 9 Jays starters had SLG of over 500). Bush was 20.1 runs below replacement level playing just 47% of the season1. So where would you want to put, arguably, the least valuable position player in the majors last year in your batting order? The Jays went with the second slot on opening day, and most of the time Bush was in the lineup, last year. As a fan following the Jays I feel that that just has to have cost us something, and want to see if my gut feeling is right or wrong.
I follow the newsgroups rec.sports.baseball and alt.sports.baseball.tor-bluejays and read various sabermetric analysis online, so I knew some research had been done in this space. My feeling was that the status quo consensus (as much as one can ever say such a beast exists) on batting order strategy was that:
My simulation calculates what the mean runs per game the Jays score with a given batting order. Each Jays' year 2000 numbers were used to estimate their probabilities of various outcomes of a plate appearance. Each possible lineup would then get to bat for a series of nine inning games, with the runs scored in each game tracked and summary statistics for each lineup produced. From this we could try to answer the three questions I asked in the introduction. I'll go into much further detail on the simulation, and its limitations, later on, but first, let's review the previous sabermetric analysis.
The first analysis I could find on lineup construction was from 1954! Branch Rickey published an article analyzing baseball in Life magazine2. Rickey was quite ahead of his time, noting that "[a]s a statistic, RBIs were not only misleading but dishonest". Nearly half a century later many managers and general managers still worship the misleading and dishonest RBI. About lineup construction Rickey noted that a likely key to success was "a closer grouping in the batting order of the club's high OBA hitters" than most teams had, as this led to better team performance in "clutch" situations, because you had the teams best batsmen up more often in these key situations.
A more modern, and more focused approach to lineup construction was taken by Mark Pankin in 1991 3. Pankin used a Markov model to perform the analysis on 1800 possible lineups for each team in the majors in 1986, and then tried to deduce from the best of these lineups what the key characteristics of each lineup slot by using regression on the characteristics of the players who produced the best results in each lineup slot. Among the conclusion Pankin makes is that speed at the top of the order is not what should determine your leadoff hitters, but rather OBA. Further in Pankin's calculation the typical manager's lineup was about 0.05 runs per game worse than Pankin's lineups constructed around the regressed qualities (although it varied from 0 to 0.1 run per game depending on the team). One quite surprising thing was that even though Pankin's ideal lineup and the traditional lineups were quite similar in performance, they were very different in composition.
A later study in 1997 posted by rec.sports.baseball regular Roger Moore again tackled the question of lineup construction, this time using a simulation and the 1996 LA Dodgers4. Moore pitted the Dodgers against themselves. One team had the conventional Dodger lineup, and their opponents had one of three different types of lineups: descending OBA, ascending OBA, and randomly ordered. The conventional lineup fared better than the ascending OBA (win percentage 52.006), but worse than descending OBA (win percentage 49.895) (random ordered was in the middle worse than conventional). The 1986 Dodgers scored 638 runs on the season5. So using the pythagorean rule (with 1.83 as exponent)6 we can calculate that this means in Moore's simulation the ascending OBA probably scored about 0.17 runs per game fewer than the actual Dodgers lineup, the random order would have scored about 0.07 runs per game fewer than the actual Dodgers, and the descending OBA would have scored approximately 0.01 runs a game more than the actual Dodgers.
Let's first look to the opening day 2000 lineup, and what kind of production one should expect from: 1. Stewart; 2. Bush; 3. Mondesi; 4. Delgado; 5. Fullmer; 6. Batista; 7. Fletcher; 8. Cruz Jr.; 9. Gonzalez. In my simulation this ordering produces an average of 5.37 RPG (869 runs per season). Not bad, but what if we move Cruz Jr. to the second spot, Gonzalez to the 8 spot, and Bush as the number nine hitter? This produces an average of 5.42 RPG (878 runs per season). A little better, now what if Gonzalez gets the second slot (which from what I last heard, is Martinez most likely choice), Cruz Jr eighth, and Bush ninth? The Jays now produce 5.43 RPG (879 runs per season). While this seems slightly better than Cruz Jr in the second slot, the difference between Cruz Jr. second and Gonzalez second, is smaller than the precision of my simulated run size7. But it is clear that Bush in the two slot, based on last years number, is not the wisest of choices 8. But the difference between these orders was very small, about 0.85 wins expected difference (using the pythagorean method again) between Bush in the second spot and Gonzalez. So it looks like Martinez is leaning the right way on which of these batters should fill the second slot.
Well what about the sabermetric lineup orderings? I tried Jays lineups sorted increasing and decreasing by AVG, OBA, SLG, and OPS. Here
Listed from worst to best you can see a much larger difference:
|Number||Description||Runs Per Game||Runs Per Season||Extra Wins (compared to opening day 2000)|
|1||Increasing by AVG||5.22||846||-2.0|
|2||Increasing by SLG||5.24||848||-1.8|
|3||Increasing by OBA||5.24||849||-1.7|
|4||Increasing by OPS||5.26||852||-1.5|
|5||Decreasing by AVG||5.45||883||1.2|
|6||Decreasing by SLG||5.45||884||1.3|
|7||Decreasing by OBA||5.51||893||2.0|
|8||Decreasing by OPS||5.52||894||2.1|
I also wanted to try some random lineups to see what I could determine from them, and what range of values one gets with random lineups.
I tried 250 randomly generated lineups. I would have liked to do more but each lineup takes 10 to 15 minutes to calculate, so I had limited time.
250 lineups means I was only getting about 0.07 % of the possible lineups, so there certainly are more to try. The average of the 250 random lineups
produced 5.35 RPG (867 runs per season) which is very near the results opening day lineup from 2000 (0.2 wins worse - a smaller difference than the
precision of the simulation). The worst random lineup (Bush, Cruz, Batista, Gonzalez, Fullmer, Mondesi, Stewart, Fletcher, Delgado) produced
5.24 RPG (849 runs per season) pretty much identical to the increasing by SLG and increasing by OBA strategies. The best random lineup
(Delgado, Stewart, Mondesi, Cruz, Fullmer, Fletcher, Batista, Gonzalez, Bush) produced 5.47 RPG (886 runs per season) better than the
decreasing by AVG and SLG, but not as good as the decreasing by OBA and OPS lineups. Here is a table summarizing how the team scored
when a player batted at each position:
|Name||Runs Team Scores Per Game When Player Hits In Position||Best Position|
Testing this lineup we do indeed see a relatively strong lineup, scoring 5.48 RPG, 888 runs per season, this lineup was about as good as the best of the random lineups, but not as good as a lineup of decreasing OBA or decreasing OPS.
There are two main questions that come to mind reading about a simulation like this: How precise is the simulation? How accurate is the simulation?
The first is the far simpler concern, and basically asks, which of the results presented, if any, are statistically sound and repeatable? Or am I just making much ado about nothing. In running the simulation for each season I made sure that there I ran enough games such that the error on the RPG would be small enough that significant differences could be determined over the expected range of values based on some small samples I had run (5.2 to 5.6). I knew that the standard deviation of runs in a given game was about 3.3-3.4 runs, and from that I could calculate what magnitude of trials I'd need. I choose 32400 games for each lineup because it was an even multiple of 162 (200) and because it was large enough to give a small enough mean error on RPG values (between 0.018 and 0.019 which maps to about 3 runs a season) while small enough to run in under 15 minutes per lineup. This means that while some of the differences between the OBA and OPS lineups may not quite have been statistically significant, the difference between Bush in the second slot and Gonzalez in the second slot were. As for the table above, dividing player into their best batting slot, I'm fairly confident that Delgado belongs in one of the first two slots an Bush towards the end, but I'm sure the information is not significant to the number of digits listed in the table. To get that kind of significance I'd need a lot more runs of the data. That may partially explain why "the Jays strongest lineup" above is not quite as strong as the decreasing OPS, the other reason may be that the best slot for a hitter is dependent on who bats around him, and while batting Bush ninth and Delgado first might be the best things overall individually, maybe Delgado's best slot, given Bush is batting ninth, is really the two slot. But calculating the conditional probabilities like that leaves us back with 9! choices.
So the initial data is statistically significant, but is the simulation accurate? In making a simulation there are always simplifying assumptions that need to be made, and when true values are not known, it is difficult to ensure that all of the simplifying assumptions maintain a reasonable relation between the simulation and reality.
There are two different types of possible inaccuracies. Those which are unlikely to be incorporated into any simulations, and those that just weren't incorporated into this simulation. Amongst the first type are such things as:
I think I will now spend more time explaining exactly what my simulation does, as it gives me the easiest way to discuss any of its inaccuracies. As I've already mentioned it assumes that the Jays' 2000 numbers were their true ability. I hope that in Delgado's case this is true, and in Bush's case far from true (he was injured, and hopefully can be average, or at least above replacement level in 2001). I then for each of their at bats generated a number between 1 and their total number of year 2000 PA. Depending on the number then they either got a BB, single, 2B, 3B, HR, or OUT based on their own personal distribution of these events. If there were a hit and base runners were on then the base runners advanced based on probabilities from a previous major league season (which include the possibility of an out on the basebaths - trying to score from second on a single say)9. Count up the runs in each inning until three outs, and the innings in each game until nine innings were up, and the number of runs in a game was recorded for each of the 32400 games. Thus my simulation has the following flaws:
I still feel that even with all of the above drawbacks my simulation has value. Last year's opening day lineup was roughly what was used where possible and they scored 861, well with in spitting distance of the 869 my simulation would predict (well actually it is not clear what my simulation would predict as you'd need to always play nine inning games, and always play your starters to get this prediction). It seems clear to me that batting Bush second last year was not the best move, although I was a little surprised at how little it mattered. Still an extra 2 wins last year would have made it much closer. I think that managers ought to consider experimenting moving their higher OBA and OPS guys to the top of the order. It appears that the effect of getting more at bats is more important than the perfectly constructed "little ball" first inning. This may partially be because the 2000 Blue Jays played in an offensive explosion and most of the Jays liked to swing for the fences. Still Roger Moore's study found similar results with the 1996 L.A. Dodgers. So that brings us back to the first three questions: