MLB Scores Schedule Pitching Probables Standings Statistics Transactions Injuries: AL | NL Players Power Alley All-Time Stats Message Board Minor Leagues MLB en espanol CLUBHOUSE Teams Anaheim Arizona Atlanta Baltimore Boston Chicago Cubs Chicago WS Cincinnati Cleveland Colorado Detroit Florida Houston Kansas City Los Angeles Milwaukee Minnesota Montreal NY Mets NY Yankees Oakland Philadelphia Pittsburgh San Diego San Francisco Seattle St. Louis Tampa Bay Texas Toronto THE ROSTER Dave Campbell Jim Caple Peter Gammons Joe Morgan Rob Neyer John Sickels Jayson Stark SHOP@ESPN.COM TeamStore ESPN Auctions SPORT SECTIONS
Wednesday, September 25
Updated: September 26, 11:03 AM ET

Statistically speaking, Bonds in '02 is the best

By Alan Schwarz
Special to ESPN.com

Wednesday night at the Boston chapter meeting of the American Statistical Association, Harvard statistics professor Carl Morris will not be discussing new graphical methods in health-care modeling or legal technometrics, the society's conventional fare. The subject will be dear to Morris' heart -- "Professional Baseball and Markov Modeling," and the proof of his new theorem that, when used to evaluate player performance, rates Barry Bonds as now finishing up by far the best offensive season of all time.

Morris' approach will sound familiar to many fans of baseball statistics: He has built a way to determine how many runs per game a team of players of various calibers will score. Bill James approximates this through his Runs Created per 27 outs; others, such as Tom Tippett of the Diamond Mind Baseball game, crunch it out through tens of thousands of computer iterations. Such methods can claim that a lineup of nine Rey Ordonezes would score, say, 2.2 runs per game, Jason Giambis 7.8 and so on.

 “ The stat isn't the star. Barry Bonds is. ” — Carl Morris, Harvard statistics professor

What Morris has done, he will assert, is devise the first approach that is not an estimate, nor a computer simulation, but a relatively straightforward algebraic formula that comes to an answer that is probabilistically true -- and can be backed up by rigorous mathematical proof. When he's done the team of nine Bondses winds up with an average of 22.4 runs per game -- besting seasons by Babe Ruth, Ted Williams and Bonds' own 2001 total as the best in major-league history.

"This is an exact calculation, not an estimate," says Morris, the former chairman of Harvard's statistics department. "It is correct." Before baseball statistics experts balk at Morris' confidence, he is not just some academic barging into sabermetrics with no idea of what has come before. The 64-year-old is aware of the field's accepted and often very accurate evaluation methods such as OPS, Runs Created, Linear Weights and more. He also acknowledges that his own procedure builds upon the concepts presented in a paper published in 1977 by Stanford University professor Thomas Cover and colleague Carroll W. Keilers in the journal "Operations Research." Morris himself has published several sports-related papers, and recently has met with one major-league team interested in his ideas.

His conviction for the significance of this new method -- which he calls Runs Per Game (RPG) -- derives from how it in effect merges the Jamesian approach of simple inputs (singles, walks, at-bats, etc.) with a device to reconstruct innings and how runners advance before three outs are made, all in a manner that should be understandable to anyone with a good high school mathematics background (and a little patience).

Before we continue, and for those appropriately skeptical, here's the formula with some explanation of its derivation:

RPG = 9 * [E(br) - E(lob)]

Not particularly helpful? Well, this is more intuitive than it might first appear. E(br) is the expected number of baserunners per inning, while E(lob) is the number of batters left on base per inning. The difference between those two represents those who have scored; multiply that by 9 and you get runs per nine innings, or game.

Now, how do we get E(br) and E(lob)? This is where it gets considerably more complicated but not ridiculously so. First, the easier half, E(br):

E(br) = 3 * [obp / (1 - obp)]

"Obp" is just what you think it is -- on-base percentage, otherwise known as the probability of not making an out. (1 - obp), then, is the probability of making an out. The ratio of these two is the expected number of runners who reach per out; multiplied by three, it becomes the expected number of runners per three outs, or inning. This is easier to see by plugging in a few representative obp's: a .500 OBP results in an average of three runners per inning, while a .333 OBP gets you an average of 1.5.

So, what is E(lob), the expected number of men left on base? This is the esoteric part of the calculation -- a considerable part of which most readers will be thankful to be spared. Here is the easiest representation of it:

E(lob) = [L1 * (1 - p0)] + [L2 * (1 - p0 - p1)] + [L3 * (1 - p0 - p1 - p2)]

Skip this current paragraph if you want, but again, the above formula isn't quite as awful as it looks. It's just the sum of three terms, using two concepts represented by the p's and the L's. The p's are the probabilities that 0, 1 and 2 baserunners reach in an inning -- for instance, p0, the probability of three outs in a row, is (1 - obp) cubed; the others are determined through elementary algebra. The L's quantify the number of men who will be left on base in certain situations -- for example, L2 is the probability that two men reach and do not score. The manner in which the L's and p's combine to create men left on base is too complicated to explain here; however, it does recreate, given certain assumptions, all the combinations of events in which runners reach, advance but ultimately do not score. (The L's also are the principal spot where a player's power comes into play, in that slugging clears runners from the bases rather than leaving them on.)

OK. Enough explanation. Given that the method does determine the expected number of men who reach and men who are left there logically and accurately -- and it does -- at this point let's do what it was designed for, and have some fun by plugging numbers in from this season. The top 10 RPG totals for the 2002 season, entering this final week:

 Player RPG 1. Barry Bonds 22.42 2. Manny Ramirez 10.82 3. Jim Thome 10.66 4. Brian Giles 10.10 5. Jason Giambi 9.36 6. Larry Walker 9.34 7. Todd Helton 9.28 8. Chipper Jones 9.05 9. Vladimir Guerrero 8.91 10. Sammy Sosa 8.86

No, Barry Bonds' total is no typo. Bonds is so good at all three major batting skills -- hitting for average, hitting for power and drawing walks (i.e., not making outs) -- that a team of nine Bondses would score more than twice as many runs as any other major-league player this season. The average player nets out at 4.44, a tick lower than what the actual average team scores. (And where's Alex Rodriguez? A surprising 14th, at 8.55 RPG, mainly because while he does everything very well, and has 56 home runs, he has done it in a whopping 700 plate appearances in part because he has the lowest walk rate among these hitters, meaning he has made many more outs.)

Moreover, as Morris will announce Wednesday night, Bonds also is putting up -- by a wide margin -- the best season of all time. Here are the top five RPG performances in major-league history:

 Player Year RPG 1. Barry Bonds 2002 22.4 (through Sunday) 2. Babe Ruth 1923 19.6 3. Ted Williams 1941 18.9 4. Babe Ruth 1920 18.2 5. Barry Bonds 2001 17.1

The most attractive feature of this method is that it is not an estimate. Most attempts at quantifying some sort of "runs per game" total have, in the past, been heuristic -- meaning people have started with a method that seems to approximate the matter well, then tinkered with it to make the correlation stronger. ("Heuristic" derives from the Greek term "Eureka," or, "I have found it.") Bill James' runs created per 27 outs is heuristic in that it combines hits and walks, etc., in a manner -- a cousin of multiplying on-base times slugging -- that winds up approximating runs per game extremely well; however, it does not recreate the scoring of runs in each inning through the rules and logic of how baseball runs are actually scored.

Morris' method is new in that it is an exact calculation based on these criteria, can be proven to be correct mathematically, and is simple enough to quickly set up on a spreadsheet.

"Runs created is very thoughtful and reasonable, but it is still an approximation," Morris explains. "It is not fundamental. It won't necessarily work if scoring increases dramatically or other major things change. This method is true -- and it will be true 100 years from now."

Of course, there are plenty of advanced methods that also rate Barry Bonds' amazing 2002 season as one of the best ever, and often the best. His 1.377 OPS is two points away from Babe Ruth's all-time record set in 1920. His 18.59 Runs Created per 27 Outs should contend for the highest total ever depending on which formula you use. But how many runs would a team of Bondses score? Morris' answer is the only one that is not an approximation, but logically and correctly constructed through the rules of the game.

Now, its being correct brings some problems. For example, RPG distorts the defense's use of the walk, particularly intentional ones. (A team of nine Barry Bondses wouldn't get 63 -- 63! -- intentional walks because no opponent would walk Bonds to get at Bonds.) This can be fixed by making the lineup one Bonds and eight average players -- and Barry still winds up posting the highest score ever, though by a smaller margin. It doesn't take into account park effects or the era in which a hitter plays. Double plays, stolen bases and other outs on the bases are not included, either, because doing so would force the use of Markovian probability theory and forbid the use of a straightforward formula. The importance of Morris' method, he will claim Wednesday night, is that it is the most detailed and logical single equation of runs scored possible.

"Yes, sometimes he's out stealing. And sometimes he's out trying to stretch a single," explains Morris, who plans to post the formula on his Harvard website along with an input device so fans can use it. "But this couldn't be kept simple if I incorporated that. If we really wanted to do it right, we'd incorporate all those other categories. But it would be so right, people wouldn't understand it and would be intimidated by it.

"Half will say it's too complicated, and half will say it's not complicated enough. But that doesn't mean you shouldn't do the calculation. When I say it's 'correct,' it's a correct determination of what we're trying to find. If batters really follow the assumptions, then it's inescapably true that this is the number of runs that will be scored."

Besides, what Morris is excited about most is not the equation, which he has worked on at various times for more than 20 years. It's the discovery that Bonds, literally as Morris speaks to his ASA colleagues Wednesday night, is shattering the all-time record in this new statistic.

"The stat isn't the star," Morris says. "Barry Bonds is."

Alan Schwarz is the Senior Writer of Baseball America magazine and a regular contributor to ESPN.com.

More from ESPN...
 Alan Schwarz Archive

ESPN Tools
 Email story   Most sent   Print story   Daily email