|Friday, April 25
Concentrating on the fragments of the fragments
By Rob Neyer
In response to Wednesday's column, in which I predicted 110-plus wins for the Yankees and 110-plus losses for the Tigers, Tom Tippett sent in the following ...
As you know, we (editor's note -- Tom is the founder and president of Diamond Mind Baseball) simulated the 2003 season 50 times last month and wrote up the results for ESPN.com's season preview. In those 50 seasons, the Yankees averaged 104 wins, one more than your projection. In 14 of those 50 seasons, the Yankees won at least 110 games, and they maxed out at 119. In other words, even before they got off to a hot start, I'd have given them a roughly 1-in-4 chance of reaching 110 wins and a small chance of topping the record of 116 wins.
The Tigers, on the other hand, averaged 55 wins and lost at least 110 games in 14 of 50 simulated seasons. Combine that with their awful start, and I agree that 110 losses are a pretty good bet for this team.
There is one minor element of your argument that I have to disagree with, however. You correctly pointed out that the Tigers run margin in 2002 was more consistent with 112 losses than their actual total of 106. I share your belief in the value of looking at run margins, but I also like to ask whether a team's run margin is consistent with the batting events they created and allowed during the season.
As I pointed out in my Team Efficiency article last December (which appeared on ESPN.com and is archived on the Diamond Mind web site), the 2002 Tigers were awful at converting offensive events into runs and almost as bad in preventing their opponents from doing the same. It's true that their run margin suggests that they deserved to lose 112 games, but their underlying batting and pitching stats were more consistent with a 64-98 record, so I can't agree that they were even worse than their record showed.
There's no question that the 2002 Tigers were a bad team, and bad teams often fail to capitalize on their opportunities or get those critical outs. But these inefficiencies tend not to persist from one season to the next, and it's reasonable to expect the Tigers to be more successful on both counts this year. So far, their offensive efficiency has been even worse than last year, but their defensive efficiency is near the league average, and it'll be interesting to see how they do the rest of the way.
Aside from making my analysis look pretty simplistic -- which, come to think of it, isn't all that tough a chore -- Tom's come up with a truly interesting question, which is
How deep should we go?
Until about 20 years ago, the analysis of a team's fortunes began with the wins and the losses ... and that's where it ended. Then Bill James started writing about what he called the Pythagorean method, which simply predicts a team's record from its runs scored and allowed. Why would anybody want to predict wins and losses after the fact? Because it turns out that between a team's hypothetical record and its actual record, the hypothetical record is better at predicting its future record.
And so for most of us who managed to get past wins and losses, runs scored and runs allowed are enough.
But as Tom notes, we can -- and probably should -- go even deeper.
In Moneyball: The Art of Winning an Unfair Game, Michael Lewis writes about "fragments," also known as "derivatives." These terms were first applied in the financial markets, right around the same time that Bill James was coming up with his Pythagorean method. According to Lewis, people figured out that if you could piece together the identity and the value of the fragments of a stock or a bond, you could figure out the inherent, underlying value of the stock or bond. And if that value differed from the actual price of the stock or bond, you could make a killing.
In baseball, runs are fragments of wins, which is why the Pythagorean method works as well as it does. If you want to determine the underlying quality of a baseball team, you might as well skip the wins and losses, and look instead at the runs scored and allowed.
But as Tom Tippett points out, there are fragments of runs, too. They're called hits and walks and errors and stolen bases, etc. We might simply describe these fragments, these derivatives, as "the Official Statistics." There are a few fragments that aren't official, I guess -- runners advanced on ground outs, maybe a few others -- but the official stats can explain the vast majority of runs.
And Tom's right. Runs are fine, but fragments of runs are finer. It's relatively simple to put the fragments together -- using Runs Created, or BaseRuns, or Super Linear Weights, or whatever -- and arrive at something we might call Expected Runs. Then you plug that into the Pythagorean method (or Pythagenport instead, for you discerning Baseball Prospectizens), and you've got a better method for determining the underlying quality of a baseball team.
But is that as far as we can go? No, it's not. As Lewis points out in Moneyball, there's yet one more layer (at least) of fragments. All hits, for example, are not created equal. If two players hit 120 singles, we consider those accomplishments the same. But what if one of the players hit 80 line drives and 40 ground balls with eyes, and the other hit 120 line drives? Would we expect them to match performances the next season?
No, we wouldn't. We'd expect the guy with 120 line drives to outperform the guy who got lucky with the grounders.
That is just one tiny example, of hundreds we could come up with. And for the people who care about such things, finding the fragments of the fragments of the fragments is the next great frontier.
Senior writer Rob Neyer writes four columns per week during the baseball season. His new book, "Rob Neyer's Big Book of Baseball Lineups," has just been published by Fireside. For more information, visit Rob's Web site.