Predicting the standings pure folly

No, pitchers and catchers haven't reported to spring training yet. But another harbinger of spring is here: the 2004 baseball-preview magazines are on the newsstands. Every February, I pick up at least one of these -- The Sporting News, or Street & Smith's -- just to help me get in the mood, just like I start listening to my favorite Christmas music in early December.

I spotted the new Street & Smith's in the grocery store on Tuesday, and I brought it home with me. The 2004 edition of this venerable publication -- it's been around since World War II, at least -- gets off to a nice start, with a few well-chosen words about the great and now late Warren Spahn, and features on Jack McKeon, Jose Reyes, and the differences between ballparks. These are the sorts of things that Street & Smith's have been doing for more than half a century, and they're still doing them well.

Unfortunately, this year Street & Smith's took a foray into statistical analysis, and that was a big mistake. It seems that a fellow named G. Scott Thomas -- by the way, I've found it unwise to trust anybody who starts his name with an initial -- has devised a method to project not just the standings for the coming season (which is hard enough), but also the standings for the next five seasons. As Thomas writes,

... Our computer compared each franchise's 2001-03 performance against the historical records of 1,118 teams between 1948 and 1998, looking for parallels in win-loss records, offensive firepower, pitching skill and average margins of victory (or defeat).

This process identified each current team's closest match ... as well as 19 other teams from 1948-98 that were strikingly similar. The computer then looked at what happened to these 20 teams from the past during the next five seasons, averaging their records (giving extra weight to the best matches) to predict the current team's future performance.

Um, OK. So what happens? Just to prove that I'm not cherry-picking, here are the "projections" for the third-place (in 2004) teams in each division, 2004 through 2008 ...

Blue Jays: 84, 80, 79, 79, 80
Indians: 80, 77, 78, 78, 81
Angels: 80, 77, 78, 81, 81
Expos: 82, 76, 78, 81, 78
Pirates: 81, 82, 79, 80, 81
Dodgers: 83, 82, 80, 79, 83

I suspect that the 98 percent of you with functioning cortexes don't need me to tell you this, but those numbers are incredibly non-illuminating.

Here's another fun one ...

White Sox: 83, 83, 83, 82, 84

If nothing else, you have to admire the White Sox for their consistency. They won 83 games in 2001, 81 in 2002, 86 in 2003, and now they're going to win 82, 83, or 84 games in each of the next five seasons. This would, I suspect, be the most consistent eight-season run in major-league history, by quite a good margin.

Hey, somebody has to set the record, right? Except, as we saw earlier, a lot of teams are going to be this consistent. Just a few more five-year projections, because they amuse me ...

Royals: 77, 77, 77, 76, 79
Phillies: 84, 83, 80, 83, 83
Marlins: 80, 81, 80, 83, 81

Hey, good news for Marlins fans! In 2007, they'll actually top the .500 mark ... before dropping to .500 exactly (again) in 2008.

Yes, all of this is absurd. And I would worry that it's unfair to the author, except he's so strident in his claims. All these numbers are listed under the heading, "Five-year outlook," as if they bear some relationship to reality, and they even come with analysis. My particular favorite summaries (among many) are those for the Dodgers and Rockies.

Dodgers will be no better than also-rans. Good pitching plus no hitting equals a permanent lock on third place.

Rockies will be not better than Dodgers. Good hitting plus no pitching equals a permanent lock on fourth place.

Two teams, finishing in the same slots for five straight seasons? It could happen. But when you look at the projected wins for all the teams, you find that the standings will remain virtually the same in every division, and in each of the next five seasons (though you Orioles fans will be pleased to learn that in 2007 the O's will finally finish third, before dropping back to fourth again in 2008).

I have admitted, many times in this space, that I'm not a trained statistician. I know (as they say) just enough to be dangerous. But the problem with Thomas' analysis is so obvious, even to me, that I'm stunned that in 2004 an actual magazine with an actual history would actually waste seven pages on this actual pap.

Here's the basic problem ... You can't use groups of teams to predict the performance of one team, because any moderately sized group of teams is going to do the same thing: regress to the mean. So when you look at Thomas' predictions, you see the great teams regressing to 90 wins in 2008, the bad teams regressing (in a good way) to 75 wins, and everybody else landing at 80 or 81. In 2003, exactly two teams won between 79 and 83 games. Yet according to Thomas' "computer," in 2008 there will 17 teams in that range ... which is, of course, almost precisely the mean.

If you asked a reasonably intelligent person to devise a system to predict how every major-league team will fare over the next five seasons, he might come up with something exactly like what Thomas came up with. For every team, find the 20 most similar teams, and see how they fared. Seems reasonable enough, doesn't it?

So you run the numbers, and you discover that your output is completely meaningless. Drivel. What do you do?

A) Start over.
B) Realize the exercise is pointless.
C) Publish what you've got, and hope nobody notices.

I happen to think the exercise is pointless, but I wouldn't blame a guy for thinking that maybe you really can figure out the 2008 standings. What I don't understand is how anyone would choose C.

In a recent (and wonderful) Baseball Prospectus chat with Red Sox GM Theo Epstein (unfortunately, you have to be a subscriber to read the chat), Epstein was asked if he believed in "the hot hand."

Epstein's response was, of course, sensible: "No. I think regression to the mean is a more powerful force than the 'hot hand.' But an element in baseball overlooked by sabermetricians is coaching, instruction, development, improvement. There is such a thing as a player, even mid-season or mid-game, making an adjustment and changing certain elements of his performance. So no, I think the 'hot hand' doesn't necessarily exist, and regression to the mean is powerful. But players are dynamic -- they're not bound by their track records."

I don't care to argue with Epstein, because of course there's some truth to what he says. Players do improve ... but as Epstein would have to admit, players usually improve in a somewhat predictable fashion; they generally improve until they reach their late 20s, at which point they stop improving and begin to decline. Not all players. But most players. As for players improving in the middle of a season, it certainly does happen. But most of those players regress to the mean before long. Yes, players are dynamic. But they're not that dynamic.

Teams, though, are dynamic. Teams do regress to the mean, too, but that's where the similarity ends. You can coach and instruct and develop Pokey Reese until the cows come home, but at the end of the day -- or the season, or the career -- he's still Pokey Reese. Teams aren't like that, though. Five seasons ago, the Yankees won the World Series.

You know how many Yankees who played in that World Series are still on the roster?


Derek Jeter. Jorge Posada. Mariano Rivera. Bernie Williams.

Mind you, that was a great team. Other, lesser teams have seen even greater turnover. Which is just one of the reasons why it's impossible to predict, with any accuracy at all, the standings in 2005, let alone 2008. And anybody who tells you he can is either foolish, or lying.

Senior writer Rob Neyer writes three columns per week during baseball's offseason. This spring, Fireside will publish Rob's next book, "The Neyer/James Guide to Pitchers" (co-authored with Bill James); for more information, visit Rob's Web site. Also, click here to send a question for possible use on ESPNEWS.