Wednesday, October 2
May the best team win ... some of the time
By Tom Tippett
Diamond Mind Baseball
Tom Tippett is the founder of Diamond Mind Baseball, a highly-realistic strategy-oriented computer baseball game. He is also the designer of the simulation used in ESPN Classic Fantasy Baseball. This article originally appeared on the Diamond Mind web site. Click on their site for more baseball articles.
Baseball cliché: "Anything can happen in a short series." Observation #1: Since 1990, the team with the best overall regular-season record has won one World Series.
Observation #2: Since the extra round of playoffs was added in 1995, the team with the better regular season record has won 13 division series and lost 13 of them, with two contested by teams with identical records.
Observation #3: In league championship series play since 1990, the team with the better regular season record (among the two teams in the LCS) has won 12 times and lost 9 times, with one LCS involving teams with the same record.
In other words, it's not easy to go all the way. And this isn't a recent phonomenon. Since division play began in 1969, the team with the best regular-season record in baseball has won the World Series only 8 times in 32 tries. (Can you name these eight teams? Answer below.)
To put it another way, it's a fool's game to predict the winner of a postseason tournament, even when one team has dominated the regular season. But baseball is supposed to be fun, so we're going to have a little fun with the numbers to see what we can learn about the chances of each of this year's contenders.
To do that, we'll start by assessing each team's chances to win one game against a given opponent. We'll use that information to estimate each team's chances to win a series against that opponent. And we'll put those figures together to estimate each team's chances of winning three consecutive series.
Estimating one-game winning percentages
A - A * B WPct = ----------------- A + B - 2 * A * Bwhere A is team A's winning percentage and B is team B's winning percentage. In other words, if you have a .600 team playing a .400 team, this method shows that the better team can be expected to win 69.2% of the games between these two teams:
.600 - .600 * .400 .360 WPct = ----------------------------- = ------ = .692 .600 + .400 - 2 * .600 * .400 .520
If you were to take A's winning percentage as a given (say .600) and solve this equation for all possible values of B, you could determine A's chances in games against any conceivable opponent. And if you graphed those values, you'd see a curve, not a straight line.
But it's a gentle curve and the middle portion of that curve is very close to a straight line. That makes it possible to substitute a simpler straight-line formula that gives very similar results in the range of .400 to .600:
WPct = .500 + A - B
For example, if A is .550, the log5 and straight-line methods produce values that differ by no more than .001 whenever B is in the range of .400 to .600. The further A gets away from .500, the bigger the differences, but they are still manageable. If A is .600, the difference is as much as .005 when B is close to .400 but is still within .002 for all values of B from .440 to .630.
In other words, because almost all baseball teams fall into this range of .400 and .600, and because the differences are smallest when A and B are close to each other, the straight-line formula is a handy alternative that works for the vast majority of matchups.
A few years ago, Tom Ruane wrote a program that looked at the result of every AL and NL game from 1901 to 1997. The program placed each team into one of twenty groups based upon their winning percentage for that season. All teams with winning percentages less than .330 went into group A; those with winning percentage between .330 and .350 went into group B, and so on up to the top group, which had all teams with winning percentages greater than .690. For each game, the program figured out what type of matchup it was (e.g. group C vs group F) and then added the game result to the totals for that matchup.
That study showed that these formulas are very accurate predictors of the actual winning percentages in matchups involving these different groups. If you read that article, you'll see that we focused on the straight-line method, but it's not hard to see that the log5 method would have provided an even better fit for the 1901-1997 results that we compiled. We'll use the log5 method for the remainder of this article.
Adding in the home-field advantage
Estimating winning percentages for a 5-game or 7-game series
For example, the probability of winning a five-game series is the sum of the chances of sweeping, winning 3-1, or winning 3-2. There are ten ways a team can be first to win three games:
Result Patterns 3-0 WWW 3-1 LWWW, WLWW, WWLW 3-2 LLWWW, LWLWW, LWWLW, WLLWW, WLWLW, WWLLWFor example, if a .600 team is playing a .400 team, we've already established that it has a .692 chance to win each game on a neutral field. If games one and two are at home, their chances to sweep a series in three games are:
(.692 + .042) * (.692 + .042) * (.692 - .042) = .350or 35%. We can use similar logic to compute the probabilities for the patterns that produce a 3-1 or 3-2 win, add them up, and presto, we have the probability that the favored team will win the series one way or another. Using this method, here are the results for this year's division series matchups:
Matchup Favorite --------- ---------
In other words, this model says that the Yankees have a 57.4% chance of beating the Angels in a five-game series when New York has the home-field advantage, with Oakland having a bigger edge over the Twins.
We can move on to project the league championship series results. Of course, we don't know yet who will win each of the first-round matchups, so we'll need to do this for all possible outcomes of the first round:
Matchup Favorite --------- ---------
Finally, there are sixteen possible matchups for the World Series, with the AL champion having the home-field advantage no matter who makes it that far:
@ NY @ OAK @ MIN @ ANA --------- --------- --------- --------- ATL NY .534 OAK .525 ATL .594 ATL .537 ARI NY .594 OAK .585 ARI .535 ANA .527 SL NY .607 OAK .598 SL .521 ANA .541 SF NY .627 OAK .618 tie .500 ANA .561
Going all the way
The Yankees, for example, have a probability of .574 to beat Anaheim and advance to the ALCS. There's a .617 chance they'll face Oakland and a .523 chance they would beat the A's in that series, so their chances to go to the World Series through Oakland are .574 * .617 * .523 = .185.
But there's a .383 chance they'll face Minnesota and a .640 chance they'd beat the Twins, so their chances to go to the World Series through Minnesota are .574 * .383 * .640 = .141.
Add these two possibilities together and you get a probability of .326, or about one chance in three, that Yankee Stadium will host game one of the World Series.
We can repeat this process for the other seven teams and then extend it to include the probability of winning the World Series. And when we do that, we come up with the following (drumroll, please):
NY 19.0% chance to win World Series OAK 18.3% ATL 17.4% ARI 11.1% ANA 10.4% SL 9.3% SF 7.6% MIN 6.9%
Aren't we missing something?
This approach doesn't take into account the starting pitchers in each game. If Randy Johnson and Curt Schilling can replicate what they did last year, Arizona's chances increase. Schilling hasn't pitched well lately, but he might be able to turn it on again when things really matter.
We're assuming the home-field advantage is the same for everyone, and Minnesota fans can point to 1987 and 1991 as proof that their home field edge is bigger than most. Then again, all of this year's playoff teams won between 50 and 55 games at home during the regular season, so nobody stands out in this regard.
This method uses regular-season winning percentages as the basis for all matchups. You could argue that other indicators, such as runs scored minus runs allowed, might be a better gauge of team talent. Using run differentials, the chances for Anaheim and San Francisco increase, mostly at the expense of Minnesota. (Of course, if run differentials were paramount, the Red Sox would still be playing, the A's would be booking tee times, and the White Sox and Twins would be in a one-game playoff for the AL Central title.)
The use of regular-season winning percentages also assumes that what happened over a six-month period is indicative of how the team's stack up right now. The unbalanced schedule skews things, with the teams in the two West divisions having battled much harder to achieve their records. Nobody would argue that Arizona is at full strength with Luis Gonzalez on the sidelines for the duration. And anyone who has watched the Yankees dial it up about three notches in almost every October since 1996 has to consider the possibility that they could do that again.
One other thing. This method assumes that the probability of winning one game in a series is independent of anything that has already happened in previous series games. Any baseball fan knows this isn't true. One team may wear out its bullpen more than the other. "Destiny" or "momentum" may somehow favor one side or the other. Underdogs who lose a couple of close games may subconsiously realize they're not going to come back to win the series. Since 1989, there have been quite a few more series sweeps than this model would predict, suggesting that there are real effects that carry over from game to game.
Let's cut these guys a little slack
In seven cracks at the division series, Atlanta has won six times. That includes a 5-for-5 showing when they entered that series with a better record than their opponent, 1-for-1 as the underdog, and one loss (to St. Louis in 2000) when the records were the same.
Atlanta has been in nine of the last ten National League Championship Series. As the favorites, they won four times in seven tries, which is about par for the course. As underdogs, they have one win in two tries. That's not bad.
It's only in the World Series that the Braves have failed to achieve their full potential, going 1-for-5 since 1991. As underdogs, they've won one (over Cleveland in 1995) and lost one (to Minnesota in 1991). As favorites, they were upset by the Blue Jays in 1992, the Yankees in 1996, the Yankees again in 1999.
Overall, in 19 postseason series against very good teams, the Braves have 12 series wins and 7 losses to their credit. As favorites, they are 9-6. As underdogs, they've gone 3-2. That's not bad, not bad at all.
Of course, what stands out are those three World Series losses when they were favored. But all it would take is one more run to the title to erase a lot of those bad memories.
Consider this. If Atlanta does go all the way this year, that would give them a 15-7 record in postseason series since 1991 and two World Series wins in 11 trips to the postseason, with both wins coming since the third round of postseason play made this journey so much more difficult. If that happens, I hope they get the monkeys off their backs once and for all.
But that's a very big IF.
The bottom line
Atlanta's no-name bullpen must keep doing what it's been doing, and their 10th-ranked offense mustn't break down. Arizona has to make up for the loss of Gonzalez and hope that Schilling gets going again. The Cardinals have had a remarkable season given everything they've had to deal with, and may be this year's team of destiny, but their starting rotation is a very big question mark right now. For San Francisco, Barry Bonds must come up big and he must get some help, while the Giants pitching staff needs to show that their #2 ranking in the NL isn't just an illusion created by their home park.
The bottom line is that anything can happen this year, especially on the NL side. That's not news, of course. The record over the past 32 years is proof enough of that.
So even though the model described in this article leaves some things out, it's still worth noting that it's much more likely that the top-seeded Yankees won't win the whole thing than that they will. George Steinbrenner may be able to buy enough talent to win the AL East title every year, but it's not nearly as easy to buy three straight series wins against good teams.
Trivia answer: Since divisional play began in 1969, the eight teams that have won the World Series after posting the best regular-season record are the 1970 Orioles, 1975-76 Reds, 1978 Yankees, 1984 Tigers, 1986 Mets, 1989 Athletics, and 1998 Yankees.