Every year in November, the debate over which teams should compete for college football's championship grows in volume and passion.
Unlike some other debates that come up in sports, this one really matters. Beginning with the 2014 season, there are four playoff spots available to teams who -- through the judgment of a committee -- are deemed worthy. Debating those four teams (or two during the BCS era) is a challenge that really does decide winning and losing.
The debate that the committee will go through won't vary too much from what we've all heard over the 15 years of the BCS. The names change, but the arguments are familiar:
• Team A had a better record.
• Team B had a tougher schedule.
• Team A beat Team Z, which beat Team B.
• Team A's loss was a quality loss.
• Team B lost because it had an extra game.
• Team A would beat Team B if they played head-to-head right now.
• Team B took care of business.
During the BCS period, numerous computer ratings contributed to the decision of the two teams in the championship game. These computer ratings were mostly tasked with answering the question, "Who's best?" What we've realized over the years of debate is that there is a second question, subtly different from that, which underlies the decision of who competes for a title: "Who is deserving?"
This is why ESPN debuts two complementary metrics for evaluating college football teams this season. One of them, the Championship Drive Ratings, answers the question of who deserves to compete for the title. The other, the Football Power Index (FPI), answers the more traditional question of what teams are best.
The 2012 season was a great example of why both metrics are useful. In 2012, Notre Dame completed an undefeated regular season, beating quality opponents in Stanford, Oklahoma and traditional powers, Michigan and Michigan State, who weren't as good as normal. The Fighting Irish deserved to be in the BCS Championship, said one side of the debate. They also had several unconvincing wins, including against relatively weak teams. They just weren't very good, said the other side of the debate.
By these metrics, both sides of the debate were right. By the Championship Drive Ratings, the Irish were No. 2 behind Alabama, deserving a spot in the BCS Championship Game. By FPI, reflecting how good they were, the Irish were No. 10 behind Oregon, three Big 12 teams, four SEC teams (including Alabama) and Florida State. Notre Dame deserved to be there even if it wasn't as good as several other teams. Florida State was probably a better team than Notre Dame, but the Seminoles blew it, losing a couple of games where they held leads of at least a touchdown into the fourth quarter.
With two metrics that answer different questions, these are a few examples of the stories that can be told in a typical November:
• In 2012, Texas A&M, with its 10-2 record before the bowl games, was about 5 points better per game than Notre Dame, but the Aggies didn't deserve to compete for the title because going 10-2 against their schedule was easier than going 12-0 against Notre Dame's schedule. Since 2004, which is as far as our records go back, Notre Dame's 12-0 regular season, given its schedule strength, was the third-most impressive win-loss record. It is behind only Alabama going 13-0 in 2009 and USC going 12-0 in 2004. Texas A&M's 10-2 record against a good schedule was only the 46th-most impressive win-loss record, given schedule strength.
• Florida's 11-1 record in 2012 was almost as difficult to achieve as Notre Dame's 12-0 record and more difficult than Alabama's 12-1 record, but its struggles against more modest opponents (comebacks against Louisiana-Lafayette and Bowling Green, especially) drop the Irish below the Crimson Tide in deserving to compete for a title. Specifically, Florida was "ahead" by win probability on 70 percent of its plays in its wins, but Alabama was "ahead" on 85 percent of its plays in its wins; Alabama controlled its wins a lot more than Florida. FPI also said that Alabama was about nine points better than Florida, even before the Gators got trounced in their bowl game.
• Perhaps teams should get seeded in the College Football Playoff differently from how they get chosen. For example, had there been a four-team playoff in 2012, should Notre Dame, with its relatively poor FPI, have been seeded at No. 4? This would be the best way to ensure a competitive final game, though that isn't the only factor involved in seeding.
Ultimately, with college football's unbalanced and short schedules, there will always be a need to untangle who are the best teams and, importantly, who are the most deserving teams. We hope that these provide some strong guidance for how to do so.
How to Interpret the Ratings
CHAMP: Championship Drive Ratings. These are on a scale from 0 to 100, where 100 is better.
AVG GM WP: The average in-game win probability for a team, NOT adjusted for its opponents' strength.
SOS RK: Rank of schedule strength, based on difficulty of winning and controlling games versus that team's schedule (accounts for both site and strength of opponent).
ADJ WINpercent: Win percentage adjusted for strength of schedule, based on chance of average team having that W-L versus team's schedule.
ADJ GM WP: The average in-game win probability for a team, adjusted for its opponents' strength, based on chance of average team having that average in-game win probability versus team's schedule.
FPI: Football Power Index. These are on a scale of scoring margin, i.e., points on the scoreboard. A team with an FPI value of 24 would be a seven-point favorite when playing a team with an FPI value of 17, if playing on a neutral field.
Some Technical Details on the Ratings
Football Power Index
• The Football Power Index was developed by looking to "predict" games historically. Using many years of pregame opponent-adjusted information, we looked at what information best projected the winner of competitive games played after week 5 of the college season. The biggest pieces of information were team-level efficiencies for the offense, defense and special teams. Plays that could be considered lucky or unlucky – fumbles, interceptions and big plays – may have lesser weights than other plays. Importantly, we did not use preseason information, which can definitely help predictions and make early-season FPI values appear more consistent with other metrics. It is possible to do this in the future.
• Though our intention was not specifically to beat the Las Vegas point spread, we did compare the overall predicted percentage of games to what Vegas does and numbers are very similar. In 2012, both systems picked 76 percent of games correctly. Over the 2004-2012 span, both systems picked 72 percent of games correctly. For the purpose of identifying the best teams, this sufficed, although gambling would definitely require a higher standard.
• As with the Championship Drive Ratings, this does depend upon adjusting for opponents. The basic method for doing this is the Simple Rating System. Doing well offensively against a good defensive team improves the offensive rating. Doing well against a poor defensive team could even hurt a team's offensive rating if it wasn't as good as expected.
Championship Drive Ratings
• The Championship Drive Ratings were developed by trying to answer these two questions:
1. What is the chance that an average FBS team accomplishes that record against that schedule?
2. What is the chance that an average FBS team controls the in-game win probability so much against that schedule?
Both of these questions account for strength of opponent and site of each game on the team's schedule, including distance traveled for neutral-site games.
• Overall win-loss record counts more than in-game win probability toward the final rating. Using win probability helps color in a team's record by looking at each play rather than just the end result.
• Using in-game win probability rather than final score eliminates the incentive to run up the score and is more about controlling games from start to finish.
• All in all, this metric measures difficulty of team's accomplishment against its schedule, allowing fair comparison of undefeated, 1-loss and 2-loss teams from different conferences on the same scale.
• As with the Football Power Index, this metric does not include preseason information. We did not want preseason information to influence the ultimate answer of who deserves to be in a title game. We could have excluded preseason information only from the final version of the Championship Drive Ratings and we may do that in the future.