Since the introduction of the Total QBR metric, we have followed some of the commentary and questions regarding it. The following is the first step in trying to clear up misconceptions and answer relevant questions.
We also intend to explain more about the methodology at the Sloan Sports Analytics Conference in Boston in March 2012. We invite you to come to that and ask questions.
1. Is it subjective?
2. Is it a black box?
3. How does the division of credit work?
4. How come quarterbacks get so much blame for sacks?
5. Explain the Clutch Index.
6. Why doesn't it have a defensive adjustment?
7. What is the point of a new quarterback rating?
8. How could Eli Manning be above average with 25 interceptions in 2010? How did Philip Rivers' interception against Dallas not matter? How come Aaron Rodgers doesn't rank higher with how he played in the playoffs? Was Jay Cutler that bad? How is Matt Ryan or Colt McCoy so high in the rankings?
9. Q. Does QBR relate to winning?
A. There is a perception, mainly from people who watched the QBR special, that QBR has a lot of subjective components. Some who watched the show thought that the expected points were allocated subjectively by people watching games -- give this guy a point or take away a point. This is not true and something we need to clear up.
What underlies QBR is expected points and win probability. In determining expected points and win probability as they relate to field position, down, and distance, there is no subjectivity other than slight differences in how these models are built (which is why AdvancedNFLstats.com doesn't have exactly the same numbers we have). These differences are definitely small.
The part of QBR that could be cynically called "subjective" is that there are judgment calls with regard to what are dropped passes vs overthrows or underthrows or defended passes. ESPN's video trackers have strict guidelines on how to chart these items so that they are consistent across the different people doing charting. If you as a fan go out and chart these yourself for a game or two, you will see how several calls are easy, but some are quite hard to judge. We have standards that make things more uniform and every game is done twice to reconcile inconsistencies. Despite the standards, the gray areas will still exist and, because they exist, the division of credit quantitative analysis described below is important. That analysis is what says that a "drop" isn't necessarily all about a receiver because there are gray areas in drops.
Notably, the kind of judgment calls here are not unique. Every week, statistics like hurries, tackles, or targets get used but have similar judgment necessary to decide them. Neither of these are official NFL statistics and both come with clear gray area. Coaches are known to spend hours going back to evaluate credit on various plays. Our hope is that any statistics used to evaluate individuals in football come with analysis to help split the credit in these more gray areas. We did that analysis to limit subjectivity.
A. One of the main concerns that people have had with Total QBR is that it is a "black box," spitting out results without saying how those results are generated.
It is true that there isn't a simple formula to just calculate QBR, but we wrote the QBR Guide and are writing this to make clear that it is not meant to be a statistic where we say, "Trust us." ESPN The Magazine's NFL Preview edition also explains aspects of this to be more transparent.
The method we followed to generate QBR is another step in the line of work done in Hidden Game of Football, at AdvancedNFLStats.com, and at FootballOutsiders.com. The methodology for Football Outsiders' Adjusted Line Yards is the same one used for dividing credit in QBR. The basic premise behind AdvancedNFLStats' work on expected points and expected wins is the same for QBR; we did it a different way and accounted for a few more factors, but a lot of results are essentially the same. If you know the work at those sites, the work behind QBR is similar.
Will it ever be calculable with a calculator? Not easily because it does look at every play. But nor is the expected points of AdvancedNFLstats easily calculated. They have an online tool to generate expected points and win probability; those and other tools will eventually come to ESPN, as well.
Will the components of QBR be available? Yes. We can look at how much interceptions, sacks, and fumbles hurt a QB and we intend to display that on ESPN.com. We can look at completions and scrambles and designed rushes and see how much those added to a QB's rating. These will be available online so that people can see where players fit. A representation of the QB's average clutch index will also be shown. These elements will add to the story and putting those out there will help fans see what QBR sees.
Q. How does the division of credit work?
A. Division of credit for a play is done a lot in Total QBR. Pass plays are the result primarily of an offensive line giving a QB time, a QB making a good decision and throwing accurately, and a receiver holding on to the ball and turning it into as many yards as he can. In most other work, all of the yards the team gets is given to the quarterback and the receiver (and the offensive line, if it ever got talked about), but this really double-counts those yards. We know that the credit really should be split and the analysis to split the credit has been available for sports for a while.
Specifically, Ben Alamar did this for the Adjusted Line Yards work of FootballOutsiders, where he looked at what percentage of yards are the offensive line for long and short gains. He also did it for a paper in baseball on whether the pitcher or the hitter controls the plate in baseball. From that work, we brought him in to do the same thing here.
The details of how it works are spelled out in the paper on baseball, but we will attempt to describe it conceptually here.
Let's use the example of a quarterback throwing to a receiver. If you have a quarterback who throws a lot of passes that get dropped and a receiver who doesn't drop a lot of passes, what will happen when that quarterback throws to that receiver? Intuitively, we think of dropped passes as mostly associated with receivers, so the QB with a lot of drops would probably be labeled unlucky and the receiver would have "good hands." So the pass is more likely to be complete and not dropped. That is intuition.
The analysis looks at the mathematical way to predict whether the ball will be dropped and whether the factors that make that prediction are more quarterback-related or more receiver-related. If the factors are more receiver-related, then drops are more on the receiver. If the factors are more associated with the QB stats, then drops are more on the quarterback. In the end, analysis does generally support intuition -- drops are more on the receiver.
So it would predict that the pass wouldn't be dropped and the QB needs more receivers like this one who don't drop the ball.
The benefit of doing this is that it uses the existing data from our video tracking team to assess the division of credit. So, given the way they have charted drops and overthrows, for example, this method suggests how much is the QB and how much is the receiver. There are some gray areas in charting drops, overthrows, etc. where a drop could be an overthrow or an overthrow could be a drop. By doing this analysis, it accounts for that gray area and the division of credit at the same time.
Q. How come quarterbacks get so much blame for sacks?
A. One of the earliest conversations we had when we took on the task of developing QBR was about sacks. We had heard from Trent Dilfer and others that many sacks are on the quarterback, not the offensive line. Some quarterbacks just hold the ball too long because they are indecisive or not confident in their ability. Pro-football-reference.com did a study suggesting that quarterbacks are "more responsible for sack rate than we believe." Anecdotally, when we looked at Matt Cassel taking over for Tom Brady in New England in 2008, Brady's sack percentage was 3.5% in 2007 and 2.8% in 2009, but Cassel was sacked 8.3% of the time in 2008, which is closer to Cassel's career sack percentage of 7.4% than the sack percentage of the New England offensive line over that period.
Dividing credit on this factor is difficult because the line and the quarterback are often tightly linked, with quarterbacks rarely changing and their offensive line also not changing much. What was available were things like the number of rushers and, for a select group of games, the amount of time a quarterback had to throw. Intuitively, one would think that the more time a quarterback had before they got sacked, the more responsible the quarterback (and their receivers) is for taking a sack -- the line can only hold on for so long. This is not always true, but it is a rule of thumb.
Similarly, with more rushers, there is less that the offensive line can do to keep them from the quarterback. The quarterback can better see and adapt to extra rushers by finding receivers with single or no coverage. The analysis we did with the data supported both these ideas. That analysis also suggested that, on average, a little more than half of the blame for sacks is on the quarterback. With extra rushers, it goes up to about 60% on the quarterback. The data for time to throw the ball was more sparse and thus less conclusive, but suggested about 53% of blame for sacks goes to quarterbacks on average.
These are average results and there are variations in offensive lines, but it is important to see that the analysis suggests half of sacks are typically on the QB. One of the long term improvements we do envision is more refined data to split credit between quarterbacks and their blockers for sacks and other plays.
A. The Clutch Index or Clutch Weight or, for those familiar with baseball analytics, the Leverage Index is a measure of how important any play is towards changing the winning percentage of the game. This Clutch Index is calculated pre-play and does not depend on the outcome of the play. We think of it as generally reflecting how much pressure a player may feel on the play. For example, imagine a team down 4 with 3 seconds to go in the game. In one case, it is 3rd and goal from the 3 yard line. In the second case, it is 3rd and 10 from midfield. In the first case, it is a high pressure situation. The team can win with the right play call, the right block, or the right pass, but they can also lose the game with the wrong call, block, or pass. Whether they win or lose on that last play, people will be talking about the last play after the game. In the second case, it is unlikely a Hail Mary works, so if they don't win on that last play, people will be talking about the rest of the game, not the failure on the last play. Only if they win will people be talking about that last play; there was less pressure then.
The Clutch Index reflects this by assigning a much higher value in the first case than in the second.
One of the big things a Clutch Index was set up to do was to minimize the value of plays made in games already decided. When a quarterback is piling up yards in the 4th quarter down several scores, they are doing it against a defense that is not working as hard as they would normally. The defense may be in prevent mode, they may take out their better players, but they just aren't competing the same as you would see in a tight game. Our advisors, from Ron Jaworski to Trent Dilfer, felt strongly that this should be accounted for.
To develop the Clutch Index, we looked at the game time and the game closeness (defined to be how close the win probability was to 50%) and how plays affected the win probability on average. The Clutch Index then became a calculated function just of time in the game and how close it was. Late close games were more clutch than early close games. But early close games were more clutch than a lot of other situations. Blowouts at any time of the game received fairly low clutch indices. This generally did what our advisors asked for, also rewarding performance in tight games.
For those aware of AdvancedNFLStats, that site has WPA, which stands for win probability added. WPA looks at the actual change in winning percentage with every play. It is an intuitively nice concept, but it has strange consequences. Single plays can completely dominate the rest of the game. A game decided on the last play gives almost all weight to that one play, even though the other plays building up to it were important in putting the team in a position to win on that last play. WPA also has the flaw of weighting every win the same amount. A 45-3 win is not viewed any differently than a 24-20 win even though there is a big difference in those games in that one represented domination and the other could be luck. A quarterback in that first game probably played great, whereas in that second game, the QB may have played great, but threw an incompletion on 3rd and goal from the 3 with 3 seconds left to "lose" the game. WPA for that second QB would probably reflect too much of that last play and not enough of the whole game.
Finally, a question we got from Aaron Schatz at Football Outsiders was this, "If you have two QBs with the exact same performance, but one has a bad defense and the other has a good defense, will the QB with the bad defense get a better rating?" The question is motivated by the idea that a good quarterback with a bad defense will be facing more close games than a good quarterback with a good defense. As a result, their clutch opportunities will be higher. The answer to the question is actually not clear. Because QBR normalizes by how many clutch opportunities quarterbacks get, there is no straightforward answer to the question. If they are good quarterbacks and both do exactly the same in clutch situations vs non-clutch situations, then they will both have the same value for QBR. That is as straightforward an answer as we can give.
Q. Why doesn't it have a defensive adjustment?
A. In the Guide to Total QBR, we talk about why there is no adjustment for the opponent played. There are many reasons, though. First, an opponent adjustment really is variable. We don't know how good an opponent is for sure. The adjustment for opponent can change over the course of the season, which causes problems interpreting QBR. If a guy goes into Week 5 with a QBR of 50, then doesn't play because it's a bye week, but the teams he has played against all give up a ton of points, the estimate of the opponent adjustment can change a lot. This would then lower an opponent-adjust QBR value. We didn't want the headline QBR statistics to have that kind of variability that is just associated with estimates of opponent quality.
The basic adjustment for opponents or defense is to look at opposing teams and iteratively estimate how good they have been, but there really are multiple ways to adjust for opponent. There is now information available about what specific players were on the field for specific plays. Should we adjust for times in the game when a team has their best or worst players on the field? This does better account for when a player like Ray Lewis gets hurt, but it then requires a more player-specific defensive evaluation.
Even further, QBR was meant to be able to be sliced to work with all sorts of situations. We can look at QBs in 3rd and long or 3rd and short, for example. The defense they face in 3rd and long is different than the one in 3rd and short. Should there be different defensive adjustments for these two? Probably so.
These complexities are what led us to keep opponent or defensive adjustments out of QBR. It complicates the story instead of simplifying it.
This being said, there will be times over the course of a season to talk about opponent adjustments, but we decided that it would be better to do OUTSIDE of Total QBR itself. Doing it outside of it allows it to be added when needed and allows it to be added in different ways depending upon the data that are available or upon the intent of the analysis.
Q. What is the point of a new quarterback rating?
A. First of all, the NFL Passer Rating needed an update. It was meant to rate quarterbacks only as passers, not rushers or scramblers. It wasn't meant to account for fumbles. It didn't consider sacks to be part of what passers do, though there is plenty of evidence now that they are at least partly associated with the QB. It was built in the 1970s to apply to full seasons with less applicability to games or small groups of plays, like third down. As a tool for evaluating quarterbacks for all that they do, the NFL Passer Rating was missing too much.
Upon design, we wanted to create a quarterback rating that could be used as a tool for identifying a better quarterback. If a General Manager is making a decision about quarterbacks, what would he want to consider? We kept that question in mind as we built it, but part of answering that question is not only the What but How. When using a statistical tool to help make a decision, a critical part is the ability to break it down -- understand what goes into it.
Total QBR is very easily broken down. Quarterbacks get expected points through completions and incompletions, interceptions, interception returns, sacks, fumbles, fumble recoveries, scrambles, designed rushes, defensive pass interference penalties. This tool can show how much comes from these different components. If we find that drawing defensive pass interference penalties isn't a repeatable skill, then making a decision about a quarterback can use QBR without the defensive pass interference part. Defensive pass interference was a part of his past performance, so we want it available and in there so that, when we get to ratings for other positions, all the players add to a total, but it can be separated out. QBR gives that flexibility to produce useful ratings, but tell the parts of the story as needed.
Another important design factor for Total QBR was that it be difficult to "game the system." We heard frequently that quarterbacks knew how the NFL Passer Rating worked, so they would try to maximize their Passer Rating even if it didn't help their team win. Total QBR is built upon team success, then broken down to the quarterback contribution, so that a quarterback's incentives and his team goals are better aligned. Racking up lots of yards in a meaningless situation doesn't help a QB in QBR like it did with NFL Passer Rating. Going back again to what an NFL GM would like, aligning a quarterback's statistical performance with that of the team helps make contract negotiation easier.
A. These are various result-related questions. We had to look at many of them as we went through the process to make sure things were being done right. We will address a few of them here.
How could Eli Manning be above average with 25 interceptions in 2010?
When we first saw Eli Manning at #7 in 2011 with 25 interceptions, we started investigating. One of the biggest components of the mismatch in perception and rating is in that Manning had 4 interceptions last year that were after a receiver really should have caught the ball. He had several others that hit the hands of receivers. He had a larger than average number of his incompletions also dropped by receivers, yet he still was in the top 10 in completion percentage. Another component of the mismatch is that he really took few sacks last year. Finally, he was, besides the interceptions, quite productive, throwing for a lot of yards downfield and a lot of touchdowns.
One thing that our advisors suggested about many quarterbacks, but with Eli Manning as the prime example, is that they often are integral to the running game. Quarterbacks with freedom to call audibles make reads at the line of scrimmage and can change which hole the backs go to, really helping certain rushers gain their yards. We looked into this but didn't have data or the right analysis to find this. Nonetheless, we have been told that Giants fans should appreciate this subtle aspect of Eli Manning's game.
How did Philip Rivers' interception against Dallas shown in the QBR Special not matter?
In the show that presented Total QBR to the public, there was a Philip Rivers interception from his own end zone that went out to about his own 30. It was portrayed as less harmful than a David Garrard interception that was returned for a touchdown. It should be clear that it is less harmful, but we want to now make clear that Rivers' interception was not a meaningless interception, which could be construed from the wording on the show. A meaningless interception is the Hail Mary at the end of the half that doesn't get returned for a touchdown. Rivers' interception wasn't as bad as many, but it was a negative play and it was underthrown. We apologize for the confusion.
How come Aaron Rodgers doesn't rank higher with how he played in the playoffs?
Aaron Rodgers played tremendously in the playoffs. It was, by QBR's count, the best playoff run in the last three years with a QBR of 86.5. But his regular season number was 67.9 -- still good, but not other-worldly as he was in the playoffs. We have not emphasized sufficiently that 67.9 is just his regular season number and doesn't incorporate what he did in the playoffs. As with all NFL statistics, we are keeping playoff and regular season numbers separate.
How is Matt Ryan or Colt McCoy so high in the rankings?
These are a couple of the more common quarterbacks whose QBR value is questioned. Matt Ryan ranked 3rd in 2010 and McCoy had a 46.6 QBR as a rookie, higher than the 41.0 for Sam Bradford.
Ryan's performance was, by most metrics, quite good. He was a Pro Bowler, his 91.0 NFL Passer Rating was 6th, the Falcons were in the top 10 in team offensive EPA, and the team won 13 games. He played poorly in his playoff matchup against Aaron Rodgers, with a QBR of 11, so the last impression people have of him is a poor one, so perhaps that is part of the mismatch. The legend of "Matty Ice," that he is cool under pressure, does show up in QBR calculations. His QBR in 3rd and long was 78 and 68 in the 4th quarter of close games. Using AdvancedNFLStats and their WPA calculation for QBs, which strongly reflects performance in clutch situations, Ryan was the best QB in 2010.
Colt McCoy was 2-6 as a starter for the Cleveland Browns as a rookie. Sam Bradford drew so much more attention for his rookie year with the Rams that McCoy didn't get noticed. With just 6 TDs in those 8 games, McCoy also wasn't the guy putting the ball in the end zone. And, like Ryan, his last impressions weren't very good, throwing 6 interceptions in his last 70 attempts after throwing 3 in his first 150. But that is part of the story -- he was pretty good out of the gate, with 5 of his first 6 games having an NFL Passer Rating over 80. The Browns beat the Patriots and Saints and had 3 losses by less than one score. In the Browns' 8 games with McCoy as starting QB, they had -10 expected points added as a team; in the other 8, they had -54. And, if we want to do some opponent adjustment, he faced the Steelers twice, the Ravens, and the Jets -- all top 10 teams in defensive expected points added.
Q. Does QBR relate to winning?
A. How QBR "relates to winning" is a question that can be interpreted various ways. One way that we addressed this on TV was that the team winning the QBR battle within a game wins the game 86% of the time. Teams that win the turnover battle don't win this often. Teams that win the NFL Passer Rating battle don't win this often. This result is primarily emphasizing that QBR is capturing team results and that quarterbacks performance is quite important to that.
But QBR is not meant to be a perfect indicator. A quarterback affects the offense, so a quarterback with a good defense doesn't have to be as good. What illustrates this are the 2008 seasons of Ben Roethlisberger of the Pittsburgh Steelers and Dan Orlovsky of the defenseless Detroit Lions. Orlovsky had a Total QBR value of 51, whereas Roethlisberger posted just a 46 (but a 61 in the playoffs that people remember).
In that year, expected points allowed by team defense had the Steelers second in the NFL and the Lions dead last, with more than 300 points difference on the defensive side of the ball. With that kind of defensive difference, it was possible for Orlovsky to have a better rating than Roethlisberger and still lose a lot more. Orlovsky wasn't terrible either. In the 7 games that Orlovsky started for the Lions, the team offense was about average by expected points added; in the other 9 games, they were roughly 10 points per game worse. In the games started by Roethlisberger, the Pittsburgh team offense was about the same as the Orlovsky-led Lions offense.
So QBR is meant to correlate to offense, moving the ball downfield, turning good field position into points, avoiding giving it back to the defense. Good offensive performance is not the same as winning. To the degree that offense correlates to winning, QBR should be helpful.