The slow demystification of defensive statistics

How good is Mike Trout on defense, anyway? Jasen Vinlove/USA Today Sports

PHOENIX, Ariz. -- A couple of weeks ago, I dropped into the seventh annual SABR Analytics Conference, a three-day series of panel discussions and presentations that hint at new frontiers of baseball research. The range of topics was impressive, extending well beyond the realm of statistical analysis.

Interested in the role of neuroscience in player development? How about the success of the sports media in communicating analytical precepts? The SABR conference has you covered. But, of course, it was an analytics conference first and foremost, and the meat of the event was in its statistical presentations.

There were a lot of good ones, but the two that stuck with me were related to a topic near and dear to my heart: defense. My takeaways, in a nutshell: We've never had better tools with which to evaluate fielders. And we've still got a long way to go to making use of them.

This is an issue in every sport, and always has been. We've developed tried-and-true methods for tracking how teams score. But we've not been nearly as successful at tracking how they keep opponents from scoring. And when you think about it, the latter is just as important as the former.

Defensive statistics have long been the bugaboo of baseball analysis. The problem has always been that the things that were tracked -- putouts, assists, errors, double plays, etc. -- didn't tell you much about whether a player is actually any good. The same held true at the team level.

Those old measures might have been fine in baseball's early days, when teams averaged more than two errors per game. Now that figure is edging toward a half-error per game. It's such a rare event that almost no one judges a fielder based on errors anymore. That's progress. Unfortunately, there still isn't currently a consensus on how to rate fielders.

I've leaned heavily on defensive runs saved (DRS), a metric developed by Baseball Info Solutions and one that is readily available on baseball-reference.com, fangraphs.com and TruMedia, which we rely on heavily at ESPN. It's a good but imperfect system, a fact that BIS itself underscored with its presentation at the conference.

DRS measures how many plays a fielder makes compared to the average player at his position. Each ball hit into play is tracked and evaluated for its likelihood of being turned into an out. The number of plays a player makes is measured against the number of plays he would be expected to make based on average performance, resulting in a simple plus-minus measurement. Convert that number to runs, and you have DRS.

DRS does have a positive correlation from season to season. In other words, a player's DRS from one season tells you something about what you can expect him to do the next season. Unfortunately, that correlation isn't particularly high and is far less than that of, say, strikeout rate or isolated power. The juice of any metric is in its predictability. DRS has predictive value but not enough to do what you'd really like it to do, which is forecast the pecking order of teams defensively with the same degree of confidence that you might with hitting statistics.

To illustrate, let's look at the DRS career of the game's consensus best player. Straight from the 2018 Bill James Handbook, here are Mike Trout's DRS totals by season:

2012: +19
2013: -11
2014: -12
2015: +5
2016: +6
2017: -6

This is all over the map. We might not dare expect perfection from a defensive metric, but we'd at least like to believe it's capturing the relative defensive value of the game's best player. For his career, Trout is plus-1. So is Trout really about a league-average fielder? His year-to-year numbers range from great to terrible, and I wouldn't be willing to wager much on what that number will look like in 2018.

Let me emphasize this again: DRS is a fine metric. So, too, is UZR (ultimate zone rating), the play-by-play-based system developed by analyst Mitchel Lichtman, which is also available at FanGraphs. (UZR has Trout 6.5 runs above average for his career.) These tools are superior to anything that came before them. But they don't get us to where we want to be.

BIS introduced its enhanced DRS system at the conference, which they are calling "PART." That's an acronym: (P)ositioning, (A)irballs, (R)ange, (T)hrowing. This new system is designed to break off each discrete defensive skill into its own bucket, then combine them back together at the end for a new version of DRS. It will utilize Statcast data for player positioning and do a much better job of evaluating the range of players in shift situations.

Sounds great, right? Well, I have some bad news, too. This new system is being marketed to teams but won't, at first, be available publicly. On our side of things, we'll get the same version of DRS that we've had the past few years. That in itself underscores a problem in covering baseball these days from an analytical perspective. That is, the best stuff is behind the curtain. Teams put their quantitative hives on new data sets to develop any sort of proprietary edge they can find. You can't blame them for that, but it's a tease.

That doesn't mean that the rest of us won't get some new toys this season. At the SABR conference, the brilliant folks from the Statcast wing at MLB.com put on an expanded version of a presentation that they gave us at ESPN headquarters last month. There is a lot of exciting stuff going on with all of that data captured by the motion-tracking cameras in every park. For me, the best of them are the new defensive tools.

First off, they've fixed the problem that Statcast had with "wall balls" -- balls that on the charts looked routine because the system didn't recognize that the fielder had a wall to contend with. That's been rectified, and it should make for a more accurate set of catch probability statistics going forward.

Even better, the Statcast crew is close to unveiling its system for measuring performance on balls hit on the ground. In other words, this season we should have data on infield play that is as compelling as the data that Statcast has been generating for outfield play. Plus, Statcast is unveiling new tools for looking at catchers -- pop times, throw times, etc. We're getting very close to having a complete data set of how fielders perform on the field of play based on the careful tracking of every move they make.

As that "wall ball" dilemma showed, there are always unforeseen nuances that must be addressed later. One thing a questioner at the conference brought up was the "Manny" effect. How much does having a player with super range like Manny Machado impact the performance and positioning of the players that play next to him? Right now, we don't know, but we suspect that it can't hurt.

These new metrics continue their slow crawl toward maturity. The introduction of Statcast positioning data into the BIS system should be a boon and, hopefully, we'll eventually get to see if it results in better year-to-year correlation of the DRS metric.

As for Statcast itself, we have to remember how new these tracking data are. We don't have enough year-to-year data to know exactly what to do with all of it. What are the run values involved? How strongly does catch probability correlate from year to year? How does a shortstop's defensive aging curve compare to that of, say, a center fielder? At a higher level, just how volatile is defensive performance as compared to hitting or pitching? My intuition is that it's less volatile, though right now there is no way I can reliably prove that to be true.

When defensive statistics reach their full potential -- whatever form that may take -- it could have a tremendous effect on how we view the role of fielding in baseball. Maybe it's more important than we ever thought. Or maybe the impact is marginal, as a certain level of acuity has to be achieved for a player to reach the big leagues in the first place. And when we know our current measures are in perfect working order, we can then go back and hone our old measures and answer questions that have bugged us for decades.

We'll know that defensive metrics have reached maturation when they've achieved certain benchmarks of stability. Predictability is one -- when we have a handle of year-to-year correlations and aging curves, and confidence in our actual measures of runs saved and runs cost, we'll be able to do a much better job of forecasting the pecking order of teams from a defensive perspective. Another way we'll be able to tell that defensive metrics have matured is when the various systems start to agree much more frequently than they do now.

When that happens, perhaps we'll finally know whether or not Mike Trout is a good, average or bad fielder.