Was Julio Teheran any good this year? WAR says yes, no ... and maybe

There are three ways to calculate a player's WAR. Sometimes, they match up relatively well. Other times, well ... you have the curious case of the Atlanta Braves'6-year-old righty. AP Photo/John Bazemore

In June 2016, the Atlanta Braves' front office was reportedly in the middle of a debate about Julio Teheran. Was he, like the departed young stars Andrelton Simmons, Justin Upton, Craig Kimbrel, Evan Gattis, Jason Heyward and Shelby Miller, most valuable to their tear-down-and-rebuild project as trade bait? Or was he, like Freddie Freeman, the young star to keep, most valuable as a piece to build around?

The Braves chose "piece to build around." According to Mark Bradley, the well-sourced Atlanta Journal-Constitution columnist, "Had [then-Braves GM John] Coppolella dangled Teheran last summer, the asking price would have been Mookie Betts, who finished second in the American League MVP voting, and Xander Bogaerts, who's an All-Star shortstop. That's how overheated the market was." He wouldn't have gotten Betts and Bogaerts, obviously, but that's where the Braves' heads were. Coppolella had once declared he'd trade his right arm before Freeman. In June 2016, after Teheran threw a one-hit shutout against the Mets, Coppolella tweeted Teheran was "almost into 'right-arm' type status for us now."

Teheran would make the All-Star Game a few weeks later. He finished the 2016 season as one of the National League's best young pitchers -- there was no ambiguity about this. According to three leading models for measuring player value, Teheran had been well above average:

  • Baseball-Reference: 4.8 wins above replacement (bWAR)

  • FanGraphs: 3.2 wins above replacement (fWAR)

  • Baseball Prospectus: 3.8 wins above replacement (WARP)

This is where our story begins. It is the story of the Braves' playoff rotation in 2019, and whether Teheran will be in it. It's about three systems smarter than we are coming to wildly different conclusions about the question the Braves tried to answer. It's about these two home runs:

And it's about our increasingly complicated answers to a deceptively simple baseball question: Was that guy good?

'What Happened' WAR

At Baseball-Reference, Teheran was much worse in 2017. He allowed heaps more runs than he had in 2016. It's more complicated than that -- a ton of work has gone into the calibration -- but at a basic level this is what we're talking about. By bWAR, based on runs allowed adjusted for things like ballpark and the quality of his defense, Teheran was worth 1.6 wins in 2017, close to league average.

This is the type of story we typically tell: A protagonist, facing a series of obstacles, ends up with a happy or a sad ending. As the protagonist, he has the most direct influence on this outcome, but there are any number of external factors -- bad guys, gusty winds, deus ex machinas -- that also contribute. As viewers, we notice the influence of these external factors, but mostly we just watch to see whether the good guy wins or the good guy loses. We watch to find out which kind of story we're watching.

Maybe Teheran got unlucky. Doesn't really matter. His story last year was a sad one.

'What Should Have Happened' WAR

At FanGraphs, Teheran was much worse in 2017, worse even than he was at Baseball-Reference. His strikeout rate went down, his walk rate went up, and he allowed way more home runs. It's more complicated than that, but at a basic level it's not much more complicated than that. By fWAR, which is based on a stat (FIP) calculated with those three factors alone, Teheran was worth 1.1 wins. He pitched considerably worse than a league-average starter.

This is less a story than a diagnosis, and pitchers who had "bad" years (such as Jeff Samardzija) might end up with good WARs, or (as with Ervin Santana), vice versa. Putting aside the external factors that obstructed or aided him, how did the protagonist perform? The only things that concern FanGraphs' WAR are those things the pitcher has the most control over, because it wants to know not how things went but how good the pitcher was. Whether every broken-bat blooper falls in for a single, or every scalded line drive finds the third baseman's glove, is irrelevant to this calculation.

Teheran's season wasn't bad because he allowed a lot of runs, but because he pitched like a person who should have allowed a lot of runs. He was bad.

'What Should Have Should Have Happened' WAR

Now it gets complicated, because at Baseball Prospectus Teheran's WARP was 3.8, identical to his 3.8 WARP in 2016. He ranked 24th in baseball, ahead of Alex Wood, James Paxton and Robbie Ray. We've found a story that says Teheran was actually good.

Baseball Prospectus bases its WARP on a stat called Deserved Run Average, which uses a method called mixed modeling that accounts for both fixed and random effects. As an untrained mathematician, I've never totally understood what mixed modeling is, which is part of the point. As DRA's inventor, Jonathan Judge, put it, "The last generation of sabermetric analysis, to its credit, managed to wring pretty much everything there was to be found inside plain algebra and basic linear regression. If we want further accuracy, that is going to require more complexity." Mixed modeling aims to unwind that complexity. (Disclosure: I edited Baseball Prospectus when Judge was creating DRA, though I had nothing to do with its development and still say things like "I've never understood what mixed modeling is.")

The bottom line is that this is a much more complicated story. It assumes those things we think are out of a pitcher's control -- like balls in play -- are at least partly under his control. And it assumes those things we think are under his control -- strikeouts, walks and home runs -- are partly out of his control.

Which takes us to a third level of storytelling, observing not just what happened or what should have happened but what should have should have happened.

In WARP's telling, Teheran walked more batters than he did in 2016, but he pitched like somebody who should have walked fewer than he did. He allowed far more home runs than he did in 2016, but he pitched like somebody who should have allowed fewer. Specifically, given his pitch types and pitch locations, he should have beat batters who actually beat him.

Consider two sprinters, Sprinter A and Sprinter B. Both run 100 meters in 11 seconds. Which is better? Neither is better, obviously. They're equally fast.

But Sprinter A is in a race against Usain Bolt. Sprinter B is in a race against Bengie Molina. One of our original two runners wins his race while the other loses.

So now which of our two runners is faster? The answer is, of course, they're still equally fast. What if Bolt trips getting out of the blocks? What if Molina has the race of his life and somehow finishes in 10.99 seconds? It still changes nothing about how fast Sprinters A and B are, or were.

In baseball, though, we can't measure how good each player is. We can only measure how much better or worse they were than their opponents. Baseball isn't a race that players run alongside each other, but in direct and constant entanglement with each other. In baseball, using baseball stats, we would probably conclude that Sprinter B was faster than Sprinter A, except when Bolt trips getting out of the blocks, in which case Sprinter A would actually be faster than Sprinter B, who wasn't even running in that race.

We can adjust our assessments of a pitcher based on the general quality of hitter he faces, but the quality of a hitter in the moment he swings is unknowable. Mike Trout makes mistakes, and in those moments he's terrible. Rougned Odor is capable of extraordinary moments, and in those moments he can hit any pitch 450 feet. What if one pitcher keeps getting Trout on his mistake days and Odor on his extraordinary days? We'd have no way of knowing, except to look at the results.

It's probably much harder to measure a pitcher than we typically acknowledge. We make adjustments for the luck and context we can see -- the hitter-friendly ballpark or the terrible defense behind him or even the terrible umpire -- but we can't really adjust for the luck or context we can't see, which is, simply, Trout missing a hittable pitch.

That's part of what DRA looks at. It concluded that, based on the type and locations of the pitches he threw, Teheran should have had better outcomes, and those outcomes should have led to fewer runs.

Here are some basic things we can say about Teheran:

1. He struck out fewer batters. He walked more. When he threw pitches out of the strike zone, batters were less likely to chase. When they did swing, they were more likely to make contact. When they made contact, it was more likely to be a hit. When it was a hit, it was more likely to be a home run. All bad.

2. He also threw almost exactly as many strikes as he did in 2016. When batters put the ball in play, they hit almost exactly the same proportion of ground balls, line drives and fly balls that they did in 2016. His effective fastball velocity was unchanged from 2016, according to Statcast. In fundamental ways, he strongly resembled the 2016 version of himself.

3. According to Statcast, batters hit the ball less hard against him in 2017. According to Statcast's xwOBA -- which estimates the value of batted balls based on exit velocity and launch angle -- batters who put the ball in play against Teheran should have produced less offense against him in 2017 than they did in 2016. At the Home Run Tracker, he led the league in "just enough" home runs, where just a few feet might have turned a fly out into another bad start.

Further, Teheran grooved fewer pitches in the middle of the strike zone than he did in 2016, according to Brooks Baseball. His command was, evidence suggests, much better than it was in 2016: Baseball Prospectus measures a pitcher's command based on how many strike calls he gets on the edge of the strike zone, and Teheran ranked fourth out of all pitchers who threw at least 100 innings. In 2016, he ranked 93rd, out of 144 pitchers.

Here's one story we can tell about Teheran: With the baseball livelier than ever in 2017, he tried to make adjustments. He tried to pitch to weak contact, working on the edges of the zones -- which led to more walks -- and throwing early count strikes, which led to fewer strikeouts. In a sense, it worked: His command was generally excellent and he got weaker contact. In a sense it didn't, because batters laid off those pitches on the edge, barreled a few extra pitches against him, and perhaps got luckier on balls in play against him.

Here's another: He threw a lot of good pitches, but his slider didn't move as much as it has in the past, so he couldn't get swinging third strikes with it. When he made mistakes in this unforgiving offensive era, they got punished.

In May, Teheran faced the Blue Jays. Luke Maile, who was probably the worst hitter in baseball last year (min. 100 PA), was batting. Teheran threw him a slider at the bottom of the strike zone, and Maile hit it a few feet over the wall. Then opposing pitcher Marcus Stroman batted, and on 0-2 Teheran threw him a fastball up and away -- well out of the strike zone. Stroman hit it the other way, another home run.

Did Teheran get beat on those two swings because his pitches were slightly more hittable this year? Because he made bad decisions and threw the two pitches these bad hitters could handle? Because he's uniquely vulnerable to those few extra feet the lively ball is giving hitters? Because he got unlucky to face Maile and Stroman at just these moments? Because they both closed their eyes and happened to connect?

There are those who complain there are multiple WAR models telling us different things about players. Stats are supposed to resolve uncertainty, we figure, not exacerbate it. But these are complicated questions. The worst thing a stat could do it mislead us about how simple baseball is, or about how much we know. It's not simple. We don't know all that much.

Teheran remains in Atlanta. The Braves are, according to reports, expected to be "collecting offers" for him this winter. I have no idea what they should do.