Do not fight the regression monster. That, my friends, is the rule to live by when thinking about the upcoming trade deadline.
You hate to begin a baseball column with a statistical precept like that, even when it's a simple concept such as regression. But while the concept is perhaps simple, it is also powerful and, still, often misunderstood.
Fifteen years ago, if you had tried to sneak the term "regression" into a baseball column at, say, a major newspaper, your editor either would change it or ask you to insert a wonky definition that would have disrupted the silky rhythm of your precious prose. Now we more or less use it freely. Not just writers and analysts, but managers, front-office honchos and even the occasional player.
In a nutshell: Regression, as it is usually referred to in baseball statistics, is the tendency of a subset of numbers to move toward the level of the overall set. Every player has a baseline level of performance. We typically look for that in his career statistics or perhaps by focusing on several recent seasons. While the player's pattern of performance might be inconsistent -- it can jump around from week to week, month to month or even season to season -- for the most part, we can expect his future performance to move toward those established percentages. This is what we refer to as regression to the mean, in a baseball context.
The part that I think is often misunderstood: Regression is too often used as a synonym for "decline." In other words, to say a player is going to regress is to say he's going to get worse. It's a quirk of language because, indeed, that is one definition of the word in the dictionary, probably because the term was misused so often over the years that the mistake became accepted usage. But it's not precisely what we mean when we talk about a baseball player regressing. We mean he's going to find his level, drawn to it as if it were a magnet.
Last week, Cubs president Theo Epstein held sway in the home dugout at Wrigley Field to discuss personnel matters with the gathered press. The conversation turned toward the upcoming deadline. Epstein downplayed the perception that the Cubs would be beating the bushes for a starting pitcher, saying that the club felt like there was more "positive regression" on the roster than negative, particularly when it comes to his starting rotation. Is that true? We'll deal with that in the next section. For now, let's look at the broader issue.
When Epstein or any other baseball executive drops terms like "positive regression," it almost sounds like subterfuge. Epstein isn't about to get specific on the Cubs' deadline plans, so he uses these esoteric terms to shield his office from inaction-related backlash, if in fact that's how Chicago's deadline strategy plays out. But it's not subterfuge at all.
What Epstein is expressing is one of the most important analytical concepts in sports, one that someone in his position has to comprehend at a near-instinctual level if he is going to understand his own team, not to mention the players he's looking at for a potential acquisition.
The principle that execs must adhere to is this: When you are acquiring a player at or near the trade deadline, you are acquiring his baseline performance, not his season-to-date performance. To make a decision on a potential deal based on the current season's numbers is borderline malpractice.
This won't be news to any general manager (or president of baseball operations, whatever the title might be) in a decision-making role in 2018. It is, I think, something that is still misunderstood by fans and also some writers who cover the game. At the very least, these groups tend to give regression less attention than it demands.
To illustrate this concept, I went back and created a spreadsheet of every veteran player traded during the month of July during the past five years, unless it was obvious that the player was not acquired with the intent of aiding a team's postseason push. This gave me 180 players, though I threw out 24 because of sample-size issues either before or after the trade.
Using Baseball-Reference.com, for each player I listed three numbers: his three-year performance by OPS+ (for hitters) or ERA+ (for pitchers) for the three seasons before he was traded. Then I listed those same metrics for the season he was moved both before and after the trade. The idea was that the post-trade numbers would align more closely with the three-year baselines than the pre-trade numbers. That's the theory I'm testing, anyway.
From there, I created a won-loss scorecard. At stake in each move is a bet against regression. An acquisition was assigned a W if the player was due for positive regression after a trade and achieved it or if he was due for a negative regression but avoided it.
It's important to remember that here I'm just measuring players against the expectation of regression. This is not a judgment of a trade as a whole. A player might have negative regression after a trade and still be more than good enough to help the team that acquired him. We'll get into that.
Here's our first result: Execs' overall record versus regression over the past five years was 68-84. So when a team has bet on a player either turning around a below-expectation performance or overcoming the expectation of decline, 68 times that team bet correctly -- a 44 percent success rate. This is just a jumping-off point, but it shows how tough it is to navigate around regression.
Over the past five years, the best teams at making these reads have been the Pirates (6-1), Dodgers (9-5) and Yankees (8-4). The worst have been the Blue Jays (3-8), Orioles (2-7) and Mets (1-6). Epstein's Cubs have gone 3-5.
Now, let's subdivide our results into the two aforementioned groups. The first group is betting on positive regression. We'll call them the Positives. In this group, the execs acquired a player whose season-to-date numbers were below those of his three-year performance. The bet being placed is that the player will return to or exceed his established level of play.
The majority of the time, the execs guessed right on the Positives. Their record was 47-28, for a 62.7 percent success rate. In other words, nearly two-thirds of the time execs acquired an underperforming player with the expectation that he would get better and were proved right. That seems like a pretty good rate.
The other group will be the Negatives. These are the players whose pre-trade performance outstripped the expectation established by three-year performance. The teams dealing these players were selling high; the teams acquiring them were buying low, perhaps making a bet that the pre-trade performance was real. Maybe it was something scouts saw; maybe it was just a hunch.
As you probably guessed, most of the time the execs were wrong about the Negatives. You might be surprised just how often they were wrong: a woeful 19-58 mark, for a 24.7 percent success rate. Not good.
As mentioned, that doesn't mean that every player who qualifies as a Negative was a poor acquisition. A perfect example was Andrew Miller, acquired by Cleveland in a July 2016 trade with the Yankees. In the three years prior to that season, Miller posted a fine 189 ERA+. With New York in 2016, he was at an astronomical 311. Given that level of performance, he would have to come down, right? He did -- all the way to 294. By this methodology, that counts as a loss, but Miller damn near carried the Indians to the World Series title.
Those cases don't change the overall conclusions here. In fact, if you remove them, the point is only sharpened. I removed any loss by a Negative if his post-trade performance dropped below that of his season-to-date level, but not below his three-year numbers. There were 20 of those guys. That still leaves teams acquiring Negatives as losers in two out of every three cases; the mark is 19-38.
Want to hone in on role? Well, the mark for hitters who were Negatives was 14-39; for pitchers, it was 5-19. For Positives, it was 32-17 for hitters and 15-11 for pitchers. It's the same phenomenon for either group. And hitters in the positive category were better in post-trade raw performance; their aggregate OPS+ was 4.1 points superior to that of the Negatives.
What about level of play? Well, there were 34 traded players who were above league average in their three-year metrics but below average season-to-date before being traded. Twenty-three of those players (67.6 percent) bounced back above average. The track records won out.
The flip side of that is startling. There were 24 cases of a team acquiring a player with an OPS+ or an ERA+ that was below average for the three years prior to the trade but was above average in the season the move was made. Mediocre players off to hot starts. Twenty of those players -- 82.3 percent -- dropped back below average after being dealt. Eighty-two percent!
Those 24 cases are our bad bets. The execs were fooled. Examples include Scott Feldman going to the Orioles in 2013, John Lackey to the Cardinals in 2014, Mike Leake to the Giants in 2015 and Trevor Cahill to the Royals last season. But there are always exceptions -- reasons why GMs will continue to mine for all possibilities.
In this unfortunate group, we have Rich Hill going to the Dodgers in 2016. He entered that season with a three-year ERA+ of 97. Then he went out and posted a 182 for the Athletics. Once Hill hit L.A., he kept on going, putting up a 221 mark as a Dodger. Sometimes players just get healthy. Sometimes, they just get better. This is why teams employ scouts -- to find the exceptions.
The exceptions don't change the rule: Track record trumps season-to-date performance. Any team seeking to buck this truism -- and it does happen -- is relying heavily on the wisdom of its scouts. This might well be the way to go if there is a physical issue involved. Sometimes, the payoff for ignoring this pattern is great -- think J.D. Martinez going to Arizona last season. Sometimes, it is not. But the rule remains the same: Track record > season-to-date.
As a result of all of this, any ranking of trade-deadline candidates will be much more accurate if it's based on preseason forecasts rather than season-to-date results, with the obvious caveat that player health can play a disruptive role in all of this -- as can age, as can tangible adjustments a player has made to his game.
With that in mind, here are a few trade candidates whose season-to-date performances cry out for significant negative regression: Jacob deGrom, Mets; J.T. Realmuto, Marlins; Jed Lowrie, Athletics; J.A. Happ, Blue Jays; Scooter Gennett, Reds; Shin-Soo Choo, Rangers; Francisco Cervelli, Pirates; Eduardo Escobar, Twins; and Corey Dickerson, Pirates.
It's not that any of these players are unworthy of targeting. But to know what you'd be getting, you had better look toward past seasons and not fixate on this year's success. Structure your offers accordingly.
On the flip side, here are some other trade candidates who are better than what they've shown so far in 2018. A team acquiring them might not only uncover a bargain, but a key piece for the postseason races ahead: Noah Syndergaard, Mets; Chris Archer, Rays; Josh Donaldson, Blue Jays; Brian Dozier, Twins; Yoenis Cespedes, Mets; Jose Abreu, White Sox; Adrian Beltre, Rangers; Todd Frazier, Mets; Adam Duvall, Reds; Cole Hamels, Rangers; and Adam Jones, Orioles.
What the numbers say
The ups and downs of regression
Let's get back to Epstein's statement about his team's chances for positive regression. In a sense, that already started to manifest for the Cubs before we hit the All-Star break. Chicago lost the day he made the statement, but it won six of eight to finish the first half and entered the break with the best record in the National League.
However, even with that stretch in the books, the Cubs still are due for more positive regression than negative regression. Epstein was right. To measure this, I used the depth charts from my system and looked at the difference between each player's WAR pace at the All-Star break and what he was projected to have entering the season. Using the depth charts allows me to adjust these measurements for playing time.
Adding up all of these differences tells us how regression can be expected to help, or hurt, each team down the stretch. The teams with the highest levels of expected positive regression tend to be the ones at the bottom, so we'll cut them out for now. By being liberal with my definition of "contender," that left me with 18 teams. Here are the five contenders with the highest expected WAR gains from regression for the rest of the season:
CONTENDERS WITH NET POSITIVE EXPECTED WAR REGRESSION
Minnesota Twins 2.84
Los Angeles Angels 2.52
Colorado Rockies 2.27
St. Louis Cardinals 1.88
Chicago Cubs 1.87
The Twins and Angels are fringe contenders, at best, so you can pretty much ignore them. As for Chicago, almost by definition, if a team has a league's best record this far into the season, you can expect that at the aggregate level, it has likely played according to expectation. Some negative regression would only be natural. The Cubs not only have the NL's best record, but they can be expected to improve without Epstein lifting a finger. That's a pretty sweet situation for the Northsiders to be in.
Of course, there is a flip side:
CONTENDERS WITH NET NEGATIVE EXPECTED WAR REGRESSION
Atlanta Braves -6.41
New York Yankees -2.60
Boston Red Sox -2.22
Seattle Mariners -2.00
Milwaukee Brewers -1.66
The Braves lap the field and more in this measurement. To a certain extent, this is likely misleading. Atlanta has so many young players in key roles, and those young players, whose preseason forecasts carry less certainty, might simply be better than they were projected to be. Still, that's a lot of negative regression to work around. Atlanta general manager Alex Anthopoulos has been careful in his public comments lately to emphasize that he is not likely to blow up his talented farm system just to max out this season's playoff run. The situation presented here likely plays a role in that.
The Yankees and Red Sox have outperformed their collective expectations, but they are competing against each other, so regression shouldn't have a big impact on the American League East race. The same holds true for the Mariners and Athletics (not listed here) in their race for the AL's second wild card.
Since you asked
Weighing in from afar
Most of this column was written in the north woods of Michigan, where I escaped for a little midseason break. It's definitely Tigers territory up there, but one day, I was sitting on a dock on Mackinac Island, looking out into Lake Huron. I mapped out the distance from there to Comerica Park in Detroit and Miller Park in Milwaukee, where I am headed after the break. To my surprise, I found the distances were precisely the same: 255 miles. During my stay, I saw nary a piece of Brewers merchandise. Milwaukee has ceded the Upper Peninsula to Detroit.
This did not come up in the comments I read coming out of Washington, D.C., where the industry I toil in had headed for the All-Star Game. In this spot, where I usually run a Q&A, I'm going to shift gears a bit and respond to some of the rhetoric that came out of the sessions commissioner Rob Manfred and players association president Tony Clark had with the gathered writers during the break.
On the DH:
Clark said that the idea of adding a universal designated hitter was gaining momentum among the players. Manfred said that the far more likely outcome was the status quo, which is the approach that I favor. Meanwhile, a couple of days earlier, Astros star Justin Verlander said he would like to see the DH eliminated altogether. He prefers the NL game and thinks that AL teams are put at a disadvantage when their unpracticed pitchers are forced to hit. His point is salient, and if a change is made, I hope it would be to kill the DH altogether.
However, I return to my central point on this topic: It is up to the fans. My sense is that NL fans vehemently do not want to see the DH added to their circuit. AL fans, to a less adamant degree, want to keep it in theirs. That is why the status quo should persist. The players' feeling on this particular topic is, to be frank, far less relevant.
On banning the shift:
The two best offenses in the American League, according to Baseball-Reference.com's OPS+, which is park-adjusted, have been the Astros and Red Sox. In the National League, the race for the top is a tight one between the Braves, Dodgers and Cubs.
The Astros, Red Sox, Braves and Cubs account for four of the six-highest team batting averages in the majors. This, I think, is a good reason to wait and see if current trends are defeated by an emergent preference for well-rounded hitters, those whose balls in play can't be so easily defended. If the best offenses are succeeding this way, others will follow by valuing that trait in the marketplace.
This shift thing is still pretty new. Let's not make it worse by introducing the kind of radical rule change that can carry with it all manner of unintended consequence. Let's see where we stand in a few years.
On free agency:
We'll see how this winter plays out, but it's entirely possible that last winter's stagnant free-agent market was a product of new realities of how teams value players. More than ever, teams will avoid paying for what a player has done and focus on what he's likely to do. If so, Clark would be right to pursue structural changes that allow players to hit free agency at a younger age, when they have more peak performance in front of them.
Beyond that, I'll assume that Clark's word choices were a product of the kind of posturing that comes from being in his position. Here, I'm referring to Clark saying, "What players saw last offseason was that their free-agency rights were under attack, that's what they see."
I don't blame Clark for saying this; it's his job. But let's be clear: Players have a right to a marketplace unimpeded by collusive behavior. They do not have a right to a bad contract.
Coming right up
Dodgers lineup + Machado = Scary good
Manny Machado will make his Dodgers debut Friday night in Milwaukee against the Brewers, one of the reported finalists for his services. Will Machado be booed? Let's hope not; he didn't spurn the Brewers. The Brewers simply didn't ante up enough to acquire him.
On the regression front, the Dodgers can pretty much expect the version of Machado we saw during the first half of the season. He was on a WAR pace (6.1) that was a little ahead of his forecast (5.1), but he had a down season in 2017 that dragged down his projections. He's an impact star, one who at the very least puts L.A. back on even footing with Epstein's Cubs in the race for National League supremacy.
Even more than bottom-line value, I'd argue that Machado not only upgrades an already very good and very deep Dodgers position group, he's the type of impact hitter who can have an exponential effect on the L.A. fortunes.
One thing that has stood out about the inconsistent but prolific Dodgers attack this season has been just how similar so many of their hitters are in terms of approach and style. This isn't a bad thing. Power and patience are the pillars on which the L.A. offense are constructed.
Still, injecting a high-level and highly aggressive masher like Machado could really jolt an L.A offense that sometimes slips into passivity in scoring situations. The Dodgers have walked 10.6 percent of the time in high-leverage spots, the third-highest level in the majors. That's fine, but the approach has yielded the majors' second-lowest high-leverage OPS (.649) and lowest isolated power mark (.108).
Machado hit .377/.514/.585 for the woeful Orioles in high-leverage spots. Sure, there were a lot of walks there, but when facing Baltimore's lineup, why would you pitch to Machado with the game on the line? In L.A., Machado will have lot of chances to do damage with men on base, and it won't be so easy to just pitch around him.