Ask any sharp DFS player how the season has been so far, and they'll likely tell you April was awful and May was fantastic. Ask them why, and they may tell you it was just natural variance.
Or, if you're lucky, they may tell you one of the best kept secrets in fantasy baseball: the edge we have over our opponents drastically changes based how much of the season has been played.
Whether we're talking DFS or season-long, it's my theory that May and June are the absolute best months for data-driven edge.
Where Does the Edge Come From?
Before we get too far down this rabbit hole, let's take a quick step back. I run one of the premier publicly-available projections systems: THE BAT. It was found to be the most accurate season-long system last year. It works for DFS. It works for sports betting. Users know that a good projection system will continually fold in the newest data and adjust its underlying player talent estimates. But one question I get a lot is "Does THE BAT get more accurate as the year goes along and there's more data to work with?"
In an absolute sense, the answer is "not really." Yes, in April the most recent data we have is from September, while in-season our most recent data is from, well, yesterday. That matters a little, but it's something everyone is on an even playing field with. No edge lost, no edge gained. In a relative sense, however, a system like THE BAT does probably get quite a bit more accurate once a decent chunk of the season has been played.
You see, in April, everyone is more-or-less using the same information. Nobody is looking at Manny Machado's stat line after the first week of April and saying, "Welp, Machado is crap now, I better trade him for Tim Anderson." Early in the year, everyone is more or less using previous years' data exclusively. Everyone realizes that random things happen in small samples and that overreacting can mean fantasy death. But what actually constitutes a small sample is often something people let their gut decide instead of their brain. Instead of trusting the math that, over decades of baseball, proves to us that underperformers through the first few weeks will get better and overperformers worse, many fantasy players still try to place more meaning into what has happened thus far than it deserves. Their brain says, "Yup, this looks big enough." But it's not.
People don't like to hear it, but at the beginning of June, we're still looking at small sample sizes! And we sure as heck shouldn't be ignoring all the data we have from previous season. I've gotten countless, "Is Josh Bell elite now?" or "Should I drop Joey Votto for Nicky Lopez?" type questions over the past couple weeks. Josh Bell has gotten very good, and Joey Votto has gotten worse. But the answer is NO WAY.
We see it every year. Stud A gets off to a slow start, and by the end of the season we've forgotten all about it. Flash In the Pan B starts off hot, but come next March we're reluctant to draft him in even the 15th round. By placing too much emphasis on the current season numbers, we miss on out Matt Carpenter's outstanding 2018 season because he started slow. We bite on 9 homers from Matt Davidson through April and end up with just 11 more along with a .222 average the rest of the way.
But Statcast Tells Me He's Elite Now!!
We now live in a post-Statcast world where we have all kinds of fancy new data at our disposal. We have launch angles and exit velocity and sprint speed and spin rate. And it's all very exciting. For most, it's actually too exciting. There's a subset of fantasy players (and even analysts) that have tried to change the conversation, to create this new dynamic of "stats" vs. "skills". This could not be any more misguided, though. This implies a blatantly false level of certainty in these new Statcast-type metrics as true measures of a player's talent. The problem here is two-fold.
1) These stats aren't actually measuring some highly stable underlying talent. They still have plenty of noise in them-much more noise than people want to believe. Like results-based stats, they are still merely proxies for true underlying talent.
This table shows how quickly various hitter stats stabilize, represented as percentage of a 650 plate appearance season, so they are all on the same scale:
Yes, our shiny new Statcast toys stabilize quickly, but they still don't stabilize as quickly as plain old K%. Launch Angle stabilizes just as quickly as GB% and not much slower than FB%. Exit velocity stabilizes quickly, but not all that much quicker than basic HR%. These stats still have noise in them, the same as any of the ones that have been around forever.
This may make it even easier to visualize. Back in the off-season, I posted a mini-Twitter thread about why these stats don't represent some locked-in new talent level. This tweet sums it up best.
From 2007-2018, 289 players increased their Hard% from one year to the next by 5%+. The next year, Hard% dropped. Regression impacts EVERYTHING, even "skill" stats.
Year 1: 27%
Year 2: 35%
Year 3: 32%
They keep more of their gains than they lose, but they don't keep it all!
- Derek Carty (@DerekCarty) March 21, 2019
Hitters keep a bit more than half of their year-to-year gains the following season.
Eno Sarris found something very similar when it came to launch angle. Hitters keep about half of their in-season gains the following year.
Yes, launch angle stabilizes quickly. But launch angle is also basically just batted ball data by a different name. Plus, it can be misleading because it also captures line drives, which we've long known to be noisy. We've had batted data for nearly 20 years, and in that time, have we once ever looked in the mirror in May said to ourselves, "Howie Kendrick has really improved his flyball rate this year, he's a great power hitter now!" Of course not, that would be ridiculous. However, people do that all the time now with launch angle. But this new data is telling us virtually the same thing the old data has!
Yes, players are more aware of the importance of home runs and of altering their swing paths to maximize performance, so maybe it matters a little more now. And yes, improvements in 'stable' stats will always matter and should factor into our talent estimates. But regression is always still likely, as my tweet (and Eno's article) shows. Not to mention, that tweet looks at guys who have improved over the course of a full season. We're only in June right now! If you're using these types of metrics to validate a breakout and are buying in at anything close to year-to-date value, you're just doing it wrong. A good projection will account for these things. It will become more optimistic about these players. But it will do it without going overboard-giving you an advantage on your competitors who can't reign in their enthusiasm.
2) Even if a player has a major improvement or decline in his Statcast numbers, his previous numbers still matter. And after just two months of baseball, they still matter a lot. Some people are smarter and, instead of just guessing when numbers become significant, they actually look at these data-driven stabilization rates. This is important, but people often misinterpret what these numbers actually mean. They take it a step to far and assume that after 20% of the season for exit velocity, nothing before it matters. That's just wrong. A stat's stabilization point isn't some magic number where anything before it is useless and once we hit the number now it's infallible. They become even more stable the more data you use.
I'm not going to run you through the full math, but accounting for multiple years of data is almost always more accurate than using one year. The purpose of these stabilization numbers is to tell us the likelihood that what a player has done is random in a given sample; they were never meant as a license to focus on only the past X PA and ignore all historical data. That message, however, has been garbled and misinterpreted in the public lexicon. Hey, more free edge.
Taking Advantage of Our Opponents' Mistakes
So how do we capitalize on this? In DFS, we just make the right, math-backed plays while our opponents chase their tails. They'll make weaker plays and score fewer points as a result. In season-long, we identify which players may be over or undervalued based on their current season numbers and which of our leaguemates are most susceptible to these traps. Then we trade accordingly.
Anytime a player with pedigree starts doing well, people rush to buy in. But there's lots of reason for pessimism with Fried despite a 3.19 ERA, which is pretty well "supported" by his peripherals (including a 3.51 xFIP). Fried has always had control issues up until this year, usually walking 10%+ of batters he faces. Through the middle of May this season, it was under 5%. Sure, sometimes things just "click" for a young player, they make a mechanical adjustment, and the control gets better. But more often than not, over 7 or 8 starts, it's just natural variance and/or favorable context. In his past three starts, regression as already started to hit, as his BB% has been nearly 10% in those starts. Given that he's always given up a decent amount of home runs considering he's a groundball pitcher, and that he strikeouts batters at a below-average rate (allowing extra contact in general), he could find himself in trouble once he starts putting more men on base. Moreover, he pitches for Atlanta, which is the hottest environment in baseball outside of Arlington, and we're heading into the summer months. THE BAT projects a 4.50 ERA the rest of the way. Sell high while you can.
Josh Bell, 1B, Pittsburgh Pirates
While there are real reasons for concern with Fried, Bell is a guy who is legitimately great-it's just a matter of how great your league treats him. DFS sites have his salary sky-high these days, so they certainly treat him as elite. THE BAT was actually very high on Bell coming into the year, viewing him as one of the top first base values in drafts. And, of course, he's gotten even better. He's pulling his flyballs way more (shortening the effective distance to the fences), he's hitting more flyballs in general, and he's hitting them harder-all while maintaining pretty good plate discipline. THE BAT views Bell as a borderline top-30 real-life hitter, which is highly impressive for a guy who was merely above-average for coming into the season. But when people are asking me "is Josh Bell elite" or "should I trade Established Ace X" for him, that's where you have to take a step back and realize that we're just 1/3 of the way through the season. Is he Freddie Freeman or Anthony Rizzo yet? Probably not.
Justin Bour, 1B, Los Angeles Angels
THE BAT thought the Angels offense was sneaky-great coming into this season, and it's been right to a large degree-they rank 8th in wRC+, despite being without Shohei Ohtani for most and Justin Upton for all of the season to this point. It's whiffed hard on Justin Bour so far, though, who was so bad he got demoted. No surprise, he's mashed Triple-A since then with a .434 wOBA and 4 HR in 10 games.
THE BAT is still quite high on him given that we're dealing with barely a 100 PA sample with Los Angeles. During that sample, his strikeout rate was up and his hard contact was down a little, but his flyball rate was also a career best, he continued to draw walks, and his .190 BABIP was insanely unlucky. And he still has the extremely favorable park shift working for him, after playing years in Marlins Park (the 4th-worst park for LHB HR). Angel Stadium, meanwhile, has become great for power since lowering the fences prior to 2018. It now ranks 7th-best for LHB HR, between Coors Field and the Rogers Centre. Bour was super cheap in DFS before his demotion and is probably free available in most season-long leagues. When/if he gets recalled, I'll be all over him once again.
Joey Votto, 1B, Cincinnati Reds
Even when the supporting data doesn't back up a rebound the way it does for Bour to a large extent, that doesn't mean a rebound is unlikely. Because there is noise even in the most "stable" data, we know that it can't be treated as gospel. For a player who has been elite for years, a couple months of bad numbers doesn't mean he himself is now bad. Sure, even his peripherals are bad this year. He's worse than we thought he was, and a good projection will reflect that, but the math still says he should bounce back to a reasonable level. He still has value. In a standard 12-team mixed league, he still projects as a $6 player, or the 12th best 1B. That's well below what he was drafted as, but if he's been dropped, pick him up, even if it's just to put him on your bench and wait it out.
Matt Boyd, SP, Detroit Tigers
While most breakouts are, by definition, overperforming, we can buy heavier into the ones that we saw coming. THE BAT projected Boyd for career-best numbers across the board, and context had a lot to do with it. The Tigers historically have had terrible pitch framers and have above-average ones this year. The AL Central has generally been a tough, strikeout-resistant division, but this year it's weak and strikeout-prone. And then Boyd has gotten better on top of that. He's all but scraped his sinker. He's throwing his slider more, and it's sharper with more dive. The strikeouts are way up. THE BAT projects Boyd as a top-20 fantasy starter at this point, just behind guys like Madison Bumgarner and Aaron Nola and ahead of Zach Wheeler and Patrick Corbin.
Derek Dietrich, IF, Cincinnati Reds
Dietrich has caught fire this month, and THE BAT is biting hard. Like Bour, he's gone from Marlins Park into an elite HR park in Great American. But he's also reduced his strikeouts, increased his walks, seen a massive rise in his flyball rate/launch angle, is hitting the ball hard, is pulling his flies way more often, is at the top of the barrels leaderboards... basically anything you can think of, he's doing better, and probably by a lot. He has an elite .434 wOBA despite an absurdly low .197 BABIP. That's practically unheard of. He projects nearly as well as Bell, but it seems like people haven't caught on yet. For DFS purposes, the Reds have pinch-hit for him 40% of the time this year, which is extremely high, but that number has come down lately. If he keeps hitting like this, it wouldn't be surprising for them to stop doing it altogether, which will make him very appealing, especially as the weather heats up in Cincinnati.
The key takeaway here is that this new data matters, especially when everything has gotten better. But it often doesn't matter as much as people want it to, and it certainly can't be taken at face value.
If you're curious about other players, you can find THE BAT's season-long projections for every single player for free over at FanGraphs. And you can see THE BAT's matchup-specific projections each day over at RotoGrinders, and a DFS Cheat Sheet driven by it every day here at ESPN. You'll find that most breakouts are expected to regress and most laggards are expected to get better. That's what has happened throughout the history of baseball, and that's what the math suggests will happen again.
Go Out There and Crush It!
Now that we're into May, sample sizes on players start to look "big enough" to the human brain, and people start relying too heavily on this year's data. That data still has a lot of noise in it-much more than people want to believe, and they make worse plays because of it. Don't be one of these people. Take advantage of them!