Statistical confidence and certainty is a tricky thing. Even at its most confident and certain, statistical analysis requires assumptions. Sometimes they're small, sometimes they're big. But they're always there, and they're always important. Properly expressing these latent assumptions is simultaneously the most important and most ignored step in communicating statistical analysis. Given an interesting data set, just about any analyst can produce quirky analysis that changes the game -- the talent comes in making sure readers understand your inherent assumptions. When you don’t, things can quickly spiral awry.
Enter the hot-hand theory, a tiny assumption and a crack team of Harvard researchers.
Beckley Mason has already covered their findings admirably, and the main takeaway is some heady stuff: The hot hand might really exist. The original research regarding the hot-hand theory was well-researched and statistically sound, but the findings were predicated on a tiny, footnote-esque assumption in the introduction of the original paper. It’s an assumption that’s a bit more troublesome than it seems at first glance.
"Each player has an ensemble of shots that vary in difficulty [depending, for example, on the distance from the basket and on defensive pressure], and each shot is randomly selected from this ensemble."
-- Gilovich, Vallone, and Tversky, 1985
As it turns out, this assumption of random selection isn’t quite true. If you think hard on all the times you’ve watched a player "get hot," you might realize the inherent flaw here. Think back -- how many times have you watched Stephen Curry make two open 3s before trying a double-teamed 26-footer? How many times have you watched LeBron James make two midrange jumpers before an unnecessary isolation at the top of the key with two men draped on him?
These aren’t the exceptions -- they’re the rule. The researchers sifted through SportVU tracking data from the 2013 season, creating a model for shot difficulty they could use to measure how players react to "hot streaks." The answer is elegant (though perhaps unsurprising): When players heat up, as a rule, they take much worse shots.
Without the information of shot difficulty, the data lends itself to a clear conclusion. "Players shoot as well or worse after hot streaks than they did before." But that data only makes sense without the context of shot difficulty -- when you add measures of difficulty to the equation, the answer changes.
Think of it this way: Players shoot a negligible percentage from the field on full-court shots, no matter how open they are. But what if every single NBA player took a full-court shot after he got hot? Furthermore, what if these "hot" players made 20 percent on heated-up, well-guarded, full-court shots? That would be a massive improvement over the average, even though 20 percent is a terrible shooting percentage overall. But it would still be less than their overall shooting percentage, which means that traditional hot-hand studies would average that out as a proof that the hot hand doesn’t exist. "They don’t shoot any better after they heat up. Ergo, the hot hand doesn’t exist." But it would, in that situation! It would just be obfuscated if you didn’t have the shot-difficulty data to check against.
The Harvard study isn’t quite that stark – players take significantly more difficult shots when they get hot, and they shoot a tiny bit better than they’d be expected to on those more difficult shots. It doesn’t necessarily prove that the hot hand exists, but it provides much-needed evidence to support the theory. It’s a great first step to properly understand the hot hand, and it exposes the inherent assumption in the previous hot-hand research, ensuring that future research doesn’t make the same mistake.
And let’s not overstate it -- it wasn’t a big mistake, at least not on the part of the researchers who wrote the original paper. They presented their findings and properly stated their assumptions. The big mistake lies in the mass proselytization of their hot-hand disproof. Scores of intelligent people shared their findings as gospel without properly understanding the limits of their finding. It’s a mistake that’s far too common among the statistically minded.
There's a certain humility necessary to admit to your readers the flaws and limitations in your own analysis. Without understanding those flaws and limitations, one's statistical certainty within assumed parameters misses the context and misses the big picture. The researchers who construct their own theory tend to have a long view of their work’s limitations and applicability. It’s when their papers and theories become public domain that the context gets lost in translation.
The hot hand is a really fantastic example of this. Here we have a place at which incomplete data led stat-minded folks to an oversold certainty. The original paper stated the assumptions, but most everyone who held up the disproof accepted those assumptions without really questioning them.
There was a great moment when the presenter noted that statisticians and data analysts of all stripes have a bad habit of outright ignoring the consensus view when they dig up interesting statistical evidence that cuts against the grain. Often, that's a place where statistical thinking can unearth really cool findings. It's where statistics shines. But it's not always that way -- sometimes, the statistical counterpoint is just a reflection of a place where the necessary context is rooted in data we can't gather.
Shot-difficulty data simply didn’t exist as a serious discipline before the availability of motion-tracking data that could measure defensive pressure, shot angle and other important metrics for estimating how difficult a shot could be. Now it does, and the assumptions they properly expressed turned out to be bunk. C’est la vie.
Once again, none of this is to say that the original hot-hand papers and work was poorly done. It wasn't. The hot-hand theory has a lot of strong work behind it. A light disproof in the NBA doesn't immediately invalidate any of the research within its stated context. The original analysis itself is still apt -- assuming that players don’t make their shots more difficult as they heat up, the existing data still strongly indicates that the hot hand doesn’t exist.
But we don’t need to rely on incomplete data, at least not for this theory. Not anymore. This is where the NBA’s new data shines. It lets us reevaluate the theories we once thought were gospel. It lets us discover that our once statistically confident assertions might be a bit more complicated than we’ve let on. It lets us unearth brand-new theories that upend conventional wisdom.
But as we sift through this new data, this stirring revision to popular thought should give us all a moment of pause. There are always latent variables in our data sets to which we don’t have access. There is always a risk that our assumptions aren't quite as rock-solid as we thought they were, and there’s always a chance that later data is going to step to the plate and invalidate our well-worn wisdom.
Before you proseltyze, understand your limits, both in theory and in data.