The Googlization of basketball information

I just happened across a two-year-old Wired article by Chris Anderson. It's not about sports, but you might want to read it anyway, because I think it tells us a lot about the future of, well, everything, including basketball analysis:

Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.

Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content.

Speaking at the O'Reilly Emerging Technology Conference this past March, Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

We're all worried about that amazing team player, who really helps his team but doesn't do things that are easily recognized and counted.

If I had a nickel for every time I heard that statistics can't measure heart!

But here's what I think we have to understand: Old statistics screwed those guys. When most people say they hate statistics, they mean they hate old statistics, which had almost no regard at all for things central to basketball like setting killer screens, playing good post defense, scooping up loose balls, closing out shooters, crisp defensive rotations, hitting the open man even if he does not shoot, inspiring teammates or playing through injuries.

As a case in point we could look at the Celtics in recent years. Old statistics had no way of knowing they'd be the most cohesive and tenacious defensive unit in years, which is why just about nobody predicted, as Kevin Garnett's trade to Boston was announced three years ago, that the Celtics were about to win 72 percent of their regular season games over three seasons, and an amazing eight of their next 10 playoff series.

A lot of that team's magic was in things we have not normally measured. But there were two groups of people who were not as surprised: A subset of real-deal basketball experts who understood the merits of what the Celtics were doing, and proponents of adjusted plus/minus.

And here's where hoops stats are getting to be a bit like Google in the article above. All those things I listed above that matter to winning but aren't easily measured? If they matter to winning, they may be mired in a lot of noise, but they're in plus/minus somewhere. In other words, you kind of can measure heart, if it leads to winning.

There's almost nothing on the Internet that Google can't index, measure, value and test in some way. It's not perfect, and it may not come with the know-how, soul and wisdom of earlier forms of advertising. Those things matter, and need to come from somewhere. But it's not either/or. You need both. It's hard to point to another system that knows more about what people are up to. If someone does something that matters on the web, Google's computers are among those most likely to value it accurately.

A similar thing can be said for players in NBA games. If they're doing things, besides scoring, that matter, it used to be that their only hope of being properly valued was for some well-placed expert -- a Jerry West here, a Pat Riley there -- to notice. But West and Riley have finite time and can watch only so many games. Stats, meanwhile, can watch every game every night, and from all that data they can find an increasingly useful bunch of data. Some of the very best of it says, basically: We may not know why, but this guy is making his team far better. If the Wests and Rileys of the world are short of time, wouldn't it make sense for them to focus their efforts on players like that?

There's a ton of work left to be done. There is not really any single system that knows anything close to all there is to know about basketball. I'm not here to tell you that cutting edge basketball statistics are ready to replace those basketball experts, and they probably never will.

However, if you're looking for the best available analysis of which players help their teams the most, no executive in 2010 can afford to ignore the vast and growing new trove of data. Math is changing basketball just as it changed the Internet and advertising, and for similar reasons: Huge amounts of data, cleverly analyzed, can tell you things that small amounts of data (for instance, as can be gathered in the mind of one human) never can.