Rethinking hoops with advanced analytics

Last summer, in the hoops-bereft doldrums of August, Basketball Prospectus recruiting guru Drew Cannon released the findings of a study he conducted on the ways the recruiting "market" -- the coaches and analysts who value players and create the rankings lists you see every summer -- undervalue certain players.

These players are usually "tweeners," effective high school talents who don't fit one of five predetermined college basketball positions. So their value falls. They land at schools like VCU and Butler and Lehigh and Weber State. A few years later, within the great equalizing walls of the NCAA tournament, they come back to haunt the coaches who spurned them.

For some coaches, especially those at the elite levels of the sport, this idea doesn't matter. But for coaches attempting to rebuild high-major programs -- for those with even a fleeting interest in the classic "Moneyball" idea of exploiting market inefficiency -- an arbitrage opportunity exists. Theoretically, anyway.

Today, we discuss another piece of this puzzle.

Muthu Alagappan, a Stanford senior, basketball fan and intern at Ayasdi, a data visualization company, released a topological data study -- in layman's terms, a study that uses graphs to handily chart and highlight patterns among varying types of information -- that rethinks and expands upon the idea of classic basketball positions. Rather than using the five normal positions (point guard, shooting guard, small forward, power forward, center) to categorize players, Alagappan highlighted trends and similarities related to specific players' traits. He came up with 13 overall positions, each organized according to performance in seven statistical categories: Points, rebounds, assists, steals, blocked shots, turnovers and fouls.

You can see the fascinating PDF of his work here. The 13 resulting positions are not complex, once you get past the fancy analytical charts. There are categories like "scoring rebounder," "offensive ballhandler," "defensive ballhandler," "paint protector," as well as categories for elite players like "1st team all-NBA" and "one of a kind," where Alagappan lists examples like Derrick Rose and Dwight Howard, players whose statistics were so off the charts in various respects they couldn't be grouped with others. (If you look at the data itself, you'll see these little dots in the upper right-hand corner of the array. Hey guys!)

So, what is the point of all this? Alagappan uses NBA players to cite his examples, but it's clear what he's after. As his graph shows, the 2010-11 Dallas Mavericks had great balance from top to bottom, a roster filled with various categories of players that wonderfully complemented each other's traits. The Minnesota Timberwolves, on the other hand, had far too many players with similar traits. They finished last in the Western Conference in 2010-11. The beauty of this model is that it matches up with what we saw on the floor, while giving us the tools to articulate exactly what it is we saw. It's a little bit sublime.

Folks far smarter than me agree: For his trouble, Alagappan won the 2012 award for best Evolution of Sport at the MIT Sloan Sports Conference.

So how does this apply to college basketball? Alagappan finishes his slideshow with arrays for various college basketball teams, like Duke and Kentucky and Butler and Stanford, so the model isn't reserved for the pros. Indeed, the "finding value" portion of the proceedings is where college hoops comes in, and where Cannon's discussion from last August meets in the middle: If a college coach hunting for recruiting value can break down that recruitment beyond the normal superficial classifications -- I need a point guard, is he a true point guard? -- and begin to construct teams based on more refined and analytical understandings of positional value, they can cut through the recruiting muck and field better teams than any recruiting ranking would suggest is possible.

It's not nearly as easy as that, of course. Why? It's hard to translate high school statistics to a college level. Alagappan can cite Devin Ebanks as an undervalued "scoring rebounder," but he has NBA stats and salaries to plug into his model. It would be far more difficult for a college coach to identify such value between levels of the sport. Recruiting is by its very nature unpredictable. You don't always know what you're going to get.

Still, even a base understanding of this idea -- that you're not recruiting positions but traits, and here's what you really need -- could be a breakthrough for mid-major and rebuilding high-major coaches willing to take advantage. Many are already using Ken Pomeroy's tempo-free statistics and Synergy's scouting data. This may be the next step.