How 'Outside the Lines' analyzed the U.S. Open tennis tournament draw

ESPN analyzed the men's and women's draws in each of the four Grand Slam tennis tournaments: the Australian Open, French Open, Wimbledon and U.S. Open.

The analysis, which was done by analytics specialist Alok Pattani of the ESPN Stats & Information Group, focused on the top two seeds in each tournament. It began with the compilation of the men's draws for all Grand Slams since the 2001 Wimbledon -- 41 total tournaments (11 Wimbledons and 10 each for the Australian, French, and U.S. Opens.)

The study used the ATP rankings information from immediately before every Grand Slam draw for each men's player in a Grand Slam (since 2001). Those rankings, along with the placements for the 32 seeded players, were used to re-rank the players 1-128 -- the total number of players in a Grand Slam tournament. So if a player was ranked 575th in the ATP rankings and that was the second-worst ranking among all players in the draw, he was re-ranked 127th.

This re-ranking was used to examine the strength of the opponents facing the top two seeds. An opponent ranking closer to the minimum rank of 33 meant the top two seeds drew a relatively difficult opponent, while a ranking close to the maximum of 128 would be representative of a relatively easy first-round draw.

The same data-gathering procedure was used for the women's draws, using the corresponding predraw WTA rankings.

Table A (below) shows the U.S. Open men's top two seeds' first-round opponents the past 10 years and those opponents' ranks among all players in the tournament.

Table B (below) shows the U.S. Open women's top two seeds' first-round opponents the past 10 years and those opponents' ranks among all players in the tournament.

Turning opponent rankings into "draw difficulty" scores

To further measure opponent difficulty, ESPN assigned scores to a player's draw by using the opponent's rank compared to all possible opponent ranks that the player could have faced in that round. So if a top two seed faced the 33rd-ranked player in the first round, he/she would get a difficulty score of 0.995 for that round; if he/she faced the 128th-ranked player in the first round, the score for that round would be 0.005. An average opponent (ranked around 80th or 81st), would correspond to a difficulty score near 0.500, which should be the average difficulty score over several years of draws.

Once ESPN had the first-round draw difficulty scores for each of the top two seeds, the scores were averaged by Grand Slams across the 10-year span (11 years for Wimbledon) to see if the top two seeds got relatively easy or difficult draws at each individual Grand Slam. The findings in the first round of the U.S. Open stood out.

Simulating thousands of random draws

It is always the case that fluctuations among tournaments and draws could be attributed to random chance -- by design, the draws have an element of randomness built into them. However, tests can be designed to see if the results produced by these random draws actually appear to be truly random. For example, the 0.326 and 0.313 first-round average scores for the top two men's and women's seeds at the U.S. Open seem quite low compared to the expected average of .500, but how often would scores that low over a 10-year span actually occur by random chance?

Because ESPN knew the actual format of these draws, one way to answer these types of questions was to simulate the draws themselves many times and see how often results as extreme as those found in the data occurred.

So ESPN created a "fake" draw sheet, with players ranked 1-128 getting placed into slots according to the way a Grand Slam bracket works. Repeating this procedure 10 times (11 for Wimbledon) generated a set of draws comparable to what was found in the draws from actual tournaments. Then ESPN looked at the first-round opponents for the top two seeds from the simulated draws and calculated their draw difficulty scores across the same time span as with the actual draws.

This exercise was repeated 1,000 times, providing a simulated distribution of 1,000 average draw difficulty scores for the top two seeds over 10 years (11 for Wimbledon). Because the simulated distribution came from draws randomized the same way that tennis' governing officials told ESPN they randomly constructed draws for the Grand Slams, it was used to benchmark what was found with the actual draws and to determine how likely that was to occur by random chance. Those are ESPN's findings, reported here.

To ensure that the methodology was sound, ESPN asked Dr. Andrew Swift, past chairman of the American Statistical Association Section on Statistics in Sport and an assistant mathematics professor at the University of Nebraska at Omaha, to evaluate the data, calculations and work. He said the analysis and its methodology were sound, and he used the same methodology to run calculations -- with up to 1 million repetitions -- and got nearly identical results.

"Any way you want to look at these, there is significant evidence here that these did not come from a random draw," he said.

A note about sample size

An initial look at the data and methodology led some USTA officials to question the study's sample size.

But sample size is properly accounted for in this analysis because the 10-year average from the actual U.S. Open draws was compared to 1,000 simulated 10-year averages, each created using the same draw procedure, making this an "apples-to-apples" comparison. From this simulated distribution of 10-year averages, ESPN was able to conclude that the margin of the discrepancy of the actual average rankings from what would be expected over a 10-year sample is outside that which would be reasonable by random chance. The 10-year period was selected because the seeding of 32 players in the Grand Slams started at the 2001 Wimbledon tournament, and there have been exactly 10 U.S. Opens since.

Said Dr. Swift: "Their argument that 10 years of data is not a big enough sample size is invalid."