As a huge fan of competitive ultimate, I love watching the college season inch closer to the spring series, where teams fight for and earn places at the national championships, this year over Memorial Day Weekend in Milwaukee, WI. Almost as much, I look forward to the debates and discussion over the bid allocation – how many teams each of the ten geographic regions gets to send to nationals. Rather than just have the top two teams from each region qualify, USA Ultimate – ultimate’s national governing body – decided to make the process a bit more complicated, but hopefully more fair given the talent disparity across regions.
The system works like this: there are 20 bids to nationals for 10 regions1, and each region automatically gets one bid. The top team in each region is then removed from the rankings, and the top 10 remaining teams earn a bid for their region. Thus, those second 10 teams occupy hugely important spots in the rankings – an extra bid for your region means an easier road to nationals, and the difference between playing in Milwaukee or staying at home.
Obviously the rankings themselves are important – but how does USAU get them in the first place? Unlike college football, there are no polls or playoff committee. The rankings are purely numerical in nature, and hopefully avoid human biases and our inability to process complex networked relationships. I won’t explain the details of the algorithm here2 but the important takeaway is that the only inputs into the algorithms are games and scores. The algorithm does not explicitly account for total win-loss record, net goal differential, or any other statistic. Because the algorithm is focused only on games, the single most important criterion to ensure its accuracy is connectivity between the teams. As of right now, there are 346 men’s teams with 4,136 games played and 228 women’s teams with 2,840 games played.
With that in mind, I thought it would be interesting to see how the connectivity between teams changes over the course of the season, and if we could draw some conclusions about whether or the regular season is structured effectively to ensure connectivity between teams.
The gif below shows the connectivity for women’s teams as time goes on. Each dot represents a team, and each line represents a game. The teams are colored according to their current ranking, not their ranking as of each date.
The first thing I noticed is that, for several weeks, there are two distinct groups of highly ranked teams connected by one purple team. This is the east coast and midwest vs. the west coast, and the team connecting them is Texas. The distinctness of the grouping diminishes substantially between March 17 and 24, but it’s still there if you look closely:
The other noticeable thing on the women’s side is how (relatively) loosely connected Clemson, Notre Dame, and Vriginia – all top 10 teams – are to the action. Is it enough to make me doubt the accuracy of their ranking? I’m not sure.
One other question I wanted to answer was, “Which teams are the most important?” The measure for this is called centrality3. The graph below shows all of the teams colored by centrality – the brighter the team, the more important they are to making connections through the network. Here is a PDF version which should have better resolution.
So if you beat Maryland, the most central team in the women’s division, it’s likely to mean that much more for your overall ranking than the average win because of their unique connections to the rest of the field.
Here’s the men’s season, with the same scheme as the women’s division:
The first thing I noticed here were the outliers at the end – Amherst and maybe Michigan and Maryland. Unlike the women’s side, these teams aren’t in the top 20 so they aren’t directly affecting nationals bid allocations.
Finally, here’s the men’s teams colored by centrality (PDF here):
Alright, time for me to get back to work. If anyone wants the raw data, look for the .csv files here.
1. This is true for DI. DIII has 16 bids for 10 regions, but otherwise the ranking and allocation processes are the same.↩
2. Here’s USAU’s explanation . It’s essentially a pairwise comparison scheme that solves by iteration until convergence, with a few tweaks like weighting individual games by recency and choosing preference values based on the score of the game.↩
3. I used betweenness centrality as it seems most appropriate for this situation. Someone please correct me if I’m wrong!↩
4. Figures were generated using the NetworkX and matplotlib Python packages.