## Absorbing Markov Chains and Ultimate, Part II: It’s Markov chains all the way down

In Part I of this series, I went through the background of Markov chains and how each game can be modeled as one. I’ll be using the terminology from that in this post, so I’d suggest you read that post first. By the end of this post, we’ll have a way to self-consistently generate win probability charts using real game data by first calculating team strength parameters, then using them in the framework from last week to construct point-by-point win probability charts.

We left off showing how Markov chains can be used to construct win probability charts for individual games. The sticking point in our analysis is that we need good estimates for $P_o$ and $P_d$, the probability that one team scores when it starts the point on offense or defense, respectively. There are a few steps in getting there, but the key is that we’ll be choosing team strength parameters so that they maximize the likelihood of observing the outcomes of a set of real games, which we can calculate directly using the Markov chain formalism.

While entire games can be modeled as absorbing Markov chains, each individual point of a game can also is also an absorbing Markov chain, where the transient states represent possession and the absorbing states are goals.

## Absorbing Markov Chains and Ultimate, Part I

Inspired largely by Thomas Murray’s article in Ultiworld  (and the supplementary information which he kindly sent me), I’ve been thinking about how we might be able to attack the problem of predicting which team might win a particular game of ultimate frisbee. It’s pretty difficult to do in a data-poor sport such as ultimate; often the only information we have about a game is the final score.

Thomas’s formalism is quite elegant; he employs a Bayesian inference approach to estimate the team strength $\alpha$. $\alpha$ can then be used to estimate the probability a particular team wins a game as follows: let’s say Team A with strength $\alpha_A$ is playing Team B with strength $\alpha_B$. The probability Team A wins a particular point is:

$p_A = \frac{ e^{\alpha_A }}{e^{\alpha_A} +e^{\alpha_B} }$

With the probability Team B wins a particular point the complement, i.e.

$p_B = \frac{ e^{\alpha_B }}{e^{\alpha_A} +e^{\alpha_B} }$

This can then be translated into the probability Team A wins a game by treating each point as an independent event and combinatorially calculating how often we should expect Team A to win based on $p_A$.

The one key assumption here is that $p_A$ is constant for every point, i.e. that if Team A has a 60% chance of scoring on the first point of the game, they also have a 60% chance of scoring every other point in the game. While this assumption is useful for paring the complexity of the problem, it’s certainly violated in the context of ultimate simply by the way the game is run. The probability of Team A scoring on a particular point must depend on whether Team A started the point on offense (by receiving the pull) or if Team A started the point on defense (by pulling). The wrench this throws in our plans is that each point is no longer an independent event: the probability that Team A scores on this point depends on the result of the previous point. We can’t just use combinatorics to predict who’s going to win, which makes the math a bit more complicated. (This is why making the assumption that the points are independent events simplifies things a ton.)

Luckily, there’s already a well-documented way to attack this problem: we can describe a game of ultimate as a Markov Chain. Described abstractly, a Markov chain is a series of events, and at each point in time we consider the system to be in an abstract state. We define an event as the system “hopping” from the current state to a new state. What differentiates Markov processes from other stochastic processes is that the transition probability from the current state to the next state is only dependent on the current state. Continue reading “Absorbing Markov Chains and Ultimate, Part I”

## Probabilistically inferring viscoelastic relaxation spectra using PyMC3

One of the core parts of rheology is the selection and evaluation of models used to describe and predict the complex mechanical response of materials to imposed stresses and strains. With conventional materials,  such as either purely elastic solids or Newtonian fluids, the models are straightforward: a single modulus $E$ or viscosity $\eta$ can perfectly describe the mechanical behavior. These parameters are single-valued, i.e. $E=100MPa$.

However, because we are generally dealing with non-Newtonian fluids or otherwise complex materials, we can no longer use single-valued materials parameters to adequately describe their behavior. Instead, we need to use material functions, such that the modulus is now a function of e.g. time, shear rate, stress, or any combination of factors in any possible functional form. The number of potential models is nearly endless.

In linear viscoelasticity, it can be shown that all models can be reduced to an equivalent expression of the Relaxation Spectrum, usually noted $H(\tau)$. While more conventional models specify single relaxation time constants that might be attributed to physical processes, the relaxation spectrum describes a continuous distribution of moduli across the time axis. Conceptually, thinking about viscoelastic mechanical properties in terms of the relaxation spectrum feels more pure than using a simple model with a few characteristic timescales. It forces us to consider that our materials can have many, many relaxation processes which act over orders of magnitude in time, which in my mind is the most fundamental mental model for viscoelastic relaxation. Using the relaxation spectrum description is also less constraining than using a more specific model, as we aren’t making as many assumptions about our material’s properties. Continue reading “Probabilistically inferring viscoelastic relaxation spectra using PyMC3”

## Fraud and Trust in Science

Over the last few weeks, two stories about scientific misconduct caught my attention:

1. John Bohannon, a science writer, published that he “Fooled Millions Into Thinking Chocolate Helps Weight Loss” by publishing a technically-true-but-intentionally-statistically-faulty weight loss study.
2. Michael LaCour, a political science grad student at UCLA, was accused of faking the data for his seminal paper on how a 30-minute conversation with a gay activist had a strong influence on people’s opinions on gay marriage ballot initiatives. He still stands by the results, although with his co-author retracting the article1 and more accusations being leveled, it’s not looking particularly good for him.

To me, both of these stories exemplify scientists taking advantage of the trust necessary to conduct and communicate research. When I read published papers, there’s no way for me to tell a priori whether or not the authors just made the whole thing up. Bohannon and his colleagues abused the trust that journalists have for scientists by presenting the results of their intentionally-flawed study as significant. LaCour abused the trust his advisors gave him and (apparently) just made up his data.

My advisor isn’t going to check the raw data from each experiment that I do, look at the protocol for each statistical analysis I run, or watch me conduct experiments to make sure I’m following the protocols. He has to trust that I’m not just making everything up. The whole system is built on that kind of trust – so much that it’s biased against publishing contrarian findings. Anecdotally, I’ve had conversations with colleagues who, intending to expand on the original findings, couldn’t reproduce the results of a published study. Rather than publish their negative results and seek to correct the larger scientific record, they abandoned the project entirely.

So I’m not at all surprised that David Broockman seemingly had to get drunk before he would talk about his examination of Michael LaCour’s data:

On December 17, 2014, Broockman found himself a bit tipsy with someone he trusted: Neil Malhotra, a professor at Stanford’s business school. […] A few drinks in, Broockman shared his concerns about LaCour’s data. Malhotra recalled his response: “As someone in your early career stage, you don’t want to do this,” he told Broockman. “You don’t want to go public with this. Even if you have uncontroversial proof, you still shouldn’t do it. Because there’s just not much reward to someone doing this.2

Even when Broockman had incontrovertible evidence that the study was faked, he had essentially no incentive to publish. Science should be a conversation, and just because a paper was peer-reviewed doesn’t mean it’s definitive, but we need to be able to trust that the authors aren’t intentionally misleading us.

1. Coincidentally (ironically? Who knows) this is John Bohannon covering the LaCour story.
2. “There’s just not much reward to doing this” should be a rallying cry for academics.

## Visualizing connectivity between teams in college ultimate frisbee

As a huge fan of competitive ultimate, I love watching the college season inch closer to the spring series, where teams fight for and earn places at the national championships, this year over Memorial Day Weekend in Milwaukee, WI. Almost as much, I look forward to the debates and discussion over the bid allocation – how many teams each of the ten geographic regions gets to send to nationals. Rather than just have the top two teams from each region qualify, USA Ultimate – ultimate’s national governing body – decided to make the process a bit more complicated, but hopefully more fair given the talent disparity across regions.

The system works like this: there are 20 bids to nationals for 10 regions1, and each region automatically gets one bid. The top team in each region is then removed from the rankings, and the top 10 remaining teams earn a bid for their region. Thus, those second 10 teams occupy hugely important spots in the rankings – an extra bid for your region means an easier road to nationals, and the difference between playing in Milwaukee or staying at home.

Obviously the rankings themselves are important – but how does USAU get them in the first place? Unlike college football, there are no polls or playoff committee. The rankings are purely numerical in nature, and hopefully avoid human biases and our inability to process complex networked relationships. I won’t explain the details of the algorithm here2  but the important takeaway is that the only inputs into the algorithms are games and scores. The algorithm does not explicitly account for total win-loss record, net goal differential, or any other statistic. Because the algorithm is focused only on games, the single most important criterion to ensure its accuracy is connectivity between the teams. As of right now, there are 346 men’s teams with 4,136 games played and 228 women’s teams with 2,840 games played.

With that in mind, I thought it would be interesting to see how the connectivity between teams changes over the course of the season, and if we could draw some conclusions about whether or the regular season is structured effectively to ensure connectivity between teams.

### Women

The gif below shows the connectivity for women’s teams as time goes on. Each dot represents a team, and each line represents a game. The teams are colored according to their current ranking, not their ranking as of each date.

The first thing I noticed is that, for several weeks, there are two distinct groups of highly ranked teams connected by one purple team. This is the east coast and midwest vs. the west coast, and the team connecting them is Texas. The distinctness of the grouping diminishes substantially between March 17 and 24, but it’s still there if you look closely:

The other noticeable thing on the women’s side is how (relatively) loosely connected Clemson, Notre Dame, and Vriginia – all top 10 teams – are to the action. Is it enough to make me doubt the accuracy of their ranking? I’m not sure.

One other question I wanted to answer was, “Which teams are the most important?” The measure for this is called centrality3. The graph below shows all of the teams colored by centrality – the brighter the team, the more important they are to making connections through the network. Here is a PDF version which should have better resolution.

So if you beat Maryland, the most central team in the women’s division, it’s likely to mean that much more for your overall ranking than the average win because of their unique connections to the rest of the field.

### Men

Here’s the men’s season, with the same scheme as the women’s division:

The first thing I noticed here were the outliers at the end – Amherst and maybe Michigan and Maryland. Unlike the women’s side, these teams aren’t in the top 20 so they aren’t directly affecting nationals bid allocations.

Finally, here’s the men’s teams colored by centrality (PDF here):

Alright, time for me to get back to work. If anyone wants the raw data, look for the .csv files here.

1. This is true for DI. DIII has 16 bids for 10 regions, but otherwise the ranking and allocation processes are the same.
2. Here’s USAU’s explanation . It’s essentially a pairwise comparison scheme that solves by iteration until convergence, with a few tweaks like weighting individual games by recency and choosing preference values based on the score of the game.
3. I used betweenness centrality as it seems most appropriate for this situation. Someone please correct me if I’m wrong!
4. Figures were generated using the NetworkX and matplotlib Python packages.

## Failing at writing for no one

One of my favorite Internet-people, Rembert Browne, has written a few times lately about the concept of “writing for no one.” In essence, it’s about putting thoughts on paper without care for who the audience is, and sometimes not publishing the work at all. The act of writing or creating is in and of itself the pursuit.

I miss writing, especially because scientific writing – the only kind I do now – lacks any room for expression. Everything about it is distilled and clean and perfectly concise. Scientific writing is basically the cleanroom of writing, sanitized beyond belief and devoid of personality.

Unfortunately, I have no idea if my writing will be any good or not. But I’ve been hearing a lot recently that failure is the only way to improve. So that’s what this blog will be: me failing at writing for no one.