For exclusive access to all of our fitness, gear, adventure, and travel stories, plus discounts on trips, events, and gear, sign up for Outside+ today
and save 50 percent.
The central question that sports scientists are grappling with these days is this: What the heck are we going to do with all this data? In endurance sports, we’ve progressed from heart rate monitors and GPS watches to sophisticated biomechanical analysis, internal oxygen levels, and continuous glucose measurements, all displayed on your wrist then automatically downloaded to your computer. Team sports have undergone a similar tech revolution. The resulting data is fascinating and abundant, but is it actually useful?
A new paper in the International Journal of Sports Physiology and Performance tackles this question and presents an interesting framework for thinking about it, derived from the business analytics literature. The paper comes from Kobe Houtmeyers and Arne Jaspers of KU Leuven in Belgium, along with Pedro Figueiredo of the Portuguese Football Federation’s Portugal Football School.
Here’s their four-stage framework for data analytics, presented in order of both increasing complexity and increasing value to the athlete or coach:
- Descriptive: What happened?
- Diagnostic: Why did it happen?
- Predictive: What will happen?
- Prescriptive: How do we make it happen?
Each stage builds on the previous one, which means that the descriptive layer is the foundation for everything else. Is the data good enough? I’m pretty confident that a modern GPS watch can accurately describe how far and how fast I’ve run in training, which allows me to move to the next stage and try to diagnose whether a good or bad race resulted from training too much, too little, too hard, too easy, and so on. In contrast, the heart rate data I get from wrist sensors on sports watches is utter garbage (as verified by comparing it to data from chest straps). It took me a while to realize that, and any insights I drew from that flawed data would obviously have been meaningless and possibly damaging to my training.
Making predictions is harder (especially, as the saying goes, about the future). Scientists in a variety of sports have tried to use machine learning to comb through big sets of training data to predict who’s at high risk of getting injured. For example, a study published earlier this year by researchers at the University of Groningen in the Netherlands plugged seven years of training and injury data from 74 competitive runners into an algorithm that parsed risk based on either the previous seven days of running (with ten parameters for each day, like the total distance in different training zones, perceived exertion, and duration of cross-training) or the previous three weeks (with 22 parameters per week). The resulting model, like similar ones in other sports, was significantly better than a coin toss at predicting injuries, but not yet good enough to base training decisions on.
Prescriptive analytics, the holy grail for sports scientists, is even more elusive. A simple example that doesn’t require any heavy computation is heart-rate variability (HRV), a proxy measure of stress and recovery status that (as I discussed in a 2018 article) has been proposed as a daily guide for deciding whether to train hard or easy. Even though the physiology makes sense, I’ve been skeptical of delegating crucial training decisions to an algorithm. That’s a false choice, though, according to Houtmeyers and his colleagues. Prescriptive analytics provides “decision support systems”: the algorithm isn’t replacing the coach, but is providing him or her with another perspective that’s not weighed down by the inevitable cognitive biases that afflict human decision-making.
Interestingly, Marco Altini, one of the leaders in developing approaches to HRV-guided training, posted a Twitter thread a few weeks ago in which he reflected on what has changed in the field since my 2018 article. Among the insights: the measuring technology has improved, as has knowledge about how and when to use it to get the most reliable data. That’s key for descriptive usage. But even good data doesn’t guarantee good prescriptive advice. According to Altini, studies of HRV-guided training (like this one) have moved away from tweaking workout plans based on the vagaries of that morning’s reading, relying instead on longer-term trends like running seven-day averages. Even with those caveats, I’d still view HRV as a source of decision support rather than as a decision-maker.
One of the reasons Houtmeyers’s paper appealed to me is that I spent a bunch of time thinking about these issues during my recent experiment with continuous glucose monitoring. The four-stage framework helps clarify my thinking. It’s clear that CGMs offer great descriptive data; and with some effort, I think you can also get some good diagnostic insights. But the sales pitch, as you’d expect, is explicitly focused on predictive and prescriptive promises: guiding you on what and when to eat in order to maximize performance and recovery. Maybe that’s possible, but I’m not yet convinced.
In fact, if there’s one simple message I take away from this paper, it’s that description and diagnosis are not the same thing as prediction and prescription. The latter doesn’t follow automatically from the former. As the data sets keep getting bigger and higher-quality, it seems inevitable that we’ll eventually reach the point when machine-learning algorithms can pick up patterns and interactions that even highly experienced coaches might miss. But that’s a big leap, and data on its own—even “big” data—won’t get us there.
For more Sweat Science, join me on Twitter and Facebook, sign up for the email newsletter, and check out my book Endure: Mind, Body, and the Curiously Elastic Limits of Human Performance.