Vetting Claims of Injury Prediction for Sports Teams and Military Units

by Fusion Sport
 | 11th May, 2022

In sports today, performance staff aren’t merely expected to demonstrate progress in players’ training metrics and improve their results in competition. They’re also charged with reducing the number of injuries suffered throughout the season and in some settings, such as Division I college football, jobs can be on the line if injury rates are too high.

Some people and vendors claim their models or technology can predict injury. But if you dig deeper, you’ll find many red-flags and areas for skepticism. In our recent Vanguard Roundtable podcast on the Promise & Pitfalls of Injury Prediction, we discussed this topic and provided some tips for vetting questionable claims.

Below, we dive a bit deeper and share some of the latest research to give you a more objective view of the limitations of current injury prediction capabilities.


Injury Prediction Lacking Transparency and Adequate Sample Sizes

One of the issues with assessing vendor’s injury prediction claims is that their technology is usually custom developed in-house. It’s understandable that a company wants to protect its intellectual property and competitive advantage in the marketplace, but when potential customers and independent researchers are unable to look under the hood to see which data sets are being used and how they’re being interpreted, it’s difficult to separate real-world facts from advertising fiction.

In a paper published in Sports Medicine, a research team stated: “Because of the profitable nature of proprietary systems, developers are often reluctant to transparently report (or make freely available) the development and validation of their prediction algorithms; the term ‘black box’ also applies to these systems. The lack of transparency and unavailability of algorithms to allow implementation by others of ‘black box’ approaches is concerning as it prevents independent evaluation of model performance, interpretability, utility, and generalizability prior to implementation within a sports medicine and performance environment.”

A co-author of this study, Wake Forest School of Medicine assistant professor Garrett Bullock, recently provided some added context on the kind of evidence you should look for when evaluating the claims of injury prediction technology systems.

Beyond ensuring the research was conducted in an independent manner and published in a reputable, peer-reviewed journal, Bullock stated in the Vanguard Roundtable podcast that the usefulness of a study, “Is less about timeframe and more about the total sample and the number of injuries.” He went on to say that most of the existing studies on injury risk modeling – even if it meets basic criteria for validity – is not comprehensive enough for you to draw any meaningful conclusions about the technology in question: “99.9% of prediction of injury risk prediction models out there are underpowered,” Bullock said. “You need multi-team or multi-base to get a big enough sample size – I just helped design a study for the military that required 5,000 participants, and we’ll need another 10,000 to 12,000 to externally validate it, and that probably still won’t be enough for any deep learning. Most of these technologies are not worth what they’re claiming.”


The Disparity Between Injury Prediction Theory and Practical Use

Some providers purport that their technology can help Army, Navy, Air Force, and special forces units predict injury in the same way as they do for sports teams. But according to Darin Peterson, who has served as the human performance director at the US Marine Corps School of Infantry East since 2007, there’s a gap between vendors’ claims and the reality of what they can actually deliver.

“Injury prediction is a hot topic right now, because a lot of folks want to get ahead of this game and identify things that could potentially happen,” he said on the Vanguard Roundtable episode. “I think it’s a little tough, to be honest, in the military setting, because a lot of the research that currently exists is in athletics in controlled settings. I don’t know if that data is directly transferable. In the military, nothing’s controlled and the enemy always gets a vote. Training for an event that lasts for two or three hours on a Saturday is very different to preparing for a deployment that could last six months. I don’t think you can translate one to the other.”

The questionable effectiveness of injury risk modeling isn’t confined to the military and the lack of testing in the uncontrolled battlefield environment and over the longer timeline Peterson described. Bullock believes that the value of such tools to elite sports teams is also limited at the time of writing, because there are gaps between the theory and practice of how this kind of technology should be utilized and potential flaws in the predictive models themselves.

“Almost all injury prediction in sports and its poor translation into the military – which is a completely different environment – is based on models that are not working very well at the moment,” he said. “Some of that is due to the methods, and it’s also due to the gap between what vendors are touting about their tools and the actuality of how they can be applied in a clinical or performance setting.”


Inconsistent Terminology and Limited Effectiveness

Ambiguity around terminology creates a lot of confusion and can set completely unrealistic expectations for what the models and algorithms are truly capable of. This is particularly true when vendors start using terms like “injury prevention.”

Whether it’s contact injuries on the field or court, combat casualties in battle, or simply the fallibility of the human body, there are too many external factors in play to ever completely rule out injuries. Even the phrase “injury prediction” is extremely nebulous. Does it mean that a system will be able to accurately forecast precisely how an athlete or warfighter is likely to get hurt and when? Or is it rather just an indicator that, for a variety of reasons, they’re at greater risk of sustaining an injury within a broader timeframe? Right now, it depends on which vendor you ask.

Similarly, if vendors start throwing around gaudy numbers about the decreased injury rate their system can supposedly deliver, this should be a red flag. It could be that there’s an outlier use case in which this was true, but to apply it as a blanket statement to every sports team or military unit is irresponsible. Some vendors use the opposite approach and try to target teams in specific sports with bold declarations about how their technology can predict or even prevent injury. While more research is needed to investigate the validity of such statements, the current evidence suggests that the efficacy of algorithm-based, sport-specific models is dubious at best, and that tried-and-true methods like basic statistical regression might be more accurate and useful to a performance staff.

For example, a paper published in the International Journal of Sports Physical Therapy analyzed the ability of machine learning to predict a certain kind of arm injury in baseball pitchers. They concluded that “machine learning models do not improve baseball humeral torsion prediction compared to a traditional regression model. While machine learning models demonstrated improved RMSE compared to the regression, the machine learning models displayed poorer calibration compared to regression. Based on these results it is recommended to use a simple equation from a statistical model which can be quickly and efficiently integrated within a clinical setting.”

In other words, if a baseball team wanted to come up with an injury risk model, it would be more effective to gather comprehensive athlete data via an athlete management system (AMS) such as Smartabase and then have a data scientist perform a basic calculation than to try and implement a complex and costly machine learning platform. This doesn’t merely apply to baseball but to any sport or branch of the military.


Building Better Models for Injury Prediction Through Transparency and Third-Party Validation

Bullock believes that moving forward, there are steps vendors can take to increase the effectiveness of their predictive models and make it easier for human performance groups to vet their claims. “Number one is having transparency in the model development and the algorithms that they create and providing a place to access the data. Second, having complete code hyperparameters if you’re doing machine learning, or the equation algorithms if you have a more statistical base model. Third, having external validation from a third party that’s not involved with the development. I think those three things would go a long way in helping develop much more robust and much more useful prediction models.”

Peterson added that performance directors of military units can fall back on the vetting that the US military performs on technology before approving it. “We tend to go with options that have been tested by universities or validated by the Tri-Service Committee in different settings because we don’t have the luxury of buying something and then testing it out ourselves,” he said. “We also need to make sure that anything we’re considering for purchase fits the needs, methods, and coaching styles of the practitioners who will be collecting injury data and making decisions based on analytics. There’s no point in buying a flashy new system if it doesn’t give us the raw data, we need to solve our problems. And if we’re collecting data that isn’t actionable and relevant to our leadership, then we’re collecting it for no reason.”

If You Enjoyed This Article, You May Also Like…