Predicting Sports Injuries Using Machine Learning

By Madison McCarty

Faculty Mentor: Evan Coleman

Abstract

This project investigates the use of machine learning to predict injury risk across two distinct datasets characterized by extreme class imbalance. Injuries represented approximately 1–2% of all observations, making traditional accuracy-based evaluation insufficient. To address this challenge, data balancing techniques and cost‑sensitive learning were applied to emphasize injury detection. Multiple modeling approaches were evaluated, including traditional statistical models, ensemble methods, and neural networks. Model performance was assessed using precision, recall, and F1 score, with particular emphasis on recall to minimize missed injuries. Results showed that several high‑accuracy models failed to meaningfully detect injuries, while Gradient Boosting achieved the most effective balance between sensitivity and reliability across both datasets. Feature importance analysis further supported model interpretability. Overall, the findings highlight the importance of appropriate data handling, evaluation metrics, and cross‑dataset validation when applying machine learning to rare but high‑impact outcomes such as injury prediction.


Posted

in

,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php