Predicting Patient-Level 3-Level Version of EQ-5D Index Scores From a Large International Database Using Machine Learning and Regression Methods
Despite the vast amount of real-life data accumulated in healthcare, EQ-5D index scores are frequently lacking for health economic analyses and have to be estimated.
We predicted 3 level version of EQ-5D (EQ-5D-3L) index scores in a large heterogenous data set of population surveys and clinical studies using eXtreme Gradient Boosting classification, eXtreme Gradient Boosting regression, and ordinary least squares regression. Regression methods outperformed classification in terms of prediction accuracy and bias. The performance of the 3 methods depended on the applied evaluation criteria, the target population, the included predictors, and the EQ-5D-3L index score range. The prediction accuracy of individual EQ-5D-3L index scores was inadequate for the majority of respondents. For the evaluation of personalized health interventions, we encourage the systematic collection of patient-reported outcomes such as EQ-5D with the involvement of artificial intelligence experts and outcomes researchers to enhance the value of accumulating data in health systems.
This study aimed to evaluate the performance of machine learning and regression methods in the prediction of 3-level version of EQ-5D (EQ-5D-3L) index scores from a large diverse data set.
A total of 30 studies from 3 countries were combined. Predictions were performed via eXtreme Gradient Boosting classification (XGBC), eXtreme Gradient Boosting regression (XGBR) and ordinary least squares (OLS) regression using 10-fold cross-validation and 80%/20% partition for training and testing. We evaluated 6 prediction scenarios using 3 samples (general population, patients, total) and 2 predictor sets: demographic and disease-related variables with/without patient-reported outcomes. Model performance was evaluated by mean absolute error and percent of predictions within clinically irrelevant error range and within correct health severity group (EQ-5D-3L index <0.45, 0.45-0.926, >0.926).
The data set involved 26 318 individuals (clinical settings n = 6214, general population n = 20 104) and 26 predictor variables plus diagnoses. Using all predictors and the total sample, mean absolute error values were 0.153, 0.126, and 0.131, percent of predictions within clinically irrelevant error range were 47.6%, 39.5%, and 37.4%, and within the correct health severity group were 56.3%, 64.9%, and 63.3% by XGBC, XGBR, and OLS, respectively. The performance of models depended on the applied evaluation criteria, the target population, the included predictors, and the EQ-5D-3L index score range.
Regression models (XGBR and OLS) outperformed XGBC, yet prediction errors were outside the clinically irrelevant error range for most respondents. Our results highlight the importance of systematic patient-reported outcome (EQ-5D) data collection. Dialogs between artificial intelligence and outcomes research experts are encouraged to enhance the value of accumulating data in health systems.