THE UNIVERSITY OF BRITISH COLUMBIA
Rationale: Automatic prediction algorithms based on routinely collected health data may be able to identify patients at high-risk for hospitalizations related to acute exacerbations of Chronic Obstructive Pulmonary Disease (COPD).
Objective: This was a proof-of-concept study for a population surveillance approach towards identifying individuals at high risk of severe COPD exacerbations.
Methods: We used British Columbia’s administrative health databases (1997–2016) to identify patients with diagnosed COPD. We used data from the previous six months to predict the risk of severe exacerbation in the next two months after a randomly selected index date. We applied statistical and machine learning algorithms for risk prediction (logistic regression, random forest, neural network, and gradient boosting). We used calibration plots and receiver operating characteristic (ROC) curves to evaluate model performance based on a randomly chosen future date at least one year later (temporal validation).
Results: There were 108,433 patients in the development and 113,786 in the validation datasets; of these, 1,126 and 1,136, respectively, were hospitalized due to COPD within their outcome windows. The best prediction algorithm (gradient boosting) had an area under the ROC curve of 0.82 (95%CI 0.80–0.83), significantly higher than the corresponding value for the model with exacerbation history as the only predictor (current standard of care – 0.68). The predicted risk scores were well calibrated in the validation dataset. Conclusions: Imminent COPD-related hospitalizations can be predicted with good accuracy using administrative health data. This model may be used as a means to target high-risk patients for preventive exacerbation therapies.