THE UNIVERSITY OF BRITISH COLUMBIA
Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predictor by directly extending the concept of the concordance statistic from a risk model with a binary outcome to a model for treatment benefit. In this work, we scrutinize cfb on multiple fronts. Through numerical examples and theoretical developments, we show that cfb is not a proper scoring rule. We also show that it is sensitive to the unestimable correlation between counterfactual outcomes and to the definition of matched pairs. We argue that measures of statistical dispersion applied to predicted benefits do not suffer from these issues and can be an alternative metric for the discriminatory performance of treatment benefit predictors.