Introduction: Well-trained staff is needed to interpret cardiopulmonary exercise tests (CPET). We aimed to examine the accuracy of machine learning-based algorithms to classify exercise limitations and their severity in clinical practice compared with expert consensus using patients presenting at a pulmonary clinic.
Methods: This study included 200 historical CPET data sets (48.5% female) of patients older than 40 yr referred for CPET because of unexplained dyspnea, preoperative examination, and evaluation of therapy progress. Data sets were independently rated by experts according to the severity of pulmonary-vascular, mechanical-ventilatory, cardiocirculatory, and muscular limitations using a visual analog scale. Decision trees and random forests analyses were calculated.
Results: Mean deviations between experts in the respective limitation categories ranged from 1.0 to 1.1 points (SD, 1.2) before consensus. Random forests identified parameters of particular importance for detecting specific constraints. Central parameters were nadir ventilatory efficiency for CO 2 , ventilatory efficiency slope for CO 2 (pulmonary-vascular limitations); breathing reserve, forced expiratory volume in 1 s, and forced vital capacity (mechanical-ventilatory limitations); and peak oxygen uptake, O 2 uptake/work rate slope, and % change of the latter (cardiocirculatory limitations). Thresholds differentiating between different limitation severities were reported. The accuracy of the most accurate decision tree of each category was comparable to expert ratings. Finally, a combined decision tree was created quantifying combined system limitations within one patient.
Conclusions: Machine learning-based algorithms may be a viable option to facilitate the interpretation of CPET and identify exercise limitations. Our findings may further support clinical decision making and aid the development of standardized rating instruments.