Article Text
Abstract
Objectives To train various machine learning algorithms to predict recurrence and recurrence-free survival (RFS) in high-grade endometrial cancer (HGEC)
Methods Data was retrospectively collected across 8 Canadian centers including 1237 patients and divided arbitrarily 50% training, 25% validation and 25% testing. Four models were trained to predict recurrence: random forests, boosted trees, and 2 neural networks. Receiver operating characteristic curves (ROC) were used to determine model performance and select the best model based on highest area under the curve (AUC) in the test set. For time to recurrence models, we trained a random forest and Lasso model compared to Cox Proportional hazards. Concordance was reported using a c-statistic.
Results Among the 4 models tested, the bootstrap random forest had the best AUC in the test set and was the best model to predict recurrence in HGEC; the AUCs were 85.2%, 74.1% and 71.8% in the training, validation and test sets respectively. The top 5 predictors were: stage, uterus height, specimen weight, adjuvant chemotherapy and pre-operative histology. When stratified by stage, the AUC in the test set increased to 77% for Stage III and 80% for Stage IV. For time to recurrence, there was no difference between the Lasso and Cox Proportional Hazards models (test set c-index 71%) while the random forest had a c-index of 60.5%.
Conclusions A bootstrap random forest model best predicted recurrence in HGEC; model prediction further improved in Stage III and IV patients. Machine learning survival models performed similar to Cox Proportional Hazards but could be conducted with greater efficiency.