Objectives To train various machine learning algorithms to predict recurrence and recurrence-free survival (RFS) in high-grade endometrial cancer (HGEC)
Methods Data was retrospectively collected across 8 Canadian centers including 1237 patients and divided arbitrarily 50% training, 25% validation and 25% testing. Four models were trained to predict recurrence: random forests, boosted trees, and 2 neural networks. Receiver operating characteristic curves (ROC) were used to determine model performance and select the best model based on highest area under the curve (AUC) in the test set. For time to recurrence models, we trained a random forest and Lasso model compared to Cox Proportional hazards. Concordance was reported using a c-statistic.
Results Among the 4 models tested, the bootstrap random forest had the best AUC in the test set and was the best model to predict recurrence in HGEC; the AUCs were 85.2%, 74.1% and 71.8% in the training, validation and test sets respectively. The top 5 predictors were: stage, uterus height, specimen weight, adjuvant chemotherapy and pre-operative histology. When stratified by stage, the AUC in the test set increased to 77% for Stage III and 80% for Stage IV. For time to recurrence, there was no difference between the Lasso and Cox Proportional Hazards models (test set c-index 71%) while the random forest had a c-index of 60.5%.
Conclusions A bootstrap random forest model best predicted recurrence in HGEC; model prediction further improved in Stage III and IV patients. Machine learning survival models performed similar to Cox Proportional Hazards but could be conducted with greater efficiency.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.