Article Text

Download PDFPDF

282 Feature selection for two-year prognosis in advanced stage high grade serous ovarian cancer using machine learning methods
  1. A Laios1,
  2. A Katsenou2,
  3. Y Tan3,
  4. M Otify3,
  5. R Hutson4,
  6. A Thangavelu1,
  7. T Broadhead4,
  8. G Theophilou1,
  9. D Nugent1,5 and
  10. D Dejong1
  1. 1St James’s University Hospital, Gynaecologic Oncology, Leeds, UK
  2. 2Visual Inforamtion Lab, University of Bristol, Electrical and Electronic Engineering, Bristol, UK
  3. 3St James’s University Hospital , Gynaecologic Oncology, Leeds
  4. 4St James’s University Hospital , Gynaecologic Oncology, Leeds, UK
  5. 1St James’s University Hospital, Gynaecologic Oncology, Leeds, UK


Introduction/Background*The prognosis of advanced stage high grade serous ovarian cancer patients (HGSOC) is multifactorial, and could be accurately predicted by using Machine Learning (ML) algorithms. We designed a study to support the feature selection of selected clinical variables to define their relative survival impact on two-year prognosis prediction in HGSOC patients, who received surgical treatment.

Methodology This was a retrospective analysis of 209 FIGO stage III-IV HGSOC women, who were scheduled for cytoreductive surgery in SJUH, Leeds between Jan 2015 to Dec 2018 with curative or life-prolonging intent. The two-year prognosis estimation was formulated as a binary classification problem. Dataset was split into training (80%) and test (20%) cohorts with repeated random sampling until there was no significant difference (p=0.20) between the two cohorts. A ten-fold cross-validation was applied. Various state-of-the-art supervised ML classifiers were tested, including Support-Vector-Machines (SVMs), K-Nearest Neighbors (KNNs), Ensemble Classifiers, and Naïve Bayes, based on a set of performance metrics. These results were directly compared to conventional Logistic Regression (LR). For feature selection, multivariate feature ranking using the MRMR method was carried out.

Result(s)*Two hundred nine patients were identified. The model’s mean prediction accuracy reached 73%. We demonstrated that SVM and Ensemble Discriminant algorithms outperformed Logistic Regression in accuracy indices. The probability of achieving a cancer-free state was maximized with a combination of primary cytoreduction, good performance status, and maximal surgical effort (AUC 0.63). Standard chemotherapy, performance status, tumor load, and residual disease were consistently predictive of the two-year overall survival (AUC 0.63-0.66) (figure 1). The model recall and precision were greater than 80%.

Conclusion*Appropriate feature selection is required when building a HGSOC model for two-year prognosis prediction. For HGSOC prognosis, one should consider not only the patient’s disease burden but also their overall medical status and ability to undergo extensive surgery, resulting in survival benefits alongside with standard chemotherapy.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.