Main

Even in the absence of randomized trials supporting the role of cytoreductive surgery in the treatment of advanced ovarian cancer, there is a general consensus regarding the need to pursue the achievement of the minimal amount, or even the absence of visible residual tumour at primary surgery, given its direct relation with prolonged survival time (Hunter et al, 1992; Bristow et al, 2002; Eisenkop et al, 2006).

Although the term ‘optimal’ has been applied over time to cytoreductive surgery achieving a maximal diameter of residual tumour from 0 to even 3 cm (Nickles Fader and Rose, 2007), the GOG currently defines as ‘optimal’ a residual disease 1 cm; indeed, the evolution of this definition to include patients undergoing cytoreduction to no apparent disease is reasonably going to rapidly occur (Bristow et al, 2002). Therefore, while awaiting for the mature results of the EORTC 55971 phase III randomized trial comparing upfront debulking vs secondary cytoreduction after neo-adjuvant chemotherapy in stage IIIC/IV ovarian cancer (Vergote et al, 2008), maximal efforts have to be attempted to leave no residual tumour at primary surgery, as this commitment represents the keystone in the management of advanced disease.

In this context, much attention has been focused on laboratory assay, clinical and radiographic parameters, or, more recently, laparoscopically assessed scores (Fagotti et al, 2006, 2008; Brun et al, 2008) able to preoperatively define each patient's chance to undergo optimal cytoreduction. Although the accuracy of preoperative serum Ca125 levels ranges between 50 and 78%, and conflicting data about the predictive ability of Ca125 levels have been reported (Chi et al, 2000, 2009; Cooper et al, 2002; Saygili et al, 2002; Memarzadeh et al, 2003), computed tomography (CT) scan assessed parameters might conceivably offer better predictive performances, as they define not only the extension of disease, but also more important, the involvement of specific intra-abdominal sites generally recognised to heavily preclude the feasibility of optimal debulking, such as portal triad disease, agglutinated bowel/mesentery, bulky diaphragmatic disease, or suprarenal aortic lymph nodes (Eisenkop and Spirtos, 2001). In particular, Bristow et al (2000) developed a CT scan-based model achieving an overall accuracy of 93% in predicting successful cytoreduction. However, the recent demonstration that CT predictors are not reliably reproducible in series different from the one(s) upon which the model was originated has questioned the true role of CT scan in predicting surgical outcome outside each own institution (Axtell et al, 2007). Additional sources of concern about the usefulness of CT imaging predictors are represented by the small size of previously published cohorts, often including early-stage cases, their retrospective nature, and heterogeneity of imaging procedures across institutions, as well as combinations of different CT predictors (Nelson et al, 1993; Forstner et al, 1995; Meyer et al, 1995; Bristow et al, 2000; Byrom et al, 2002; Dowdy et al, 2004; Qayyum et al, 2005; Axtell et al, 2007).

Moreover, discrepancies across the studies and reliability of the results are also conceivably related to the time frame, and duration of accrual as variations and/or improvements in the imaging techniques, equipment, and performances have occurred over time, and also diverge across different imaging centres.

The aim of this study was to investigate the overall performance of CT in predicting the feasibility of primary optimal cytoreduction in advanced ovarian cancer patients in a large, prospective trial. The performances of different predictive models, including also clinically assessed parameters, have been considered.

Materials and methods

Between January 2005 and October 2008, 195 consecutive patients with clinical and radiographic suspicion of advanced (Stage III–IV) ovarian/peritoneal cancer were consecutively enrolled at the Gynecologic Oncology Unit, Catholic University of Rome and Campobasso, Italy.

Routinary staging work up included complete physical and gynaecological examination, Ca125 serum level assessment, chest X-rays, and abdomino-pelvic CT scan. ECOG performance status (ECOG-PS) was also recorded. To select advanced stage cases, clinical and radiological parameters were used as the presence of at least two of the following criteria: ascites (>500 ml), CT evidence of metastatic disease, and elevated Ca125 levels (>500 IU ml−1).

Exclusion criteria were represented by ECOG-PS >2, large volume extra-abdominal disease.

At time of study conception and design (2004), the approval of the Institutional Review Board was not required because the study did not include diagnostic or therapeutic procedures different from the standard ones (preoperative staging and cytoreductive surgery).

Imaging technique

Preoperative CT scans were performed with a high-speed scanner (CT Hi Speed Nx/i Pro; 2-slice; GE Medical System, Milwaukee, WI, USA). Computed tomography examinations were obtained after the oral administration of 1000 ml of diluted iodinated water soluble contrast medium (approximately 2% Gastrografin solution; 20 ml Gastrografin/1000 ml water). All CT scans were acquired at baseline and 70 s after i.v. administration of 120–130 ml high-concentration non-ionic iodinated contrast medium (350–370 mgI ml−1). The usual flow rate was 3 ml s−1.

Images were obtained in a craniocaudal direction, from diaphragm to the ischial tuberosities, with 5 mm thickness and 15 mm s−1 table speed. The hard copy images were reviewed by two radiologists (GS, EC) with a special interest in gynaecologic oncology imaging, who were unaware of the clinical characteristics of the patients. In case of disagreement a re-joint evaluation of the scans was performed until a consensus was reached.

Surgical procedures

All patients underwent standard longitudinal laparotomy, and intensive surgical staging was attempted according to the standard guidelines. Maximal surgical effort (achievement of <1 cm residual disease) has been attempted in all patients, and when possible, included surgical removal of tumour masses, along with total abdominal hysterectomy, bilateral salpingo-oophorectomy, radical omentectomy, appendectomy, multiple biopsies, and additional surgery (intestinal resections (20%), diaphragm stripping (20%), abdomino-pelvic peritoneal stripping (35%), liver and pancreatic resection (9%), splenectomy (15%), if required. Radical pelvic and para-aortic lymphadenectomy was performed in all patients undergoing primary cytoreduction leaving a residual tumour 1 cm. In case of impossibility to proceed to primary optimal cytoreduction, as assessed at primary laparotomic effort, patients were triaged to neoadjuvant chemotherapy (Fanfani et al, 2003; Vergote et al, 2005).

Data analysis

The computed tomography parameters used in the data analysis were as follows: peritoneal thickening (diffuse, linear, >1 cm thickening) or peritoneal implants >2 cm, bowel mesentery involvement, omental extension (spleen, stomach, lesser sac), pelvic sidewall involvement and/or hydroureter, suprarenal aortic lymph nodes >1 cm, infrarenal-aortic lymph nodes >2 cm, superficial liver metastases >2 cm and/or intraparenchimal liver metastases any size, large volume ascites (>500 ml).

Clinical data used in the analysis were age, Ca125 serum levels, and ECOG-PS. For the purpose of the study, analysis of the data was performed by two different approaches: in Approach A, the sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and accuracy of each radiographic parameters, as well as clinical features in predicting surgical outcome were calculated. Sensitivity was defined as the number of correctly defined suboptimally debulked cases (true positives) divided by the total number of suboptimally cytoreduced patients. Specificity was defined as the number of correctly defined optimally debulked cases (true negatives) divided by the total number of optimally cytoreduced patients.

NPV corresponded to the number of true negatives divided by the total number of negative results for each parameter, and PPV corresponded to the number of true positives divided by the total number of positive results for each parameter. Accuracy was calculated as the sum of true positives and true negatives divided by the total number of patients in the study.

Inclusion of a specific radiographic parameter in the final model required a specificity 75%, a PPV 50%, and a NPV50%: the radiographic feature satisfying these three criteria was assigned a point value of 1 (Bristow et al, 2000). An additional point was assigned to those parameters that, besides the above criteria, also showed an overall accuracy 60% in predicting surgical outcome. With this scoring system, a predictive index (PI) was calculated for each patient.

In Approach B, the χ2-test or Fisher's exact test for proportion were performed for each radiographic and clinical parameter, and all features showing a statistically significant association (P-value <0.05) with surgical outcome were analysed by means of logistic regression (Cox, 1970) using a stepwise routine. Features shown to maintain the association with surgical outcome in multivariate analysis were used to generate a predictive model. In particular, a PI score was assigned to each patient according to the absence or presence of any of the variables identified. The predictive performances of the PI scores in approaches A and B were tabulated in different categories, and receiver operating characteristic (ROC) curves were obtained to analyse the ability of different PI models in predicting surgical outcome. The statistical significance of differences between ROC estimates was performed applying the method by Hanley and McNeil (1982).

Finally, the pre-test probability, likelihood ratios, and post-test probability of models deriving from Approaches A and B were calculated to assess their efficacy in predicting surgical outcome (Deeks and Altman, 2004).

Results

At the end of enrolment, the final study included 195 patients consecutively seen at the Division of Gynecologic Oncology of the Catholic University of Rome and Campobasso; patients’ characteristics at initial diagnosis as well as their surgical outcome, and final pathology are summarised in Table 1. Median age was 59 years (range: 31–85), with 69 (35.4%) of cases aged 65 years; approximately 25% of patients had ECOG-PS=2. Overall, the rate of cytoreduction to absent or 1 cm residual disease were 27.2 and 16.9%, respectively, for an overall proportion of optimal cytoreduction of 44.1%. One hundred seventy-four (89.2%) were diagnosed as having primary ovarian carcinoma, whereas 19 cases (9.7%) were metastatic tumours from other primary tumours, thus emphasising the need to proceed always to the histological assessment of pathology. Because of the prospective nature of the study, results refer to the whole study population. One hundred forty-five patients (74.4%) had CA125 serum levels >500 IU ml−1.

Table 1 Characteristics of the patients enrolled

Neither was there any difference in the percentage of optimal cytoreduction over time of enrolment, nor across the six operating teams (data not shown).

Imaging findings

The features assessed on CT scan were prospectively recorded in the data form presented in Table 2. In the same Table the diagnostic performances of each radiographic parameter compared with laparotomic findings are summarised: in terms of specificity, the best performance was documented for infrarenal aortic lymph nodes and omental extension, whereas very low specificity rates were documented for peritoneal thickening and ascites. Overall, the accuracy rate ranges between 40.2 (pelvic involvement) and 81.9% (involvement of infrarenal aortic lymph nodes).

Table 2 Accuracy, negative, and positive predictive value of computed tomography scan assessed parameters vs laparotomic findings

Approach A

According to the Bristow criteria, two radiographic parameters (omental extension, liver involvement) fulfilled the criteria required for being assigned a point value=1, while the involvement of diaphragm or bowel mesentery, and, among clinical parameters, ECOG-PS obtained a point value of 2 (Table 3). On the basis of the absence or presence of the above cited parameters a PI score was calculated for each patient; the frequency distribution of the predictive score in the overall series are presented in Figure 1. In particular, the PI scores range from 0 to 6 (median=2) and from 0 to 8 (median=2) in Model 1 (not including ECOG-PS data) and in Model 2 (including ECOG-PS data), respectively. The calculation of sensitivity, and specificity was carried out for each PI score 1 to the upper limit in each model, and the ROC curve analysis was performed. The AUC was 0.78±0.035 in Model 1, and 0.81±0.031 in Model 2.

Table 3 Prediction of optimal cytoreduction: computed tomography-based and clinically assessed parameters assigned a point value according to Bristow criteria (Approach A)
Figure 1
figure 1

Distribution of predictive index values (A) and ROC curves (B) in Model 1 and Model 2.

The addition of ECOG-PS data led to the improvement of the diagnostic performances, as the difference between the AUCs of Model 1 and Model 2 was statistically significant (z=2.41, P-value <0.05).

Approach B

Univariate and multivariate analysis were carried out to analyse radiographic and clinical features for their association with surgical outcome (Table 4). All radiographic parameters but one (pelvic involvement) were shown to be predictive of residual disease in univariate analysis, and were therefore included in the multivariate analysis.

Table 4 Prediction of optimal cytoreduction: univariate and multivariate analysis by logistic regression of CT-based and clinically assessed parameters to use for modeling (Approach B)

Among clinical parameters, only ECOG-PS was associated with the extent of primary cytoreduction. In multivariate analysis, only involvement of peritoneum, bowel mesentery, suprarenal aortic lymph nodes, and diaphragm, as well as ECOG-PS maintained their independent association with surgical outcome, and were assigned 1 point value. The PI score therefore ranged from 0 (absence of all selected radiographic features) to 4 (presence of all selected radiographic features) in Model 3, and from 0 to 5 in Model 4, which included selected radiographic variables plus ECOG-PS. Figure 2 shows the distribution of PI scores in Models 3 and 4. The calculation of sensitivity and specificity was carried out for each PI score 1 to the upper limit in each model, and the ROC curve analysis was performed. The AUC was 0.78±0.034 in Model 3, and 0.82±0.031 in Model 4. In this case, also the addition of ECOG-PS data produced a more favourable AUC for Model 4 (z=3.41, P-value <0.05).

Figure 2
figure 2

Distribution of predictive index values (A) and ROC curves (B) in Model 3 and Model 4.

The pre-test probability, likelihood ratio, and the post-test probability were calculated for Models 2 and 4 at different cutoff values (Table 5). The pre-test probability was 55.8% (109 patients with residual tumour >1 cm at primary cytoreductive effort).

Table 5 Pre-test probability, likelihood ratio, and post-test probability for different predictive models of primary optimal cytoreduction in ovarian cancer

There was an increasing improvement of the post-test probability paralleling the increase in the cutoff values for both Models 2 and 4: indeed in Model 2, 27 cases had a PI score >5, and 25 of them had suboptimal cytoreduction; the positive likelihood ratio was 9.86, and the post-test probability was 92.6% with an improvement of 36.8% compared with the pre-test probability.

Similarly, in Model 4, 28 cases had a PI score >3, and 26 of them had suboptimal cytoreduction; the positive likelihood ratio was 10.25, and the post-test probability was 92.8% with an improvement of 37.0% compared with the pre-test probability.

To take advantage of a ‘easy handling’ PI score, practically, the rate of inappropriate unexploration, which therefore should tend to 0, can be easily calculated as the inverse of PPV, whereas the proportion of unnecessary exploration corresponds to the inverse of NPV. In our series, only Models 2 and 4 (both including radiographic features and ECOG-PS data) provided PPVs=100% at PI score cutoff values of 7 and 4, respectively (Tables 6 and 7).

Table 6 Performance of Approach A (Model 2) in defining the rate of patients unnecessarily explored or inappropriately unexplored
Table 7 Performances of Approach B (Model 4) in defining the rate of patients unnecessarily explored or inappropriately unexplored

Discussion

This is the first study reporting the results from a very large, prospective trial investigating the role of CT scan-based evaluation in the preoperative prediction of optimal cytoreduction in advanced ovarian cancer. We developed predictive models from two different approaches (on the basis of either on diagnostic performance of each CT-derived parameter, or on results from multivariate analysis), which also took significant clinical variables into account: in both approaches ECOG-PS resulted the only clinical variable fulfilling the criteria required for inclusion in the predictive models, whereas preoperative Ca125 serum levels and age showed a low degree of accuracy in predicting surgical outcome. Indeed, the predictive models, including ECOG-PS data, were more accurate than those derived from CT alone; these findings confirm previously reported results (Bristow et al, 2000; Chi et al, 2000; Cooper et al, 2002; Saygili et al, 2002; Memarzadeh et al, 2003), and definitively recognise the extent of the impact played by ECOG-PS in the preoperative prediction of ovarian cancer primary resectability (Aletti et al, 2007). The models based on diagnostic performance and on results from multivariate analysis showed the same accuracy in predicting the chance of optimal cytoreduction, although they included slightly different CT-based features; in particular, involvement of bowel mesentery, omentum, liver, and diaphragm were shown to fulfil all the required criteria (Bristow et al, 2000) in Approach A, whereas in multivariate analysis the involvement of peritoneum and suprarenal aortic lymph nodes, besides bowel/mesentery and diaphragm disease, were independently associated with suboptimal cytoreduction. The divergence between the two approaches remains difficult to explain, although the strict and, to a certain extent, unpredictable associations among the variables might more likely have an impact on multivariate analysis. In any case, our findings support the relevance of the assessment of the status of bowel mesentery and diaphragm involvement, recognised among the most important features determining the feasibility of ovarian cancer cytoreduction (Bristow et al, 2000; Axtell et al, 2007). Although the results obtained using the two approaches were similar, we think that the approach based on diagnostic performances of CT parameters plus ECOG-PS is more easily understandable and manageable, and is therefore recommended for future studies and/or use.

A direct comparison of our results with those reported in the literature is rather difficult; indeed, the vast majority of previously published studies investigated relatively small sample series and, given their retrospective design, were likely characterized by a selection bias (Table 8). Moreover, the number of CT-assessed parameters used in the prediction of surgical outcome showed a wide range of variability across different studies (Nelson et al, 1993; Forstner et al, 1995; Meyer et al, 1995; Bristow et al, 2000; Byrom et al, 2002; Dowdy et al, 2004; Qayyum et al, 2005; Axtell et al, 2007) and, more important, the use of a CT-based cutoff score was investigated only by three Institutions (Nelson et al, 1993; Forstner et al, 1995; Eisenkop and Spirtos, 2001). Nonetheless, there is a general consensus that CT scan represents a valid tool to address the issue of preoperative prediction of ovarian cancer resectability at primary surgery.

Table 8 Summary of the studies analysing the performance of computed tomography (CT) scan in the prediction of optimal cytoreduction in ovarian cancer

Some issues, however, have to be discussed: first, although it can be argued that the percentage of optimal cytoreduction in our series was not so close to the upper limit of the range reported in the literature (Nickles Fader and Rose, 2007), it has to be taken into account that our patients were selected on the basis of clinical and radiographic features of very advanced disease, thus resulting in 91.3% FIGO stage IIIC/IV cases, whereas other series also included FIGO stage I/II patients in a range between 19 and 45% (Nelson et al, 1993; Forstner et al, 1995; Meyer et al, 1995; Byrom et al, 2002; Qayyum et al, 2005). We recognise that the predictive performance of any test is expected to loose a part of its potential advantages with an increasing rate of optimal cytoreduction. Although this remains to be experimentally tested, on the other hand it cannot be excluded that changes in the percentage of optimal cytoreduction across different centres could more likely result in the modification of the threshold level of the PI cutoff value, rather than in questioning the overall PIV-based approach. Moreover, it has to be acknowledged that very high percentages of optimal cytoreduction are hardly achievable outside a few, very committed Institutions; in this context, the availability of a tool suitable to adapt to the range of the most commonly achievable rates of optimal cytoreduction is clinically relevant (Wakabayashi et al, 2008).

We developed predictive models able to produce different PI values, thus providing the chance to choose the most adequate cutoff on the basis of patients’ and disease characteristics (performance status, need to perform very extensive surgery), and surgeon's commitment: obviously, the predictive performance of any model varies with the chosen cutoff value of PI; for instance, if we had used a PI value of 2 (see Table 6, Model 2), we would have obtained a rate of unnecessary exploration of 33.3%, in face of a rate of inappropriate unexploration of 19.4%, which means that almost one-fifth of our patients would have been deprived of the potential survival benefits achievable with optimal cytoreductive surgery.In this context, the need to use a PI with the highest degree of accuracy in minimising the rate of cases erroneously judged to have an unresectable disease is of utmost important, and is even more relevant than running the risk of unnecessarily explore patients who rather present unresectable disease at laparotomy. Therefore, as a practical rule, the calculation of the rate of inappropriate unexploration can be carried out from our models, as the inverse of PPV and therefore will be 0% at the cutoff values of 7 and 4, in Models 2 and 4, respectively. Finally, more sophisticated imaging approaches such as Positron emission tomography/computed tomography (PET/CT) (Risum et al, 2008), as well as laparoscopic approaches (LPS) scores (Fagotti et al, 2006, 2008; Brun et al, 2008) have been recently investigated in terms of prediction of surgical outcome in advanced ovarian cancer, whereas the results of PET/CT seem currently too preliminary to draw any definitive conclusion, data from pilot and prospective studies proposed open LPS as a reliable and flexible predictive tool scores (Fagotti et al, 2006, 2008; Brun et al, 2008). Although the accuracy of LPS in the assessment of specific sites of disease involvement is expectedly higher compared with CT scan (Fagotti et al, 2006, 2008; Brun et al, 2008), the clinical impact of whether triaging or not advanced ovarian cancer patients to laparotomy on the basis of LPS findings urgently requires to be investigated in controlled clinical trials.

In conclusion, we showed that CT scan still represents a valid tool into address the issue of preoperative prediction of ovarian cancer resectability at primary surgery, and that its predictive performances might be improved by the inclusion of ECOG-PS data. As already acknowledged (Bristow et al, 2000; Cooper et al, 2002), a multi-institutional prospective trial hopefully integrating preoperative clinical and radiographic variables is required to test whether the predictive models maintain their accuracy when applied to different patient cohorts. Indeed, a very recently published study by Gemer et al (2009) has underscored the difficulty to devise generally applicable models able to reliably predict surgical outcome in advanced ovarian cancer patients, across different Institutions. This issue can become clinically more relevant in the light of the upcoming mature results from EORTC 55971 trial.