Review

A review of a journal article created by a Journal Watch contributor

International multiinstitutional external validation of preoperative risk scores for 30 day in hospital mortality in paediatric patients

British Journal of Anaesthesia

Submitted March 2025 by Dr Andrew Hughes

Read by 91 Journal Watch subscribers

Overview
This is an external validation study of two patient-specific paediatric preoperative risk scores for 30-day in-hospital mortality

---

Methods
The Pediatric Risk Assessment (PRAm) score and the intrinsic surgical risk (ISR) score were selected by the investigators as the two patient-specific risk scores to be externally validated, as these scores had previously been identified as having a low risk of bias.

Both the PRAm score and the ISR score were developed using data from the American College of Surgeons (ACS) National Surgical Quality Improvement Program-Pediatric (NSQIP-P) database (from 2012-2013 and 2012-2016 respectively).

This study aimed to externally validate both scores using retrospective observational data from the Multicentre Perioperative Outcomes Group (MPOG) registry, collected from 56 hospitals in the USA and the Netherlands between 2015 and 2020. Cardiac and diagnostic imaging procedures were excluded. The primary outcome was 30-day in-hospital mortality. The 30-day in-hospital mortality for the MPOG dataset was 0.14%, significantly lower than in the derivation NSQIP-P datasets (0.4% and 0.34%).

The two risk scores were assessed for:
- Discrimination (ability to separate survivors from non-survivors)
- Calibration (how accurately predictions matched the observed 30-day in-hospital mortality rates)
- Clinical utility/decision analysis (the extent to which the risk scores actually affect clinical decisions)

---

Results
• The ISR score exhibited better discrimination and specificity than the PRAm score, but both scores resulted in large numbers of false-positive cases (i.e. predicted death but patient survived). The superior performance of the ISR score was largely due to its inclusion of the ASA Physical Status score. Both scores exhibited substandard discrimination compared with the original studies.

• Calibration metrics were deceptively favourable for both scores because the vast majority of cases had low probabilities of mortality (81.4% of cases were ASA PS 1 or 2; 30 day mortality in the MPOG dataset was 0.14% compared with 0.4% and 0.34% in the NSQIP-P datasets of the original studies). This meant that predicting that a patient would not die would be correct in the vast majority of cases. Both scores were poorly calibrated at higher probabilities of mortality (mortality was overestimated for these cases)

• Decision curve analysis showed that neither score was useful in supporting clinical decision-making, except at very low probabilities of mortality

---

Discussion
Risk scores can be clinically useful if they quantify risk better than clinician judgement alone. For example, they can help to guide informed perioperative decision-making and communication with patients and families, or identify high-risk patients who may benefit from the allocation of additional resources.

Risk scores are well established in the adult preoperative setting. Several preoperative risk scores are available to predict perioperative mortality in children undergoing noncardiac surgery, but none have undergone multicentre external validation.

External validation is the testing of a risk score on a different set of patients to determine whether the score performs to a satisfactory degree. Risk scores should not be recommended for general clinical use without satisfactory external validation.

This external validation study found that the PRAm and ISR scores performed poorly when compared with their performance in the original internal validation studies. This may be at least partially explained by:

• important case mix differences between the NSQIP-P and MPOG datasets
• difficulties in reproducing all of the data elements used in the original studies: importantly, the investigators could not recreate the measure of intrinsic surgical risk used in the ISR score due to differences in data nomenclature. As a result, only the patient-specific variables of the ISR score were used in this analysis

Interestingly, the superior performance of the ISR score in sicker patients was largely driven by a subjective clinical assessment (ASA Physical Status score), suggesting it adds little to clinical judgement in higher-risk patients.

---

Take home points
The overall performance in this study of the PRAm score and the ISR score, which were assessed as the best available risk scores, suggest they are unlikely to be clinically useful in predicting the rare outcome of 30-day mortality in a diverse paediatric noncardiac surgery population.

This study also calls into question the role of preoperative risk prediction scores in modelling the rare outcome of 30-day mortality in paediatric patients.

Tags for this article:

About

Journal Watch is a community of SPANZA members who work to identify and review articles of interest in the paediatric anaesthesia literature.


Subscribe for email updates, or create an account to get involved.

Issues