ARTICULO DE REVISIÓNREVISTA DE LA FACULTAD DE MEDICINA HUMANA 2022 - Universidad Ricardo Palma
1Medical-Surgical Research Center, Future Surgeons Chapter, Colombian Association of Surgery, Bogotá, Colombia
2Prometheus Group and Biomedicine Applied to Clinical Sciences, Faculty of Medicine, University of Cartagena, Cartagena, Colombia
3Faculty of Medicine, Universidad Santiago de Cali, Cali, Colombia
4Faculty of Medicine, Universidad Libre, Barranquilla, Colombia
5Faculty of Medicine, Universidad del Rosario, Bogotá, Colombia
6Faculty of Medicine, National University, Colombia, Colombia
7Faculty of Medicine, Cooperative University of Colombia, Medellín, Colombia
8Faculty of Medicine, Autonomous University of Bucaramanga, Bucaramanga, Colombia
9Faculty of Medicine, University of Manizales, Manizales, Colombia
10Faculty of Medicine, Juan N. Corpas University Foundation, Bogotá, Colombia
Introduction: Critical syndromes are conditions that carry a high global burden of disease. Scoring systems are practical and reproducible aids that allow patients with more severe diseases to be quickly identified and admitted to intensive care and structured and aggressive therapy initiated. The Sequential Organ Failure Assessment (SOFA) score is one of the most widely used in the world, as there are several versions, and it is simple. However, with the appearance of COVID-19, several studies showed a disparity in the estimation of mortality and associated outcomes concerning race, culminating in excess of preventable mortality in certain racial groups. The constant evaluation of the performance of these scoring systems must be carried out due to definition updates, which can vary the accuracy of the predictive value. There is a very large gap in the evidence since the existing studies come from high-income countries, where the predominant racial group is Caucasians, which should draw attention to the magnitude of the problem. Based on the above, the objective of this review is to discuss evidence on the performance of scoring systems in critical care, particularly SOFA, and the impact that race has had on its predictive value.
Keywords: Organ Dysfunction Scores; Continental Population Groups; Predictive Value of Tests; Critical Care Outcomes. (Source: MeSH NLM).
Introducción: Los síndromes críticos son condiciones que acarrean una elevada carga de enfermedad a nivel global. Los sistemas de puntaje, son ayudas prácticas y reproducibles que permiten identificar de manera rápida pacientes con enfermedad más grave e ingresarlos a cuidado intensivo e iniciar terapia estructurada y agresiva. El score Sequential Organ Failure Assessment (SOFA), es uno de los más utilizados en el mundo, al existir varias versiones y ser sencillo. No obstante, con la aparición de la COVID-19, diversos estudios demostraron que existía una disparidad en cuanto a la estimación de mortalidad y desenlaces asociados, respecto a la raza, lo que culminó en un exceso de mortalidad prevenible en ciertos grupos raciales. La evaluación constante del rendimiento de estos sistemas de puntaje, debe realizarse debido a actualizaciones de definiciones, las cuales pueden variar la precisión del valor predictivo. Existe una brecha muy grande en cuanto la evidencia al respecto, puesto que los estudios existentes provienen de países de altos ingresos, donde el grupo racial predominante son los caucásicos, lo que debe llamar la atención de la magnitud del problema. Sobre la base de lo anterior, el objetivo de esta revisión consiste en discutir evidencia al respecto sobre el rendimiento de sistemas de puntuación en cuidado crítico, particularmente del SOFA y el impacto que ha tenido la raza sobre su valor predictivo.
Palabras Clave: Puntuaciones en la Disfunción de Órganos; Grupos de Población Continentales; Valor Predictivo de las Pruebas; Resultados de Cuidados Críticos. (fuente: DeCS BIREME).
With the appearance of the COVID-19 pandemic, the need to validate and carry out massive research in critical care in regions that lack quality primary data became evident(1). Disease burden studies show that for the year 2004, there were 58,772 deaths globally, with critical syndromes such as sepsis, acute lung injury, and invasive mechanical ventilation being the most frequent and with the greatest burden of disease worldwide. (approximately 45,000 / 58,772 reported cases)(2).
The availability of an intensive care unit (ICU) bed per 100,000 inhabitants continues to be a global health challenge. While high-income countries such as the United States, Germany, Canada, among others; have an average of between 15 and 20 beds per 100,000 inhabitants; low- and middle-income countries have on average ≤ 5 beds for the same proportion of the population(2). At that time almost 20 years ago, promoting strategies that would improve disease burden indicators were mentioned as a global health objective, especially considering the political, economic, and health limitations of a large part of the world's population.(3-6).
Recently, a meta-analysis carried out by the Department of Integrated Health Services of the World Health Organization showed that the global incidence of sepsis with multi-organ dysfunction continues to be in ranges similar to those reported many years ago (9,300 cases per 100,000 inhabitants), and maintains a worrying incidence of the hospital- and ICU-acquired sepsis, which ranges between 25% - 50%(7). Reported mortality currently exceeds 50%; it is necessary to highlight that this meta-analysis included 51 studies, and approximately half were from low- and middle-income countries, which is of great concern due to the previously described barriers (7). Preventable deaths in critical care is a hot topic of discussion since different tools have been proposed to help prioritize or re-stratify in timely manner patients with higher support requirements for their survival(6,8,9)..
The scales consisting of scoring systems are practical and reproducible aids that allow the rapid identification of patients with more severe disease, admission to the ICU, and initiation of structured and aggressive therapy(10). Many of these scales have been proposed and validated over the years, such as SOFA (Sequential Organ Failure Assessment)(11), APACHE (Acute Physiology and Chronic Health Evaluation)(12), SAPS (Simplified Acute Physiologic Score)(12), MPM0 (Admission Mortality Probability Model)(13), among others(8,9).
Various studies have evaluated the performance of these scales in different contexts, adapting them to new variables according to the population's behavior where they are used, with new versions emerging since their creation(10). This is fundamental since, being a scale that defines the critically ill patient quickly, the prognostic value must be precise in those who apply it, regardless of the population group evaluated. Meta-investigations strongly discuss the heterogeneity of some studies and the implications that this would have in real practice (14).
During the course of the COVID-19 pandemic, there were many differences in the performance of these scales, which unbalanced the validity of the evidence(15-17). Particularly, during the year 2021, many studies suggested that there were disparities in the accuracy of the SOFA (one of the most used scales and with the best predictive value) in terms of the race of the population where it was used, having a substantial impact on the procedure doctor in the ICU and hospitalization, in patients with severe COVID-19 phenotype, who required invasive mechanical ventilation, scarce drugs, strict surveillance, among other interventions(15-17); which were very limited back then.
Considering the current relevance of the global burden of disease caused by critical clinical syndromes, the need to apply meta-investigations that discuss the strengths and weaknesses of the evidence and to know possible biases that exist in prognostic tools in the management of critically ill patients, the objective of this review is to summarize evidence on the performance of scales used in critical care, and particularly the SOFA, which allows knowing the outcomes obtained over time, and to discuss the dilemma of the race when using this instrument.
A bibliographic search was carried out using search terms such as "SOFA", "Score", "Critical Care" and "Race", as well as synonyms, which were combined with the Boolean operators "AND" and "OR", in the search engines. search and databases PubMed, ScienceDirect, Embase, EBSCO, and MEDLINE. As an inclusion criterion, it was defined that any article related to the evaluation of the SOFA scale in patients where performance was discriminated according to race or other relevant subgroups would be included, giving priority to original studies and systematic reviews, and meta-analyses. Also, articles related to other critical care scores with predictive values for mortality and associated outcomes were included.
In addition, they should be available in full text. As non-inclusion criteria, it was established that articles published in a language other than Spanish and English would not be included. Considering the breadth of the topic and the great variety of publications, articles published between 2000 and 2022 were included. A total of 242 potentially relevant articles were identified, with a review of the title and abstract of all of them, of which finally 57 articles were included after their discrimination according to the inclusion and non-inclusion criteria. Other useful references were included for the discussion of general concepts. The estimates and calculations found were expressed in their original measurements, whether frequencies, percentages, confidence intervals (CI), mean difference (MD), relative risk (RR), odds ratio (OR), and incidence rate (IRR). or hazard ratio (HR).
Predictive scoring systems for outcomes in severe disease
Starting in the 1980s, there was already talk about the APACHE score and its first versions(18). This system is made up of variables such as age, temperature, mean arterial pressure, heart and respiratory rate, creatinine, and Glasgow coma scale, among others; which can range between 0 and 59, establishing a minimum cut-off score of 0-4 and a maximum of >34. The mortality of those in the minimum range fluctuates between 1% - 4%, for both postoperative and non-operative patients(19). Those that reach the maximum cut, have mortality above 85%, an important value in decision-making. These data take into account the prognostic value of version number two of the score (APACHE II)(19).
In 1991, Knaus et al. (18) conducted a multicenter prospective study with more than 17,000 patients in the United States to increase the prognostic accuracy of the APACHE (resulting in a new version). The authors showed that a 5-point increase in the new score (APACHE III) was independently associated with an increased mortality risk (OR 1.10) in each of the 78 categories. Similarly, it was observed that the equation monopolized the variability in the presented mortality rate (r2 = 0.90; p <0.0001)(18). In the 2000s, numerous studies were carried out that tried to reproduce and validate the performance of these criteria, but including more and more variables, according to the subgroups that were presented, taking into account the cause of the critical clinical syndrome(19- 21).
Ho et al. (19) carried out a retrospective cohort study on that date, where they evaluated more than 11,000 admissions to the ICU after non-cardiovascular surgery, performing a reassessment of the patient's status at 24 hours, showing that the average score obtained was 12 and 15, respectively; and that the predicted mortality was 15% and 19%, correlating with the real mortality, which was 16%. The area under the curve (AUC) calculated was 83.8% and 84.6% for both cuts, with no significant difference (p=1.0)(19). This allowed us to conclude that the score maintained adequate performance from its first calculation, regardless of the worsening of the patient 24 hours after admission.
In 2006, two very large cohort studies were published, both by Zimmerman et al. (20,21), who included more than 130,000 ICU admissions from 45 hospital centers in the United States, giving rise to the fourth version of the score (APACHE IV). This included new variables, mainly regarding chronic health and ICU admission diagnosis. The authors evidenced adequate validation (AUC: 0.88), without finding significant differences in terms of mortality variation in 90% of cases(20). Regarding ICU stay, the same authors in the second study found that the predicted value correlated adequately with the real value (in this case, 3.78 vs. 3.86; p <0.001), with no significant differences among 93% of the diagnoses made(21).
It was concluded that the APACHE IV was a tool that provided clinical utility to critically ill patients regarding their ICU stay and mortality risk(20,21). However, it was highlighted that due to a large number of categories, the score was dynamic and should be interpreted rationally, since it was individualized. Recent studies continue to evaluate the performance of this last score, emphasizing some factors that can significantly influence the performance of the predictive value, according to the subgroup evaluated.
For example, Xu et al. (22) evaluated the score in post-transplant patients, finding that age, the use of hormones, and the presence of respiratory failure must be considered since they modify the association between the score and the actual mortality(22). Xiao et al. (23), performed a secondary analysis of the eICU collaborative database, showing that baseline platelet count is negatively associated with all causes of mortality, both in the hospital (RR: 0.87; 95% CI, 0.84-0.91) and in the ICU (RR: 0.87; 95% CI, 0.83-0.92)(23). Other authors have shown that the clinical course and establishment of treatment influence the predictive value of scores in critical care (5% in predicting mortality and up to 4 hours of hospital stay)(24). Procalcitonin has been described as a biomarker that also influences, although not significantly, the prediction of survival at different times, and it is necessary to consider it in depth in those with organ involvement where this peptide is produced and released(25). Today, the APACHE is still used in critical care and has shown to have adequate performance in general, although it has the limitation that it is individualized and that variables continue to arise that can affect the predictive value since it has many categories to evaluate.
In 1993(26), the study that developed and validated the SAPS II score was published, consisting of 17 variables (12 physiological, age, type of admission, and presence of 3 underlying diseases [cancer and immunodeficiency syndrome acquired]). This study included more than 13,000 patients and obtained an AUC of 0.88 for its development and 0.86 for its validation(26). Like the APACHE, it has been replicated throughout the world, and to date, studies continue to appear that report factors that modify the performance of this score.
Recently, it was shown that colonization by Clostridioides difficile increases the SAPS II score, although it does not modify mortality or the frequency and/or severity of diarrhea(27). Another study found that in patients with trauma, homocysteine, D-dimer, and procalcitonin are parameters that are associated with greater disease severity and, therefore, mortality risk in those patients stratified by SAPS II (p <0. 05), considering independent factors of unfavorable prognosis(28). Likewise, it has been observed that extracorporeal membrane oxygenation (ECMO) is a complementary risk factor that increases the risk of death in those patients with a moderate or high SAPS II score(29). Also, SAPS II has been found to be a good predictor of outcomes in difficult-to-access tracheostomized patients admitted to specialized rehabilitation units(30).
In the third version of this score (SAPS III), which was used in a timely manner in the management of critically ill patients with COVID-19, it showed that in diabetic and non-diabetic Austrian patients, the performance was inadequate due to inaccuracy in predicting mortality(31). Although some authors tried to calibrate it, it was not possible to achieve it. Therefore, it is suggested to use it with caution or not in this population(31). This could be the least used score and the one most influenced by different factors, despite being made up of a few categories.
This score is made up of 15 variables, which evaluate the patient's evaluation time according to admission, morbid and paraclinical history, clinical status, and treatment performed. Like the other previously described scores, this one was created in the 1980s and initially validated in a multicenter manner in the 1990s(32). To date, the third version of this score (MPM0-III) is used. Studies such as the one by Higgins et al. (32) have prospectively evaluated the performance of this version in more than 55,000 patients, finding that the predicted mortality was 7,456, which correlated satisfactorily with the actual mortality (7,331; 13.2). %), obtaining a mortality ratio of 0.983 (95% CI, 0.963 – 1.001)(32). Until that moment, it was concluded that the model was robust and had optimal external validity in the American population, which was where the primary data were obtained.
Less than 10 years ago, new parameters appeared that influenced the performance of this score, as well as others, which must be taken into account when establishing the prognosis in critical care. Ho et al. (33) conducted a cohort study in Australia, where they evaluated the impact of the anion gap and other similar paraclinical on the mortality of 6,878 individuals(33). The authors showed that 13.4% died (n=924), highlighting the differences between the acid-base markers between the groups of survivors vs. not survivors. The anion gap added to the lactate presented an AUC of 0.631; while in isolation, it presented an AUC of 0.521. When adjusting the mortality predicted by MPM0-III according to the markers, it was shown that arterial lactate correlated with the mortality variability, compared to the anion gap, which lost its predictive value(33). More precisely, in low- and middle-income countries, there have been reports criticizing the accuracy of this score, making it necessary to adapt it under certain limitations, as is the case of the R-MPM score in Rwanda(34).
However, studies that have compared the APACHE, SAPS, and MPM0 scores have found that although all three have good calibration, the APACHE IV is superior in terms of its ability to predict mortality (AUC 0.745); while SAPS and MPM0 present an AUC of 0.700 and 6.70, respectively(35). Chen et al(36) compared eight different scores to evaluate the prediction of general mortality at 28 days, finding that APACHE III was superior to the rest in both outcomes, with an AUC of 0.817(36). Another study compared APACHE II, SAPS III and MPM0 in 9549 patients, of whom 1276 died(37). It was found that although APACHE II had better AUC (0.845) than SAPS III (0.836) and MPM0-III (0.807), SAPS III had better calibration and overall performance (slope of curve 1.03, R 0.297) compared to the other two(37). Up to this point, it is understood that of the three scores studied, APACHE has better performance, regardless of subgroups, having some variations in the calibration according to certain parameters but maintaining an adequate AUC. It is observed that MPM0 is not so reliable and is dependent on many variable adjustments.
This score was designed after an event in 1994, made up of clinical and paraclinical criteria, which allow evaluation of the presence of multi-organ damage. It has been closely associated with infections and sepsis, the diagnoses from which the most robust studies have been made to-date(38-43). Retrospective analyzes of more than 180,000 patients have found discrimination of hospital mortality given by an AUC 0.753 (99% CI, 0.750 - 0.757); being above summary versions and other criteria of the systemic inflammatory response(38). Favorably, when performing multiple sensitivity analyses for the outcome of in-hospital mortality, the estimated prognosis was sustained, which may reflect adequate external validity.
Currently, the breadth of applicability of this score has been discussed, supported by internationally recognized critical care scientific societies. It has been postulated as a reference marker of efficacy in clinical trials since the variations between the different studies and compared to other parameters are minimal(39). Liu et al. (40) evaluated 1865 patients in China who had sepsis and were admitted to the ICU, in whom serum lactate, SOFA and quick SOFA (qSOFA) were measured to compare the predictive value regarding mortality. It was found that SOFA presented higher AUC (0.686; 95% CI, 0.661-0.710), compared to the other two parameters (serum lactate, AUC 0.664; 95% CI, 0.639-0.689 vs. qSOFA, AUC 0.547; 95% CI). %, 0.521 – 0.574)(40). In acute decompensated heart failure, SOFA is significantly associated with overall and 30-day mortality, with an AUC of 0.765 (95% CI, 0.733-0.798) and 0.706 (95% CI, 0.676-0.736), respectively(41). In patients with suspected infection who are evaluated in the emergency department, a SOFA score ≥2 is independently associated with mortality up to 2 years later (HR 1.90; 1.83 - 1.98)(42). This precision has been obtained in other studies that have also studied representative samples in various parts of the world(43).
These results show that SOFA is probably the most accurate and widely used score globally today. Although it requires paraclinical for its calculation, it would be difficult to establish the prediction in a timely manner in low- and middle-income countries, where there are not many specialized centers and the availability of individual teams by departments or treatment units.
SOFA performance over time and the dilemma of race
During the COVID-19 pandemic, due to the variety of unknown information on the management of these patients, various analyzes allowed us to observe discrepancies and correlations not previously studied in depth, which have the potential to modify the performance of scores in critical care, such as the SOFA. Over time, this scoring system has been studied in depth, being sustained since the beginning of the year 2000, as a simple but effective score in predicting mortality(44).
As new definitions appear, it is necessary to revalidate the performance of the scores. In 2017, with the appearance of the third definition of sepsis, Matics & Sanchez-Pinto(45) published the results of the adaptation and validation of this system in the pediatric population through the evaluation of 8,711 events, obtaining a mortality of 2.6% and an AUC of 0.94 (95% CI, 0.92-0.95)(45). Other studies, such as those by Pawar et al. (46) and Arakawa et al. (47) discussed the variability of SOFA against different infectious states and disseminated intravascular coagulation (DIC), respectively. The former showed that the AUC ranged between 0.59 (95% CI, 0.49 - 0.70) and 0.79 (95% CI, 0.69 - 0.90), for cases with endocarditis and bacteremia. isolated(46). The latter developed the SOFACOMB, finding that in the DIC, the AUC at 2, 4, and 7 days was much higher compared to the original version (p <0.002)(47). In COVID-19 patients, SOFA was used to predict the development of a severe phenotype (AUC 0.908; 95% CI, 0.857-0.960) and overall and 60-day mortality (AUC 0.995; 95% CI, 0.985-1000)( 48). In octogenarians, accuracy varies depending on variables such as the day of score evaluation, neurological failure, and polypharmacy on admission, which can generate an AUC between 0.71 and 0.91(49). In this order of ideas, particularly as new definitions appear and new parameters are incorporated into their use, it is necessary to reassess the variability of the scores according to the stratification of patients by subgroups(50,51).
A very interesting situation that was evidenced during the management of COVID-19 patients was the impact that race had on the general and specific outcomes and the predictive value of SOFA(52-54). In early 2021, Rodriguez et al. (52) analyzed the American Heart Association cardiovascular disease registry of those hospitalized for COVID-19, including 7,868 patients. The study group was heterogeneous, composed of Hispanics (33%), non-Hispanic Afros (25.5%), Asians (6.3%), and non-Hispanic Caucasians (35.2%). It was observed that the general mortality was 18%, and 53% was concentrated in Hispanic and Afro patients. When adjusting for morbidity and mortality, by race, this has the greatest burden, although Asians more frequently presented cardiopulmonary severity due to COVID-19(52).
Gershengorn et al. (53) conducted a bicenter retrospective cohort study with 1127 in the United States for the same date, who also had some ethnic heterogeneity (63.1% Caucasian, 28.7% Afro-descendant; 54.2% of the total were Hispanic patients). The authors found no association between ethnicity and variability in the prediction of mortality when taking the Caucasian group as a reference (Afro-descendants [IRR 1.00; 95% CI, 0.89 - 1.12]; Asians [IRR 0, 95, 95% CI, 0.62 - 1.45], multiracial [IRR 0.93, 95% CI, 0.72 - 1.19](53)Ashana et al. (54) also evaluated this phenomenon through from the analysis of more than 113,000 patients, 24.4% of whom were Afro-descendants.When comparing the SOFA with the Laboratory-based Acute Physiology Score (LAPS2), it was shown that the latter discriminates more accurately the outcomes in Afro-descendant patients (AUC 0 .76; 95% CI, 0.76-0.77 vs. AUC 0.68; 95% CI, 0.68-0.69).The LAPS2 was better calibrated in predicting mortality in both racial groups. the creatinine item of the SOFA, the miscalibration was reduced(54)The authors concluded that it is necessary to develop more equitable scores, and clearly, it can be evidenced that when evaluating more robust samples, it is o more easily observing the existing gap. It can be inferred then that in countries where this race predominates, there should be greater disparities with this score.
Other authors have recently shown that disparities continue to exist regarding the priority of granting ventilators and prioritizing care through standard care in crisis events, which culminated in excess mortality (up to 43.9% in Afro-descendants) preventable(55-57). This, added to the underestimation that can occur in certain clinical contexts, can increase morbidity, mortality or disability among racial groups, mainly Afro-descendants. Therefore, it is necessary to reevaluate or design scores that have greater precision, considering the considerations discussed regarding race and the different clinical-pathological contexts.
Currently, numerous scoring systems have a useful predictive value for mortality and associated outcomes in critical care, which should be used rationally depending on the various clinical contexts. The SOFA score has stood out for having a satisfactory performance over the years, even being adapted and validated in pediatrics and in its fast version. No obstante, parece verse afectado por la raza, subestimando la mortalidad principalmente en pacientes afrodescendientes, lo que puede culminar en un exceso de mortalidad prevenible.
Authorship contributions:All authors participated in the research through the development of the project, collection, and analysis of information, as well as in the preparation of the manuscript of this research.
Funding sources: Self.
Conflicts of interest: None of the authors has a conflict of interest in accordance with their declaration.
Received: June 28, 2022
Approved: August 16, 2022
Correspondence: Ivan David Lozada Martinez.
Address: Prometheus Group and Biomedicine Applied to Clinical Sciences, Faculty of Medicine, University of Cartagena, Cartagena, Colombia.
Article published by the Journal of the faculty of Human Medicine of the Ricardo Palma University. It is an open access article, distributed under the terms of the Creatvie Commons license: Creative Commons Attribution 4.0 International, CC BY 4.0(https://creativecommons.org/licenses/by/1.0/), that allows non-commercial use, distribution and reproduction in any medium, provided that the original work is duly cited. For commercial use, please contact firstname.lastname@example.org.