Patient-reported outcome measures used in patients with primary sclerosing cholangitis: a systematic review

Background Primary Sclerosing Cholangitis (PSC) is a rare chronic, cholestatic liver condition in which patients can experience a range of debilitating symptoms. Patient reported outcome measures (PROMs) could provide a valuable insight into the impact of PSC on patient quality of life and symptoms. A previous review has been conducted on the quality of life instruments used in liver transplant recipients. However, there has been no comprehensive review evaluating PROM use or measurement properties in PSC patients’ to-date. The aim of the systematic review was to: (a) To identify and categorise which PROMs are currently being used in research involving the PSC population (b) To investigate the measurement properties of PROMs used in PSC. Methods A systematic review of Medline, EMBASE and CINAHL, from inception to February 2018, was undertaken. The methodological quality of included studies was assessed using the Consensus-based Standards for selection of health Measurement Instruments (COSMIN) checklist. Results Thirty-seven studies were identified, which included 36 different PROMs. Seven PROMs were generic, 10 disease-specific, 17 symptom-specific measures and 2 measures on dietary intake. The most common PROMs were the Short form-36 (SF-36) (n = 15) and Chronic liver disease questionnaire (CLDQ) (n = 6). Only three studies evaluated measurement properties, two studies evaluated the National Institute of Diabetes Digestive and Kidney Diseases Liver Transplant (NIDDK-QA) and one study evaluated the PSC PRO; however, according to the COSMIN guidelines, methodological quality was poor for the NIDDK-QA studies and fair for the PSC PRO study. Conclusion A wide variety of PROMs have been used to assess health-related quality of life and symptom burden in patients with PSC; however only two measures (NIDDK-QA and PSC PRO) have been formally validated in this population. The newly developed PSC PRO requires further validation in PSC patients with diverse demographics, comorbidities and at different stages of disease; however this is a promising new measure with which to assess the impact of PSC on patient quality of life and symptoms. Electronic supplementary material The online version of this article (10.1186/s12955-018-0951-6) contains supplementary material, which is available to authorized users.


Background
Primary Sclerosing Cholangitis (PSC) is a chronic, cholestatic liver condition that results in inflammation and fibrosis that can involve the entire biliary tree [1]. PSC is a progressive disorder and can lead to cirrhosis, portal hypertension and liver failure [1].
Approximately 1 in 100,000 people in the general population is affected with PSC per year in Europe and the United States [2]. The disease occurs at any age, but is more prevalent in adults between the ages of 30-60 years and is more common in men than in women. Approximately 70-80% of patients with PSC have an associated inflammatory bowel disease (IBD) such as ulcerative colitis or Crohn's disease [3]. Currently, there is no known licensed medication to prevent the progression of PSC, which if left untreated can result in increasing disability and even death [4]. In patients with end-stage PSC liver disease, the only therapeutic option currently available is a liver transplant [4].
Although overall disease progression can be slow, patients with PSC can experience a range of debilitating symptoms. In the early stage of the disease, symptoms include tiredness or fatigue. In more advanced cases, symptoms include pruritus, jaundice, abdominal pain, weight loss, fevers, hyperpigmentation, vitamin deficiencies and metabolic bone disease [5]; all of which can have a significant impact on health-related quality of life (HRQOL) [6,7].
Increasingly in chronic diseases and terminal illness, it is recognised that maintaining HRQOL is an important consideration when the treatment is aimed at maintenance rather than a cure, or the treatment has a high level of toxicity [8]. Many of the current therapeutic interventions in PSC are aimed at managing symptoms. Measuring the impact of these interventions and preserving HRQOL is an important aspect of PSC care. This requires patient reported outcome measures (PROMs) that are sensitive enough to capture changes in HRQOL or symptoms over time.
Increasingly, PROMs use has demonstrated a positive contribution to clinical practice and research [9]. In clinical practice, aggregate level PROM data can help us to understand the burden of chronic medical conditions, identify health inequalities [10] and determine new areas for therapeutic interventions. They can also play a key role in benchmarking and audit. [11] At an individual patient level, PROMs can be used to monitor the response, adverse effects and benefits of treatments in routine practice, [12] facilitating communication between clinicians and patients regarding their HRQOL, symptom management and control [13][14][15].
A previous review investigating the quality of life (QOL) instruments used in liver transplant recipients has been conducted [16]. However, to date, no comprehensive review of PROMs used in PSC patients has been undertaken. There is a clear need to evaluate the measurement properties of the PROMs currently used in this population to determine the optimal measures for use in future research and routine care. Therefore the objectives of this systematic review were to: (a) identify and categorise PROMs currently used in research involving the PSC population; and (b) investigate their measurement properties, to help inform the selection of PROMs for use in future PSC research and routine practice.

Methods
The following guidelines were used, where applicable, to inform the conduct and reporting of this study: (i) the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [17] guidance (see Additional file 1 for the PRISMA checklist), (ii) COnsensus based Standards for the selection of health Measurement INstruments (COSMIN) guidance [18] and (iii) the updated method guidelines for systematic reviews in Cochrane collaboration back review group [19]. The study was registered with PROSPERO (Registration Number: CRD42016036544).

Search strategy
A systematic search was conducted on the following electronic databases: Medline, EMBASE and CINAHL from inception to 15 February 2018. The search terms "Primary sclerosing cholangitis" and "Patient reported outcome measures" were used, alongside synonyms and related terms (see Additional file 2 for the full search strategy). These terms were combined with the COSMIN search filters developed by VU University Medical Centre Amsterdam and University of Oxford (available on COS-MIN website: http://www.cosmin.nl/). In addition, papers included in the full text review were subjected to a hand search of reference lists [20,21].

Inclusion criteria
Studies were eligible if: a) PROMs were included in the study meeting the FDA definition [22]. b) Study participants were patients with PSC.
In addition: c) Studies that evaluated at least one measurement property (i.e. reliability, validity, responsiveness, interpretability) were included in the COSMIN quality review.
No restriction was placed on age or gender of participants or language, publication date or country of origin of the study.

Selection of studies
Two reviewers (FI/GT or GT/GK) independently screened studies according to their title and abstract to determine eligibility. Following this, the full text of potentially eligible studies was retrieved and screened independently by two independent reviewers (FI/GT or GT/ GK). The protocol planned that discrepancies would be discussed with a third investigator (MG or DK or AS) to reach consensus; however, this was not required.

Data extraction
The two independent reviewers (GT plus FI, GK or AS) independently extracted the data from each study using a predefined form (including study design and patient level characteristics). Information regarding each PROM was extracted, including: constructs, therapeutic area, domains, number of items, scoring method, recall period, administration, completion time, data collection, cost/permission and measurement properties (reliability, validity, responsiveness, interpretability).

Content comparison of included PROMs
A summary of PROMs used in studies of PSC patients, including an overview of included domains and specific content was prepared. The PROMs were categorised according to their domains to facilitate comparison of the measures that have been used in PSC studies to-date.

Quality assessment
The COSMIN checklist [23] was used to assess the methodological quality of studies that reported on the measurement properties of PROMs used in the study. Two reviewers (FI/GT or GT/AW) independently completed the COSMIN checklist. The protocol planned that discrepancies would be discussed with a third reviewer; however, this was not required. Each measurement property was scored according to the quality of reporting by the publication, using a four-point rating scale: 'excellent' , 'good' , 'fair' and 'poor'. The methodological quality of each study was rated by taking the lowest score (worst score counts method) per domain. For example, if any of the items of the domain reliability was scored 'poor' , the overall score for regarding the methodological quality of reliability was rated as 'poor'.

Evidence synthesis
Synthesis of measurement property evidence was performed using standardised criteria developed by Terwee 2011 [23]. The summary of the overall evidence of measurement properties of the PROMs was determined by the number of studies, the methodological quality of the studies, and consistency of the findings. Based on these factors the overall rating of a measurement property per PROM was ranked as "+" positive, "?" indeterminate or "-" negative and combined with an assessment of the overall level of supporting evidence (strong, moderate, limited, conflicting, unknown) as proposed by the Cochrane Back Review Group [24].

Study selection
In total, 8074 studies were identified, 5893 remained after duplicate removal and 150 remained after reviewing titles and abstracts (Fig. 1). Following review of the 150 full texts, 37 studies, containing 36 different PROMs, were included. Table 1 summarises the general characteristics of the included studies. The study designs included 17 cross-sectional studies, five randomised controlled trials (RCTs), four case-control studies, two validation study, two pilot study, two before and after study, one cost-effectiveness study, one case matched study, one longitudinal study, one cohort study and one retrospective case series study.
Twenty seven of the 37 included studies used PROMs to examine the impact of PSC on patients and seven of these measured the effectiveness of treatments: one study evaluated the cost-effectiveness of liver transplantation, one study assessed health utilities and two were validation studies of the PROMs: the National Institute of Diabetes Digestive and Kidney Diseases Liver Transplant (NIDDK-QA) and the Primary Sclerosing Cholangitis Patient Reported Outcome (PSC PRO).
In total, 3742 patients with PSC were recruited to the included studies (sample size range n = 4-1000). All participants were adults, with the exception of one study [25] which included patients with the mean age of 11.6 years. Studies were heterogeneous in terms of population demographic characteristics. In the thirty-five studies that reported gender, the proportion of PSC patients who were males ranged from 15 to 97%. Five studies reported a relatively wide range of mean Mayo risk scores (− 0.1 to 2.87) for PSC patients, a score which estimates patient survival in PSC [6,[26][27][28][29]. Twenty-four studies described the proportion of IBD in PSC patients, ranging from 7 to 100%. In 12 studies, the percentage of PSC patients who had received a liver transplant ranged from 12 to 100%.

Characteristics of PROMs
Characteristics of the 36 included PROMs are presented in Table 2. The most frequently used PROM was the Short Form 36 health survey (SF-36) (n = 15), followed by the Chronic Liver Disease Questionnaire (CLDQ) (n = 6) and the Primary Biliary Cirrhosis (PBC)-40 (n = 5). All other PROMs were used in ≤3 studies (Table 1).
Two other measures included: the Lifetime Drinking History (LDH) and Health Habits and History Questionnaires (HHHQ), which focused on alcohol consumption and dietary intake.

Content comparison of included PROMs
The most frequent health domains (n = 6) included across the measures were: fatigue, pain, physical functioning, emotion, anxiety and general health. Generic PROMs measured symptoms such as pain, physical functioning, emotion, mental health and depression. The disease-and symptom-specific PROMs targeted aspects surrounding gastro intestinal symptoms, such as abdominal pain, or gastroduodenal symptoms, sexual problems, somatic symptoms, depression, mood disturbance, and vegetative features (Additional file 3).

Quality assessment
Only three studies investigated measurement properties for PROMs, two studies evaluated the NIDDK-QA [26,28] and one study evaluated the PSC PRO [43].
For NIDDK-QA, one validation study [28] included 76 Primary Biliary Cirrhosis (PBC) and 17 PSC patients. A second study examined health status and QOL in patients with cholestatic disease before and after a liver transplant. In this study the NIDDK-QA questionnaire was administered to 65 Primary Biliary Cirrhosis and 92 PSC patients [26]. The PSC PRO validation study included 102 patients with PSC who completed the PSC PRO and four other questionnaires (SF-36, CLDQ, PBC-40 and 5-D Itch Scale) using an ePRO website [43]. The results of the validation studies are presented in Table 3 and summarised below.

Internal consistency
All the validation studies, appropriately calculated Cronbach's alpha to estimate reliability and internal consistency. Reported Cronbach's Alpha ranged from 0.87 to 0.94 for the NIDDK-QA and 0.86 to 0.94 for the PSC PRO which suggests good internal consistency. Criteria defined by the COSMIN tool meant that for the NIDDK-QA the measurement properties were evaluated as 'poor' in methodological quality in both studies primarily because of small sample sizes and a lack of information regarding the proportion of missing items and how missing items were managed. The PSC PRO was rated as 'fair' due to the lack of explicit reporting of missing items and sample size for unidemensionality analysis. However, this measurement property was also evaluated with 'poor' methodological quality owing to the absence of details regarding the measurement properties of the comparator scale (SF-36) in this population, and issues with sample size and missing data. Kim et al. (2000) [28] also measured discriminant validity and information on the significant differences in the item and domain level scores of NIDDK-QA reported. Again, this property was evaluated with 'poor' methodological quality, secondary to issues regarding sample size, proportion and handling of missing data.
For the PSC PRO, 26 PSC patients enrolled in cognitive interviews for assessment of content validity, which was rated as 'excellent' according to the COSMIN checklist. An external validation cohort of 102 patients completed the PSC PRO along with SF-36, CLDQ, PBC-40 and 5-D Itch Scale; all correlations were statistically significant. The structural validity measurement property was rated as 'fair' due to the sample size in relation to the number of items.

Evidence synthesis
Both NIDDK-QA studies reported limited information regarding internal consistency, reliability and validity (concurrent and discriminant). Using the COSMIN guidance these properties were rated as indeterminate due to the poor methodological ratings of both studies (Tables  4 and 5) (Additional file 4) [23]. The PSC PRO study [43] had higher methodological quality compared to the NIDDK-QA studies; however, as there was only one study the level of evidence is limited.

Discussion
This review identified a total of 37 studies assessing 36 different PROMs used in patients with PSC; however, only one of these tools was specifically developed for the PSC population in accordance with FDA guidelines. The rationale for PROM utilization in the included studies varied. Most studies sought to measure the burden of the disease using constructs such as HRQOL and symptom severity; however, some studies examined the effectiveness of treatment, cost effectiveness and health utility. No studies researched the use of real-time monitoring of PROMs to directly inform PSC patient care in a routine clinical setting. Only three studies evaluated the measurement properties of PROMs in PSC patients: two studies evaluated the NIDDK-QA [26,28] and one study evaluated the PSC PRO [43]. Currently, due to weakness in the methodological quality, there is limited evidence to support the use of these PROMs in the PSC population; however the PSC PRO is a promising new measure designed with patient input which requires further validation.
Clinicians or researchers wishing to use PROMs in PSC patients may consider use of both generic and disease specific measures. Choice of measurement selection should be informed through consideration on psychometric properties and patient input [53]. Generic measures such as the SF-36, although not formally validated in PSC patients, are widely used and allow comparison of the burden of PSC with other chronic disease, whilst the EQ-5D and SF-6D may be used to provide estimates of health utility to inform cost-effectiveness analysis [54]. Use of the PSC PRO will provide a more detailed assessment of symptoms and impact of symptoms relevant to PSC patients and help identify patients with varying disease severity [43,55].
Although the PSC PRO has been developed with input from patients with and without IBD, questions focused on IBD symptoms appear fairly limited. This is important to note since 70-80% of PSC patients have co-existent IBD, most frequently ulcerative colitis [3]. This is a long term comorbidity and can occur even after a liver transplant [56]. The clinical course for patients with PSC and concomitant IBD can be different when compared to IBD or PSC alone [57]. PSC-IBD patients have higher incidence of rectal sparing, colorectal neoplasia, pouchitis following ileal pouch anal anastomosis (IPAA), pancolitis, and an overall poorer prognosis when compared to patients with IBD alone [57,58]. Thus, PSC-IBD patients have additional symptoms and burdens that impact on activities of daily living with the consequential impact on HRQOL [59]. Additional use of an IBD measure such as the IBS-QOL may therefore be warranted [60].
Following further validation, the PSC PRO has potential for use in a number of ways to inform PSC patient care. The PRO may be used in clinical trials to assess the impact of new treatments or be used at the individual patient level in routine clinical practice to facilitate shared decision making and tailor care to individual patient needs. This approach has been highly successful in other settings such as cancer where routine monitoring using ePROs reduced emergency room admissions by 7%, hospital admissions by 4%, helped patients stay on treatment longer, improved patient quality of life by 31% and increased survival on average by 5 months at low cost [61,62].

Strengths and limitations
This study is the first to undertake a systematic review of PROMs used in PSC, in accordance with the PRISMA [63] and COSMIN guidelines [64]. The use of COSMIN criteria has permitted a structured and comprehensive evaluation of the identified measures. However, the NIDDK QA studies evaluated in this review were carried out before the COSMIN guidance was available and at the time of publication the level and detail of reporting may have been deemed acceptable at that time. Another important consideration for research studies or clinical trials in rare diseases such as PSC are the small study populations. When guidelines such as COSMIN judge the quality of the methodology on sample sizes, it can make it more difficult to demonstrate sound methodological quality when there are only small numbers of patients available for recruitment and validation of PROs [65]. The use of international multi-centred studies may be one approach to overcome the small numbers available in studies that aim to evaluate and develop PROs for use in PSC in future studies.

Conclusion
In conclusion, a wide variety of PROMs are used to assess HRQOL and symptom burden in patients with PSC, but none have undergone comprehensive and extensive validation in this patient group. The PSC PRO is a promising new measure to assess symptoms and symptom impact in PSC patients; however further validation work is required. Collection of PROs in PSC patients can provide valuable information in a research setting and routine clinical practice to improve PSC patient care.