Inflammatory bowel disease-specific health-related quality of life instruments: a systematic review of measurement properties

Background This review aims to critically appraise and compare the measurement properties of inflammatory bowel disease (IBD)-specific health-related quality of life instruments. Methods Medline, EMBASE and ISI Web of Knowledge were searched from their inception to May 2016. IBD-specific instruments for patients with Crohn’s disease, ulcerative colitis or IBD were enrolled. The basic characteristics and domains of the instruments were collected. The methodological quality of measurement properties and measurement properties of the instruments were assessed. Results Fifteen IBD-specific instruments were included, which included twelve instruments for adult IBD patients and three for paediatric IBD patients. All of the instruments were developed in North American and European countries. The following common domains were identified: IBD-related symptoms, physical, emotional and social domain. The methodological quality was satisfactory for content validity; fair in internal consistency, reliability, structural validity, hypotheses testing and criterion validity; and poor in measurement error, cross-cultural validity and responsiveness. For adult IBD patients, the IBDQ-32 and its short version (SIBDQ) had good measurement properties and were the most widely used worldwide. For paediatric IBD patients, the IMPACT-III had good measurement properties and had more translated versions. Conclusions Most methodological quality should be promoted, especially measurement error, cross-cultural validity and responsiveness. The IBDQ-32 was the most widely used instrument with good reliability and validity, followed by the SIBDQ and IMPACT-III. Further validation studies are necessary to support the use of other instruments. Electronic supplementary material The online version of this article (10.1186/s12955-017-0753-2) contains supplementary material, which is available to authorized users.


Background
Inflammatory bowel diseases (IBD) are characterized by chronic, uncontrolled and relapsing inflammation of the gastrointestinal tract, which encompasses Crohn's disease (CD) and ulcerative colitis (UC). Health-related quality of life (HRQoL) is defined as a broad, multidimensional concept comprising patients' physical health (including disease), psychological state, level of independence, social relationships, personal beliefs and relationship to their environment [1,2]. The evaluation of HRQoL for patients with IBD in clinical research and clinical practice enhances the understanding of the disease impact and the effects of treatments on the disease. Thus, the evaluation of HRQoL should be recognized as an important outcome indicator by patients and their clinicians.
Up to now, a large number of IBD-specific HRQoL instruments have been developed and validated for the IBD patients [3][4][5][6][7]. These instruments have been used to assess patients' understanding of IBD symptoms and the subjective perception of the illness in clinical practice and research [3,4]. They have also been used to compare the effect of treatment strategies and to provide evidence for health policy makers [3][4][5].
Several researchers have conducted reviews that measure the HRQoL of patients with IBD [3][4][5][6][7][8]. However, the reviews only enrolled some of the instruments, while other instruments are commonly ignored. The measurement properties and methodological quality of measurement properties should be evaluated systematically for clinical practitioner and researchers. We aimed to comprehensively collect all of the eligible IBD-specific HRQoL instruments to gain an understanding of their measurement properties. Therefore, the aim of this systematic review was to critically appraise and compare the measurement properties of the instruments to help clinicians and researchers select an appropriate instrument.

Inclusion and exclusion criteria
This study was conducted following the guideline of the preferred reporting items for systematic reviews and meta-analysis (PRISMA statement) [9]. Articles were included if they fulfilled the following criteria: (1) Types of patients: Patients diagnosed as CD, UC or IBD were enrolled. Patients with other diseases (infectious colitis, ischemic colitis, irritable bowel syndrome, etc.) were excluded. (2) Types of instruments: The HRQoL instruments developed and validated for patients with CD, UC or IBD were eligible. HRQoL was defined as a broad, multidimensional concept comprising patients' physical health (including disease), psychological state, level of independence, social relationships, personal beliefs and relationship to their environment. Both the self-administered and rater-administered instruments were included. The instruments for child or adult patients were included. (3) Types of languages: The full-text articles were published in English. General HRQoL instruments were excluded, such as the SF-36. Disease-specific instruments not related or only partially related to IBD were also excluded, such as the gastrointestinal quality of life index [10].

Literature search
The following relevant electronic databases were searched for English-language articles: Medline (via Pubmed) and EMBASE. The search period was from the inception of the databases to May 31th 2016. The search strategy for Medline (see Additional file 1: Appendix S1) consisted of 3 types of search terms for the following: (1) IBD, UC or CD; (2) HRQoL; and (3) measurement properties. The latter two filters were developed according to the syntax established by Kotecha et al. [11].
In addition, Google Scholar was used to search for relevant articles and literature. The citations of the reviews and the references of included articles were also checked. The patient-reported outcome and quality of life instruments database (website: https://eprovide.mapi-trust.org/) was searched for eligible instruments. Two review authors (XLC, FBL) independently performed the literature search. Disagreements between the two authors were resolved by discussion with another author (LHZ).

Literature extraction
A set of questions regarding the characteristics of the instruments were drafted. The characteristics were as follows: Which type of disease does the instrument assess (IBD, UC or CD)? How is the instrument administered (self-administered or rater-administered)? How long does it take to complete (completion time)? At what time does it measure the HRQoL of the patients (recall period)? How many items does it contain? What is the form of the item (response options: including Likert or visual analogue scale [VAS])? What is the range of the scores? What domains does it contain? Are classical test theory and item response theory applied? Data about the first author, year of publication, the full and abbreviated names of the instrument and the country of origin (the first version) were also collected.
The methodological quality of measurement properties was assessed according to the consensus-based standards for the selection of health measurement instruments (COSMIN) checklist with a 4-point scale [12][13][14]. The COSMIN had the following items: internal consistency, reliability (test-retest reliability), measurement error, content validity, structural validity, hypothesis testing, cross-cultural validity, criterion validity and responsiveness. For each instrument, the measurement properties were rated as "poor", "fair", "good" or "excellent" based on predefined criteria [12][13][14]. The definitions of measurement properties for measurement properties based on COSMIN checklist are shown in Additional file 1: Appendix S2. The following measurement properties of the instruments were also evaluated: reliability (internal consistency, test-retest reliability), content validity (interviews/focus groups, pilot test), structural validity (convergent/divergent, discriminant), criterion validity and responsiveness.
The methodological quality of measurement properties was based on the original version, except that crosscultural validity was based on the translated versions. Two of the three review authors (XLC, LHZ or YW) independently performed the article selection, screened and extracted the characteristics of the instruments and assessed the measurement properties. Disagreements between the two authors were resolved by discussion with another author (TWL or XYL).
The numbers of domains in the 15 instruments varied from 1 to 6 ( Table 2). For the instruments of paediatric IBD, the IMPACT series instruments contained four domains: IBD-related symptoms, physical functioning, emotional functioning and social functioning. For adult IBD patients, some instruments contained the above four domains, whereas some only contained one or two domains. In total, of 55 domains were obtained from all the instruments.    [31] and the total score of the IBDQ-9 (unidimensional) [25]. The methodological quality of measurement properties based on the COSMIN checklist with 4-point scale ratings is shown in Table 3. All of the instruments were developed and assessed based on classical test theory. Item response theory was also used in the IBDQ-9 and CLIQ. (1) Most of the instruments scored "excellent" or "good" for content validity. The items of these instruments were mainly from interviews with patients, review of the literature and professional experience. The pilot study was used to ensure the applicability of the items in the seven instruments. The domains of these instruments mainly contained IBD-related symptoms, physical, emotional and social functioning ( Table 2). For example, the IBDQ-32 contained bowel symptoms, systemic symptoms, emotional and social domains [22]. (2) Most of the instruments scored "good" or "fair" for internal consistency, reliability, structural validity, hypotheses testing and criterion validity. For example, structural validity was rated in 12 instruments. Among them, two instruments scored "excellent" [25,33], three scored "good" [21,26,31], five scored "fair" [19,20,[22][23][24] and two scored "poor" [29,30]. (3) Most of the instruments scored "fair" or "poor" for measurement error, responsiveness and cross-cultural validity. The reasons for responsiveness scoring "fair" or "poor" included: the magnitude of the correlations or differences was not stated; and the criterion for change was not considered as a reasonable gold standard. The reasons for  (6), systemic impairment (2) Body image (3) Emotional impairment (11) Functional/social impairment (11), treatments (3) IMPACT-II IBD symptoms (7), systemic symptoms (3) Body image (3) Emotional functioning (7) Social functioning (12), treatment (3) IMPACT-III IBD symptoms (5) Body image (4), energy (4) Embarrassment (6), worries/concerns about IBD (13) -For adults IBDQ-32 Bowel symptoms (10), systemic symptoms (5) -Emotional functioning (12) Social functioning (5) SIBDQ Bowel symptoms (3), systemic symptoms (2) -Emotional functioning (3) Social functioning (2) IBDQ-36 Bowel symptoms (8), systemic symptoms (7) Functional impairment (7) Emotional functioning (8) Social impairment (6) RFIPC Impact of disease (13), complications of disease (4) Body stigma (2), sexual intimacy (3) --

CCQIBD
Medical/symptoms (9) Affect/life in general (11), functional/economic (12) -Social/recreational (15) PIBDQL Intestinal symptoms (8), systemic symptoms (7) -Emotional functioning ( The IBDQ-9 had only one domain: total score. The CUCQ did not report the domain Information (2)* in the EIBDQ did not belong to social functioning -: no domain cross-cultural validity scoring "poor" and "fair" included: whether the two translators work independently was not reported; whether the items translated forward and backward was not reported; how differences between the original and translated versions were resolved was not described in detail; the cultural relevance of the translation was not checked; and differential item function between language groups was not assessed. The measurement properties of the instruments are shown in Table 4. (1) The IMPACT series instruments (IMPACT, IMPACT-II and IMPACT-III) were used to assess the HRQoL of paediatric IBD patients. The IM-PACT series instruments, especially IMPACT-II and IMPACT-III, had good content validity and were translated into other languages. They were easily administered and contained the main domains (symptoms, physical, emotional and social domains). (2) The IBDQ-32 was considered to be of good measurement properties (content validity) and was proven to be valid, reliable and responsive. The IBDQ-32 contained the main domains: symptom, social and emotional domains. Furthermore, the IBDQ-32 was the most widely used and was translated and back-translated into a variety of languages.
(3) The rating form of IBD patient concerns (RFIPC) had good content validity, internal consistency and internal consistency and acceptable responsiveness. Although the original version did not report the responsiveness, its responsiveness was confirmed in the translated version [39]. The RFIPC contained symptoms and emotional domains but did not contain emotional or social domains. (4) The SIBDQ, IBDQ-9, Cleveland global quality of life (CGQL), short health scale (SHS), Edinburgh inflammatory bowel disease questionnaire (EIBDQ) and Crohn's and ulcerative colitis questionnaire (CUCQ) were short instruments, which were all easily administered and could be completed in a short time. The IBDQ-9, SIBDQ, CUCQ and SHS had good measurement properties. The SIBDQ and IBDQ-9 were short versions of the IBDQ-32 and IBDQ-36, respectively. The SIBDQ was used in the UK, the US, Germany and Spain [40][41][42][43]. The SIBDQ contained symptoms, emotional and social domains. The IBDQ-9 was used in Spain and Iran [25,44], which only contained one domain (total score). The SHS contained symptom burden, general wellbeing, disease-related worry and social functioning. The SHS was used in England, Norway and Sweden [45][46][47]. The CUCQ was used only in the UK, which should be further evaluated in other languages [32]. (5) For the IBDQ-36, the Cleveland clinic questionnaire for inflammatory bowel disease (CCQIBD) and Padova inflammatory bowel disease quality of life (PIBDQL), limited evidence was available for their measurement properties.

Discussion
The present review summarizes an overview of 15 IBDspecific HRQoL instruments with respect to their measurement properties and the methodological quality based on the COSMIN checklist.
According to the results of the COSMIN checklist, most of the instruments did not include all the methodological quality. Only content validity was assessed properly in most of the included instruments. Most of the instruments scored "good" or "fair" for internal consistency, reliability, structural validity, hypotheses testing and criterion validity. The information regarding measurement error, responsiveness and cross-cultural validity was limited or was of poor measurement property because they did not reach the required criteria or because of insufficient information. Our results were consistent with other instruments appraised by the COSMIN criteria, such as irritable bowel syndromespecific QOL instruments [87]; rheumatoid arthritisspecific QOL instruments [88]; and QOL instruments for infants, children and adolescents with eczema [89].
Most of the IBD-specific instruments did not show adequate methodological quality. One reason for this was that most of the IBD-specific HRQoL instruments were developed before 2010. However, COSMIN guidelines were developed approximately 2010 [12][13][14]. Therefore, older articles could not follow COSMIN guidelines, and their measurement properties might be underestimated.
Based on the results of the measurement properties and translated versions of the included instruments, some instruments had good psychometric characteristics and were widely used. (1) For paediatric IBD-specific instruments, most of the measurement properties were tested properly, especially the IMPACT-III [21]. The IMPACT-III had the same items as the IMPACT-II. However, The IMPACT-III was on a 0-4 Likert scale, which was easily understood by children. The IMPACT-III was translated into at least 4 translated versions [51][52][53][54]. The IMPACT-III was recommended to assess the HRQoL for paediatric IBD patients. (2) For the adult IBD instruments, the IBDQ-32 and SIBDQ (short version of IBDQ-32) had good measurement properties. The two instruments had excellent content validity and proved to be valid, reliable and responsive. The two instruments contained symptoms, emotional and social domains. The two instruments were used widely. The IBDQ-32 has been translated and validated in 93 languages. The SIBDQ was used in the UK, the US, Germany and Spain [40][41][42][43]. The IBDQ-9, CGQL, SHS, EIBDQ and CUCQ were all short instruments, which had relatively high methodological quality. However, they had fewer translated versions. The IBDQ-36, CCQIBD, PIBDQL, CGQL and EIBDQ had the lowest measurement properties. The PIBDQL and CGQL instruments were developed and assessed based on IBD patients receiving surgery, and they were translated into other languages. The EIBDQ had not been translated into other languages, which limited its use.
Although there was a lack of consensus regarding the specific domains among all of the instruments, the common domains measured in the instruments were identified: IBD-related symptoms, physical functioning or general wellbeing, emotional functioning and social functioning. These domains were consistent with the concepts of the common scales, such as the WHOQOL and FACT-G [92][93][94]. The typical manifestation of IBD included diarrhea with blood, fever, abdominal pain and malnutrition. These symptoms are the most frequently occurring, meaning that the domains contribute the most important information to the IBDspecific instruments.
The limitations of this study were as follows: (1) Non-English articles were not enrolled because of language restrictions; thus, the restriction resulted in limited negative evidence for this study; (2) Articles about the original language were used to assess the measurement properties of the included instruments. The translated articles were not used for the assessment of measurement properties; and (3) Some articles about clinical trials may have been excluded in this review, which resulted in a limited ability to examine responsiveness.

Conclusions
This review better guides the use of IBD-specific HRQoL instruments and helps clinicians and researchers choose appropriate IBD instruments. The measurement properties scored low for some IBD-specific HRQoL instruments. Based on the characteristics, measurement properties and applications of the instruments, the IBDQ-32 was the most widely used and had the strongest evidence of being reliable, valid and responsive for adult IBD patients. As a short instrument, the SIBDQ also had good measurement properties and was widely used. The IMPACT-III had good measurement properties and was widely used for paediatric IBD patients. For worldwide use of the new instruments, it is necessary to develop instruments according to the standard procedures (for example, the COSMIN) and make sure their measurement properties had excellent or good ratings. New instruments for IBD should take into account IBD-related symptoms and physical, emotional and social domains.

Additional file
Additional file 1: Appendix S1 and Appendix S2.