Development and validation of the pulmonary tuberculosis scale of the system of Quality of Life Instruments for Chronic Diseases (QLICD-PT)

Background Generic assessments are less responsive to subtle changes due to specific diseases, making it challenging to fully understand the impact of pulmonary tuberculosis (TB) on patient’s quality of life (QOL). Methods We applied programmed decision procedures and theories on instrument development to develop the scale. Two hundred patients with pulmonary TB participated in measuring QOL three times before and after treatments. We assessed the validity, reliability, and responsiveness of QLICD-PT using correlation analysis, factor analysis, multi-trait scaling analysis, randomized block analyses of variance with Least Significant Difference post-hoc tests. Results We composed QLICD-PT with 3 domains (28 items) for general QOL and 1 pulmonary TB specific domain (12 items). Correlation and factor analysis confirmed good structure validity and criterion-related validity when using Chinese version of the Medical Outcomes Short-Form Health Survey (SF-36) as a criterion. The internal consistency of α values were higher than 0.70. The score changes after treatment were of statistical significance for the overall scale, physical domain and specific domain with effect size ranging from 0.32 to 0.72. No floor effects but small ceiling effects were observed at domain level. Conclusions As the first pulmonary TB-specific QOL scale developed by a module approach in Chinese, QLICD-PT has an acceptable degree of validity, reliability and responsiveness, and can be used to measure the life quality of PT patients specifically and sufficiently.


Background
Pulmonary Tuberculosis (TB) is a chronic pulmonary infection caused by Mycobacterium tuberculosis. As a major global public health challenge, TB remained one of the top 10 causes of deaths worldwide, leading to more deaths than HIV/AIDS did [1]. According to World Health Organization, around 9.6 million people were diagnosed with TB in 2014, 1.2 million died from the disease [2]. With almost one million new cases in 2015, approximately 10% of the global incident cases, TB continues to be a major public health problem in China, and making China the third among the high TB burden countries [3].
Compared with the general population, patients with TB reported more deficits in their physical and mental well-being. TB patients are facing various formats of social rejections and isolations because TB has been stigmatized as a source of infection for the healthy individuals [4][5][6], which may lead to work absenteeism, and in turn, substantial amounts of loss of productivity, and reduced monthly income. Stigmatization and negative emotions resulting from the illness could result in a long-term impairment of patient's psychosocial well-being [7]. Emerging evidence also suggest that psychosocial burden amongst TB patients, after microbiological cure, may have a greater impact on health-related quality of life (HRQOL) than clinical symptoms [5,8], and QOL of TB patients has been substantially compromised [7][8][9].
The term quality of life (QOL) and HRQOL have been created to pivot a collection of health outcome research over the past decades. The term HRQOL is often used to indicate QOL from the perspective of health care or medical services people experience [10], hence, in this study, the term of QOL and HRQOL are interchangeable. QOL instruments are usually classified as being either generic or disease-specific. Generic measures can be used in almost any population, irrespective of the underlying condition or disorder. Since generic measures apply to a wide variety of populations, they allow for broad comparisons of relative impact of various diseases or interventions on QOL [11,12]. However, generic assessments are less responsive to subtle changes due to specific diseases or in specific population. The disease-specific instruments have the advantage for assessing domains relevant to specific diseases and sensitive to capture small changes [11,12]. However, to the best of our knowledge, one study has been published to assess QOL for patients with TB [13], making it challenging to fully understand the impact of TB on patient's QOL. Therefore, it is urgent to develop a tuberculosis-specific QOL scale under the Chinese culture context. The Chinese QOL instruments, i.e. Quality of Life Instruments for Chronic Diseases (QLICD), were developed, including both a generic module (QLICD-GM) [14], and various modified modules for specific diseases. Instruments have been developed and validated for coronary heart disease (QLICD-CHD) [15], irritable bowel syndrome (QLIC-D-IBS) [16] and for hypertension (QLICD-HY) [17] but not pulmonary TB, the leading causes of mortality and morbidity of infectious diseases among Chinese. Therefore, we made effort to make the missing piece of the puzzle and develop a set of QLICD, specifically for the Pulmonary TB (QLICD-PT). The aim of this paper was to describe the development and validation processes of QLICD-PT.

Methods
Development of the QLICD-PT General principles and steps of developing QLICD-PT In principle, our effort to develop QLICD-PT followed the general steps described in detail elsewhere for QLICD-GM [14]. In brief, the QLICD-PT was created from two sub-modules, i.e., modified QLICD-GM (very left column of the Fig. 1) and newly created pulmonary TB specific module (very right column of the Fig. 1). We approached both modified QLICD-GM and pulmonary TB specific module by two mutually independent group efforts. The nominal group consisted of 16 individuals including six physicians, two nurses, one medical educator, and seven teachers/researchers (two in QOL/medical statistics, one in epidemiology, two in sociology, two in psychology), and were created to make suggestions about what items should be included. The focus group with 10 experts including four physicians, one medical educator, and five teachers/ researchers (two in QOL/medical statistics, one in epidemiology, one in sociology, one in psychology) were formed to use programmed decision method to present the conceptual framework and select items proposed by nominal group. Overall, the nominal group was responsible for item presentation, whereas the focus group dealt with item selection and organization. During the process of item selection, we applied not only qualitative analysis such as group discussion, in-depth interviews, pilot tests and pretests, but also quantitative statistical methods of variation analysis, correlation analysis, factor analysis and cluster analysis procedures.

Modifying QLICD-GM
Slight modification was made to simplify the original version of QLICD-GM [14]. For example, sexual function was an independent facet in original QLICD-GM but included as part of the physiological functions in modified QLICD-GM. Modified QLICD-GM consists of 28 items, classified into three domains and nine facets. Physical domain (PHD) includes 9 items (coded GPH1-GPH9) grouped into three facets: Basic Physiological Functions (BPF), Energy and Discomfort (EAD), and Independence (IND). Psychological domain (PSD) contains of 11 items (coded GPS1-GPS11), divided into three facets: Cognition (COG), Will and Personality (WIP), and Emotion (EMO). Social domain (SOD) comprises 8 items (coded GSO1-GSO8), categorized into three facets: Interpersonal Communication (INC), Social Support and Security (SSS), and Social Role (SOR) ( Table 1).

Creating pulmonary TB specific module
Using similar procedure described above, we selected 21 items as the item pool of pulmonary TB specific module based on literature reviews, nominal/focus group discussion and patient interviews. A total of 4 facets with 12 items (coded PT1-PT12) made the way to the final module,   Table 1).

Validation of the QLICD-PT Data collection and scoring
The QLICD-PT, combining both modified QLICD-GM and pulmonary TB specific facets, was used to evaluate patients with pulmonary TB in a field survey for assessing the psychometric properties. The survey was carried out in ten Disease Control and Prevention Centers selected in Yunnan Province, China. The study population was limited to patients with pulmonary TB who were able to read and understand the questionnaire. The participating investigators included doctors, nurses, and medical postgraduates. The investigators explained the purpose and the scale to the patients and obtained informed consent from patients who agreed to participate in the study. The study protocol and informed consent form were approved by the Institutional Review Board of the investigators' institutions. Each respondent (n = 200) completed the questionnaire before receiving treatment as the 1st wave of assessment. After 2 months of treatment, respondents (n = 198) participated in the 2nd wave of assessment, and after 6 months of treatment, a total of 175 respondents participated in the 3rd wave of assessment to evaluate responsiveness. Due to lack of an agreed-upon gold standard for assessing QOL of pulmonary TB, and convergent and discriminant validity of QLICD-PT, we used the Chinese version of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) [18], one of the commonly used generic QOL scales to collect data for assessing the criterion-related validity of QLICD-PT. SF-36 included eight subscales: Physical Function (PF), Role-Physical (RP), Bodily Pain (BP), General Health (GH), Vitality (VT), Social Function (SF), Role-Emotional (RE), and Mental Health (MH).

Analytic steps and indicators used to measure the validity
Each item of QLICD-PT is rated in a five-level Likert scoring system, namely, not at all, a little bit, somewhat, quite a bit, and very much. The positively stated items were scored from one to five, while the negatively stated items were scored from five to one. By adding together within the domain/facet item scores, we obtained the raw scores by items, facets, and domains. The overall score of the scale is the sum of all domain scores. For the purpose of comparison, all the domain scores were linearly converted into a standardized score (SS) ranging from 0 to 100 using the following equation: SS = (RS -Min) × 100/R, where RS, Min, and R represent the raw score, minimum score, and range of scores, respectively (see Table 1 for details). We assessed the validity of QLICD-PT from perspective of validity (construct validity, content validity), reliability (internal consistency), and responsiveness as recommended [13].

Construct validity
We calculated Pearson's correlation coefficient between the similar domains of QLICD-PT and SF-36 to assess convergent validity, one aspect of construct validity. Multi-trait scaling analysis [19] was applied to test item convergent and discriminant validity with the following two criteria: convergent validity is supported when the item-domain/facet correlation is 0.40 or above; and discriminant validity is showed when the item-domain/facet correlation is higher than that of other domains/facets. We performed factor analysis with Varimax Rotation to examine the coincidence between components extracted from data and theoretical construct of the instrument, and confirm the construct validity.

Content validity
The floor and ceiling effects are characterized by scores being concentrated on the lowest and highest sides of the overall distribution, respectively. If floor and ceiling effects are present, it is likely that extreme items are missing in the lower or upper end of the scale, indicating limited content validity. As a result, patients with the lowest or highest score can't be distinguished from each other, thus reliability is reduced [20]. The floor and ceiling effects of each domain/facet were evaluated. Floor and ceiling effects were defined to be present if more than 15% of the patients reported the minimum or maximum possible score [20].

Internal consistency (reliability)
Cronbach's alpha coefficient is common practice in scale development to evaluate the internal consistency of reliability. A score between 0.70 and 0.95 has been suggested as evidence of adequate internal consistency [19]. To assess internal consistency, Cronbach's alpha coefficient was calculated separately for each domain/facet.

Responsiveness
Responsiveness has been defined as the ability of a questionnaire to detect clinically important changes over time [20]. We measured the responsiveness by comparing the mean difference of the pre-and post-treatment assessments. Testing with randomized block analyses of variance and Least Significant Difference post-hoc tests. The standardized response mean (SRM), the measurement of effect size, was also used to proxy the responsiveness, and the values of 0.20, 0.50, and 0.80 representing small, moderate, and large effect, respectively [21] .

Validity
Correlation analyses showed that there were strong associations between items and their own domains/facets (most correlation coefficients are higher than 0.5), but weak relationship between items across domains/facets and between domains/facets ( Table 2). For example, correlation coefficients between items of GPH1-GPH9 (in bold) are higher than those across domains. The Kaiser-Meyer-Olkin values for general module and specific module were 0.84, and 0.75, respectively, exceeding the recommended value of 0.60, indicating a suitability of factor analysis. And Bartlett's Tests of Sphericity were statistically significant (P < 0.001), also supporting the factorability of the correlation matrix. There were seven principal components (initial eigen-values> 1) abstracted from 28 items of the general  Table 3) Correlation coefficients among the domain scores of the QLICD-PT and SF-36 were presented in the Table 5, showing that the correlations between the same and similar domains are generally higher than those between different and non-similar domains. For example, the coefficient between the physical of QLICD-PT and physical function of SF-36 was 0.56, higher than any other coefficients in this row. Similarly, the coefficient between the social domain of QLICD-PT and social function of SF-36 was 0.47, higher than any other coefficients in this row. These confirmed the criterion-related validity to a reasonable degree and an acceptable level of the convergent and divergent validity. For Content validity, No floor effects was detected, but small ceiling effects (≤2%) were identified in the domains and in the total scale. While at the facets, significant ceiling effects were also found in three facets, ie. IND (48.5%), WIP (17.5%) and DSE (53.0%). (Table 6).

Reliability and responsiveness
The domain-specific Cronbach's α (for internal consistency) were higher than 0.70 for all domains. At the facet level, values of Cronbach's α ranged from 0.20 to 0.83 (Table 6). There were statistically significant differences between before and after 2 months treatments for physical domain, specific domain, general module and the overall scale (P < 0.05) with SRMs ranging from 0.23 to 0.65. There were statistically significant differences between before and after 6 months treatments for all domains, general module and the overall scale (P < 0.05) with SRMs ranging from 0.17 to 0.72. At the domain level, significant difference was observed between 2 and 6 months treatments only for specific domain (P < 0.05) ( Table 7).

Discussion
Based on the original version QLICD-GM (a generic QOL evaluation instruments for Chronic disease), this version of QLICD-GM has made several improvements to increase the comprehensibility and accessibility. For example, sexual function was adopted as one component of the facet BPF (Basic Psychological Functions) rather than listing as an independent facet since the low response rate     of sex-related item among Chinese participants would influence the whole score of facet due to the privacy of sex issue in China. Moreover, several items were reworded for better straightforwardness, and similar items were combined for better simplicities. Furthermore, TB-specific domain rather than generic domain was developed and applied to measure the characteristics of pulmonary TB. By combining the modified general module QLICD-GM and the newly developed disease-specific module for pulmonary TB, we created a new QOL assessment scale QLICD-PT with psychometric strength. Generally, a practical QOL instrument should be validated in terms of at least three aspects: validity, reliability and responsiveness [13]. In this study, correlation coefficients between the similar domains of QLICD-PT and SF-36 revealed a reasonably good criterion-related validity, and convergent and divergent validity. Correlation analysis indicated that strong association between items and their own domains/facets, but weak correlations between items and other domains/facets. Factor analysis showed that components extracted from the data coincided with the theoretical constructs of the instrument, confirming the construct validity. No floor effects and very small ceiling effects (≤2%) in the domains indicated a possibility to detect QOL improvement and deterioration over time if there are, and the item design of the scale QLICD-PT is reasonable. Internal consistency was moderate. Table 7 revealed that QOL score changes after treatment were of statistically significance on physical domain, specific domain and overall scale with SRMs ranging from 0.32 to 0.72. Given that no significant changes is expected for the psychological and social domains pertaining stable traits posttreatment, QLICD-PT can be concluded to be moderate responsiveness.
Limited efforts have been made to develop specific instruments to assess QOL of patients with tuberculosis (TB) [22], including DR-12 [23], and FACIT-TB (Functional Assessment of Chronic Illness Therapy-tuberculosis) [13]. DR-12 was developed in Indian and first published in 2003, which consists of 12 items, among them 7 cover TB symptoms and 5 relate to socio-psychological characteristics and exercise adaptation. However, response options were presented on 3 point scale, largely reducing the variation of the data collected, compromising its discriminant validity. FACIT-TB was developed in Iraq and published in 2015, which consists of 27 items of the core questionnaire, and a set of 20 items referring to diseases symptoms [13]. With more items included than DR-12, FACIT-TB is capable to pick up five domains. Being relatively brief and easy to administrate and calculate scores, FACIT-TB is in particularly suitable for the use in clinical trials [22]. However, using Consensus-Based Standards for the assortment of health status measurement instruments (COSMIN) check-list to assess methodologic quality of the HRQOL measures in development studies, Khan et al. shown that most of the studies, including DR-12 and FACIT-TB, were rated as fair to poor largely due to insufficient information collected. Therefore, it was recommended to take the advantage of good sensitivity of generic scales and excellent specificity of disease-specific scales and combine generic and disease-specific scale for a mixed scale to quantify the QOL of TB patients [24]. Our effort supports this suggestion and created a scale with both feasibility of measurement uses and qualities of methodology.

Strengths and limitations
A recent literature failed to identify any non-English HRQOL measure developed in TB population of non-English speaking countries [22]. The current study is the first one developed in non-English language for TB patients among non-English speaking countries. The QLICD-PT is certainly subject to various limitations. The TB patients who participated in the valid study were limited with the individuals who were able to read and understand the questionnaire. Although illiterate rate in China is overall very low, however, studies have repeated demonstrated that population with low level of educational attainment and low income levels are substantially more vulnerable to TB infections, and TB prevalence is relatively high among poor communities with low education levels. Cautions must be exercised. Psychometric properties and external validity of QLID-PT should be further evaluated among population with low educational attainment when proxy interview is used. Cultural proficiency should also be carefully assessed when QLICD-PT was translated into language other than Chinese.

Conclusions
Combining modified generic model and TB specific model, we used a rigorous method to develop a scale to better characterize QOL in Chinese patients with pulmonary TB. The 40-item QLICD-PT is a part of the QLICD instrument system, which showed acceptable certain degrees of validity, reliability, and responsiveness. Published evidence of reliability and validity indicates that FACIT-TB is the best QOL measurement tool and one of the commonly used among TB patients [22], QLICD-PT evaluated in the current study provides an alternative, at least, among Chinese.