Psychometric evaluation, using Rasch analysis, of the WHOQOL-BREF in heroin-dependent people undergoing methadone maintenance treatment: further item validation

Background The brief version of World Health Organization Quality of Life assessment (WHOQOL-BREF), a useful outcome measure for clinical decision making, has been evaluated using classical test theory (CTT) for psychometric properties on heroin-dependent patients. However, CTT has a major disadvantage of invalid summated score, and using Rasch models can overcome the shortcoming. The purpose of this study was using Rasch models to evaluate the psychometric properties of the WHOQOL-BREF for heroin-dependent patients, and the hypothesis was that each WHOQOL-BREF domain is unidimensional. Methods Two hundred thirty six participants (24 females, mean [SD] age = 38.07 [7.44] years, first used heroin age = 26.13 [6.32] years), with a diagnosis of opioid dependence, were recruited from a methadone maintenance treatment program. Each participant filled out the WHOQOL-BREF. Parallel analysis (PA) and Rasch rating scale models were used for statistical analyses. Results Based on the PA analyses, four domains of the WHOQOL-BREF were unidimensional. The Rasch analyses showed three negatively worded items (2 in Physical and 1 in Psychological) reported as misfits that may not contribute to the Physical and Psychological domains; one positively worded item in the Physical domain may be redundant. All values for the separation indices were above 2 except for the person separation index in the Physical domain (1.93). Category functioning and item independency of four WHOQOL-BREF domains were supported by the Rasch analyses, and there were 5 items showing the differential item function (DIF) for positive versus negative HIV (human immunodeficiency virus) infection. Conclusions The WHOQOL-BREF is a valid outcome measure for assessing general quality of life for substance abusers in terms of physical, psychological, social, and environmental factors. It can also be used as a treatment outcome measure to evaluate the effect of treatments for substance abusers. However, the three misfit negatively worded items should be used with caution because the substance abuser may not fully understand their meaning. Future research may apply cognitive interviews to determine the cognitive functioning of substance abusers and their interpretation of negatively worded items.


Background
Substance abuse, such as heroin addiction, may cause economic burdens for individual users and for the society as a whole [1,2]. Substance abusers are likely to have comorbid infectious diseases, for example, the human immunodeficiency virus (HIV), hepatitis B virus (HBV), and hepatitis C virus (HCV), because of needle sharing [3]. Therefore, substance abusers may have physically and mentally multiple negative consequences as well as those applicable to their social relationships [4], and they are reported to have a poorer quality of life (QoL) than those who are not substance-dependent [5].
QoL is used as an important outcome measure for healthcare decision-making and for evaluating intervention effects [6,7], such as the effect of medicine [8], and studies e.g. [9,10] have used QoL to examine the effect of maintenance treatment on substance abusers. Of the QoL instruments commonly used in maintenance treatment studies, the brief version of the World Health Organization Quality of Life assessment (WHOQOL-BREF) has been suggested to be the most suitable for assessing the global QoL in research on addiction behavior [6,11]. The WHOQOL-BREF has been translated into various languages, including the traditional Chinese used in Taiwan, and the Taiwan version of WHOQOL-BREF has been suggested as valid and reliable (α = 0.70-0.77, comparative fit index [CFI] = 0.89) [12]. Many studies have established the psychometric properties of the WHOQOL-BREF in different populations (e.g., community-dwelling older people [13], people with schizophrenia [14], and depressed people [15]); however, almost no studies have examined substance abusers. To the best of our knowledge, only one recent study [6] has used classical theory test (CTT) methods to evaluate the psychometric properties of the WHOQOL-BREF as applied to substance abusers.
However, using only CTT methods is insufficient for clinicians to understand the psychometric properties of the WHOQOL-BREF. Specifically, CTT methods treat raw scores and item responses to rating scales as interval data, and this may yield invalid scores. Therefore, there is a trend toward using Rasch analysis, a modern statistical method that can transform the ordinal scores of polytomous items into interval scores for the purpose of psychometric evaluation [16]. Although Rasch models have the main weakness of being a complicated model theory in terms of mathematical equations that are hard for clinicians to understand [17], it has the following strengths: (1) the validity of the items can be individually analyzed to determine any redundancy, which may not be detected by CTT; (2) item difficulty can be estimated; (3) an ordinal-to-interval conversion table can be produced that can help clinicians use the items to understand the latent traits of respondents [18][19][20]. In addition, a number of ordered polytomous Rasch models, such as the partial credit model (PCM) and the rating scale model (RSM), have been used in QoL instruments that are rated on a Likert scale [18].
Previous psychometric evaluations for the WHOQOL-BREF on substance abusers have used mainly CTT methods. Because Rasch models can detect items that are out-of-concept or redundant and can precisely measure the latent QoL of a heroin user using an ordinal-tointerval conversion table, the purpose of this study was to use several Rasch models to examine the psychometric properties of the WHOQOL-BREF in a heroin-dependent sample in Taiwan.

Participants and procedures
The Hospital Ethics Committee of Jianan Psychiatric Center approved this study (IRB number: JMH9601).
All participants (n = 236) were recruited from a methadone maintenance treatment (MMT) program and, based on the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) criteria, were diagnosed with opioid dependence by qualified psychiatrists from the Jianan Psychiatric Center. After they had expressed willingness to participate, each participant filled out and signed an informed consent; completed a structured questionnaire, including demographic data, information about substance use, and the WHOQOL-BREF, and then they underwent a series of laboratory tests, including HIV, HBV, and HCV tests.

Instrument-The WHOQOL-BREF
The WHOQOL-BREF Taiwan version contains 28 items with 26 standard items from the original WHOQOL-BREF and 2 Taiwanese national items [12]. Of the 28 items, 2 are generic items that test overall QoL and general health (i.e., "How would you rate your quality of life?" and "How satisfied are you with your health?"); the remaining 26 are in the Physical (7 items), Psychological (6 items), Social (4 items), and Environment (9 items) domains. All items are scored from 1 to 5, and 3 items (Ph1, Ph2, and Ps6; Table 1) are reversely coded. The 2 Taiwanese national items are, respectively, in the Social (S4) and Environment (E9) domains. Domain score calculation has been reported elsewhere [13], and potential scores for each domain ranged from 4 to 20, with a higher score representing a better QoL. In addition, satisfactory psychometric properties have been established for the WHOQOL-BREF Taiwan version [12].

Data analysis
Descriptive analyses yielding means, standard deviations (SDs), and frequencies were used to determine the characteristics of the participants. A major assumption of the Rasch model is unidimensionality for each domain [17,18]. Therefore, before applying the Rasch models to the WHOQOL-BREF, a parallel analysis (PA) [21,22] was used to test the unidimensionality of each domain. PA was used to produce simulation results, and these simulation results were compared to the results from our participants. The number of dimensions was decided based on how many extracted factors had an eigenvalue greater than the generated mean eigenvalue and estimated eigenvalue at the 95th percentile. For example, if the Physical domain has 2 extracted factors with eigenvalues greater than the generated mean eigenvalue and estimated eigenvalue at the 95th percentile, the Physical domain is determined to be 2-dimensional.
Several Rasch RSM models were used to test the item fit of each item on its WHOQOL-BREF domain; that is, we separately examined the items properties in each domain. Therefore, the 2 generic items were not examined in this study. However, it is acceptable not to measure the properties of the 2 generic items because they are often treated as a criterion to validate each WHOQOL-BREF domain e.g., [6,12]. Two fit indices (information-weighted fit statistic [infit] mean square [MnSq] and outliersensitive fit statistic [outfit] MnSq) were used, and the MnSq range of 0.6 to 1.4 suggests an acceptable fit [23]. Specifically, an item with a fit statistic > 1.4 means that the item may not contribute to the same underlying construct as do the other items in the same scale. An item with a fit statistic < 0.6 means that the item may be redundant in the same scale [24]. The Rasch ratingscale model can report standardized item difficulties with a mean of 0 and an SD of 1 log-odd unit (i.e., logit). A higher logit represents a more difficult item.
Item and person separation reliability, separation index, category functioning, local dependency, and person fit statistics have also been assessed using Rasch RSM models. Person separation reliability was measured by the reproducibility of person ordering on respondent abilities when they answer another set of items measuring the same concept, while item separation reliability was evaluated by the reproducibility of hierarchical item difficulty when the same items were answered by another set of respondents with comparable ability. We applied separation indices to examine how well the respondents can be discriminated (person separation index) and by how well the items can be separated (item separation index) using questionnaires [18]. An acceptable value for person and item separation reliability is > 0.7 [24], while that for person and item separation indices is >2, indicating that the measure can separate respondents (person separation index) or items (item separation index) into more than 2 distinct groups [19].
Category functioning means whether successive response categories for each item are located in their expected order; for example, the difficulty of the response "Not at all" should be lower than that of the response "Slightly" on item "How well are you able to concentrate?" In order to examine the category functioning, average measure (the estimates of average ability on a particular category that is chosen by all respondents), step measure (the thresholds and the boundaries between categories), and category fit statistics (the infit and outfit MnSq) were used. Both average and step measures are expected to monotonically increase with categories, while both fit statistics are recommended to be between 0.6 and 1.4 [18,19].
Local dependency, which means that some items are still correlated after the same underlying concept has been taken into account (e.g., the same wordings), was examined using the correlations (r) of the Rasch residuals between every two items. If there is no local dependency, the r will be 0; however, Wang et al. [25]: p. 5 claimed that "…there is always some degree of local dependence in empirical data." Therefore, some degree of local dependence (say, r ≤ 0.4) would still be acceptable [26]. Person fit was also examined using infit and outfit MnSq, where an infit or an outfit MnSq value > 1.4 indicates a misfit respondent. Using the person fit statistics, the person response validity can be examined for logical hierarchical ordering [27]. Fisher et al. [28] suggested that the person response validity of children's school task-related measures could be established when < 5% of children is misfit. However, we decided not to adopt such a criterion, and simply reported the percentage of misfit respondents because our sample had a special mental health issue. Finally, items were tested for differential item functioning (DIF) across educational level (junior high vs. senior high), gender (female vs. male), HBV (positive vs. negative), HCV (positive vs. negative), and HIV (positive vs. negative).

Results
The mean (SD) age of the participants was 38.07 (7.44) years, and the mean for first use of heroin was at age 26.13 (6.32). Their mean duration of heroin use was 8.05 (5.85) years. Most participants (n = 212, 89.8%) were male, and had 9.43 (2.35) years of formal education. In addition, 19.9% were HIV-positive; 16.5% were HBVpositive; and 94.5% were HCV-positive. Moreover, 155 participants had simultaneously used methamphetamine and heroin ( Table 2).
The PA showed that only the first factor extracted from each domain had a higher observed eigenvalue as compared to the estimated eigenvalue at the 95th percentile. In addition, the second factor extracted from the Physical domain had the same value (1.15) as the mean eigenvalue from repeated sampling, and the second factor extracted from the other three domains had eigenvalues lower than those from repeated samplings (Psychological: 0.84 vs.  (Table 5). Moreover, the person fit statistics demonstrated that about one fifth of our participants were misfit (Table 6).
No DIF items were detected on WHOQOL-BREF across either the HBV-positive and HBV-negative carriers or the HCV-positive and HCV-negative carriers. Items E9 (Eating) and S2 (Sexual activity) were found to be DIF items for educational level and gender, respectively. For the participants with the same QoL, those with a junior high educational level and below tended to score higher than those with a senior high educational level and above on item E9 (DIF contrast = 0.50), and females were prone to score lower than were males (DIF contrast = −0.86) on item S2. Five items were found to have a DIF between HIV-positive carriers versus HIV-negative carriers. For the participants with the same QoL, HIV-positive carriers tended to score higher on Ps3 (Think; DIF contrast = 0.63) and S2 (Sexual activity; DIF contrast = 0.64), and tended to score lower on Ps6 (Negative feelings; DIF contrast = −0.62), S1 (Personal relationship; DIF = −0.71), and E7 (Health service; DIF contrast = −0.57) than HIV-negative carriers (Table 7).

Discussion
To the best of our knowledge, this is the first study using several Rasch models to examine the psychometric properties of the WHOQOL-BREF with a substanceaddicted sample. Unidimensionality of the Social and Environment domains was evidenced using both PA and Rasch models; however, the Physical and Psychological domains had misfit items. Three items (Ph1, Ph2, and Ps6) were not embedded in their underlying domain, and item Ph6 was redundant. The WHOQOL-BREF was shown to have satisfactory reliability and separation indices (including person and item). No disordering was detected for 5 thresholds in the four domains, and item dependency was acceptable. However, about one-fifth of  the subjects were found to be misfit, possibly indicating the unstable nature of heroin users. The WHOQOL-BREF has been confirmed as an appropriate QoL instrument around the world e.g., [6,13,29]. In addition, we extended the satisfactory reliability to the acceptable separation index. That is, the WHOQOL-BREF has enough items and is sensitive enough to distinguish both high and low QoL participants, and our sample is large enough to verify the item difficulty hierarchy [26].   Participants with maximum extreme score (n = 2) or minimum extreme score (n = 1) were not included. c Participants with maximum extreme score were not included (n = 1).
However, we found that some misfit items in the Physical and Psychological domains contradicted previous Rasch model findings [13,16,29]. One possible reason for this is the different populations used between our study and previous studies (substance abusers vs. a general population and community-dwelling elderly people). Substance abusers are often cognitively impaired [30], and thus may have difficulty understanding some items that use indirect wordings (e.g., negatively worded items). Because three misfit items in this study were negatively worded, we tentatively concluded that substance abusers may not have sufficient intact cognitive function to interpret the three items as they were intended to be understood. Negatively worded items have a wording effect that biases the evaluation of the extracting constructs of QoL instruments [31], especially in the case of people without sufficient cognitive ability. Therefore, the underlying constructs, such as the Physical and Psychological domains, may be affected by the negatively worded items [32,33].
In order to strengthen our hypothesis (i.e., negatively worded items have a wording effect on substance abusers), we used another statistical method (confirmatory factor analysis, CFA) to justify the results. Two CFA models (M1: 4-QoL-factor model, and M2: correlated QoL traits and uncorrelated wording methods) were compared, and we hypothesized that M2 outperformed M1 because it includes the wording effect in the model. Our results showed that M2 substantially improved the data-model fit in χ 2 difference test (Δχ 2 = 149.81, Δdf = 23; P < 0.001), expected cross-validation index (ECVI; M1: 3.894, M2: 3.166), and Akaike information criterion (AIC; M1: 895.673, M2:728.120). The CFA results, therefore, somewhat confirmed our hypothesis. However, because no cognitive tests were done in this study, our hypothesis was only supported by indirect evidence (i.e., Rasch and CFA models). Therefore, future researchers may want to verify our hypothesis using direct investigations. For example, cognitive interviews can be conducted to clarify whether substance abusers have insufficient cognitive functioning by which to understand negatively worded items. In addition, DIF analyses among substance abusers and nonabusers on negatively worded items may also justify our hypothesis.
Our results suggested that the WHOQOL-BREF exhibited the expected threshold ordering among the five categories and low item dependence for a sample of heroin users. The major reasons for this may be a combination of following factors: excellent instruction documents originally provided by the WHOQOL team for the establishment of the version for Taiwan, sound leadership and cooperation among psychometricians, clinicians, and statisticians in Taiwan to form a focus group for the development, careful selection of descriptors for each item [34], standard translation procedure (i.e., forward translation, backward translation, and reconciliation), as well as active participation from 17 hospitals/clinics throughout Taiwan [12]. Future studies are warranted for corroboration of our findings in people with other mental illnesses.
A substantial percentage (17.8% to 25.4%) of person misfit was found in our heroin-dependent patients, which is much higher than 5%, as suggested by Fisher et al. [28]. However, such high percentages might be largely explained by following reasons: First, all of the people in our sample had a diagnosis of opioid dependence, and they are frequently associated with mood problems and/or impaired cognition. Second, the less than 5% misfit was suggested for children without any mental health problems in cases where the objective ability to complete school tasks was being measured [28]. Because we assessed subject reported outcomes from patients, which are frequently affected by emotion and cognition [35,36], less than one-quarter of person misfit may indeed be acceptable. However, more studies are needed to corroborate our speculation.
Although the DIF analyses did not detect any item for HBV or HCV infection, we did find 5 DIF items for HIV infection. Namely, WHOQOL-BREF should be measured and interpreted in opioid users with and without HIV infection separately because 5 out of 26 items were DIF DIF contrasts were calculated as: logit of Group 1logit of Group 2; a positive value indicates that a patient in Group 1 has a higher item score than a patient who has the same QoL level in Group 2, and a negative value indicates that a patient in Group 1 has a lower item score than a patient who has the same QoL level in Group 2.
HIV: Human immunodeficiency virus. Reversely coded items are in italics.
items. In addition, we found one DIF item for educational level and another one for gender. The participants with less than a junior high school education seemed to overestimate eating satisfaction, or they seemed to be more easily satisfied than those with higher education and thus to report a higher score on item E9. Furthermore, the reason that females report an underestimated sexual life QoL may be due to their embarrassment or higher expectations. This study has three main limitations. First, all participants were recruited in the same MMT program in southern Taiwan, which prevents us from generalizing our results to the entire Taiwanese population. However, our results were comparable to those of another study [6] testing psychometric properties of WHOQOL-BREF in substance abusers from northern Taiwan. Therefore, the generalizability issue may not be serious. Second, the participants were recruited from an MMT program; thus, our results may not be applicable to substance abusers who do not seek anti-addiction treatment. Third, all participants in this study used heroin, and only some used other substances (e.g., methamphetamine, ketamine). Therefore, our results may be more representative of the heroindependent population and less representative of populations dependent upon other substances.
In conclusion, the WHOQOL-BREF is suitable to use for evaluating the QoL of substance abusers. It can also be used as a treatment outcome measure to evaluate the effect of treatments for substance abusers. However, those with and without HIV infection should be interpreted after stratification, and the three negatively worded items should be used with caution because substance abusers may have cognitive problems that may preclude them from having a full understanding of the meanings. Future research may apply cognitive interviews to determine the cognitive functioning of substance abusers and their interpretation of negatively worded items.