Viability of a MSQOL-54 general health-related quality of life score using bifactor model

Background MSQOL-54 is a multidimensional, widely-used, health-related quality of life (HRQOL) instrument specific for multiple sclerosis (MS). Findings from the validation study suggested that the two MSQOL-54 composite scores are correlated. Given this correlation, it could be assumed that a unique total score of HRQOL may be calculated, with the advantage to provide key stakeholders with a single overall HRQOL score. We aimed to assess how well the bifactor model could account for the MSQOL-54 structure, in order to verify whether a total HRQOL score can be calculated. Methods A large international database (3669 MS patients) was used. By means of confirmatory factor analysis, we estimated a bifactor model in which every item loads onto both a general factor and a group factor. Fit of the bifactor model was compared to that of single and two second-order factor models by means of Akaike information and Bayesian information criteria reduction. Reliability of the total and subscale scores was evaluated with Mc Donald’s coefficients (omega, and omega hierarchical). Results The bifactor model outperformed the two second-order factor models in all the statistics. All items loaded satisfactorily (≥ 0.40) on the general HRQOL factor, except the sexual function items. Omega coefficients for total score were very satisfactory (0.98 and 0.87). Omega hierarchical for subscales ranged between 0.22 to 0.57, except for the sexual function (0.70). Conclusions The bifactor model is particularly useful when it is intended to acknowledge multidimensionality and at the same time take account of a single general construct, as the HRQOL related to MS. The total raw score can be used as an estimate of the general HRQOL latent score. Supplementary Information The online version contains supplementary material available at 10.1186/s12955-021-01857-y.


Introduction
Over the last two decades, health related quality of life (HRQOL) measures have been increasingly included into research studies of neurodegenerative disorders, including multiple sclerosis (MS) [1][2][3]. Importantly, HRQOL instruments can disclose aspects of disease which are not considered by standard clinical tools, and that would otherwise go unrecognized. In addition, HRQOL instruments can help clinicians appreciate patient priorities particularly in terms of treatment goals, facilitate physician-patient communication, and promote shared decision making [4].
The Multiple Sclerosis Quality of Life-54 items (MSQOL-54) inventory was designed to address the need for HRQOL measures to be used in quality of care and clinical effectiveness research. Thus, the MSQOL-54 comprehensively assesses the HRQOL of patients with MS, an unpredictable chronic neurological disorder which affects 2.8 million people worldwide [5,6].
Compared to other instruments, its main strength is that it combines a generic-and a disease-targeted approach. In fact, the MSQOL-54 is a multidimensional, MS-specific HRQOL instrument, based on the generic SF-36 [7] supplemented with 18 MS-specific items [8]. This approach allows to compare HRQOL in MS with that in other diseases and with the general population using the generic score, in addition to allowing a sensitive measure for within-disease comparisons.
In the validation work of the MSQOL-54, Vickrey et al. [8] reported a quite high correlation (r = 0.66) between the two composite scores. Given this correlation, it could be hypothesized that a unique total score of HRQOL may be calculated, with the benefit to provide patients, clinicians and researchers with a single overall HRQOL assessment, to assess for example, treatment response or modify treatment plan. In this very context, applying a bifactor model to the MSQOL-54 items could be particularly useful, as it is intended to acknowledge multidimensionality and, at the same time, take account of a single general construct [17], as the HRQOL related to MS is. The bifactor model may constitute an alternative to the more widely-used second-order models, or correlatedtraits [18]. By definition, the bifactor model is employed so that each item loads on a general factor and only one group factor, and the general and group factors are all uncorrelated to each other [18]. For each single item, the general factor captures what the item shares with all the other items and the group factor reflects what the item shares with the other items belonging to the same subscale, once the influence of the general factor has been removed. That is, all the covariation between items and all the covariation between subscale scores is captured by the general factor that is a broad latent dimension made of all the subscale contents. Bifactor modeling is generally used to test multifaceted constructs [17], and so far, has been used mainly in the area of intelligence research [19,20], and in the study of personality [21,22]. However, this has rarely been applied in neurology and MS research, except for a few studies [23][24][25].
In the present study, our primary aim was to apply the bifactor model to the MSQOL-54 items in order to verify whether a total HRQOL score could be calculated. Second, if the bifactor model fitted the data well, we aimed to evaluate the measurement invariance of MSQOL-54 items across age and gender.

Participants
To perform the present secondary analysis, we used data drawn from different datasets collected utilizing the MSQOL-54 within ongoing or completed projects conducted in Italy and Australia [26].
We obtained the data collected with the English version from the 'HOLISM study', an observational international study, whose methods and results have been reported elsewhere [27,28]. Briefly, participants from Europe, Australasia, North America, and other countries were recruited in 2014 via online platforms (e.g. websites, and forums involving MS patients, and social media). The study aimed to provide an overview of riskmodifying behaviors and current lifestyle of a large international cohort of MS patients to analyze the association between these variables and disease progression. Patients with ≥ 18 years, and who could undertake an English language survey were included. In the present study, we used baseline data from English-speaking countries only: 840 (41%) from North America, 797 (39%) from Australasia, and 427 (20%) from UK and Ireland.
We obtained the data collected with the Italian version from the datasets (i.e. baseline data for longitudinal studies/trials) of the following research projects: • The 'Care system project' [29,30] Lucia Foundation, Rome). Patients gave written or online informed consent to be included in the original projects. Additional consent was not required for this secondary analysis, for which patients' privacy and anonymity were guaranteed.
Records were included in the database if the following variables were available: MS diagnosed (according to any criteria, Italian sample) or disclosed by a physician (English-speaking sample); patient age ≥ 18 years; gender; level of disability (EDSS, Italian sample; PDDS [35], English-speaking sample), and disease duration.

Statistical analysis
The goodness of fit of the original second-order factor model comprising two factors, the novel second-order factor model comprising one factor, and the bifactor model was tested using confirmatory factor analysis (CFA).
According to the original factor structure of the MSQOL-54, in the two second-order factor model, it was hypothesized that 52 items loaded in 12 first-order factors and two second-order factors, corresponding to the PHC and MHC [8] (Additional file 1). The remaining two items (i.e. item 2 'Compared to one year ago, how would you rate your health in general now?' , and item 50 'Overall, how satisfied were you with your sexual function during the past 4 weeks?') were not included in this model, as well in the other models, because they are single items.
In the single second-order factor model, the first-order factors were the same as in the original model, and one second-order factor was imposed, called 'HRQOL' (Additional file 2).
In the bifactor model, it was hypothesized that 50 items loaded onto the general HRQOL factor and on their specific group factors, whereas the two items forming the overall QOL subscale (items 53 and 54) were loaded only onto the general factor, because the bifactor model needs each group factor to be composed of at least three items to be identified (Additional file 3).
Global fit of the models was evaluated with three approximate indices recommended by Kline [36], namely, the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the standardized root mean square residual (SRMR). As a rule of thumb, RMSEA under 0.08 represents good fit and values below 0.05 represent very good fit [37]; SRMR values under 0.08 indicate good fit, and values greater than 0.10 indicate poor fit [36]; concerning CFI, values above 0.95 are indicative of good fit [38], and, as for other incremental fit indices, values below 0.90 indicate that models "can usually be improved substantially" [39]. Akaike Information Criterion (AIC) [40] and Bayesian Information Criterion (BIC) [41,42] were used for model comparisons. The model with lower AIC and BIC values was chosen as the best model to fit the data.
To evaluate the relative strength of the general HRQOL factor to group factors, magnitude of loadings was considered (values ≥ 0.40 were considered satisfactory [43], and explained common variance (ECV) and percentage of uncontaminated correlations (PUC) were calculated [44]. A high ECV value or a moderate ECV value supplemented with a high PUC value (> 0.90) indicated that data were sufficiently "unidimensional" [45]. To judge the degree to which total raw scores reflected a common single factor, the McDonald's coefficient omega hierarchical (ω H ) was computed. High values meant that the total raw score was a reliable measure of the general factor. Further, to evaluate the reliability considering all sources of common variance (general and group factor), the McDonald's coefficient omega (ω) was calculated. Both omega hierarchical and omega were also calculated for each subscale to evaluate how much subscale scores were reliable measures of the corresponding specific latent variables, once items' common variance due to the general factor was removed (ω S ), and how reliable they were considering all sources of common variance.
Finally, we used CFA to evaluate the measurement invariance of MSQOL-54 across gender (male [26%]; female [74%]), and age (using the median of 44 years old as cut-off ). Three increasingly constrained levels of measurement invariance (i.e. configural, metric, scalar) were assessed using multi-group CFA. We used the same criteria as above to assess the model fit.
In line with Chen [45], a worsening of CFI exceeding the cut-off of 0.010, accompanied by a change of ≥ 0.030 in SRMR or a change of ≥ 0.015 in RMSEA was deemed a signal of lack of metric invariance; as regards the scalar invariance, the threshold values for RMSEA and CFI were identical to those used for metric invariance, whereas it was 0.010 for SRMR. To liken the fit of two nested models, the χ2 difference test was not employed, as it is responsive to sample size, therefore usually providing significant results with large sample sizes [45].
All models were estimated using the software Mplus 7.0 with the maximum likelihood estimation with robust standard errors (MLR) [46].

Results
The database consisted of 3669 MS patients (mean age 43.8 years [range , 74% women, 54% with a mild level of disability (measured with the self-reported PDDS), and mean disease duration of 7.2 years [0-48]) ( Table 1). Of these, 1605 (44%) were Italian (mean age 40.9 years, 62% women, 69% with a mild disability level) and 2064 English-speaking participants (mean age 46.1 years, 83% women, 54% with a mild disability level). Compared to Italians, English-speaking participants were older, had a higher percentage of women, and had longer disease duration (p < 0.001) ( Table 1).
The goodness-of-fit statistics of the three alternative CFA models are reported in Table 2.
The (original) two second-order factor model fit-     . Therefore, the bifactor 1 solution was inadmissible, being necessary to respecify a second bifactor model. In the 'bifactor 2' the three items of the social function subscale (20, 33, and 51) loaded onto the general factor only, and, to account for the group specificity of item 20 and item 33, residuals of these two items were allowed to correlate. This last model had satisfactory fit (RMSEA = 0.055; CFI = 0.892, RMRS = 0.062), and both AIC and BIC statistic values were better than those of the one and two second-order factor models (AIC = 1,710,637; BIC = 1,711,910; Table 2). Standardized factor loadings for the revised bifactor model are shown in Table 3.
All items loaded satisfactorily on the general (HRQOL) factor (loading ≥ 0. ECV value was 0.51 (indicating that 51% of the common variance was due to the general HRQOL factor) and PUC was 0.92, denoting that the data were sufficiently 'unidimensional' .
Omega value for the total raw score was 0.98, suggesting that the reliability considering all sources of common variance (general factor and group factors) was very high.
Moreover, omega hierarchical value of the general factor was 0.87, indicating that the total raw score was a reliability measure of the general HRQOL factor.
As shown in Table 4, for the majority of the subscales, omega hierarchical value (ω S ) was around 0.50, whereas it was very low (≤ 0.35) for three subscales (i.e. energy, health perceptions, and health distress)-meaning that summed scores of items belonging to these subscales were not a reliable measure of their respective domain latent variable once the general HRQOL was taken into account-and it was high (0.70) for sexual function subscale. For the latter subscale, it seems that the specific group factor accounted for more variance than the general factor, indicating that items belonging to this subscale were more likely to reflect a specific domain of HRQOL (related to sexual function) than a common general construct of HRQOL.

Measurement invariance
First, the model was estimated to evaluate the measurement invariance of MSQOL-54 across gender (Table 5, upper part). Results showed that the model produced an acceptable fit for configural invariance (RMSEA = 0.055; CFI = 0.892; SRMR = 0.063). Considering the model where loadings were imposed to be identical across gender, indices of fit were satisfactory, and worsening of the unrestrained model was insignificant (ΔRMSEA < 0.001; ΔCFI = − 0.006; ΔSRMR = 0.008), hence providing evidence of metric invariance. With regard to the scalar invariance (i.e. intercepts and loadings imposed to be invariant across groups), the model fitted the data well (RMSEA = 0.054; CFI = 0.885; SRMR = 0.063). Finally, examining the variations in fit indices when compared with the metric invariance model, cut-off values were met, supporting the scalar invariance. Second, the model was estimated to evaluate the measurement invariance of MSQOL-54 across age (using the median of 44 years as cut-off ) ( Table 5,  All the changes in fit indices across the models were satisfactory.

Discussion
As far as we know, this was the first study applying the bifactor model to the MSQOL-54 in a large international database of MS patients.
The bifactor model with one general HRQOL factor and 10 specific group factors achieved acceptable fit and outperformed both the original two second-order factor model and the single second-order factor model. Also, our findings supported measurement invariance of the questionnaire across age and gender, suggesting that it has the same meaning across these socio-demographic variables, and that patients having the same ratings on MSQOL-54 general or domain factors would attain the identical value on the observed variable, regardless of sub-group membership. Generally, the factor loadings were substantially high both on the general and the group factors, and the ECV was about 50%, indicating that MSQOL-54 items contribute to essentially the same extent to both the general HRQOL factor and to the group factors. Despite this, the data can be deemed sufficiently 'unidimensional' , because the MSQOL-54 consists of several subscales composed of few items each, and this implies that the vast majority of correlations between items (PUC = 92%) reflect general factor variance only. Furthermore, the satisfactory value of the coefficient omega hierarchical indicated that the total raw score is a reliable measure of the general HRQOL latent variable. Taken together, all these results  Table 4 Omega statistics for the MSQOL-54 total and subscales scores ω = scores reliability considering all sources of common variance (the general and the group factor); ω S (omega hierarchical subscale) = scores reliability considering only the common variance due to the group factor, that is the reliability of subscales scores, controlling for the effects of the general factor  support the hypothesis that the MSQOL-54 has a sufficient 'unidimensional' structure, and thus it is appropriate to calculate a total HRQOL score. Among the 52 items analyzed in the study-it is noteworthy to remember that items 2 and 50 were excluded from the analysis as they are single items-the weaker indicators of the general HRQOL dimension were the four items of the sexual function subscale. Considering the omega hierarchical value, the sexual function subscale is more likely to reflect a specific domain of HRQOL (namely related to sexual function) than a common general construct of HRQOL. In fact, this is the only subscale that showed an omega hierarchical value ≥ 0.70.
Another issue derives from the social function subscale. The three items of this subscale loaded onto the general factor only because one of them (item 51, dealing with bowel or bladder) was not a good indicator of social functioning, and a group factor needs at least 3 items to be identified. Thus, it was not possible to evaluate the contribution of the relative group factor. This study has important implications for clinical practice and research. For clinical practice, it could be crucial to provide health professionals and MS patients with feedback using a single HRQOL total score, which includes aspects of HRQOL not captured by the 10 group factors-as well as with subscale scores, to add granularity. The total HRQOL score could be useful also to identify patient subgroups-with different disease forms as well as levels of disability-in order to deliver personalized interventions addressing, for example, self-efficacy or resilience. On the other hand, for researchers, it could be easier to calculate and interpret a unique total HRQOL score, when using such measure in clinical trials or other research studies. Moreover, the present results can be a stimulus for future research aimed at revising the MSQOL-54 questionnaire. Specifically, our findings highlight the need to enlarge the number of items measuring the social function subscale, because one of the three items of this subscale was not a good indicator. Furthermore, we suggest revising the sexual function subscale items by broadening the content domain so as to include also intimacy and sexual pleasure, as three of the four items from this subscale originated from Medical Outcomes Study sexuality functioning scale which focus on performance indicators [47].
In the present study there were a number of limitations, some of which are reported elsewhere [26]. This secondary analysis was carried out in a large cross-sectional international MS database and should be confirmed in an independent sample, using a prospective longitudinal design. Stability of the factor structure was not established, as the data were not collected using longitudinal assessments. Further, criterion validity of the total HRQOL score should be assessed by correlating it with other pertinent questionnaires.

Conclusions
To conclude, this study adds new knowledge to the factorial structure of the MSQOL-54, in that a bifactor model fits the data well, outperforming the two secondorder models. Therefore, it is appropriate to calculate a total HRQOL score, including all the original subscales/ domains. Based on these results, in future research, items should be calibrated using item response theory in order to assess whether a multidimensional computerized adaptative version of the MSQOL-54 is feasible. Further work to integrate / revise selected items is suggested.