The goal of the current paper was to replicate and expand on the original psychometric assessment of the IBS-QOL when applied to an IBS-d-specific patient set. Our results indicate that male and female IBS-d patients who are highly compliant with daily diary entry and who have a minimal requirement for pain as well as explicit criteria for diarrhea as defined by the BSS share commonalities with a general population of non-subtyped IBS patients, but that the originally-proposed subscale structure doesn’t apply as well as one might anticipate to our patient set. The deviations observed from the original assessment could be attributed to the fact that we evaluated IBS-d patients or due to the much larger sample size employed here. Without such large-scale data on other IBS subtypes, it is difficult to discern the cause of the departures from the original analyses, but in the case that one or both differences are influencing the current results, it is still clear that the IBS-QOL performs well for IBS-d patients.
The item reduction criteria applied to the 34-item version of the IBS-QOL resulted in many items having high bivariate correlations, as defined as r ≥ 0.7. A possible factor influencing the high correlations between items could be due to priming or order effects, i.e., responses on subsequent items being influenced by earlier-answered items. As the IBS-QOL is a static instrument with only one item order presented to patients, however, testing whether priming influences responses by patients to single items is not possible.
Alternatively, high correlations between items could suggest that the items are measuring a single latent trait. Items 6 (“I feel like I’m losing control of my life because of my bowel problems”), 7 (“I feel my life is less enjoyable because of my bowel problems”), 9 (“I feel depressed about my bowel problems”), and 10 (“I feel isolated from others because of my bowel problems”) all showed a fairly high level of correlation with one another. The α-value for the overall sum scale of the IBS-QOL is also very high, suggesting redundancies across these items.
Similarly, Items 12 (“Because of my bowel problems, sexual activity is difficult for me”) and 20 (“My bowel problems reduce my sexual desire”) exhibited a high correlation with one another (r = 0.741) as expected. Both items make up the Sexual subscale and while the language of the two items respectively imply physical and psychological aspects of sexual activity, patient responses tended to suggest that one does not occur without the other.
There were several other pairs of items that exhibited high inter-item correlation values (cf., Table
1). Our results suggest that a possible future research path for the IBS-QOL is to explore whether a shortened version of the IBS-QOL targeted toward IBS-d could be constructed from the current items while maintaining its measurement properties and still being relevant to IBS-d patients. If items have redundancy, then one could conceive of an item pool that supplies items to each of slightly different versions of the IBS-QOL. Alternatively, specific cognitive debriefing may also help isolate whether any of these items are truly redundant or if items all closely measure HRQOL in IBS-d and simply represent very closely related aspects of IBS-d-related QOL.
Conversely, in IBS-d patients, a departure from the original validation analyses was not surprising either. For example, 72.8% IBS-d patients answered “Not at all” to Item 32 (“I fear I won’t be able to have a bowel movement”) at Baseline. This result fits, conceptually, with how patients should answer items that are not geared toward their IBS subset. This item, therefore, could be taken out of a targeted IBS-d instrument or, perhaps, could simply be included with a binary, “yes” versus “no”, response instead of the 5-point graded response set.
While some of the results suggested that certain items in the IBS-QOL may be candidates to remove if a reduced-item version were to be sought for IBS-d patients, other results support that the full set of items is relevant and psychometrically sound, consistent with conclusions of previous validation studies of the IBS-QOL. This result is not surprising given the extremely high value of Cronbach’s Coefficient α (α = 0.963). This is consistent with the interpretation of the bi-factor and EFA models because the common interpretation of Coefficient α analyses is that the items are internally consistent and therefore represent a unidimensional latent construct.
This conclusion is reinforced by a high observed average item-to-total correlation of 0.642. However, one of the limitations here is that modern applications of α analysis stretch interpretation of the statistic beyond its original intent
[28]. Coefficient α was intended to substitute alternate forms reliability—in which two equivalent forms of the instrument were to be administered and the results correlated with one another. As most instrument developers do not have the resources to develop two instruments together, Coefficient α was devised as a means of assessing agreement between an instrument and a theoretical one of same length, comprised of items randomly drawn from all possible content valid items. The coefficient, therefore, is laden with assumptions and also is, ostensibly, a lower bound for the theoretical true internal consistency of a measure. Many have criticized the use of α for this and other reasons
[29–31]. Further, while an α assessment assumes sum of item responses, the IBS-QOL standardizes responses to a 0-100 scale, so without further study, it is not clear how the scoring algorithm relates back to a simple sum score. Structural equation modeling techniques, e.g., extensions of the CFA models, actually offer the best alternatives to α and other individual indices as they are better equipped to handle multivariate item data
[32, 33]. However, any positive or negative bias around the α-value of 0.96 would likely still yield acceptable levels of consistency.
In terms of how the items structurally relate to one another at the instrument level, the fact that the oblique bi-factor model fits the data the best, and better than the orthogonal bi-factor model, suggests that the original factor structure is redundant to the total sum score because factors that are allowed to be correlated fit better with the data than hypothetically independent subscales. We do note, however, that more complex CFA models tended to fit better by both standard fit indexes and usual assessment of residuals and that, generally, increasing model complexity provides better fit in most statistical models. While the oblique bi-factor model accounted for a marginal amount of variance (GFI = 0.8616), an acceptable improvement in variance above a null model (CFI = 0.9073) was observed. The RMSEA index imposes a penalty for higher complexity models, thereby allowing us to infer whether the bi-factor models fit better according to other indices based on their complexity. The observed RMSEA value of 0.069, although moderate in size, comparatively supports the oblique bi-factor conceptualization of IBS-d, i.e., that the best of all CFA models fit is on with an overall latent factor supported by the original substructure whilst allowing the substructure factors to correlate with one another. The model fit may have room for improvement as 10.5% of residuals are outside of the preferred limits potentially indicating that some items may not fit well within the proposed structure.
The EFA model supports that there may be pairs or subsets of items of the IBS-QOL that group together more so than with others—an observation that is not surprising given the observed inter-correlations between items. Interestingly, though, the EFA fit did not produce a factor structure in line with the original substructure, suggesting that HRQOL may be qualitatively different for IBS-d as compared to non-subtyped IBS patients as a whole.
Despite the extraction of multiple factors from the analysis, the EFA fit actually further strengthens the interpretation that the IBS-QOL is unidimensional in IBS-d patients. This is because the EFA model has a large first eigenvalue (37.8) as compared to the second (3.8). Eigenvalues of extracted factors measure the amount of variance observed in the items making up that factor. Here the first extracted factor accounts for 79.5% of the total variance in the items. Further, a test of whether the items suggest a structure in which there is at least one common factor to all items was also significant [χ2(561) = 16,080.2, p <0.0001] implying that any structure extracted after that first factor is residual information that could enhance interpretation of the first factor, but one factor would be adequate to interpret the construct under study. This observation indicates that imposing the original factor structure
[6] is helping model fit, implying that the original subscale structure of the IBS-QOL seems to be beneficial in accounting for information above and beyond the total sum score. Furthermore, combined with the fit of the orthogonal bi-factor model results, one could conclude for IBS-d patients that the IBS-QOL may be measuring a unidimensional construct, both because of the need to allow factors to correlate and that the original substructure seems only approximately correct.
The CFA and EFA modeling, taken together, suggest that perhaps the best means of assessing the psychometric properties of the IBS-QOL would be to employ Item Response Theory (IRT) methods
[34]. IRT approaches estimate a latent construct via a joint model of the individual items. IRT models can also help determine if individual items are performing as intended within the IBS-QOL because relationships between items and the latent trait under study are estimated, directly.
In terms of test-retest reliability, the current analyses demonstrated good levels for the IBS-QOL total score in this regard. Both RΛ and RT exceed the traditionally-accepted reliability threshold of around 0.7 and were comparable to the ICC calculated by the original validation study. Both reliability measures employed here are similar to ICCs with slightly different interpretations. RΛ is the multivariate reliability of the sequence of scores while RT is the average reliability for the total score over any arbitrary number of administrations. Both will tend to increase for a consistent instrument with more administrations because additional information is being taken into account with each added administration. Therefore, with 3 post-Baseline administrations of the IBS-QOL, we have substantial evidence for good reliability of the total score. Contrastingly, even with less information, e.g., two administrations of the IBS-QOL, we would expect that a reliability level would still be approximately 0.75 by our estimates.
The analysis of IBS-QOL total scores with regard to responsiveness were consistent across effect size definitions for different paired comparisons, with moderate increases in effect sizes seen for higher doses of eluxadoline versus placebo. Interestingly, the pattern of effect size estimates suggest that the 100 mg dose of eluxadoline had the largest impact, the same conclusion as was reached by the analysis of clinical measures
[5] as defined in FDA’s 2012 IBS Guidance
[21]. This conclusion is bolstered by evaluating the cumulative proportions of change from Baseline to Week 12 scores for the IBS-QOL total score with better improvements seen at higher dose levels, specifically 100 and 200 mg. We especially note that within a wide range of improvement levels, the proportion of patients in the eluxadoline 100 mg group meeting given improvements was dramatically higher than those patients receiving placebo. This indicates that the observed treatment effect in the IBS-QOL total score is consistent. Visually, this result is apparent by the wide gap between the placebo and 100 mg eluxadoline lines on Figure
1.
Of note, all treatment groups showed large increases in IBS-QOL total scores at Week 12 as compared to Baseline. Even the Placebo group showed an approximately 17-point increase in total score—higher than the 14-point clinically-significant difference found by Drossman, et al
[10]. While further longitudinal study is warranted, we believe that the improvement may be due to natural cycling of disease or due to potential Hawthorne effects, ie, improvements by patients as a result of simply being observed. We do, however, also note that the treatment group differences approximate a dose response that peaks at 100 mg and plateaus with 200 mg. This pattern mimics that of the other outcome measures reported elsewhere
[5].
Our analyses suggest that a reduced-form IBS-QOL specific for IBS-d sufferers may improve measurement of IBS-related QOL for these patients. However, further research is necessary to determine which of the items may be ideally suited for a reduced form. We suggest that a better characterization of item-level properties of the IBS-QOL via IRT methodology would be helpful in determining an optimal item configuration.