This study confirms the conceptual model of the WHODAS-2, which has shown good metric properties among patients with chronic conditions in Europe in the MHADIE project: a very high reliability, good ability to discriminate among known groups and adequate capacity to detect change over time. Therefore, these results support the adequacy of the WHODAS-2 to measure disability in a wide range of physical and mental disorders.
The goodness of fit indices obtained with the CFA models together with the high factor loadings confirmed the 7 domain structure of WHODAS-2 and the global score , as proposed by developers. Only some concerns should be raised. The RMSEA wasn't below the standard as recommended. CFA modification indexes (data not shown) suggested that the structural model behind data may be improved if some items from 'Participation in Society' domain were relocated on some of the other factors. Nonetheless, accepting the original structure proposed by developers would improve comparability with past and ongoing WHODAS-2 studies. Therefore, we suggest using the structure of the WHODAS-2 as it is now known, taking into account the expert-based validity criteria originally applied and that, despite the described concerns, our findings confirmed it on a heterogeneous sample. Moreover, the structure is quite consistent with previous results, both from specific populations[23, 24] and from the modified version.
The low proportion of missing values suggests the easy completion for a wide range of patients, indicating the high feasibility of WHODAS-2. A great percentage of missing data was only found at the domain of activities at work or school (50.3%), which is clearly related with the proportion of respondents neither working nor being students. The moderate percentage of patients with the best possible score in several domains suggests the possible unsuitability of the WHODAS-2 to differentiate among very low grades of disability. This may not be a limitation for measuring disability on patient samples, but one should be cautious when using it on other samples such as general population, which has earlier shown a very high ceiling effect. Nonetheless, the distribution of the 'Participation in society' score merits a comment. No patient has the worst possible score (floor effect) and presents the lowest ceiling effect (11%), indicating that this domain is able to characterize a wide range of scenarios and is perhaps reflective of the final common pathway in which disability is manifested in the societal context.
The high internal consistency coefficients indicate good reliability. All of them were above the standard proposed for group comparisons (0.7) , which is consistent with findings from previous studies[23, 15, 19, 21, 22, 24]. It is also remarkable that internal consistency coefficient for the global score reaches the most strict standard recommended for individual comparisons of 0.95. Reproducibility was acceptable, with the exception of the 'Getting around' domain (ICC = 0.19). Due to the long test-retest period, patient's mobility may have improved or worsened over 6 weeks, even though disease severity did not change substantially. The only study in which stability of the WODAS-2 has been assessed, presented excellent ICC coefficients (0.82-0.96) on patients with inflammatory arthritis.
The WHODAS-2, as designed for covering disability, measures the restrictions on daily life activities and social participation, while the Short form-36 Health Survey addresses patients' physical and mental health. The moderate magnitude of the associations among the two instruments is reflecting how the WHODAS-2 and the SF-36 measure different aspects of related concepts (disability and HRQL, respectively). In fact, coefficients found in previously published studies[23, 15–18, 20, 21] were fairly similar to ours. These findings support the validity of WHODAS-2 to measure disability and its use as an outcome which complements HRQL.
The WHODAS-2 is able to detect differences between clinical-severity groups. Those patients classified as severe reported worse disability scores than mild patients, with a large difference for most of the health conditions (66%), and a moderate difference for 25% of them. Poor discrimination ability among severity groups were found only for 3 of the WHODAS-2 domains ('Getting along with people', 'Life activities household' and 'Life activities work or school'). Beside this, the instrument detects differences between patients who were working at the time of the study and those who were not working due to their health condition. This is the first time that such an ability is evaluated on the WHODAS-2, and is specially remarkable when talking about disability, probably more than being able to differentiate among severity groups (which has also been shown in other studies[15, 16, 22, 23]).
Coefficients of change at 3 months were moderate or low for all domains. However the WHODAS-2 sensitivity to change may be under-estimated in our study due to the MHADIE patients' characteristics and design, such as the chronic profile of the conditions, and not being an evaluative intervention study. Moreover, this pattern of low improvement, also presented by the SF-36 (no physical change and moderate mental improvement), an instrument which has extensively demonstrated good responsiveness[52, 21], is indicating the lack of a real great improvement in our sample rather than a problem of WHODAS-2 to detect change over time. In fact, a previous study has demonstrated how the WHODAS-2 is quite responsive (ES = 0.65) when change is measured after starting a treatment.
This study's results should be interpreted taking into account some limitations. Firstly, the study was not specifically designed for evaluating responsiveness, since the optimum design for this should include an intervention which would produce a clear improvement or an event closely related to deterioration. However, assuming that a change in severity would be accompanied by a change in self-perceived disability, patient improvement was measured indirectly due to the lack of a gold standard for disability change. Secondly, the interval for test-retest evaluation is longer than the standard period used to assess reproducibility. However, the selection strategy applied assured the needed stability and ICC coefficients showed agreement between evaluations. Moreover, it should be noted that different WHODAS-2 linguistic versions have been administered regarding the country setting, but analyzed as a whole. To test the equivalence of these versions, differential item functioning (DIF) analysis would be required . However, it was not possible in our study because of the sample design, where most of the health conditions were recruited only in one country, making impossible to differentiate the effect of these two variables. Finally, other minor limitations are related to version differences. The SF-36 v2 was used for Spanish patients with psychiatric disorders but, as version 1 and 2 of the SF36 are quite similar, no impact on results was expected. On the other hand, proxy versions used on those patients unable to respond were negligible.