Skip to main content

DEMQOL and DEMQOL-Proxy: a Rasch analysis

Abstract

Background

DEMQOL and DEMQOL-Proxy are widely used patient reported outcome measures (PROMs) of health related quality of life in people with dementia (PWD). Growing interest in routine use of PROMs in health care calls for more robust instruments that are potentially fit for reliable and valid comparisons at the micro-level (patients) and meso-level (clinics, hospitals, care homes).

Methods

We used modern psychometric methods (based on the Rasch model) to re-evaluate DEMQOL (1428 PWDs) and DEMQOL-Proxy (1022 carers) to ensure they are fit for purpose. We evaluated scale to sample targeting, ordering of item thresholds, item fit to the model, and differential item functioning (sex, age, relationship), local independence, unidimensionality and reliability on the full set of items and a smaller item set.

Results

For both DEMQOL and DEMQOL-Proxy the smaller item set performed better than the original item set. We developed revised scores using the items from the smaller set.

Conclusions

We have improved the scoring of DEMQOL and DEMQOL-Proxy using the Rasch measurement model. Future work should focus on the problems identified with content and response options.

Background

DEMQOL and DEMQOL-Proxy [1,2,3] are well known and widely used patient reported outcome measures (PROMs) for measuring health related quality of life (HRQL) in people with dementia (PWDs). DEMQOL and DEMQOL-Proxy provide the means to assess HRQL at all stages of dementia severity. DEMQOL is self-reported by the PWD and is appropriate for use in mild/moderate dementia, DEMQOL-Proxy is proxy-reported by a family carer on behalf of the PWD and can be used at all stages of dementia. The two instruments are intended to be used together.

The original development of DEMQOL and DEMQOL-Proxy was grounded in strong methodology and robust psychometric principles [1, 3]. However, the use and application of PROMs is changing. In addition to their use in randomised controlled trials (RCTs) and other evaluative studies, there is a growing interest in the use of PROMs as part of routine monitoring of the quality of health and social care [4]. Routine use of PROMs provides an opportunity to help drive changes in how health and social care are organised and delivered [5] and to improve quality. Consequently, it is necessary to re-evaluate the measurement properties of PROMs to ensure that they are fit for these new purposes. To this end, in this paper we report the re-evaluation of the psychometric properties of DEMQOL and DEMQOL-Proxy.

Modern psychometric methods such as those based on Item Response Theory (IRT) [6, 7] and Rasch Measurement Theory (RMT) [8, 9] provide more stringent psychometric methods than traditional methods derived from Classical Test Theory (CTT). Since April 2009 PROMs data are routinely collected for some elective surgical operations in England [4, 10] and similar use is under consideration for other conditions, including dementia [11, 12]. Methodological work has been undertaken to apply IRT or RMT to the measures for routine use [13,14,15,16,17,18], but for measures of HRQL in dementia this has been limited. Rasch methods have been used with DEMQOL and DEMQOL-Proxy as part of the development of a health state classification system for DEMQOL-U and DEMQOL-Proxy-U [19]. No work has yet used Rasch methods to evaluate the whole set of DEMQOL/DEMQOL-Proxy items in terms of the overall score.

The measurement of outcomes in dementia is challenging. Cognitive impairment can make it difficult for a PWD to provide a reliable self-report on their HRQL and it may be necessary to rely on a proxy report from a family member. Yet, also proxy reports are methodologically challenging; proxies find it difficult to separate their own experience from that of the patient and for more subjective constructs, such as HRQL, PWD-proxy agreement is likely to be lower [20]. These challenges mean it is important that we apply the best available methodological techniques to ensure that the dementia specific outcome measures used in health services research, health care monitoring and individual clinical management are of the highest quality.

Modern psychometric approaches such as IRT and RMT have four advantages over CTT [21, 22]. The scores obtained are invariant, i.e. independent of the sampling distribution of the items used and locate items in a scale independent of the sampling distribution of the people in whom the scale is derived. They generate individual (rather than group) standard errors that clarify the degree of confidence in individual’s scores. Since scores are invariant there is greater potential to measure clinically meaningful differences. Finally, missing data can be dealt with more efficiently. Both IRT and RMT use mathematical (logit) models to improve the measurement properties of scores derived from questionnaires but they differ in the approach to data that do not fit the model: IRT tends to add parameters to the model whereas RMT investigates the data to identify why the misfit occurred. We used RMT to evaluate DEMQOL and DEMQOL-Proxy because the Rasch paradigm allows us to achieve interval scales, to identify potential anomalies with items and response scales, and at the same time, keep the conceptual framework on which the items are based central. This is important to ensure content validity and to produce scores that are clinically meaningful. Anomalies that are identified within the Rasch paradigm can help us to understand which particular items and response options are candidates for improvement. It also allows us to begin to build an evidence base about the extent to which instruments achieve invariant comparison. For example, differential item functioning (DIF) helps us to understand if any items are biased in favour of particular groups of the population. DEMQOL/DEMOQL-Proxy include a range of items about different aspects of daily life which arguably could also be affected by the aging process itself, gender roles and expectations and the deteriorating nature of dementia where eventually patients lose insight about their condition. Our analyses therefore enable us to understand which (if any) items are responded to differently by people of different ages, gender and severity.

Methods

Sample

The data were collected within a large study investigating the impact of Memory Assessment Services (MAS) on HRQL of PWDs [23]. Each of 78 MASs, geographically spread across all regions of the country and representative of all MASs in England, recruited up to 25 consecutive patients with suspected dementia who were attending for a first referral (either at the clinic or at a home visit) and their family carers (if present). Patients or carers with insufficient English to understand the consent procedure or study materials were not eligible for inclusion in the study.

Instruments

DEMQOL consists of 28 questions and DEMQOL-Proxy consists of 31 questions, each assessed on a 4-point Likert-type response scale: a lot, quite a bit, a little, not at all. The questions were derived from five conceptual domains: health and well-being, cognitive functioning, daily activities, social relationships and self-concept [2] and with the exception of the emotion items all have the stem, “How worried have you been about…..”. There is also an additional overall quality of life question, answered on a 4-point scale: very good, good, fair, poor. The items are scored according to a standard scoring algorithm [24] to produce an overall score where higher scores represent better HRQL. See Smith et al. [1,2,3] for details on the development and CTT-based validation of DEMQOL and DEMQOL-Proxy.

Data analysis

The use of modern psychometrics (IRT or Rasch methods) brings the opportunity to achieve more robust measurement by applying a mathematical approach to deriving scores based on a logit model. Modern psychometric methods are based on the relationship between a person’s location on the construct being measured (in this instance the level of their HRQL) and their probability of responding positively to each item. In contrast, traditional methods (such as Classical Test Theory) focus on the relationship between a person’s location on the construct and their observed total score on the scale. Thus, the analysis enables us to consider whether a measurement “ruler” has been successfully constructed. We evaluate this by considering whether i) response categories work as intended (threshold ordering); the items map out a continuum that is relevant to the people being measured (targeting); iii) the items work together (item fit); iv) responses to one item bias responses to another item (response dependency); v) performance is stable across relevant groups (differential item functioning (DIF); vi) items in the instrument represent a reliable unidimensional construct. The unique position of the Rasch paradigm is that when the data do not fit the model, the data (as opposed to the model) are scrutinised to determine the reasons why and to identify ways in which the items and/or response scales can be improved. Rasch based methods therefore provide a powerful set of diagnostic techniques which, although also generating more robust scores, can also highlight ways to improve the instruments in the future.

We conducted a Rasch analysis using RUMM2030 software to identify potential anomalies in the data indicating aspects of the instruments that were not working as intended [25]. Although all the items have the same 4-point Likert type scale, the unrestricted (partial credit) model was used as this was a diagnostic analysis and we wanted to evaluate whether each response scale was actually used similarly to each of the others.

All of the analyses were initially conducted for all items (28 for DEMQOL and 31 for DEMQOL-Proxy) and subsequently for a slightly smaller set of items that excluded the positive emotion items (23 items remaining for DEMQOL and 26 items remaining for DEMQOL-Proxy) as our early analyses [3] and preliminary work on this dataset (including parallel factor analysis – see Appendix) indicated that these items were conceptually different (trait items) and therefore represented a distinct dimension from the other items. We did not consider other reduced item sets because our aim was not to derive a shorter version of the scale. Rather we aimed to retain as many of the original scale items as possible and evaluate their performance. Because the sample was large, all estimates were based on the full sample, but to avoid type 1 error, the sample size was adjusted (N = 500), within the RUMM programme, before calculating significance tests (p-values).

Targeting

Scale-to-sample targeting concerns the match between the range of HRQL measured by the DEMQOL items (and DEMQOL-Proxy items) and the range of HRQL in the sample of PWDs. This was evaluated by comparing the spread of person and item (threshold) locations.

Ordering of item thresholds

We evaluated whether the response categories were working as intended by a visual inspection of the threshold map. As each item has four response categories, there are three thresholds per item, which should be ordered logically. Disordered thresholds can indicate where respondents have misunderstood or been unable to use response categories consistently. Collapsing (or re-scoring) the disordered thresholds can help to provide an indication of how response categories can be improved.

Item fit

The overall fit to the model was evaluated using chi-square. The fit of each item to the Rasch model was evaluated both statistically – fit residual within +/−2.5, chi-square statistic (Bonferroni corrected significance level) – and graphically (visual inspection of the item characteristic curve (ICC)). No single piece of information can confirm the fit of an item to the model and it is important therefore to consider all the evidence together.

Differential item functioning (DIF)

DIF is concerned with the extent to which different groups within the sample exhibit different scores for the same amount of the construct being measured. In this analysis for DEMQOL groups were defined as follows: PWD sex, PWD age group (quartiles), and disease severity (≥ 24 versus <24 MMSE or equivalent based on published cut offs indicating dementia). For DEMQOL-Proxy we additionally defined groups according to the sex and age group (quartiles) of the carer and relationship to the PWD (spouse, son/daughter, other). We used ANOVA to evaluate both main effects for these groups (uniform DIF) and interactions between these groups and the class intervals (non-uniform DIF). The presence of uniform DIF can be corrected by calibrating problem items separately for each level of the group (known as “splitting” items). Items showing non-uniform DIF may need to be investigated and/or removed from the item set.

Local independence

The extent to which each item was independent of the others was evaluated by examining the residual correlation matrix. Pairs of items where the residuals were correlated >0.3 were flagged. In the short term, the presence of response dependence can be corrected by considering each pair of dependent items to identify which is conceptually higher order. The lower order item is then calibrated (or “split”) by each level of the higher order item [26]. This avoids the need to remove items and further compromise content validity.

Unidimensionality

Item analysis by the Rasch model assumes unidimensional data. This was evaluated by prior factor analysis (Appendix) and principal components analysis (PCA) of the residuals to determine if there are any other identifiable dimensions in the data after the main “Rasch dimension” has been taken into account. If there is no interpretable pattern in the residuals then unidimensionality can be said to be supported [27]. Two subsets of four items were created from the highest and lowest loadings on the first principal component and a series of independent t-tests used to investigate whether the estimates for these two subsets differed significantly (percentage of individual t-tests outside the range ± 1.96). We computed Wilson 95% confidence intervals [28], as recommended by Brown, Cai, and DasGupta [29].

Reliability

Reliability was evaluated using the Person Separation Index (PSI), which is similar to Cronbach’s alpha. A value >0.7 is considered adequate.

Rasch model based (logit) scores and their benefit

For both DEMQOL and DEMQOL-Proxy, we re-scored items with disordered thresholds (i.e. combining response categories as necessary). In addition, we resolved the items showing DIF (i.e. by splitting the relevant item and creating new items, one for each level of the person factor showing DIF) and/or local dependency (i.e. splitting the dependent item by the levels of the higher order item). We then generated Rasch model based scores (logits) for both resolved and unresolved versions. If the two versions were highly correlated, we retained the unresolved versions. The benefit of these scores over the raw scores was assessed by plotting them against the raw (original classically derived) scores. When the Rasch model based scores are different to the raw scores this will tend to give an ogive (“S”-shaped) curve.

Results

Descriptive characteristics of the sample

DEMQOL was completed by 1428 people with suspected dementia: 52% female, age range 42–98 years (mean age = 77.9, SD = 8.5) and 95% White or White British. DEMQOL-Proxy was completed by 1022 accompanying carers: 69% female, age range 16–94 years (mean age = 65.9, SD = 13.6), and 95% White or White British. Carers were predominantly the spouse (61%) or son/daughter (29%) of the PWD. Details of the sample are presented in Table 1

Table 1 Demographic characteristics of PWD and carer

Overall fit to the model

For both DEMQOL and DEMQOL-Proxy the overall chi square statistic was non-significant (p = 0.99 and p = 0.11 respectively) suggesting that for both scales the data fit the model.

Targeting

Original item sets (DEMQOL and DEMQOL-Proxy)

For both DEMQOL and DEMQOL-Proxy, targeting of persons to item threshold locations could be improved (see Fig. 1a and b, respectively). In both cases, the spread of person locations (DEMQOL: SD = 0.915, DEMQOL-Proxy: SD = 0.888) covered the spread of item threshold locations well, though there was a lack of item thresholds at the high ends of the continuum.

Fig. 1
figure 1

Person-item threshold location distribution for DEMQOL (a) and DEMQOL-Proxy (b)

Smaller item sets (DEMQOL and DEMQOL-Proxy)

For DEMQOL (23 items) (Fig. 2a) the range of item threshold locations is clearly smaller compared with the full set of items. For DEMQOL-Proxy (26 items) (Fig. 2b) the range of item threshold locations stayed almost the same because in contrast to DEMQOL, the highest located item thresholds included a wider range of items than just positive emotion items.

Fig. 2
figure 2

Person-item threshold location distribution for DEMQOL (23 items) (a) and DEMQOL-Proxy (26 items) (b)

Ordering of item thresholds

Original item sets (DEMQOL and DEMQOL-Proxy)

Five DEMQOL items and four DEMQOL-Proxy items showed response options not working properly (disordered thresholds). For DEMQOL these were having been worried about: a) not having enough company, b) how you get on with people close to you, c) getting the affection that you want, d) getting help when you need it, and e) getting to the toilet in time. For DEMQOL-Proxy these were having been worried about: a) keeping him/herself clean (e.g. washing and bathing), b) keeping him/herself looking nice, c) using money to pay for things, and d) looking after his/her finances. For all of these items we found that the middle two categories (“quite a bit” and “a little”) were not used as intended.

Smaller item sets (DEMQOL and DEMQOL-Proxy)

For DEMQOL (23 items), the same five items as in the original item set showed disordered thresholds. For DEMQOL-Proxy (26 items) we found one item less than in the original item set: having been worried about looking after his/her finances was no longer flagged. This may be due to the slightly smaller sample size (N = 1021) available for this analyses.

Item fit

Original item sets (DEMQOL and DEMQOL-Proxy)

No DEMQOL or DEMQOL-Proxy items showed misfit to the model, considering the fit residuals, chi square values and the ICCs together (Table 2). However, four of the five DEMQOL positive emotion items (felt lively, full of energy, confident, cheerful, enjoying life) were among the items with the highest average threshold locations; the two highest (felt lively, full of energy) also showed large fit residuals (> +/− 2.5) and non-optimal fit to the ICC. We found this pattern largely replicated in DEMQOL-Proxy (Table 3), in particular for (felt) full of energy, lively and –to a lesser extent—cheerful.

Table 2 Diagnostic statistics for the original item set of DEMQOL (28 items)
Table 3 Diagnostic statistics for the original item set of DEMQOL-Proxy (31 items)

Smaller item sets (DEMQOL and DEMQOL-Proxy)

None of the 23 DEMQOL items nor the 26 DEMQOL-Proxy items showed misfit to the model, considering the fit residuals, chi square values and the ICCs together (Tables 4 and 5). However, items that showed large fit residuals (> +/− 2.5) in the original item sets now tended to fit slightly better for both DEMQOL (23 items) and DEMQOL-Proxy (26 items).

Table 4 Diagnostic statistics for the smaller item set of DEMQOL (23 items)
Table 5 Diagnostic statistics for the smaller item set of DEMQOL-Proxy (26 items)

Differential item functioning (DIF)

Original item sets (DEMQOL and DEMQOL-Proxy)

None of the DEMQOL items showed significant main effects (uniform DIF) for PWD sex, age group or severity. Three DEMQOL-Proxy items showed significant main effects. The item “feeling irritable” showed a significant main effect for patient age (carers of younger people report more irritability), patient sex (carers of men with dementia report more irritability) and relationship to the carer (spouse carers tending to report more irritability). The item “worried about forgetting what day it is” showed a significant main effect for severity (carers of people with MMSE scores <24 tending to report more worry about forgetting what day it is). The item “worried about not having enough company” showed a significant main effect for patient sex (carers of women with dementia reporting more worry about not having enough company), relationship to the carer (other carers tending to report more worry about not having enough company) and carer age (general trend for younger carers to report more worry about not having enough company). There were no significant interactions for any of the groups by class intervals.

Smaller item sets (DEMQOL and DEMQOL-Proxy)

None of the 23 DEMQOL items showed significant main effects for PWD sex, age group or severity. Three of the 26 DEMQOL-Proxy items showed significant main effects (uniform DIF). The item “feeling irritable” showed a significant main effect for patient sex (carers of men with dementia reporting more irritability) and patient age (carers of younger people reporting more irritability) and relationship to the carer (spouse carers tending to report more irritability). The item “worried about forgetting what day it is” showed significant main effects for severity (carers of people with MMSE scores <24 tending to report more worry about forgetting what day it is). The item “worried about not having enough company” showed significant main effects for patient sex (carers of women with dementia reporting more worry about not having enough company), carer age (younger carers tending to report more worry about not having enough company) and relationship to the carer (carers who are not a spouse reporting more worry about not having enough company). There were no significant interactions for any of the groups by class intervals.

Local independence

Original item sets (DEMQOL and DEMQOL-Proxy)

Four pairs of DEMQOL items showed local dependency; the correlations were 0.36 (felt cheerful/that you are enjoying life), 0.39 (felt lonely/worried about not having enough company), 0.46 (worried about how you get on with people close to you/getting the affection that you want) and 0.53 (felt full of energy/lively), respectively, see Table 2. Fourteen DEMQOL-Proxy items showed local dependency, with correlations ranging from 0.31 (e.g. felt frustrated/fed-up) to 0.66 (felt full of energy/lively), see Table 3.

Smaller item sets (DEMQOL and DEMQOL-Proxy)

In the smaller item set for DEMQOL two residual correlations >0.3 remained (Table 4): felt lonely/worried about not having enough company (0.40) and worried about how you get on with people close to you/getting the affection that you want (0.41). For DEMQOL-Proxy in the smaller item set we found 11 residual correlations >0.3 (Table 5). The largest ones were between felt sad/fed-up (0.42), having been worried about using money to pay for things/looking after his/her finances (0.47) and keeping him/herself clean/ looking nice (0.64); the large residual correlation between felt sad/fed-up was new.

Unidimensionality

Original item sets (DEMQOL and DEMQOL-Proxy)

Neither the 28 items in DEMQOL nor the 31 items in DEMQOL-Proxy formed a unidimensional scale. The PCA/t-test protocol showed that for DEMQOL the two subsets of measurements differed significantly for 12.3% [10.7; 14.1] of the cases at the 5% level and for 3.0% [2.0; 4.3] of the cases at the 1% level. For DEMQOL-Proxy they differed significantly for 12.0% [10.1; 14.1] at the 5% level and for 3.0% [1.9; 4.7] at the 1% level.

Smaller item sets (DEMQOL and DEMQOL-Proxy)

The smaller set of 23 items in DEMQOL formed an acceptably unidimensional scale though the smaller set of 26 items in DEMQOL-Proxy were still not unidimensional. For DEMQOL the two subsets of measurements differed significantly for 7.1% [5.9; 8.6] of the cases at the 5% level and for 1.1% [0.6; 2.1] of the cases at the 1% level. This is marginally more than can be expected by chance alone and is satisfactory, taking into account the large sample size [30]. For DEMQOL-Proxy the two subsets of measurements differed significantly for 11.9% [10.0; 14.0] of the cases at the 5% level and for 3.0% [1.9; 4.7] at the 1% level.

Reliability

Original item sets (DEMQOL and DEMQOL-Proxy)

For DEMQOL PSI = 0.90, for DEMOL-Proxy PSI = 0.91, suggesting that both instruments discriminate well among people in terms of their HRQL (i.e. high reliability).

Smaller item sets (DEMQOL and DEMQOL-Proxy)

The smaller item sets showed similar PSI statistics. For the smaller set of 23 DEMQOL items PSI = 0.87, and for the smaller set of 26 DEMQOL-Proxy items PSI = 0.91.

Rasch model based (logit) scores and their benefit

We derived Rasch model based scores for the smaller item sets (23 items for DEMQOL and 26 items for DEMQOL-Proxy) because of their generally better performance. For DEMQOL, we re-scored the five items with disordered thresholds. In addition, we resolved the two items that showed response dependency. Person location estimates with and without resolving for response dependency correlated ICC = 0.99, therefore, we kept the original estimates.

For DEMQOL-Proxy, we re-scored three items with disordered thresholds. In addition, we resolved the 11 items that showed response dependency and the three items that showed DIF were split. Person location estimates with and without resolving for these issues correlated ICC = 0.97, therefore, we kept the original estimates.

The plots showing the benefit of the Rasch model based scores are shown in Fig. 3. The S-shaped curve clearly indicates that at the extremes of the distribution there is benefit from deriving the Rasch model based scores. For both DEMQOL (23 items) and DEMQOL-Proxy (26 items), a 10-point increase in terms of raw scores corresponds to a variable amount of increase in terms of logits, dependent on the person’s location on the raw score scale.

Fig. 3
figure 3

Relationship between raw scores and measurements (logits) for DEMQOL (23 items) (a) and DEMQOL-Proxy (26 items) (b)

Discussion

We have improved the scoring of DEMQOL and DEMQOL-Proxy using RMT and developed scores that can provide more robust and meaningful estimates of change and in addition are potentially appropriate for use with individual patients as part of the clinical decision making process. Neither of these were possible with the original CTT based scores. We have also identified a set of items about positive emotion included in the original questionnaires that do not have strong measurement properties. These items need further qualitative investigation to understand how they could be written more appropriately. In addition, we have identified that the response options may not be as easy for respondents to use as was originally reported. This also needs further qualitative investigation. Nonetheless using the new Rasch-based scores will potentially mean that at the group level evaluative studies will be able to report estimates of change that are more precise. Consequently, decisions based on these studies will be more robust and more easily justified. For example, while many researchers using CTT-based scores assume that points on the scale are equally distanced [31] (i.e. interval) in fact their level of measurement is merely ordinal. There is no information about the actual distances between points on the scale. Consequently change scores derived from ordinal scores (e.g. at baseline and follow up) can be difficult to interpret, as the distance between points on the scale may be different at baseline compared with follow up.

We are not advocating that a shorter version of DEMQOL/DEMQOL-Proxy should be administered. DEMQOL and DEMQOL-Proxy are already widely used and should continue to be administered in the standard form (28 items for DEMQOL and 31 items for DEMQOL-Proxy). The improved scores derived here can be calculated for existing datasets or for new data collected using the standard questionnaires. The three available scores for DEMQOL/DEMQOL-Proxy (original classically developed scores, DEMQOL-U /DEMQOL-Proxy-U and the new Rasch developed scores reported here) are based on the same conceptual framework [2]. Each score is a trade-off between measurement for a particular purpose and content validity. Future users should choose the measure appropriate for their purpose. The removal of the positive items in the Rasch scores does not mean that they are unimportant for HRQL in dementia, merely that in their current form and when combined with the other items in the scale, these items do not work as they were intended. Future qualitative work should investigate how these items could be improved to enable them to be retained in the scores.

Future work should also evaluate the effect of Rasch scoring (as described here) on the evaluation of change using DEMQOL and DEMQOL-Proxy. This could be retrospective using existing datasets or prospective. The s-shaped curve in Fig. 3 suggests that most difference between the original scores and the new Rasch scores will be seen at the extremes of the distribution. In a normal sample distribution, the effect of the Rasch scores at the group level may therefore be small. The Rasch scores however, provide added potential for use at individual level.

Our removal of the positive emotion items means there is item content that had been identified as important to PWD’s HRQL [2] that is not represented in the new Rasch scores for DEMQOL and DEMQOL-Proxy. A similar issue also occurred in the development of the original DEMQOL and DEMQOL-Proxy scales [1, 3] in that the items representing the domain of self concept were removed at the item reduction stage. Both of these are examples of the trade off that can sometimes occur between content validity and measurement properties. Our recommendation is that future work should prioritise investigation of the wording of both the “positive emotion” and “self concept” items to develop better ways of asking these questions within the questionnaire format. The targeting diagrams suggest that there are some parts of the continuum of HRQL not represented by items in the questionnaire, particularly at the “higher” end of the HRQL scale. Further qualitative work is needed to investigate these two issues. This further understanding of the construct of HRQL that underlies DEMQOL and DEMQOL-Proxy would also help to improve the apparent lack of unidimensionality of the items in the Rasch based DEMQOL-Proxy score.

Secondly, for some items, response options appear not to work as intended (i.e. disordered thresholds). The category probability curves and item threshold locations suggest that this may be because respondents do not distinguish between the two categories at the extremes of the response scale (i.e. between “a lot” and “quite a bit” and between “a little” and “not at all”). Alternatively, the labels of the two middle categories (“quite a bit” and “a little”) may not be meaningful. We have temporarily resolved this issue by re-scoring the items as dichotomous items by collapsing the two categories at either end of the response scale, but future work should investigate why for some items the response categories are not working.

Although this analysis improves the scores of DEMQOL and DEMQOL-Proxy, the progressive severity of dementia presents additional measurement challenges. In particular, with increasing severity there is likely to be a point where self-report of HRQL is no longer possible. Using DEMQOL-Proxy partially solves this problem, but it is well known that agreement of self and proxy reports is relatively low for subjective, non-observable constructs such as HRQL [20]. One of the possible reasons for lack of agreement between self- and proxy- reports is that the two different reporters use different constructs to define what we call HRQL. Further analysis using the Rasch model could build on these results to address this problem by equating the Rasch scores reported here for DEMQOL and DEMQOL-Proxy to determine if they can be placed on a single scale. Equating would evaluate whether DEMQOL and DEMQOL-Proxy can be placed on a common metric and therefore whether the two instruments actually measure the same construct. If this were the case, then DEMQOL-Proxy scores could be used with confidence even when self-report was no longer possible.

The current analyses were conducted on a large, representative sample of people attending a first appointment at MAS for suspected dementia [23]. The benefits of the Rasch analysis reported here are therefore based on data from people with relatively mild cognitive impairment and their carers. Future developments should investigate the effect on the model fit of including people with more severe cognitive impairment in the sample (particularly for DEMQOL-Proxy). Further, as the questionnaires are standardised instruments, developed in English, people without enough English language to understand and complete the questionnaire were excluded from the study. It was therefore not possible to investigate DIF by ethnic groups and we do not know whether and to what extent items within DEMQOL/DEMQOL-Proxy are affected by the ethnic status of the participants.

Conclusion

We have established that DEMQOL and DEMQOL-Proxy can provide robust measurement of HRQL in dementia when scores are derived from analysis using the Rasch model. At the group level, estimates of change in evaluative studies will potentially be more precise than when using CCT-based scores and the Rasch based scores can also now be used at the individual level. This is an important improvement for making and justifying decisions. There still are a number of limitations. Further research into the anomalies that we have identified may further improve the two instruments in terms of breadth of content and optimizing answer categories. Furthermore, we need to investigate whether measurement properties are the same across ethnic groups and levels of dementia severity. In addition, in future work we will investigate whether DEMQOL and DEMQOL-Proxy can be placed on the same scale and if so a revised Rasch model based scoring algorithm can be produced. This would ensure that one could use DEMQOL-Proxy with confidence if a self-report on DEMQOL is no longer possible. Such an algorithm would be appropriate for use in both existing and new datasets.

Abbreviations

CTT:

Classical Test Theory

DIF:

Differential item functioning

HRQL:

Health related quality of life

ICC:

Item characteristic curve

IRT:

Item Response Theory

MAS:

Memory Assessment Services

PCA:

Principal components analysis

PROMs:

Patient reported outcome measures

PSI:

Person Separation Index

PWD:

Person with dementia

RCTs:

Randomised controlled trials

RMT:

Rasch Measurement Theory

SD:

Standard deviation

References

  1. Smith SC, Lamping DL, Banerjee S, Harwood R, Foley B, Smith P, et al. Measurement of health-related quality of life for people with dementia: development of a new instrument (DEMQOL) and an evaluation of current methodology. Health Technol Assess. 2005;9(10):1–93. doi:10.3310/hta9100.

    Article  CAS  Google Scholar 

  2. Smith SC, Murray J, Banerjee S, Foley B, Cook JC, Lamping DL, et al. What constitutes health-related quality of life in dementia? Development of a conceptual framework for people with dementia and their carers. Int J Geriatr Psych. 2005;20:889–95.

    Article  Google Scholar 

  3. Smith SC, Lamping DL, Banerjee S, Harwood RH, Foley B, Smith P, et al. Development of a new measure of health-related quality of life for people with dementia: DEMQOL. Psychol Med. 2007;37:737–46.

    Article  CAS  PubMed  Google Scholar 

  4. Department of Health. Equity and excellence: liberating the NHS. London: Department of Health; 2010.

    Google Scholar 

  5. Black N. Patient reported outcome measures could help transform healthcare. BMJ. 2013;346:19–21.

    Article  Google Scholar 

  6. Lord FM. Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum; 1980.

    Google Scholar 

  7. Lord FM. Novick MR (with contributions by Birnbaum a). Statistical theories of mental test scores. Reading, MA: Addison-Wesley; 1968.

    Google Scholar 

  8. Andrich D. A rating formulation for ordered response categories. Psychometrika. 1978;43:561–73.

    Article  Google Scholar 

  9. Rasch G. Probabilistic models for some intelligence and attainment tests. Copenhagen, Danish Institute for Educational Research. 1960. Expanded edition with foreword and afterword by BD Wright. Chicago: University of Chicago Press; 1980.

    Google Scholar 

  10. Department of Health. Guidance on the routine collection of patient reported outcome measures (PROMs). London: Department of Health; 2008.

    Google Scholar 

  11. Department of Health. The adult social care outcomes framework 2015/16. London: Department of Health; 2014. p. 37.

    Google Scholar 

  12. Department of Health. Prime Minister’s challenge on dementia 2020. Implementation plan. London: Department of Health; 2016. p. 13.

    Google Scholar 

  13. Conaghan PG, Emerton M, Tennant A. Internal construct validity of the Oxford knee scale: evidence from Rasch measurement. Arthritis Rheum. 2007;57:1363–7.

    Article  PubMed  Google Scholar 

  14. Fitzpatrick R, Norquist JM, Dawson J, Jenkinson C. Rasch scoring of outcomes of total hip replacement. J Clin Epidemiol. 2003;56:68–74.

    Article  PubMed  Google Scholar 

  15. Fitzpatrick R, Norquist JM, Jenkinson C, Reeves BC, Morris RW, Murray DW, Gregg PJ. A comparison of Rasch with Likert scoring to discriminate between patients’ evaluations of total hip replacement surgery. Qual Life Res. 2004;13:331–8.

    Article  CAS  PubMed  Google Scholar 

  16. Ko Y, Lo N-N, Yeo S-J, Yang K-Y, Yeo W, Chong H-C, Thumboo J. Rasch analysis of the Oxford knee score. Osteoarthr Cartilage. 2009;17:1163–9.

    Article  CAS  Google Scholar 

  17. Ko Y, Lo N-N, Yeo S-J, Yang K-Y, Yeo W, Chong H-C, Thumboo J. Comparison of the responsiveness of the SF-36, the Oxford knee score, and the knee society clinical rating system in patients undergoing total knee replacement. Qual Life Res. 2013;22:2455–9.

    Article  PubMed  Google Scholar 

  18. Norquist JM, Fitzpatrick R, Dawson J, Jenkinson C. Comparing alternative Rasch-based methods vs raw scores in measuring change in health. Med Care. 2004;42(Suppl 1):I-25–36.

    Google Scholar 

  19. Mulhern B, Rowen D, Brazier J, Smith S, Romeo R, Tait R et al. Development of DEMQOL-U and DEMQOL-Proxy U: generation of preference-based indices from DEMQOL and DEMQOL-Proxy for use in economic evaluation. Health Technol Assess 2013;17(5):v–xv, 1-140. doi:10.3310/hta17050.

  20. Sneeuw KCA, Sprangers MAG, Aaronson NK. The role of health care providers and significant others in evaluating the quality of life of patients with chronic disease. J Clin Epidemiol. 2002;55:1130–43.

    Article  PubMed  Google Scholar 

  21. Hobart J, Cano S. Improving the evaluation of therapeutic interventions in multiple sclerosis: the role of new psychometric methods. Health Technol Assess. 2009;13(12):1–200. doi:10.3310/hta13120.

    Article  Google Scholar 

  22. Pallant JF, Tennant A. An introduction to the Rasch measurement model: an example using the hospital anxiety and depression scale (HADS). Brit J Clin Psychol. 2007;46:1–18.

    Article  Google Scholar 

  23. Park MH, Smith SC, Neuburger J, Chrysanthaki T, AAJ H, Black N. Sociodemographic characteristics, cognitive function, and health-related quality of life of patients referred to memory assessment services in England. Alzheimer Dis Assoc Disord. 2016; doi:10.1097/WAD.0000000000000166.

  24. Banerjee S. DEMQOL: dementia. Quality of Life measure. Brighton and Sussex Medical School. http://www.bsms.ac.uk/research/our-researchers/sube-banerjee/demqol/. Accessed 01 Dec 2016.

  25. Andrich D, Sheridan B. Rumm 2030. Perth, WA: RUMM Laboratory Pty Ltd; 1997-2016.

    Google Scholar 

  26. Marais I, Andrich D. Formalizing dimension and response violations of local independence in the Unidimensional Rasch model. J Appl Meas. 2008;9(3):200–15.

    PubMed  Google Scholar 

  27. Tennant A, Pallant J. Unidimensionality matters! (a tale of two Smiths?). Rasch Meas Trans. 2006;20:1048–51.

    Google Scholar 

  28. Wilson EB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927;22:209–12.

    Article  Google Scholar 

  29. Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Stat Sci. 2001;16:101–33.

    Google Scholar 

  30. Hagell P. Testing rating scale unidimensionality using the principal component analysis (PCA)/t-test protocol with the Rasch model: the primacy of theory over statistics. Open J Stat. 2014;4:456–65. http://dx.doi.org/10.4236/ois.2014.46044

    Article  Google Scholar 

  31. Nunnally JC. Psychometric theory. Second ed. New York: McGraw Hill; 1978. p. 28–33.

    Google Scholar 

  32. Lorenzo-Seva U, Ferrando PJ. FACTOR 9.2: a comprehensive program for fitting exploratory and semi-confirmatory factor analysis and IRT models. Appl Psych Meas. 2013;37:497–8.

    Article  Google Scholar 

  33. Reise SP, Waller NG, Comrey AL. Factor analysis and scale revision. Psychol Assessment. 2000;12:287–97.

    Article  CAS  Google Scholar 

  34. Hendriks AAJ, Perugini M, Angleitner A, et al. The five-factor personality inventory: cross-cultural generalizability across 13 countries. Eur J Personality. 2003;17:347–73.

    Article  Google Scholar 

  35. Ten Berge JMF, Hofstee WKB. Coefficients alpha and reliabilities of unrotated and rotated components. Psychometrika. 1999;64:83–90.

    Article  Google Scholar 

  36. Hofstee WKB, De Raad B, Goldberg LR. Integration of the big five and circumplex approaches to trait structure. J Pers Soc Psychol. 1992;63:146–63.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank all people with dementia and their family carers who participated in this study.

Funding

This research was commissioned and funded by the Department of Health Policy Research Programme (Using Patient Reported Outcome Measures to Assess Quality of Life in Dementia, 0700071). The views expressed in this publication are those of the authors and not necessarily those of the Department of Health.

Availability of data and materials

The datasets generated and analysed during the current study are not publicly available because the study is still ongoing, but after the end of the study can be requested from the corresponding author.

Author information

Authors and Affiliations

Authors

Contributions

SCS, NB, and JH designed and planned the study. JH and SS were responsible for data analysis and interpretation with advice from SJC. The first draft of the paper was written by JH. All authors reviewed drafts of the paper and have seen and approved the final version.

Corresponding author

Correspondence to A. A. Jolijn Hendriks.

Ethics declarations

Ethics approval and consent to participate

Patients and carers provided written consent to take part. The study protocol was approved by the National Research Ethics Service Committee London (reference: 14/LO/1146) and the London School of Hygiene and Tropical Medicine (reference: 8418).

Consent for publication

Not applicable.

Competing interests

Dr. Sarah Smith is the first author of the original development of DEMQOL and DEMQOL-Proxy (Smith et al. Development of a new measure of health-related quality of life for people with dementia: DEMQOL. Psychol Med. 2007;37:737–46). The instrument is publically available with no charge.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Results of the exploratory and parallel factor analysis

The factor analysis results for DEMQOL and DEMQOL-Proxy were highly comparable; therefore, we only present the results for DEMQOL. We performed an exploratory and parallel factor analysis based on principal components using the freely available software programme Factor 9.2 [32] to investigate the number of non-random components underlying the data. As recommended [32], the analysis was carried out on the polychoric correlations matrix because most DEMQOL items showed asymmetric univariate distributions with excess kurtosis. The results (Table 6) indicated at maximum [33] four content-related components (explaining more variance than parallel components extracted from 500 correlation matrices obtained from random permutations of the raw data). Only the first two components were sufficiently reliable (α ≥ 0.72, after rotation) to yield robust, replicable dimensions [34]. We computed α of the unrotated (α = 0.94 and α = 0.65) and varimax rotated (α = 0.83 and α = 0.75) components using the formulae published by Ten Berge and Hofstee [35].

Table 6 Exploratory and parallel factor analysis based on principal components

The first principal component was clearly dominant, explaining four times as much of the variance (10.79/28 = 38.5%) than the second component (9.5%). Table 7 shows the rotated component matrix. Four of the five positive emotion items loaded exclusively on a “Feelings” factor defined by positive and negative emotions; their secondary loadings on the first component defined by all other items (i.e. cognition, social relationships as well as negative emotions) were essentially zero. A plot of the factor loadings (Fig. 4) shows that the positive emotion items form a distinct, separate cluster.

Table 7 Rotated component matrix
Fig. 4
figure 4

Plot of the factor loadings

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hendriks, A.A.J., Smith, S.C., Chrysanthaki, T. et al. DEMQOL and DEMQOL-Proxy: a Rasch analysis. Health Qual Life Outcomes 15, 164 (2017). https://doi.org/10.1186/s12955-017-0733-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12955-017-0733-6

Keywords