The main focus of this paper is the significance of response shifts for assessing treatment outcomes. Disparities between objective clinical measures and patients' subjective assessments are common. Patients with the same condition respond differently and even the same patient can respond differently over time. QoL measures used currently in clinical research were not designed to account for response shifts but are based on the assumption that people respond consistently on measurement scales and also that scales are directly comparable across individuals and over time. The classical approach has been to consider individual differences in response as sources of error. However, Schwartz and Rapkin  have argued convincingly that individual differences in cognitive appraisal processes should be viewed, not as sources of error in QoL research but, rather, that these properties are intrinsic to all QoL measurement.
In this study, we used an individualised measure of quality of life, the SEIQoL-DW, as we felt that, by focusing on the unique choices of patients, we would be in a position to detect more clearly any response shifts that might occur. SEIQoL index scores did not reveal a significant improvement in IQoL 3 months after receiving high quality conventional dentures. However, when the baseline scores were derived based on the then-test, and when comparing then-test and post-test estimates, a significant improvement was seen. Response shifts had occurred in that patients had changed their criteria for assessing their quality of life from baseline to 3 months. It was only when this change was factored into the analysis that the improvement following treatment could be seen. The changes in the SEIQoL were highly complex but it is possible to gain some insight into their nature by looking at the various components of the measure i.e. cues, weights and levels.
Four out of every 5 patients (81%) nominated at least one different QoL cue at 3 months compared to baseline. Therefore, the elements that they considered most important for their quality of life changed over the study period. This represents a form of re-conceptualisation, one with which clinicians will be familiar. Patients change and adapt with time and in response to changing circumstances. The domains that might have been important for one's QoL before treatment may not be as important on a subsequent occasion. The same phenomenon can be seen with disease progression. Some patients with severe chronic conditions report higher QoL than do healthy individuals . Significantly disabled or terminally ill patients sometimes report QoL similar to or higher than that of healthy controls . One limitation of the SEIQoL-DW in this context is that, the respondent is only allowed to select 5 cues. If she chooses different cues on a subsequent occasion from those chosen previously, it could be argued (as we have done) that she has re-conceptualised what QoL means to her. But if she were allowed select as many cues as she wished and she included all of the cues previously chosen as well as any new ones, then this would be more likely to indicate re-prioritisation rather than re-conceptualization. Patients may also have used different words at each evaluation to refer to essentially the same area. This can be controlled by collecting detailed descriptions of the life areas chosen as well as including questions assessing patients' own perception of change.
Patients were asked at 3 months to indicate retrospectively their level of functioning on each of the cues chosen at baseline. In general, patients retrospectively rated their level of functioning on most of the cues as lower that they had done at the time. If we assume that they completed both assessments at 3 months using a single internal frame of reference, it seems reasonable to label this as re-calibration. It may be that the superior function associated with the quality dentures provided caused patients retrospectively to perceive their pre-treatment levels as worse on reflection.
Because the SEIQoL-DW weights are individualised, it is possible to measure changes in the relative importance of cues over time. We found that on average some weights (comparing pre-test and then-test weights) changed indicating that reprioritisation can occur. However, when comparing then-test and post-test weights we found no changes. This might be a true finding, or maybe patients simply applied the same weights they were using at T2 to the cues at T1. This may also partly be an artefact of the SEIQoL-DW procedure as the weights of all five cues selected by respondents are constrained to add to 100. Therefore, if the relative importance of one cue increases, the relative importance of at least one of the other four cues must diminish.
One of the major challenges in interpreting the results of this study is that 81% of the patients chose at least one different cue at 3 months compared to baseline. All patients were asked, in the then-test, to re-evaluate their baseline cues, whether they were the same or not. It seems likely that this process is different for those who chose the same and different cues at 3 months and this is worthy of further research. The sample size of 19% of patients who chose exactly the same cues at 3 months was too small to draw firm conclusions about the nature of response shift in this group.
Some studies have found that memory can influence the findings from the then-test . A limitation with our study is that we did not control for recall bias and we did not compare the changes with any criterion measure of change [18, 19]. However, receiving dentures is a significant and salient event and it seems likely that the influence of recall bias is minimal especially given the number of judgements a patient had to make and the 3 month gap between assessments. One alternative explanations for our findings of a discrepancy between prospective and retrospective assessments is that subjects may have expected that receiving high quality dentures should improve their health, an they retrospectively rated their initial health as lower to reflect this expectation, a cognitive mechanism known as the implicit theory of change . Our interpretation of the results is based on the assumption that the retrospective then-test data provides a more valid indication of baseline IQoL for comparison with 3 month data than does the baseline assessment itself. If, however we assume that the retrospective judgement is biased and that the concurrent baseline assessment is more valid, our results would be interpreted differently and there would be no treatment effect. To support the response shift theory, we would need to show that the new information available to patients after receiving their dentures led to more valid judgments of their baseline scores. However, it is as yet unclear how one would determine which theory is more valid for a particular situation. It would be important to distinguish patients who's situation had improved or deteriorated from those who had changed their mind about what it means to have the best or worst possible outcome.
Recently, Schwartz and Rapkin have proposed a new psychometric model, which posits that the "true" PRO score is contingent on aspects of the appraisal process . The appraisal of a construct like QoL may be related to culture, personality and situation and may vary across persons and over time [21–24]. Building on the response shift model, Schwartz and Rapkin have proposed using an Appraisal Profile . They suggest that "rather than simply asking people to re-rate their baseline status using "today's criteria", we assess their appraisal processes to make those criteria explicit at each time in order to help characterise qualitative change". Improved knowledge about the ways in which patients appraise QoL might lead to more valid, reliable and responsive measures. Future studies need to disentangle the differing ways individuals appraise QoL and researchers must acknowledge the dynamic nature of QoL by empirically testing for response shift phenomena.