- Open Access
Testing the construct validity of a health transition question using vignette-guided patient ratings of health
Health and Quality of Life Outcomesvolume 16, Article number: 2 (2018)
A single-item transition question is often used to assess improvement or worsening in health, but its validity has not been tested extensively. The purpose of this study was to test the construct validity of a transition question by relating it to qualitative changes in patient’s self-rating of health guided by clinical vignettes.
We studied 169 patients with active rheumatoid arthritis (RA) before and after treatment escalation. At both assessments, patients scored their current health on a rating scale after first rating three vignettes describing mild, moderate, or severe RA. We classified patients into one of these three RA categories using a nearest-neighbor match. We then related the change in these self-rated categories between visits to responses to a transition question on visit 2.
Sixty patients improved their RA vignette category after treatment, 86 remained in the same vignette category, and 23 worsened categories. On the transition question, 101 patients reported improvement, 48 reported no change, and 20 reported worsening, representing a modest association with changes in RA vignette categories (polychoric correlation r = 0.19). The association was stronger if patients who were in the mild RA category at both visits were also classified as improved if their self-rating changed from below to above their mild vignette rating (r = 0.23) and when incorporating the importance of changes on the transition question (r = 0.26).
Changes in health states, guided by clinical vignettes, support the construct validity of the transition question.
Transition questions, also known as global ratings of change, are judgments by patients of whether their health has improved, worsened, or is unchanged from a prior time or from the start of an intervention . Given its simplicity and face validity, the transition question is among the most commonly used measures in clinical practice and clinical investigations. In particular, transition questions are frequently used to validate other measures, and serve as anchors for estimating the minimal clinically important changes in other health status measures [2, 3].
Questions have arisen about the validity of transition questions . The cognitive processes that patients use to judge whether a qualitative change in health has occurred are complex and incompletely understood [5, 6]. Judgments of change have been shown to be influenced not only by the measured change in health but also by how well or poorly the patient feels when making the judgment [7,8,9,10]. The post-intervention state of health has often been found to be more influential than the pre-intervention state of health, which violates the premise that transition questions should be reciprocally associated with both the pre- and post-intervention states of health . Countering these concerns is evidence that despite some contribution from the post-intervention state, transition questions are strongly associated with measured changes in health, and responses to transition questions parallel associations with other anchors, such as physician ratings of improvement [12,13,14].
In a new approach to test the construct validity of transition questions, we used clinical vignettes as indicators of changes in patient’s health. Vignettes have been increasingly used in health status measurement because they may provide less biased ratings than other patient-reported measures, or at a minimum can identify the extent of intergroup bias [15,16,17,18]. We asked patients to rate three clinical vignettes that described different levels of health (mildly ill, moderately ill, and severely ill), and matched patients’ ratings of their current health to one of these three levels before and after a medication intervention. Using the monotrait-heteromethod framework for convergent validity, we considered that patients who experienced a major subjective change in health with treatment would endorse this change on both their vignette rating and the transition question, while those with no change in health based on their vignette ratings would not report a change on the transition question .
Subjects and study design
Data were from a short-term prospective longitudinal study whose main goal was to determine estimates of minimal clinically important improvement for measures of rheumatoid arthritis (RA) activity . We enrolled adults with active RA from two outpatient clinics who, in the judgment of their rheumatologist, required treatment escalation with either disease-modifying medications, biologicals, or prednisone. Subjects were evaluated twice, once at study entry and then either 1 month later (for those treated with prednisone) or 4 months later (all others). The timing of the follow-up visit differed by treatment group because clinical responses occur more rapidly with prednisone treatment. Subjects had a joint examination at each visit, and completed self-reported measures of health status and quality of life at both visits. We used a global transition question as the anchor for determining thresholds for clinically important improvement in RA activity measures . The study protocol was approved by the institutional review board, and all subjects provided written informed consent.
Of 262 subjects enrolled, 206 subjects participated in a substudy which included the rating of clinical vignettes. Lack of time and primary language other than English were the main reasons for non-participation in this substudy. Of these, 195 subjects completed the vignette ratings at both visits. We then excluded 26 subjects who rated the mild RA vignette as less healthy than the severe RA vignette, which indicated some confusion with the task. This resulted in 169 subjects for analysis (82% of substudy participants).
Clinical vignette ratings
We used a computer-administered questionnaire to obtain subjects’ ratings of their own current health and ratings of three hypothetical health states described in vignettes. Responses were recorded on a vertical visual analog scale with the top endpoint labeled “perfect health” and the bottom endpoint labeled “worst imaginable health,” which were set to 1.0 and 0, respectively. The scale was unnumbered. Subjects were asked to move a cursor with the mouse to the point on the scale that corresponded to their rating on the perfect-worst continuum, which was then recorded by the computer. First, subjects read a description of a person with mild RA and then rated this vignette. This was followed by descriptions and ratings of a person with moderate RA and then severe RA, in this order. Subjects were then asked to rate their current health on this scale, considering the endpoints and their vignette ratings . Specifically, the prompt was “Now think about how your arthritis affects you currently. Think about your ability to work and do things, your ability to take care of yourself, your ability to enjoy leisure activities, the amount of pain you have, your mood, and your outlook for the future.”
The three clinical vignettes were based on the McMaster Utility Measurement Questionnaire . The mild RA vignette read as follows: Think what it would be like to live in the following way: You are able to perform all of your daily activities, like work, shopping and driving. You are completely able to take care of your personal needs, like eating and bathing. You have some difficulty participating in leisure activities, like sports and hobbies. You have occasional pain. You normally do not have any worry or stress, but sometimes you are concerned about the future course of your arthritis. You have some mild stomach upset from some medication you take.
The moderate RA vignette was: Think what it would be like to live in the following way: On most days you are able to run errands and work around the house, but fatigue and joint pain prevent you from working. You are completely able to take care of your personal needs, such as eating and bathing. Joint pain is mild on most days, but is never gone and is sometimes quite severe. You rarely have enough energy for leisure activities. At times you are frustrated with dealing with your arthritis. The medication you take sometimes causes diarrhea.
The severe RA vignette read as follows: Think what it would be like to live in the following way: You are unable to work, shop, or drive. You have much difficulty getting around outside the house. Sometimes you need help to bathe. You are unable to participate in any leisure activities. You are depressed and frustrated. You have severe pain on most days. The medications you take cause you painful sores in your mouth and difficulty thinking.
Subjects performed new ratings at the second visit, without knowledge of their prior ratings.
Since the vignettes portrayed overall RA status, we used the global transition question for this analysis. At the second visit, subjects were asked to respond to the following on a written questionnaire: “Since the start of the study, overall my arthritis has: improved, stayed the same, or worsened.” Those who responded that they were improved or worsened were then asked to rate the importance of the change on a seven-level scale: almost none, hardly at all; a little important; somewhat important; moderately important; a good deal important; very important; extremely important .
We first categorized patient’s self-ratings on the visual analog scale into one of three states, based on the proximity of their self-rating to their ratings of the mild, moderate, or severe RA vignette, using the nearest neighbor technique . For example, consider a subject whose mild, moderate and severe vignette ratings were 0.88, 0.65, and 0.34, respectively. If their self-rating was 0.72, they would be classified as moderate, whereas if their self-rating was 0.10, they would be classified as severe. Classification was done independently for each visit. Subjects were only classified with respect to their personal ratings of the vignettes and not to group mean vignette ratings. In the rare event of a nearest-neighbor tie, we randomly assigned the subject to one of the categories.
We then compared each subject’s category at the first visit to that at the second visit. In the base analysis, those who moved from severe to either moderate or mild, or from moderate to mild, were classified as improved. Those who moved from mild to either moderate or severe, or from moderate to severe, were classified as worsened. These shifts represented subjects who no longer rated themselves as similar to one vignette and more like the person described in a different vignette, which we interpreted as a clinically meaningful change in the subject’s perception of their health status related to RA. The remainder were classified as unchanged. We then related changes in subjects’ health categories, guided by the clinical vignettes, to their responses on the transition question (improved, same, or worsened) using polychoric correlations. Polychoric correlations test the association between two ordinal variables, with the assumption that the data underlying the ordered categories are continuous latent traits. Model fit was tested using the likelihood ratio chi-square test G2.
We repeated the analysis using subjects’ ratings of the importance of changes on the transition question, creating a 15-point scale ranging from −7 for those who reported worsening that was extremely important to +7 for those who reported improvement that was extremely important, with 0 representing those who reported no change.
In the base analysis, subjects who were classified as mild on the first visit had no opportunity to register improvement, which may have limited the associations with the transition question. Therefore, we repeated the analysis with the modification that those subjects who were categorized as mild at both visits, but whose self-rating at visit 1 was less healthy than their rating of the mild vignette and whose self-rating at visit 2 was healthier than their mild vignette rating, were also classified as improved.
To our knowledge, there are no accepted criteria for interpreting the magnitude of associations based on polychoric correlations. Therefore, we adopted the convention used for point-biserial correlations (i.e. between one ordinal variable and one dichotomous variable). Following Cohen, medium effect sizes are represented by point-biserial correlations of 0.243, which correspond to Cohen’s d of 0.5 [26, 27]. We considered polychoric correlations between the vignette-guided self-ratings and the transition question responses of 0.243, representing a medium effect, to support the construct validity of the transition question.
The study included 124 women (73%) and 45 men, with a mean (± standard deviation) age of 53.2 ± 13.7 years and a median (25th, 75th percentile) duration of RA of 6.5 (2.3, 16.0) years. Subjects had active RA at entry, with a mean Disease Activity Score-28 (a composite score of tender and swollen joint counts and serum C-reactive protein level; possible range 1–9.4 ) of 5.3 ± 1.0. At visit 2, the mean with-subject Disease Activity Score-28 change was −0.9 points, indicating improvement on the group level.
Clinical vignette ratings
Mean vignette ratings at each visit are shown in Table 1. There were no significant intra-subject differences in ratings of the mild vignette or moderate vignette between the two visits, while the severe vignette was rated as slightly higher on the second visit. In comparison, subjects’ self-ratings improved substantially. We did not observe any distress or annoyance among subjects during the vignette ratings.
Based on the nearest neighbor approach, 36 subjects had self-ratings that were closest to the severe vignette at visit 1, while 51 subjects were classified as moderate and 82 as mild (Table 2). At the second visit, 60 subjects had improved by at least one category using the nearest neighbor, but 23 worsened by at least one category. Self-ratings increased by an average of 0.28 ± 0.21 among those who improved by at least one category, were unchanged among those who remained in the same vignette category (0.09 ± 0.17), and decreased an average of −0.11 ± 0.16 among those whose vignette category worsened.
Association of vignette-guided self-ratings with transition question responses
In the base analysis, 101 subjects reported improvement on the transition question, 48 subjects reported no change, and 20 subjects reported worsening (Table 3). These responses were only modestly associated with changes in categories of RA status based on self-ratings guided by the vignettes, with r = 0.19. Model fit was good (G2 = 0.29; p = 0.96). In the analysis that accommodated improvement among subjects in the mild vignette category at visit 1, the association was somewhat stronger (r = 0.23), approaching the criterion of a medium effect (Table 3). This model also fit the data well (G2 = 0.53; p = 0.91).
Association of vignette-guided self-ratings with importance of the transition question changes
Not all of the 15 possible importance ratings were used by subjects, resulting in 12 categories for the importance of the change in the transition question (Table 4). Vignette-guided self-ratings were more strongly associated with the importance ratings of the transition question than with the 3-category transition question. In the base analysis, r = 0.26 (model fit G2 = 24.3; p = 0.33), while in the analysis that accommodated improvement among subjects in the mild category at visit 1, r = 0.28 (model fit G2 = 15.8; p = 0.78). Both associations represent a medium effect, supporting the construct validity of the transition question.
The purpose of a transition question is to capture a patient’s explicit judgment about whether they have experienced a noticeable change in their health. One method to test the validity of these explicit judgments is to compare how well they relate to contemporaneous implicit judgements of important change. We used shifts in subjects’ ratings of their health relative to a set of three health states described in clinical vignettes as an indicator of a change in their implicit judgments. We found that changes in vignette-guided categories of health after treatment escalation were associated with similar changes in the transition question, with associations of medium effect sizes. These results support the construct validity of the global transition question.
Several previous studies that reported only weak or no associations between changes in health status measures and a transition question examined patients in usual care settings who were not necessarily very symptomatic at study entry or who did not receive specific or impactful treatment during the study [1, 4, 8,9,10]. Some studies did not report the magnitude of health changes observed [1, 7,8,9]. These naturalistic designs might have limited the ability of these studies to detect associations with the transition question, because few patients may have had notable changes in health. Other studies used health status measures of limited responsiveness . Studies that examined major categories of transition responses (i.e. improved, same, or worsened) were more likely to find associations with changes in health status measures than studies that used more finely-graded responses or that converted transition question responses to a semi-quantitative score based on the perceived importance of the change [12, 29, 30]. These semi-quantitative measures include two potential sources of error or imprecision: one related to the direction of change in the transition question and one related to the importance of that change. Associations with qualitative changes in the transition question may be obscured by imprecision in the importance ratings.
In contrast, we examined a group of patients with active disease who were all treated with known effective medications. The importance of studying patients with active disease is underscored by a study of highly active antiretroviral therapy that found associations between health measures and a transition question among patients with symptomatic human immunodeficiency virus disease but not among patients who were asymptomatic . The magnitude of associations with the transition question was stronger when ratings of the importance of the change were analyzed than when examining only the presence and direction of a change. These results suggest that information contributed by the subjective importance of the change may outweigh any associated measurement error.
Our study has some limitations. We chose the reference standard to which the transition question was compared to be a substantial and clinically meaningful change in health, denoted by a change in self-rating relative to the vignettes. We chose this approach, rather than using the rating scale response as a continuous measure, in order to emphasize major changes rather than incremental changes in health, even though mapping patients to the vignettes might have encumbered some error. We do not know if subjects would have picked the same states to describe themselves if asked to do so explicitly. However, asking subjects to directly compare themselves to a vignette has been shown to induce agreement irrespective of the health state described in the vignette, increasing the likelihood of disordered responses compared to a sequential rating of vignettes followed by a self-rating . We could have included more vignettes, although with finer categories, the distinctions between health states would decrease and we might be less certain that a change in health state represented an important change. Random errors and disordered ratings also increase with the number of vignettes. Three vignettes have commonly been used in health studies, although one study used 10 vignettes [16, 17, 31, 32]. Thirteen percent of potential subjects were excluded because of disordered vignette ratings, which is similar to the frequency reported in other surveys . One reason for disordered responses may be that subjects keyed in on one specific feature of a particular vignette description. It is not clear if the vignettes served to improve recall of the subject’s prior health state, which is a well-recognized bias affecting transition questions [5, 6]. In this study, we used the vignettes, which required more time and respondent burden, to validate the simple single-item transition question.
Use of clinical vignettes assumes response consistency, that is, that subjects use the same criteria and standards to rate themselves as they use when rating the vignettes. Few studies have tried to test this assumption, with variable results [34,35,36]. Response consistency may be enhanced by using multidimensional vignettes rather than vignettes focused on a single health domain, and by reminding subjects to identify with the person described in the vignette . We employed both aspects in this study. A second major assumption is vignette equivalence, wherein all respondents view the vignettes as describing the same construct with the same unidimensional scale. Vignette equivalence is central to their use in estimating differential item functioning among groups of patients, but is less relevant in our study because vignette ratings were not compared across subjects. We did not explore potential differences among patient subsets in the absence of evidence to support effect modification. Vignettes may also be a novel method to assess response shift, particularly recalibration . Our finding of small intra-subject differences in vignette ratings over time suggest the absence of major response shift.
Our results support the construct validity of a general health transition question. Replication in other conditions and settings would be important, with attention to the key design features of studying patients with active symptoms who experienced major changes in health.
Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look at transition ratings. J Clin Epidemiol. 2002;55:900–8.
Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR, Clinical Significance Consensus Meeting Group. Methods to explain the clinical significance of health status measures. Mayo Clin Proc. 2002;77:371–83.
Wyrich KW, Bullinger M, Aaronson N, Hays RD, Patrick DL, Symonds T, et al. Estimating clinically significant differences in quality of life outcomes. Qual Life Res. 2005;14:285–95.
Rose AJ, Sacks NC, Deshpande AP, Griffin SY, Cabral HJ, Kazis LE. Single-change items did not measure change in quality of life. J Clin Epidemiol. 2008;61:603–8.
Wyrwich KW, Tardino VM. Understanding global transition assessments. Qual Life Res. 2006;15:995–1004.
Sprangers MAG, Schwartz CE. Integrating response shift into health-related quality of life research: a theoretical model. Soc Sci Med. 1999;48:1507–15.
Schmitt J, DiFabio RP. The validity of prospective and retrospective global change criterion measures. Arch Phys Med Rehabil. 2005;86:2270–6.
Metz SM, Wyrwich KW, Babu AN, Kroenke K, Tierney WM, Wolinsky FD. Validity of patient-reported health-related quality of life global ratings of change using structural equation modeling. Qual Life Res. 2007;16:1193–202.
Kamper SJ, Ostelo RWJG, Knol KL, Maher CG, de Vet HCW, Hancock MJ. Global perceived effect scales provided reliable assessments of health transition in people with musculoskeletal disorders, but ratings are strongly influenced by current status. J Clin Epidemiol. 2010;63:760–6.
Grøvle L, Haugen AJ, Hasvik E, Natvig B, Brox JI, Grotle M. Patients’ ratings of global perceived change during 2 years were strongly influenced by the current health status. J Clin Epidemiol. 2014;67:508–15.
Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol. 1997;50:869–79.
Meyer T, Richter S, Raspe H. Agreement between pre-post measures of change and transition ratings as well as then-tests. BMC Med Res Methodol. 2013; https://doi.org/10.1186/1471-2288-13-52.
Lattig F, Grob D, Kleinstueck FD, Porchet F, Jeszenszky D, Bartanusz V, et al. Ratings of global outcome at the first post-operative assessment after spinal surgery: how often do the surgeon and patient agree? Eur Spine J. 2009;18:386–94.
Nieuwkerk PT, Tollenaar MS, Oort FJ, Spangers MAG. Are retrospective measures of change in quality of life more valid than prospective measures? Med Care. 2007;45:199–205.
Korfage IJ, de Koning HJ, Essink-Bot ML. Response shift due to diagnosis and primary treatment of localized prostate cancer: a then-test and a vignette study. Qual Life Res. 2007;16:1627–34.
Dowd JB, Todd M. Does self-reported health bias the measurement of health inequalities in U.S. adults? Evidence using anchoring vignettes from the health and retirement study. J Gerontol B Psychol Sci Soc Sci. 2011;66:478–89.
Grol-Prokopczyk H, Freese J, Hauser RM. Using anchoring vignettes to assess group differences in general self-rated health. J Health Soc Behav. 2011;52:246–61.
Kapteyn A, Smith JP, Van Soest A. Vignettes and self-reports of work disability in the United States and the Netherlands. Am Econ Rev. 2007;97:461–73.
Bagozzi RP, Yi Y, Phillips LW. Assessing construct validity in organizational research. Adm Sc Q. 1991;36:421–58.
Ward MM, Guthrie LC, Alba MI. Clinically important changes in individual and composite measures of rheumatoid arthritis activity. Thresholds applicable in clinical trials. Ann Rheum Dis. 2015;74:1691–6.
Ward MM, Guthrie LC, Alba MI. Domain-specific transition questions demonstrated higher validity than global transition questions as anchors for clinically important improvement. J Clin Epidemiol. 2015;68:655–61.
Hopkins DJ, King G. Improving anchoring vignettes: designing surveys to correct interpersonal incomparability. Public Opin Q. 2010;74:201–22.
Bakker C, Rutten M, van Doorslaer E, Bennett K, van der Linden S. Feasibility of utility assessment by rating scale and standard gamble in patients with ankylosing spondylitis or fibromyalgia. J Rheumatol. 1994;21:269–74.
Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10:407–15.
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13:21–7.
Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Erlbaum; 1988.
Rice ME, Harris GT. Comparing effect sizes in follow-up studies: ROC area, Cohen's d, and r. Law Hum Behav. 2005;29:615–20.
Chang E, Abrahamowicz M, Ferland D, Fortin PR. Comparison of the responsiveness of lupus disease activity measures to changes in systemic lupus erythematosus activity relevant to patients and physicians. J Clin Epidemiol. 2002;55:488–97.
Williams NH, Wilkinson C, Russell IT. Extending the Aberdeen back pain scale to include the whole spine: a set of outcome measures for the neck, upper and lower back. Pain. 2001;94:261–74.
Locker D, Jokovic A, Clarke M. Assessing the responsiveness of measures of oral health-related quality of life. Community Dent Oral Epidemiol. 2004;32:10–8.
Peracchi F, Rossetti C. Heterogeneity in health responses and anchoring vignettes. Empir Econ. 2012;42:513–38.
Salomon JA, Tandon A, Murray CJL. Comparability of self rated health: cross sectional multi-country survey using anchoring vignettes. BMJ. 2004; https://doi.org/10.1136/bmj.37963.691632.44.
Murray CJL, Ozaltin E, Tandon A, Salomon J, Sadana R, Chatterji S. Empirical evaluation of the anchoring vignettes approach in health surveys. In: Murray CJL, Evans DB, editors. Health systems performance assessment: debates, methods and empiricism. Geneva: World Health Organization; 2003. p. 369–99.
Gupta ND, Kristensen N, Pozzoli D. External validation of the use of vignettes in cross-country health studies. Econ Model. 2010;27:854–65.
d’Uva TB, Lindeboom M, O’Donnell O, Van Doorslaer E. Slipping anchor? Testing the vignettes approach to identification and correction of reporting heterogeneity. J Hum Resour. 2011;46:875–906.
Au N, Lorgelly PK. Anchoring vignettes for health comparisons: an analysis of response consistency. Qual Life Res. 2014;23:1721–31.
This study was funded by the Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases (ZIA-AR041153), National Institutes of Health, and U.S. Public Health Service grant RO1-AR45177.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
The study was approved by the National Institute of Diabetes and Digestive and Kidney Diseases/National Institute of Arthritis and Musculoskeletal and Skin Diseases Institutional Review Board (03-AR-0133). Written informed consent was obtained from all participants in the study.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.