Detection of response shift in health-related quality of life studies: a systematic review

Background Response Shift (RS) refers to the idea that an individual may undergo changes in its health-related quality of life (HRQOL). If internal standard, values, or reconceptualization of HRQOL change over time, then answer to the same items by the same individuals may not be comparable over time. Traditional measures to evaluate RS is prone to bias and strong methodologies to study the existence of this phenomenon is required. The objective is to systematically identify, analyze, and synthesize the existing and recent evidence of statistical methods used for RS detection in HRQOL studies. Methods The analysis of selected studies between January 2010 and July 2020 was performed through a systematic review in MEDLINE/PubMed, Scopus, Web of Science, PsycINFO and Google Scholar databases. The search strategy used the terms “Health-Related Quality of Life” and “Response Shift” using the filters “Humans”, “Journal Article”, “English” and “2010/01/01–2020/07/31”. The search was made in August 2020. Results After considering the inclusion and exclusion criteria, from the total selected articles (675), 107 (15.9%) of the publications were included in the analysis. From these, 79 (71.0%) detected the existence of RS and 86 (80.4%) only used one detection method. The most used methods were Then Test (n = 41) and Oort’s Structural Equation Models (SEM) (n = 35). Other method used were Multiple Lineal Regression (n = 7), Mixed-Effect Regression (n = 6), Latent Trajectory Analysis (n = 6), Item Response Theory (n = 6), Logistics Regression (n = 5), Regression and Classification Trees (n = 4) and Relative Importance Method (n = 4). Most of these detected recalibration, including Then Test (n = 27), followed by Oort’s SEM that detected the higher combination of RS types: recalibration (n = 24), reprioritization (n = 13) and reconceptualization (n = 7). Conclusions There is a continuous interest of studying RS detection. Oort’s SEM becomes the most versatile method in its capability for detecting RS in all different types. Despite results from previous systematic reviews, same methods have been used during the last years. We observed the need to explore other alternative methods allowing same detection capacity with robust and highly precise methodology. The investigation on RS detection and types requires more study, therefore new opportunity grows to continue attending this phenomenon through a multidisciplinary perspective. Supplementary Information The online version contains supplementary material available at 10.1186/s12955-022-01926-w.

This concept is established as a facet of an individual's state of life while measuring its well-being as a medical patient. HRQOL is analyzed as a functional health state and identifies effective strategies to improve patient´s conditions as a result of medical interventions [3,4].
After a medical treatment, patients may perceive and demonstrate different conditions over time, from the procedure's initial stages up to many years after treatment has finished. The changes in the measurement of individual's perception or internal standard is known as Response Shift (RS) [5,6]. From a clinical perspective, RS is generated as the change in the meaning of a subject self-administrated assessment [7] as a valid and sensitive mechanism to evaluate the change in different moments in time [8].
Spranger and Schwartz [9] theoretical model explains how RS may affect HRQOL as a result of health state changes. As a baseline model, it presents five components: (1) a catalysts, corresponding to an individual health states or its changes as a result or not of a treatment; (2) antecedents, referring to individual's characteristics influencing catalysts or appraisals mechanisms; (3) mechanism, explained by behavioral, cognitive, or affective processes accommodating changes in catalysts; (4) response shift, representing changes in the meaning of an individual self-evaluation of QOL resulting from changes in internal standards, values, or conceptualization; and (5) perceived QOL. Rapkin and Schwartz [10] propose that QOL appraisal processes must consider how individuals perceive their health status and respond questionnaires about their QOL. The model follows these processes: (1) induction of a frame of reference; (2) sampling based on the frame of reference; (3) judge against standards of comparison; and (4) combine algorithms to formulate a response. The proposal allows dynamic feedback to explain how QOL scores can be stabilized accounting inter-individual and temporary differences despite changes in health status [10].
An individual's self-assessment may demonstrate changes in three contexts: in the internal standards of the measurement scale (recalibration) indicating that the patient has a new scale for measuring its own state of HRQOL; in the scale of values (reprioritization) representing a change in the priority of elements that influence the context of life; and in the definition of the objective construct (reconceptualization) when a patient raises a redefinition of its own concept [9].
The change processes of a patient must be appropriately measured when the effects of a disease or medical treatments is evaluated [11,12]. The interpretation of HRQOL data represents a challenge because patients self-report their health conditions at a specific time, which can also be influenced by psychological phenomena [13]. This suggests that HRQOL measurements must consider that the individual reports on its status, at least in two or more moments to detect significant changes over time.
Two main approaches are proposed for the detection of change: methods based on specific study designs and secondary data analysis that includes statistical methods developed to test hypotheses that do not require specific designs [14].
The most commonly used methodology is the retrospective Then Test design [15] that allows evaluating the change of patient´s internal standard by comparing the scores with two other moments: pre-test and post-test [5,7]. However, this method is sensitive to bias and difficult to be used in longitudinal secondary data analysis [15].
Structural Equation Models (SEM) of Schmidt [16] and Oort [17], Item Response Theory (IRT) of Anota et al. [18], and Guilleaux et al. [19], Relative Importance Method of Lix et al. [20], Latent Trajectory Modelling of Ahmed et al. [21], and Classification and Regression Trees of Li & Schwartz [22] are among the most widely used and proven methods for RS detection in both primary and secondary data.
This bibliographic review has permitted to identify the methods traditionally applied in specific clinical studies related to HRQOL or to the analysis of previously elaborated databases, mainly by medical, academic or research institutions. Although other recent publications have carried out exhaustive systematic reviews to address this issue [15,23], it is necessary to continue exploring if other alternative statistical methods are used as emerging mechanisms in the RS detection. Consequently, the purpose of this investigation is to systematically identify, analyze, and synthesize the existing evidence of new statistical methods used for RS detection in HRQOL studies.

Methods
The systematic review is a structured methodology allowing the identification and integration of different specific studies based on inclusion and exclusion criteria and facilitates the eligibility of relevant or interesting publications [24]. Some critical and determining elements to capture the largest number of eligible publications are the number and orientation of search repositories, and the established inclusion and exclusion criteria.
The PRISMA methodology [25] was applied to develop this systematic review. Through pre-specified eligibility criteria, it allows reducing bias in the identification, selection, synthesis, and summary of results from previously published studies [26]. The precision and reliability of this tool provides important benefits in health-related research [27]. The acceptance of this methodology has been used in recent studies on RS applied to different clinical areas: oncology [23,28], orthopedic rehabilitation [29], preinjury [30], as well as patient-reported outcomes (PRO) [31].

Data sources
The selected studies were carried out through an organized review of MEDLINE/PubMed, Scopus, Web of Science Core Collection (SSCI) PsycINFO and Google Scholar databases. The terms considered in MEDLINE/ PubMed search are "Quality of Life", "Health-Related Quality of Life", "Response Shift" in descriptors or keywords in the title and/or abstract. The following filters were used: "Humans", "Journal Articles", "English", "2010/01/01-2020/07/31". This search strategy was adopted for each of the databases consulted. The search was made in August 2020 and completed with an analysis of the selected literature between January 2010 and July 2020.

Articles selection
Articles that qualified for the eligibility review were those in English language that met the following criteria: adapt to the objectives of the search to identify and synthesis of statistical methods used in RS detection in HRQOL studies; be published in peer-reviewed journals and be able to retrieve the full text of the work; and the term "Response Shift" was included in the title, abstract and/or keyword; those studies in a language other than English that did not include the use of statistical methods for the RS detection, as well as studies whose main objective was not the detection of SR and its classification. Conferences, editorial notes, systematic reviews, and conceptual evaluations were excluded from the study.
Article selection was independently made by two authors (EOJ and PVG) who initially review titles, then abstracts, and read the full texts. For article inclusion, a concordance assessment between authors was established to be greater than 80% (Kappa index). In case of discrepancies, the process was repeated until they were resolved by consensus among all the authors. Figure 1 illustrates the literature search and publication selected through the PRISMA flow diagram.    Based in the year of publication, 15 studies were recorded in 2017, followed by 2016 (13) and 2014 (12). An active interest in this research topic was continued during the rest of the years studied (see Fig. 2). A total of 56 journals were identified in this systematic review, where 31.8% of the articles were published in Quality of Life Research (34), followed by Health and Quality of Life Outcomes (10), Journal of Clinical Epidemiology (4) and European Journal of Cancer Care (4).
The analysis of the included articles determined that 69.2% of the studies were based on primary data, 29.0% on secondary data, and 1.8% did not specify. Different questionnaires were used to capture patient´s information. Some studies concentrated in the use of one instrument, while other indicated diverse mechanisms to collect data (see Table 1). The medical orientations of the publications were mainly in Oncology (27), Neurology (11), Psychology/Psychiatrics (10), Orthopedics (9), Oral health (7) and Cardiology (6).

About the methods for Response Shift detection
Of the 107 articles analyzed, 76 (71.0%) detected the existence of RS, 30 (28.0%) did not identify it, while one article did not indicate it (see Fig. 3). The review described the methods for RS detection, types of RS, and if studies detected or not its existence. A group of 86 (80.4%) articles used one method for RS detection, 19 (17.8%) articles used two or more methods, and two articles did not specify the method used. Table 2 describes the frequency of methods used in these publications. 41 (31.8%) articles used the Then-Test method, 35 (27.1%) adopted the Oort's SEM and 2 (1.6%) articles indicated the use of Schmidt's SEM. Other models were Multiple Linear Regression (7), Mixed-Effects Regression (6), Latent Trajectory Analysis (6), Item Response Theory (6), Logistic Regression (5), Classification and Regression Tree (4) and Relative Importance Method (4). Other were used but in a smaller number of studies.
According to the detection methods used in all articles, 91 (70.5%) detected RS, while 36 (27.9%) indicated   20:20 its absence. In the review, 2 (1.6%) articles did not identify the presence or absence of this phenomenon. Of the studies that detected RS through different methods, 73 indicated the presence of recalibration, 34 indicated the existence of reprioritization, 16 highlighted reconceptualization, and 7 articles detected the existence of RS, but did not specify the type (see Table 3). From these, 41 articles detected one type of RS, 20 articles registered two types of RS, only 7 studies identified the three types simultaneously. Seven articles did not indicate it.
Of the articles included (n = 107), the traditional method for RS detection Then Test registered the highest number of RS recalibration type (27), while Oort's SEM registered a lesser number of studies detecting recalibration (24), but the highest record of reprioritization (13) and reconceptualization (7) of the entire systematic review (see Table 3). Schmidt's SEM detected changes in internal standards only in two (2) articles. The rest of the methods also proved to be used for the detection of the three types of RS, except for the Relative Importance Methods, Mixed-Effects Regression, Random Forest Regression, and Latent Trajectory Analysis, which did not identify the presence of reconceptualization.
Despite the effectiveness for detecting RS, this review evidences the growing interest for exploring different statistical methods for RS studies (see Fig. 4). The most frequently used methods were Then Test and Oort's SEM between 2010 and 2020. However, during the last years, the figure shows that other alternative techniques are being considered for the assessment of this phenomenon.

Discussion
After the systematic review, the study examined the advances on HRQOL research through different methods for the detection of RS. It describes how the detection of this phenomenon is being evaluated in recent years, which methods are used the most, and the most identified type of RS. For physicians and researchers, systematic reviews are useful sources of evidence and scientific advances [32], they provide elements so clinical policymakers can evaluate risks, benefits and effects on health care, as well as new research initiatives [26,27].
The majority of HRQOL studies and RS detection methods were published during 2016 and 2017. One third of the articles included in the review were studying cancer disease [33][34][35][36][37][38][39][40][41]. Then Test was the most used method mainly for primary data measuring recalibration [42][43][44][45][46][47][48]. Several studies remarked that Then Test may also identify other RS types: reprioritization y reconceptualization [33,49,50]. Similar results were presented in Sajobi et al. [15] study corresponding to a systematic review about RS detecting methods were 54.5% of the articles used Then Test for this purpose. As same, in Ilie et al. [23] a 60.0% of the cancer studies remarked using this methodology, but authors indicate that interpretation must be cautious as it is bias prone. Then Test has the advantage for easy handling and analysis, but its disadvantage relies in random errors and/or confound and difficult for interpretation. Therefore, this appraisal suggests the utilization of individualized methods with strong statistical rigor [51].
Oort's SEM [17] differs from Then Test, since it is used for studies based on primary and secondary data. This multivariate method evidences the capability for detecting changes in all type of RS: recalibration, reprioritization, and reconceptualization. Several studies reiterate Oort's SEM as an effective method to detect changes in patients, despite the different type of RS or data used [52][53][54]. This detection capacity might be the reason for a larger interest for researchers of using this method.
Schmidt's model was used in two studies detecting only recalibration. In comparative studies with Oort's method [55,56], only this type of RS is identified, but conclusions indicated that both approaches use different parameters to identify recalibration: Schmidt defines recalibration as the change in factor variances or factor loadings over time, while Oort by the change in intercepts.
Despite a major frequency of methods to detect recalibration, it calls the attention that all three studies using Relative Importance Method detected reprioritization [63][64][65]. Mixed-Effect Regression method [57,58], Classification and Regression Trees [22,61] and Latent Trajectory Analysis [66] neither provided information on reconceptualization.
During this study we observed that the sample size is related to the type of method used for RS detection. Then Test is more flexible and functional in small samples or in specific studies, while methods based on stronger statistic techniques require larger samples. According to Schwartz et al. [67], for using advanced multivariate methods capable to detect RS, data analysis must include sufficient individuals participating in the study, as well as considering certain number of parameters such a clear the model, loading factor, relationship between the items, data distribution, and processes for estimating these parameters.
The need of exploring new multivariate methodologies for analyzing this phenomenon brings an alternative proposal for studying HRQOL and RS detection through three-way data: dual STATIS Method [68,69]. The proposal presented by Vicente Galindo [70] centers in a procedure integrating the dual STATIS methods and the comparison of Krzanowski [71] subspaces, and examinates factorial structures from multiple data sets focused to identify the existence of change.
The systematic reviews evaluated in this study [15,23,[28][29][30][31] confirm the continuous use of the same methods for detecting RS and its different types, their advantages, and limitations. The previous proposal is an example of existing opportunities to continue examining other strong statistic methodologies allowing to deepen this line of investigation and reducing bias and analysis ambiguities.

Conclusion
The systematic review has demonstrated to become an adequate and convenient methodology to identify and synthetize advances of specific topics. Results have evidenced a generalized interest for studying HRQOL and RS for different diagnostic groups. RS detection continue attracting researcher's attention that have consolidated a set of methods for its analysis: Then Test, Oort's SEM, Multiple Linear and Mixed-Effects Regressions, Item Response Theory and Latent Trajectory Analysis. Oort's SEM becomes the most versatile method in its capability for detecting RS in all different types.
Results demonstrated that not all the methods achieve RS detection in similar proportions, mostly capable to identify recalibration, in some cases reprioritization, and in few studies reconceptualization. At the same time, previous systematic reviews and the result from this updating research conclude that same methods have been used during the last years and there is no evidence found of alternative statistical techniques proposed for detecting RS. Perhaps, our study states a need for exploring other methods with similar detection capacity, with robust and highly precise methodology either graphical oriented or with simpler methodologies.
Previous systematic reviews and the need to continue investigating about this phenomenon have motivated the update of this search within the last decade (2010-2020). Although these reviews have used a larger number of search terms including different RS types as well as data measurement characteristics, this systematic review has intentionally generalized the terms using RS and HRQOL to focus the results on those studies centered on this topic.
Since RS can be evaluated from different perspectives, and other disciplines such as statistics, psychology, humanities, and other related social sciences can provide significant contributions, it is recommended to continue