The Anti-Clot Treatment Scale (ACTS) in clinical trials: cross-cultural validation in venous thromboembolism patients

Background The Anti-Clot Treatment Scale (ACTS) is a 15-item patient-reported instrument of satisfaction with anticoagulant treatment. It includes a 12-item ACTS Burdens scale and a 3-item ACTS Benefits scale. Its role in clinical trials and other settings should be supported by evidence that it is both clinically meaningful and scientifically sound. The aim of the study was to evaluate the measurement performance of the ACTS (Dutch, Italian, French, German and English language versions) in patients with venous thromboembolism based on traditional psychometric methods. Methods ACTS Burdens and Benefits scale data from a large clinical trial (EINSTEIN DVT) involving 1336 people with venous thromboembolism were analysed at both the scale and item level. Five key psychometric properties were examined using traditional psychometric methods: acceptability, scaling assumptions, reliability (including internal consistency reliability, test-retest reproducibility); validity (including known groups and discriminant validity); and responsiveness. These methods of examination underpin the US Food and Drug Administration recommendations for patient-reported outcome instrument evaluation. Results Overall, the 12-item ACTS Burdens scale and 3-item ACTS Benefits scale met the psychometric criteria evaluated at both item and scale levels, with the exception of some relatively minor issues in the Dutch language version, which were just below reliability criteria (i.e. alpha = 0.72, test-retest intraclass correlation = 0.79). A consistent finding from item-level evaluations of aggregate endorsement frequencies and skewness suggested that response scales may be improved by reducing the number of response options from five to four. Conclusions Both the ACTS Burdens and ACTS Benefits scales consistently satisfied traditional reliability and validity criteria across multiple language datasets, supporting it as a clinically useful patient-reported instrument of satisfaction with anticoagulant treatment in clinical trials. Trial registration number NCT00440193


Background
Patient-reported outcome (PRO) instruments are rapidly becoming the primary or secondary outcome measures of choice in pivotal clinical trials, research and practice [1], which means that PRO data now have a key role in patient care, policy-making and prescribing. The quality of inferences made from clinical trials is dependent on the PRO instruments used, and thus they need to be scientifically robust and clinically meaningful [2]. This is increasingly acknowledged [3,4] and has led the US Food and Drug Administration (FDA) to produce guidelines [5] that specify minimum criteria for the scientific adequacy of scales in clinical trials.
Venous thromboembolism (VTE) encompassing deep vein thrombosis (DVT) and pulmonary embolism (PE) occurs with an incidence rate of 1 to 2 per 1000 persons per annum in Western countries, with two-thirds of cases presenting with DVT [6]. VTE can be idiopathic in nature or be associated with risk factors such as surgery, limb trauma or cancer [7]. Oral anticoagulant therapy with vitamin K antagonists (VKAs), alongside initial parenteral heparin, have proved effective in the secondary prevention of recurrent VTE [8,9]. However, VKA treatment involves regular monitoring and dose adjustment, owing to a narrow therapeutic window and an inherent variability arising from genetic and dietary factors. This can be challenging for the patient with the potential to limit long-term persistence and adherence. In addition, bleeding is an important side-effect of anticoagulation. Therefore, as new anticoagulant therapies become available, it will be essential to measure not only their effectiveness and safety in improving clinical outcomes, but also their effectiveness in improving patient satisfaction [10,11].
The Anti-Clot Treatment Scale (ACTS) is a 15-item patient-reported instrument of satisfaction with anticoagulant treatment. It includes a 12-item ACTS Burdens scale and a 3-item ACTS Benefits scale. The ACTS also includes two additional global questions (see Appendix A). The ACTS was developed based on the original conceptual model of the Duke Anticoagulation Satisfaction Scale (DASS) following a literature review, interviews with experts and patients, and qualitative cognitive debriefing interviews [10][11][12]. The original DASS included 25 items covering the limitations, 'hassles' and positive impacts related to anticoagulant treatment. Modification of the DASS focused on making the instrument more applicable to a wider range of respondentsin particular, patients with DVT and PE and those in different country settings. This was achieved through qualitative research involving patient interviews and consensus panels (further information is available from the authors). The key changes included simplification of the wording and structure of the original instrument, improving item stems, changing the response timeframe, reducing the response categories from 7 to 5, and selecting the most relevant items for patients undergoing the different types of anticoagulant treatment. The focus of the new instrument is to delineate the burdens and benefits associated with anticoagulation therapy, and is designed to be used in patients receiving longterm anticoagulation irrespective of the underlying condition.
If the ACTS is to be considered suitable for future measurement of the burdens and benefits of anticoagulation therapy in patients with VTE, it should satisfy stringent criteria as a reliable and valid instrument. This study provides clinical researchers with a comprehensive evaluation of the reliability and validity of the ACTS using traditional psychometric methods in line with current guidelines.

Setting and participants
Bayer Pharma AG provided anonymised, blinded ACTS Burdens and Benefits scale datasets from EINSTEIN DVT, a large clinical trial involving patients with acute symptomatic DVT treated with rivaroxaban or enoxaparin/VKAs [13]. The inclusion criteria included: patients aged 18 years or older with a diagnosis of acute symptomatic DVT without symptomatic PE. The EINSTEIN DVT study included data from 1336 patients across six time points (day 15, 1 month, 2 months, 3 months, 6 months and 12 months). The protocols associated with the EINSTEIN trial programme were approved by the institutional review board at each centre and written informed consent was obtained from all patients. For the current psychometric analysis reported in this paper, the earliest time point data for each trial was analysed (i.e. the first time that patients completed the ACTS Burdens and Benefits scales; day 15).
Patients were asked to complete a questionnaire booklet containing the ACTS and Treatment Satisfaction Questionnaire for Medication version 2 (TSQM II) during follow-up visits. The measurement performance of the ACTS Burdens and Benefits scales was evaluated in the following languages: Dutch, Italian, French, German and English, and then a pooled dataset of all language versions. In this paper, the scale-level analyses for the separate study/language versions (acceptability, scaling assumptions and reliability, including internal consistency reliability, test-retest reproducibility) and both item and scale-level analysis for the pooled language versions datasets (acceptability, scaling assumptions, reliability [including internal consistency reliability]; validity [including known groups and discriminant validity]; and responsiveness) are presented. Further information is available from the authors.

Instruments
The ACTS is a 15-item, patient-reported measure of satisfaction with anticoagulant treatment. It includes 12 items that assess the burdens of anticoagulant treatment and three items that assess the benefits of anticoagulant treatment. Patients are asked to rate their experiences of anticoagulant treatment during the past 4 weeks on a 5-point scale of intensity (1 = not at all, 2 = a little, 3 = moderately, 4 = quite a bit, 5 = extremely). The ACTS Burdens total score ranges from 12 to 60, and the ACTS Benefits total score ranges from 3 to 15. When used in clinical research, it is recommended that the ACTS Burdens scores are reverse-scored so that higher ACTS Burdens and Benefits scores indicate greater satisfaction with treatment. For the purposes of this psychometric evaluation, however, the original raw score data were analysed. French, Dutch, Italian, German and English language versions of the ACTS were created previously in accordance with a standard protocol to achieve conceptual equivalence in the translation, including: forward/backward translation, reconciliation, review and pilot testing [14]. Further information about the translation process is available from the authors.
For validation purposes, the TSQM II was also included. This is an 11-item PRO instrument that assesses patient satisfaction with treatment. It includes four scales: two items that assess the effectiveness of treatment (TSQM II Effectiveness), three items that assess side-effects (TSQM II Side-effects), three items that assess convenience of treatment (TSQM II Convenience) and two items that assess global satisfaction (TSQM II Global) [15]. Patients are asked to rate their experiences of treatment between 'extremely dissatisfied' and 'extremely satisfied' on 5-point to 7-point scales. Higher TSQM II scores indicate higher satisfaction with treatment.

Data analysis
Psychometrics is a well-established scientific field that is concerned with the measurement of subjective judgements using numerical scales and the evaluation of the measurement properties of such scales (e.g. reliability, validity, responsiveness). The most widely used methods for evaluating measurement performance are known as 'traditional' psychometric methods [16]. Traditional psychometric methods form the basis for the recent FDA guidelines [1,2,5] that specify minimum criteria for the scientific adequacy of PRO instruments in clinical trials. The methods and criteria selected for evaluating the psychometric performance of the ACTS are grounded in current widely accepted guidelines [3,4,[17][18][19][20], including the FDA guidance [5]. This methodology has been used extensively in previous research to develop and validate PRO instruments in other areas of medicine and surgery [21][22][23][24].
Based on data collected, the following psychometric properties of the ACTS were examined: acceptability (including data quality and targeting); scaling assumptions; internal consistency reliability and test-retest reproducibility; aspects of validity (including known groups and discriminant validity); and responsiveness. Table 1 summarises the psychometric methods and criteria used in this study to analyse and interpret results. Acceptability and reliability analyses were carried out on the separate language versions of the ACTS and the combined sample at baseline (N = 1336). Factor analysis and item convergent/discriminant validity analysis were conducted on the combined sample at baseline (N = 1336). Test-retest reproducibility and construct validity examinations comparing the TSQM II were conducted on ACTS data from a separate sub-sample of patients at 3 months (Burdens scale, n = 792; Benefits scale, n = 822). Responsiveness analysis was carried out on ACTS data from a separate sub-sample of patients who completed the ACTS at baseline and 3 months (Burdens scale, n = 1227; Benefits scale, n = 1257).

Sample
The EINSTEIN DVT ACTS validation dataset included 1336 patients (96% response rate) at day 15  Psychometric properties: scale level by study/language version (Dutch, Italian, French, German, English) Acceptability: data quality and targeting For each language version, there was a low level of missing data for all item and scale scores for both the ACTS Burdens and ACTS Benefits (scale level <5%). This means that scale scores could be computed for >95% of patients. There was a reasonable distribution of ACTS Burdens scores (mean 60%, range 54-77%), and an excellent distribution of ACTS Benefits scores (mean 100%). Floor and ceiling effects were generally low for both scale scores (mean 5%, range 0-14%) and data skewness was slightly higher for the ACTS Burdens scores than for the ACTS Benefits scores (mean −1.02 Table 2).

Reliability: internal consistency, test-retest and homogeneity coefficients
Across EINSTEIN DVT datasets, Cronbach's alpha and test-retest intra-class correlations for both ACTS Burdens and ACTS Benefits scores were acceptable (>0.82), with the exception of the Dutch version (alpha = 0.79; test-retest = 0.72). The homogeneity coefficient mean ranged from 0.24 to 0.75 (Table 2).
Psychometric properties: item and scale level by combined language versions (EINSTEIN DVT pooled language datasets) Acceptability: data quality and targeting There were minimal missing data for all item and scale scores (<4%). Therefore, scale scores could be computed for >96% of patients, which was slightly higher than the individual country analysis owing to the effect of pooling the datasets. At the scale level, there was a good distribution of ACTS Burdens scores (77%) and an excellent distribution of ACTS Benefits scores (100%). Scale-level floor and ceiling effects were generally low for both scale scores (range 0-11%). Data skewness was slightly higher for the ACTS Burdens scale scores than for the ACTS Table 1 Summary of psychometric methods

Psychometric property
Definition/criteria for acceptability

Acceptability
Assessed by data quality and targeting. Data quality refers to the completeness of item-and scale-level data. Assessed by completeness of data; criterion for missing data <10% [20]. Targeting is the extent to which the range of the variable measured by a scale matches the range of that variable in the study sample. Assessed by: maximum endorsement frequencies <80% [17], aggregate endorsement frequencies a >10% [17], and skewness statistic −1 to +1 [35][36][37], proximity of scale mean score to scale midpoint b (no fixed criterion but closer matches indicated better targeting) [38], and acceptable distribution of ACTS Burdens scores c (no fixed criterion but closer to 100% indicates better targeting) [39] Scaling assumptions Tests of scaling assumptions assess the extent to which it is legitimate to sum a set of items, without weighting or standardisation, to produce a single total score. This criterion is satisfied when items have adequate corrected-item total correlations ≥0.30 [38,40] and the proposed grouping of items in each subscale is correct. Assessed by using two complementary approaches: principal components analysis (factor loadings >0.30, cross-loadings <0.20) and item convergent and discriminant validity (item own-scale correlations >0.30, magnitude >2 standard errors than other scales)

Reliability
Reliability is the extent to which scale scores are not associated with random error

Internal consistency reliability
The precision of the scale based on the homogeneity (intercorrelations) of items at a single point in time. Assessed using Cronbach's alpha ≥0.80 [41,42], mean item-item correlations (known as the homogeneity coefficient) ≥0.30 [37] and item-total correlations ≥0.30 [42] Test-retest reproducibility This is based on the agreement between people scores at screening and baseline, and estimates the ability of components and scales to produce stable scores [34]. For adequate test-retest reproducibility, scale-level intraclass correlation coefficients ≥0.80 [40] and item-level intraclass correlation coefficients ≥0.50 [43] should be achieved Benefits scale scores (sk = −1.08 and sk = −0.80, respectively). At the item level, the ACTS Burdens scale ceiling effects ranged from 37% to 77%. In both datasets, in relation to aggregate endorsement frequencies, for all items, three of five response categories met the >10% criterion, but two of five response categories were <10% (between response categories 4 and 5) and 10 of 12 items fell outside the skewness criterion (−1, +1). The ACTS Benefits scale had much lower ceiling effects, which ranged from 15% to 19%. In relation to aggregate endorsement frequencies, for all items, three of five response categories met the >10% criterion, but two of five response categories were <10% (between response categories 4 and 5) and all items passed the skewness criterion (Table 3).

Psychometric properties: scaling assumptions
Item groupings in the ACTS Burdens and ACTS Benefits scales passed tests for scaling assumptions. Corrected-item total correlations for both scales ranged from 0.39 to 0.80, satisfying the recommended criteria (>0.30). This indicated that items in each scale measured a common underlying construct and contained a similar proportion of information. In addition, principal components analysis factor loadings (>0.48) and tests of item convergent/discriminant validity (>0.39) supported this finding, thus further indicating that all items in each of the scales passed the criteria (Tables 3, 4 and 5).

Psychometric properties: validity
Overall, the correlations with the four TSQM II scale scores were consistent with predictions (4/4 correlations meeting predictions; Table 6). Known groups validity was supported for both the ACTS Burdens and ACTS Benefits scale scores on the global items (p < 0.0001; further information available from the authors). Discriminant validity correlations suggest no bias by age or sex (r < −0.16).

Psychometric properties: responsiveness
The pattern of mean scores over time suggested a trend to higher scores in the ACTS Burdens and ACTS Benefits scales over the six time points assessed (day 15, 1 month, 2 months, 3 months, 6 months, and 12 months). Responsiveness statistics comparing day 15 scores with all the other time points individually supported a trend of increasingly higher ACTS Burdens and Benefits scales scores over time, with low but increasing effect size statistics (range −0.14 to −0.37 and −0.03 to −0.33, respectively) ( Table 7).

Discussion
Current PRO instrument guidelines [3][4][5] make it increasingly important for clinical researchers to understand the science behind the instruments used to try to capture the patient perspective. In this study, both the ACTS Burdens and ACTS Benefits scales satisfied traditional psychometric criteria for data quality, scaling assumptions, targeting, reliability, validity and responsiveness. In fact, its psychometric properties were found to be remarkably stable across different cultural groups, supporting pooling of data. This study, together with previous work on conceptual model development [10,11], provides an initial evidence base for its use in clinical trials and other settings (e.g. post-market surveillance, clinical research and in practice), in line with the current FDA guidelines ( Table 8). The ACTS can be used to evaluate and compare different therapies in patients with DVT [25], it is acceptable to patients, and has a simple checklist format that can be completed easily and quickly. Importantly, the ACTS measures aspects of treatment satisfaction, treatment adherence, relevance (e.g. burdens surrounding treatment regimens, impact on daily activities, and the possibility of bruising and bleeding) and important positive outcomes to patients (e.g. benefits surrounding assurance and confidence in treatment) [11]. Overall, the 12-item ACTS Burdens and 3-item ACTS Benefits scales met the psychometric criteria evaluated at both item and scale levels. Item-level targeting was ACT anticoagulation treatment, ACTS Anti-Clot Treatment Scale, CITC corrected item-total correlation, IIC item-item correlation, SD standard deviation, -analyses not conducted owing to lack of data.
adequate given the nature of the target construct (i.e. a scale that taps into aspects of treatment satisfaction would be expected to result in a degree of skew to the positive in score distributions) [26]. Scaling assumptions were also broadly supported, as were criteria for internal consistency reliability at item and scale level and scalelevel test-retest reproducibility. Validity was also supported by assessments of discriminant validity and known-groups comparisons. Finally, responsiveness analyses supported increasing improvement over time in both treatment satisfaction scores. Looking forward, three areas require further consideration: construct validity, response options and further exploration of responsiveness. First, construct validity analyses in the form of testing hypothesised correlations between the ACTS Burdens scale and four scales of the TSQM II were supported in the pooled dataset but were slightly lower than expected. This issue may reflect the fact that, although both measures focus on treatment satisfaction, the constructs captured by the ACTS and TSQM II are more distinct than would be first expected. On closer inspection, the TSQM II items that capture 'Effectiveness' , 'Side-effects' , 'Convenience' and 'Global satisfaction' are significantly different from the ACTS Burdens and Benefits items. Thus, despite some overlap between the two instruments, there are key differencesfor example, the TSQM II Effectiveness scale has one of two items addressing symptom alleviation, not directly impacted by anti-coagulation or measured by ACTS. Furthermore, the TSQM II Side-effects items do not directly address the important anticoagulation-specific side-effects of bleeding or bruising, which are measured in ACTS Burdens items. In addition, the TSQM II is more narrowly targeted at the medicine, the ACTS being more inclusive of the services and difficulties of undergoing anticoagulation therapy. Thus, the findings from the analyses should be interpreted with these facts in mind.
The second issue that requires further exploration is that, across all language versions, findings from itemlevel tests of aggregate endorsement frequencies and skewness suggested that response scales may be improved by reducing the number of response options from five to four. This is also reflected in the findings from the scale-level targeting, which also revealed slightly skewed distributions across the board. One  potential cause for this is that there may simply have been too many response options for respondents to discriminate between, especially at the more satisfied extreme of the choices. This is not uncommon in PRO instruments [27] and it has been found previously that four categories work better than five [28]. Another possibility is that the response category labelling is problematic. The present findings uncovered a consistent issue in the way in which patients responded in the 'not at all' and 'a little' categories. Therefore, a reconsideration of wording in these response categories may also help to improve measurement performance. However, given that the current validation is limited to the five-option response scale, this is a matter for consideration in future development of the ACTS. The responsiveness analyses, which can be considered to be preliminary, revealed a modest but stepwise increase in ACTS Burdens and Benefits scale scores over the six time points. The associated responsiveness statistics were moderate but were in the range that would be expected clinically. This is because scale responsiveness and treatment effectiveness are inseparably linked [29,30]. The effect sizes computed on ACTS Burdens and Benefits scale scores from day 15 to all other time points are indicators of the ability of these scales to detect change. However, these are also an indicator of the size of the treatment effect. To put the present findings in context, it may be useful to consider the effect sizes of other interventions. Thus, effect sizes associated with hip arthroplasty have been shown to be very large (3.1) [31]. This would be expected given the dramatic impact of this surgical intervention of pain symptomatology. By contrast, the effect of carpal tunnel repair on grip strength is small (0.2) [32]. A degree of improved treatment satisfaction associated with anticoagulant treatment would be expected to occur over time, but would not be expected to be as marked as intensive interventions. However, the clinical meaning of the ACTS Burdens and Benefits scale change scores and specification of what constitutes an important difference based on these scores are matters for consideration in future development of the ACTS. Given some of the potential limitations of traditional responsiveness statistics [33], further evaluations would be desirable using more sophisticated modern rating scale analysis techniques [34] to further delineate the specific ability of the ACTS Burdens and Benefits to detect differences between and clinically meaningful change within patients.
Our study has two key limitations. First, although the scope of our psychometric evaluation of the ACTS in patients being treated for acute DVT was relatively comprehensive, there are further analyses that would aid our understanding of the measurement performance of the ACTS Burdens and ACTS Benefits scales. These would include further known groups and discriminant validity tests based on clinically sensible sub-grouping (based on predefined hypothesis-driven selection) and responsiveness analyses assessed against a priori clinically anchored hypotheses. The second limitation is that the   Responder definitionused to identify responders in clinical trials for analysing differences in the proportion of responders between treatment arms Change in score that would be clear evidence that an individual patient experienced a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or non-clinical anchor, an empirical rule, or a combination of approaches remembering to take your medicine at a certain time, taking the correct dose of your medicine, following a diet, limiting alcohol, etc.). 7 How much of a hassle (inconvenience) are the occasional aspects of anti-clot treatment? (e.g. the need for blood tests, going to or contacting the clinic/doctor, making arrangements for treatment while travelling, etc.).
Now I want to ask you about daily and occasional aspects of your anticoagulation therapy during the past 4 weeks 8 How difficult is it to follow your anti-clot treatment? 9 How time-consuming is your anti-clot treatment? 10 How much do you worry about your anti-clot treatment? 11 How frustrating is your anti-clot treatment? 12 How much of a burden is your anti-clot treatment? 13 Overall, how much of a negative impact has your anti-clot treatment had on your life? 14 How confident are you that your anti-clot treatment will protect your health? (e.g. prevent blood clots, stroke, heart attack, DVT, embolism) 15 How reassured do you feel because of your anti-clot treatment? 16 How satisfied are you with your anti-clot treatment? 17 Overall, how much of a positive impact has your anti-clot treatment had on your life? Competing interests LB is an employee of Bayer Pharma AG. SC, DL and SS were supported in part through a grant from Bayer Pharma AG.
Authors' contributions SC conducted, analysed and interpreted the data and wrote the manuscript. DL was involved in guiding the study, including design and analysis of data. LB and SS were involved in providing additional input and reviewing drafts of this manuscript. All authors read and approved the final manuscript.

Copyright
Copyright of the ACTS instrument is held by Bayer AG, Germany (2006). All rights reserved. For information on or permission to use, please contact Mapi Research Trust; http://www.mapi-trust.org.