Preliminary Qualitative Study
As part of a preliminary qualitative study, cognitive interviews were conducted with 22 GAD patients (77.3% female), ranging in age from 21 to 59, to better understand the interpretation of the GA-VAS and the GA-VAS response process from the patient perspective. Patients were asked to think aloud while completing the GA-VAS so that the interviewer could hear how it was interpreted and how a response was selected.
Psychometric Study Design
After cognitive testing, the GA-VAS was included in a clinical trial assessing two approved pharmaceutical treatments for anxiety. Analyses were aimed at providing evidence of the reliability, responsiveness, validity, and utility of the GA-VAS.
Data were collected during a randomized, 4-week, double-blind, multi-center, fixed-dose, placebo-controlled, parallel-group clinical study conducted in the United States. Lorazepam was selected as a fast-acting benzodiazepine, and paroxetine, a selective serotonin reuptake inhibitor, was chosen as a slower-acting GAD pharmacotherapy. There were three treatment arms—lorazepam (1.5 mg TID), paroxetine (20 mg QD), and placebo—and three phases to the study: (1) a 1-week screening phase (Days -7 to -1) during which eligibility was determined; (2) a 4-week double-blind treatment phase (Day 1 or baseline through Week 4); and (3) a 5-day double-blind treatment phase (Week 5) during which therapy was down-titrated. Patients completed the GA-VAS during six clinic visits (screening, baseline, Weeks 1, 2, 4, and 5) and at home each night during the screening week and first week of treatment.
Participants
Otherwise healthy individuals, aged 18 to 65 with a primary diagnosis of GAD as determined by a structured clinical interview, and a HAM-A total score ≥ 20 were eligible for inclusion. To ensure prominence of anxiety symptoms over depression symptoms, patients were required to have a Covi Anxiety Scale [19] score ≥ 9 and a Raskin Depression Scale [20] score ≤ 7. These psychiatric rating scales have long been used in clinical trials and have been shown to be valid tools for differentiating anxious and depressed patient subgroups [21, 22]. Subjects were excluded from study participation if they had significant suicidal risk, had failed treatment with lorazepam or paroxetine in the past, required daily benzodiazepine use in the three months prior to study participation, or if they had most other concurrent DSM-IV mental disorders, including major depressive disorder, panic disorder with or without agoraphobia, acute stress disorder, obsessive compulsive disorder, dissociative disorder, posttraumatic stress disorder, social anxiety disorder, anorexia, bulimia, caffeine-induced anxiety disorder, alcohol or substance abuse or dependence, premenstrual dysphoric disorder, or antisocial or borderline personality disorder. Subjects with current or past diagnoses of schizophrenia, psychotic disorders, delirium, dementia, amnestic disorders, clinically significant cognitive disorders, bipolar or schizoaffective disorder, benzodiazepine abuse or dependence, or factitious disorder were also excluded. Patients were not permitted to use any psychotropic medications and could not have initiated any psychodynamic or behavioral psychotherapy for anxiety within the 3 months prior to the study.
Instruments
General Anxiety - Visual Analog Scale
The 100 mm GA-VAS, shown in Figure 1 (not to scale), was administered at all clinic visits and at home in a daily diary format. The distance from the left edge of the line to the mark placed by the patient is measured to the nearest millimeter and used in analyses as the patient GA-VAS score.
A number of additional measures were included in the present psychometric evaluation study to help assess the construct validity of the GA-VAS. Both the HAM-A [13] and the Hospital Anxiety and Depression Scale (HADS) [23, 24] were completed during clinic visits. The HAM-A is a clinician-reported measure of 14 items assessing both psychic or cognitive (anxious mood, fears, intellectual impairment, etc.) and somatic or physical symptoms of anxiety (muscular complaints, cardiovascular symptoms, gastrointestinal symptoms, etc.) on a 5-point severity scale (0 = "Not present" to 4 = "Very severe"). The HADS is a 14-item self-report measure designed to screen for mood disorders in medically ill patients. Seven HADS items assess anxiety and seven assess depression on a 0-to-3 response scale; anxiety and depression are scored separately. Like the GA-VAS, higher scores on the HAM-A and HADS reflect greater severity.
Two self-report instruments gathered generic information about patient quality-of-life, the 36-item Medical Outcomes Study Short Form - 36 (SF-36) [25, 26] and the 14-item General Activity subscale of the Quality of Life Enjoyment and Satisfaction Questionnaire (QLES-Q) [27]. For each item of the QLES-Q, the respondent uses a 5-point scale ranging from 1 = "Very poor" satisfaction to 5 = "Very good" satisfaction; higher scores indicate greater quality-of-life and satisfaction. The SF-36 assesses eight dimensions of health-related functioning and quality-of-life: Physical Functioning, Physical Role, Bodily Pain, Social Functioning, General Mental Health, Emotional Role, Vitality, and General Health Perceptions. Each subscale is scored from 0 to 100, with higher scores indicating better functioning and quality-of-life.
The Clinician Global Impression of Severity (CGIS) [28] is a single-item rating that asks the clinician to evaluate the severity of the patient's GAD symptoms on a 7-point scale (1 = "Not at all ill" to 7 = "Among the most extremely ill patients"): "Considering your total clinical experience, how severe are the patient's symptoms now, compared to your experience with other patients with the same diagnosis?" The Clinician Global Impression of Change (CGIC) and Patient Global Impression of Change (PGIC) are two additional items that address change in the severity of a patient's illness over a particular time interval, in the present context "since the start of the study." The CGIC, like the CGIS, is completed by the clinician, whereas the PGIC is patient-reported. Both items employ a 7-point response scale (1 = "Very Much Improved" to 4 = "No Change" to 7 = "Very Much Worse").
Statistical Methods
Reliability
At-home test-retest reliabilities were computed using stable patients whose HAM-A change scores from screening to baseline (randomization) was 1 point or less. Data from Day -6 were used as the initial or "test" administration and Day -5 as the "retest" administration; reliabilities were also calculated for Day -5 to -4, Day -4 to -3, Day -3 to -2, and Day -2 to -1. Intraclass correlation coefficients (ICCs) were computed using a two-way (subjects × time) random effects analysis of variance (ANOVA) model as recommended by Schuck [29] and Shrout and Fleiss [30].
Responsiveness
For utility in clinical trials, it is important that the GA-VAS be capable of detecting change over time, preferably at more than one time-point to understand the onset and durability of the effect. Guyatt's responsiveness statistic [31] is an effect size estimate recommended for use in the evaluation of responsiveness. We calculated Guyatt's statistics at Weeks 1, 2, and 4 in order to compare three different types of HAM-A responders to non-responders. Initial responders were defined as those patients who achieved ≥ 50% reduction in HAM-A scores at Week 1, regardless of their responder status at Weeks 2 and 4; partial responders were patients who achieved ≥ 30% reduction in HAM-A scores at Week 1 (again, regardless of responder status at Weeks 2 and 4); sustained responders were patients who achieved ≥ 30% reduction in HAM-A scores at Weeks 1 and 2, and ≥ 50% reduction in HAM-A scores at Week 4. It was anticipated that Week 1 responsiveness statistics comparing initial responders and non-responders would be greater than responsiveness statistics comparing partial responders and non-responders or sustained responders and non-responders, with the responsiveness statistics based on the latter two comparisons being very similar at Week 1. To the extent that GAD symptoms return at Weeks 2 and 4 in initial and partial responders, it was expected that those responsiveness statistics would become smaller in size. It was further expected that the Guyatt's statistics involving sustained responders and non-responders would maintain a high level of responsiveness over all three time-points.
Computing change as the difference between Day 1 (baseline) and Week 1 (or Week 2 or Week 4), we calculated Guyatt's responsiveness statistics [31] for the three different responder definitions at three time-points:
The resulting value is a measure of the effect of treatment on GAD symptoms. Cohen [32] provides a general rule-of-thumb for the interpretation of such effect size estimates: effect sizes of about 0.20 represent small effects, those of about 0.50 represent moderate effects, and those greater than about 0.80 represent large effects.
It is also important to demonstrate that the GA-VAS is sensitive to differences between treatment groups. We computed Cohen's [32] effect size estimate at Weeks 1, 2, and 4 in order to compare each active treatment to placebo: (MeanTreatment - MeanPlacebo)/SDPooled
Construct Validity
Construct validity describes the relationships among multiple indicators of a construct and the degree to which they follow predictable patterns [33]. Correlations between the GA-VAS and the HAM-A, HADS, QLES-Q, SF-36, and CGIS were computed using data collected during clinic visits at screening, baseline, and Weeks 1, 2, and 4. It was expected that the GA-VAS would correlate relatively highly with the other measures of anxiety—the HAM-A, HADS-Anxiety, and CGIS. As evidence for divergent validity, it was also anticipated that the GA-VAS would correlate more highly with the HADS-Anxiety score than with HADS-Depression and also more highly with the QLES-Q and the mental functioning subscales of the SF-36 (i.e., Emotional Role, Mental Health, Social Function, Vitality) compared to the SF-36 physical functioning subscales (i.e., Physical Function, Physical Role, Bodily Pain, General Health).
Minimum Important Differences (MIDs)
Another useful property of an outcome measure is the MID or the smallest change in a score from baseline that patients perceive as beneficial and would be clinically significant. Several methods have been proposed to assess clinically meaningful change, for example, patient- and physician-based global judgments and statistical criteria. One relatively common approach is to examine the distribution of change scores on a measure in conjunction with patients' global ratings of change [34]. In the present analysis, both PGIC and CGIC data were used as anchors to produce MID estimates. A simple MID estimate is taken to be roughly equivalent to the mean GA-VAS change of patients who reported they were "Minimally Improved."
The standard error of measurement (SEM), as recommended by Wyrwich et al. [35], for the GA-VAS was also computed:
where SD is the standard deviation of the subscale score and r is the test-retest reliability estimate. This is a distribution-based MID estimate that also considers measurement precision and has been shown to be relatively stable across populations [36]. We also explored the use of a 0.5 standard deviation (half-SD) unit change in the GA-VAS [37] as a final estimate of MID.