Sensitivity as outcome measure of androgen replacement: the AMS scale

Background The capacity of the AMS scale as clinical utility and as outcome measure still needs validation. Methods An open post-marketing study was performed by office-based physicians in Germany in 2004. We analysed data of 1670 androgen-deficient males who were treated with testosterone gel. The AMS scale was applied prior to and after 3 months treatment. Results The improvement of complaints under treatment relative to the baseline score was 30.7% (total score), 27.3% (psychological domain), 30.5% (somatic domain), and 30.7% (sexual domain), respectively. Patients with little or no symptoms before therapy improved by 9%, those with mild complaints at entry by 24%, with moderate by 32%, and with severe symptoms by 39% – compared with the baseline score. We showed that the distribution of complaints of testosterone deficient men before therapy almost returned to norm values after 12 weeks of testosterone treatment. Age, BMI, and total testosterone level at baseline did not modify the positive effect of androgen therapy. We also demonstrated that the AMS results can predict the independent (physician's) opinion about the individual treatment effect. Both, sensitivity (correct prediction of a positive assessment by the physician) and specificity (correct prediction of a negative assessment by the physician) were over 70%, if about 22% improvement of the AMS total score was used as cut-off point. Conclusion The AMS scale showed a convincing ability to measure treatment effects on quality of life across the full range of severity of complaints. Effect modification by other variables at baseline was not observed. In addition, results of the scale can predict the subjective clinical expert opinion on the treatment efficiency.


Background
The Aging Males' Symptoms (AMS) scale was originally developed in Germany as a health-related quality of life scale (HRQoL) [1]. The scale was designed as self-administered scale (a) to assess symptoms of aging (independent from those which are disease-related) between groups of males under different conditions, (b) to evaluate the severity of symptoms over time, and (c) to measure changes pre-and post androgen therapy [1]. It was developed in response to the lack of fully standardized scales to measure the severity of aging symptoms and their impact on HRQoL in males, specifically [2,3]. It was recently demonstrated by a French research group that the AMS scale measures HRQoL similarly in younger (even 20-30 years old) and older persons [4]. Right from the beginning a possible screening potential of the AMS scale was controversially discussed. Therefore we compared the AMS scale with internationally well-known screening instruments for androgen deficiency in adult males (ADAM scale of Morley et al [5] and the Screener of Smith' et al [6]. We found that the AMS has obviously similar test characteristics as both screening instruments [7]. Later, the similarity of AMS and ADAM was confirmed in another study [8]. In addition, Kratzik et al [9] observed in a populationbased cross-sectional study in Vienna an impressive association between subscales of the AMS and free testosterone level when age and body mass index was multivariately taken into account. Recently, a Japanese research group under Itoh et al [10] and Soh et al [11] observed a correlation between the AMS scores and the testosterone level. A Polish research group found a similar but less clear result [12]. Other studies, however, could not find associations of the AMS scores with testosterone level [13,14].
Meanwhile, the AMS scale was internationally well accepted: it is now available in 21 languages [2,15,16], and can be down loaded from the internet http:// www.aging-males-symptom-scale.info.
The evaluation of the AMS scale is simple; the scheme has been published [4]. Norm values to compare with were determined [1,3]. Conventional psychometric requirements of test reliability and validity have been successfully achieved and published [17,18]. A point was reached to demonstrate the capacity of the scale to reliably measure the effect of androgen treatment or to predict the magnitude of the therapeutic effect subjectively perceived by the treating physician. To this end, many clinicians use the term "validity" and mean high utility for clinical work or research.
It was shown in a previous publication that the AMS scale meets the requirements of a clinical utility and outcomes sensitivity [7]. The focus of this paper however is to analyze if variables such as age, body mass index (BMI), severity of complaints, and testosterone level before treatment effect the outcome measured with the AMS scale.

Methods
An open post-marketing study was conducted together with office-based physicians in Germany in 2004. The study monitored the effect of androgen substitution on complaints as well as adverse reactions of a licensed testosterone gel product (Testogel JENAPHARM ® ) under routine conditions. The eligibility of male patients for androgen therapy was determined by the prescribing physician, i.e. following the recommendations of the International Society for The Study of the Aging Male (ISSAM) [19] for testosterone treatment in patients with androgen deficiency. No other inclusion or exclusion criteria were set up except the consent between the patient and his treating doctor. The needed sample size was not determined for this observational three months follow-up study.
The observation encompassed three visits which were documented in a short form: before treatment, after 4-6 weeks, and at the end of the observation period of 12 weeks. A short questionnaire was completed by the treating physician to characterize the patient at baseline, after 4-6 weeks androgen treatment, and at the end of the study (3 months). The physician subjectively assessed the treatment effect at each of the three visits, and listed also adverse reactions on a specific form.
We got for this methodological paper a database with the AMS scale completed before therapy and after 12 weeks, age, BMI and total testosterone (TT). However, only a subsample had TT values available.
The computerized data of this post-marketing study were analyzed with conventional statistics using the statistical package SAS 9.1 ® . n a number of men who had no missings in certain variables which were used as independent variables for analysis; n b proportion of the total of 1670 men who provided information on a certain parameter.

Results and discussion
Altogether, 1670 patients were available for analysis. A general description of the group analyzed in this paper is given in Table 1. The mean age was of 56.4 years, the mean BMI was 26.8, and the TT at baseline was 2.5 ng/ml. A great proportion of the participants had a medical history of one or more chronic diseases. In addition, alcohol consumption and smoking were quite frequent (see table  1).
The HRQoL improved after 12 weeks of testosterone-gel application as measured with the total score of the AMS scale. Relative to the scores at baseline, the total score, the scores for the psychological, somato-vegetative, and sexual scores improved 30.7%, 27.3%, 30.5%, and 30.7% compared with the baseline score, respectively. This is an almost identical relative improvement of the HRQoL as shown in an earlier study associated with injectable testosterone [7].
The higher severity of complaints at baseline the greater is the improvement as demonstrated for the AMS total score in Figure 1. This applies also for the three sub-scales (data not shown). This was expected but an important observation with impact on the methodological assessment (validity) of the AMS scale.
Compared with the distribution of complaints in the normal population (=norm values [1,3]), the markedly altered HRQoL in this androgen deficient males shifted towards the "norm" of the male population over 40 years after androgen treatment (Figure 2). Patients under 40 were excluded for this comparison (n = 137). It seems to be important to underscore: There was a positive effect Improvement of complaints under androgen therapy Figure 1 Improvement of complaints under androgen therapy. Difference between pre-and post-treatment AMS total score divided by pre-treatment score in percent (%). Stratification by four categories of severity of complaints at baseline. (taken numbers at face value) detectable even in males with mild or even little symptoms. Almost identical observations concerning improvement of complaints after androgen therapy were made for injectable testosterone in a previous study [17]. Thus, two independent observational studies (with the same methodology) found the same pattern, but there is still a need for confirmation in a randomised clinical trial.
To answer the question if the treatment-related improvement depends on age, BMI, and testosterone level at baseline, we run a stratified analysis (one-way analysis of variance). Table 2 shows the relative improvement after therapy for the AMS total score, and also for the three subscores. There is neither much difference in relative improvement among age, BMI or testosterone groups nor among subscales. All significant changes of HRQoL range around 30% improvement compared with baseline (before therapy). There is no clinically consistent and relevant impact on the magnitude of improvement of the three variables at baseline (age, BMI, TT) considering the size of the standard deviation (in brackets), although the Tuckey-Test showed a few significant effects of age and BMI but with contradictory direction of the trend of the effect, i.e. random findings due to multiple testing cannot be excluded. It cannot be excluded either that the study group is too homogeneous to find small effects.
Another question was to what extend the evaluation with the AMS scale could "predict" the opinion of the treating physician regarding "success" of the androgen therapy. The treating physician subjectively assessed the "success" without knowing the results of the AMS he had no access to. Sensitivity (correct prediction of a positive assessment

Percent (%) in population / patients aged 40 years and older
by the physician) and specificity (correct prediction of a negative assessment by the physician) are important characteristics for a test that intends to "diagnose" successful therapy -like AMS in this case (predictive validity).
As can be seen in Table 3, we plotted the sensitivity and specificity in a kind of ROC analysis against the degree of improvement of complaints found under androgen treatment (difference between the pre-and post-treatment score on AMS as percent of pre-treatment total score). More than 22% relative improvement of the total AMS score seems a suitable cut-off point for "diagnosing treatment success": both sensitivity and specificity were about 70% and thereby acceptably high. In other words, the AMS scale is able to assess the treatment success with sufficient good test characteristics, validated against the expert opinion of the treating physician. It could be an advantage of a "success diagnosis" with the AMS scale because it is directly based on patients' view, whereas the physician's assessment could be rather a varying mixture of theoretical expectation/experience and patient's report. Anyway, since the methodological characteristics of the AMS scale are pretty good it is worthwhile to get experience with this tool in clinical practice. It might be recommendable to apply a standardized "objective" scale like the AMS scale in clinical studies, and in addition -if necessary -the subjective, not standardizable judgment of a physician.
It seems important to underline that this paper has a sufficient basis to describe the validity of the scale, but there is no intention to discuss the efficacy of testosterone substitution. This would be the task of a double-blinded, placebo controlled trial. We could just demonstrate that the AMS as measure of health-related quality of life (HRQoL) is able to detect changes in quality of life following androgen substitution in androgen-deficient males -irrespective of the specific drug used as treatment or the specific underlying diagnosis in this observational follow up study. This confirms results of an earlier post-marketing study [7] that showed acceptably good validity.
The AMS scale can be applied in clinical trials both in young hypogonadel men as well as in late-onset hypogonadism according to French experience [4]: The scale measures the same phenomenon in different age groups, and no obvious differences exist in reference values. We recommend the use of the total score as outcome measure, but domain scores can be used as well. From our current experience we would conclude that the same cut-off point should be used as "treatment success" when plan-  ning a trial. However, we had no access to data from randomized clinical trials to really investigate this issue.

Conclusion
The AMS scale showed a convincing ability to measure treatment effects on quality of life across the full range of severity of complaints. Effect modification by other variables at baseline (age, BMI, and testosterone level) on the "treatment associated effect on AMS score" was not observed. In addition, results of the scale can predict the subjective clinical expert opinion on the treatment efficiency.