The MRS scale was developed (a) to assess symptoms of aging/menopause (independent from those that are disease-related) or HRQoL between groups of women under different conditions, (b) to evaluate the severity of symptoms over time, and (c) to measure changes pre- and post hormone replacement therapy. The aim of this paper was to empirically demonstrate that the latter claim is evident.
Reliability and validity are important to show the usefulness of the scale as a clinical utility in monitoring treatment effects – once all other methodological requirements are successfully demonstrated before. Reliability measures (internal consistency and test-retest stability) were found to be good across countries . Regarding validity it was shown that the internal structure of the MRS across countries was sufficiently similar to conclude that the scale really measures the same phenomenon .
The comparison with another scale for aging women – although not a validated HRQoL scale (Kupperman) – showed sufficiently good correlations of the total score, which is compatible with the notion of a good criterion-oriented validity. The same is true for the comparison with the generic quality-of-life scale SF36 where also high correlation coefficients have been shown [3–5]. Another fact in favor of the scale is that it was translated into 10 languages so already [7–9].
Having the above-mentioned psychometric data available, a point was reached to critically evaluate the capacity of the scale to reliably measure health-related effects of hormone treatment independent from the severity of complaints and – in addition – to the comparison of treatment effects measured by the MRS scale and the subjective assessment by the treating physician. To this end, many clinicians use the term "validity" and mean high utility for clinical work or research.
The only hormone treatment study with the MRS scale as outcome measure in women during menopausal transition we could get data for methodological analysis was the above described postmarketing study. We hope to repeat/confirm this analysis with data of a more stringently designed clinical trial. But even on the basis of a methodologically weak dataset, in absence of other data, we got re-assuring methodological information about the MRS scale.
It is a well-established experience that women with menopausal complaints respond to hormone therapy with a marked improvement of the HRQoL. This is what the MRS scale should be able to detect.
We saw that the increased mean MRS total score at baseline (before treatment) markedly decreased after 6 months under treatment indicating a significant improvement of complaints & HRQoL. This was also the case for the mean scores of the three subscales. These data cannot disentangle the effect of treatment and "natural variation" of complaints over time. This however was not the point: It was not the intention of this paper to evaluate effectiveness of hormone therapy in an uncontrolled post-marketing study.
The absolute improvement of symptoms during treatment was 9.3 points of the MRS total score on average. This is equivalent to 36% of the baseline score, and similar also for all three subscales. In other words, the MRS scale was shown to be successful in detecting treatment effects. The impressive magnitude of the therapy-related improvement of HRQoL should be obviously discussed in the context of selection of women with complaints susceptible for this kind of treatment by the participating gynaecologists. Another critical remark is that we cannot comment as to what extend the MRS scale is able to measure true or placebo treatment effects. But this is more a question concerning efficacy and the study draw any conclusions in this regard by definition of the study design.
To answer the question whether the sensitivity of the MRS scale is good enough to detect even treatment-related changes in women with only little or mild symptoms as compared with severe ones, the analysis was stratified. An improvement of complaints/QoL was seen in an increasing degree in patients with little, mild, moderate and severe symptoms at baseline. The relative improvement increased with the degree of severity of symptoms at baseline, which is consistent with the general expectation. It seems to be important to underscore: The MRS scale seems to detect also a positive treatment effect in women with little complaints – although to a lesser degree.
Moreover, we showed the capacity of the MRS scale to determine therapeutic efficiency with another approach: a face-value-comparison with norm values of the population [2, 3]. The level of complaints in patients before therapy expressed a higher degree of severity (higher MRS total score). After 6 months of hormone treatment the frequency distribution of patients with a certain severity of complaints returned towards a similar distribution as observed in the general population. The extreme proportion of patients with no/ little complaints after therapy should be again seen in the context of apparent patient selection (patients were not only treated because of their symptoms but also for other indications such as prevention) and/or effects of the interaction of patients with the treating physician (who also administered the MRS. Thus, we cannot exclude that such a biases have inflated the impression of a "too positive therapy efficiency". But we do not intent to draw conclusions about therapeutic efficiency anyway. It is another way to look at therapeutic efficiency with the assistance of the MRS scale. Although this indicates at least that comparisons with norm values could be helpful for interpreting results of intervention studies, we are not recommending formal statistical testing of differences between patient groups and the reference values of the population: Patients are usually too different from the general population, a difference hard to adjust for. It is just a visual comparison (as in Table 3) to get a crude idea for the interpretation of results.
The MRS scale was also tested whether it predicts the therapeutic assessment of the treating physician. At face value, the individually assessed efficiency of hormone treatment by the treating gynaecologists was comparable with the assessment by the MRS scale, i.e. using a simple dichotomization of the treatment effect in "successful" and "not successful" for both the subjective opinion of the physician and the result of the MRS scale: The sensitivity (correct prediction of a positive assessment by the physician) was 70.8% and specificity (correct prediction of a negative assessment by the physician) 73.5%. In other words, the MRS scale fits well with the subjective assessment of the treatment effect estimated by the physician. However, conclusions have to be drawn very carefully because of a possibly inherent bias that may have inflated the positive result: The subjective assessment of "success" by the treating physician was obviously not as independent from the assessment by the MRS scale as desirable because the physician applied the scale to the patient. Even without being able to recall the result of the MRS six month ago or to calculate and compare the total score of both administrations, the interaction with the patients is likely to have introduced this bias in the direction of a higher compatibility between both assessments.
Although the result may too positive compared with a blinded, really independent assessment, it permits to generate the working hypothesis of a sufficiently good prediction of the therapeutic effect by means of the MRS scale. This needs to be confirmed with better data, i.e. data from a blinded, independent comparison, i.e. with the currently used, self-administered MRS scale.
The aim of this exercise was only to demonstrate that the MRS scale may well predict the clinical opinion about efficiency of hormone therapy, what was not empirically shown before. We recommend the MRS as standardized/validated "objective" scale for use in clinical studies, although some aspects discussed above need confirmation in a new study. Moreover, since the scale is already broadly used at the international level, it is important to sensitise users about some lacking information or weak evidence.
The limitations of this study should be shortly summarized. First of all, this study was performed in a dataset where an earlier version of the MRS scale was used, i.e. the scale was not self-administered but completed in an interview of the physician with the patient. This could have influenced the magnitude of the absolute scores of the total and sub-scales. As far as pre-/post-treatment changes are concerned, the magnitude of the absolute changes may have been more influenced than the relative changes of the HRQoL assessment of the patients as discussed in this paper. Another problem along the same line is that we had to transform the old coding system into the new one. This was done with a simple linear transformation and is not likely to have introduced any bias. Another limitation is that this is the first study we are aware of for this kind of assessment of the validity to measure therapeutic intervention.
It is not likely that the main conclusions of the study are materially biased. However, the results should be cautiously used (e.g., for planning clinical trials or outcomes studies) as long as not confirmed with data obtained with the currently used self-administered MRS scale without potential influence of the physician. It can be assumed that a new study with the currently recommended MRS scale – in the sense of "patient-reported-outcome" – would demonstrate positive results but to a lesser degree.