The German version of the Expanded Prostate Cancer Index Composite (EPIC): translation, validation and minimal important difference estimation

Background No official German translation exists for the 50-item Expanded Prostate Cancer Index Composite (EPIC), and no minimal important difference (MID) has been established yet. The aim of the study was to translate and validate a German version of the EPIC with cultural adaptation to the different German speaking countries and to establish the MID. Methods We translated and culturally adapted the EPIC into German. For validation, we included a consecutive subsample of 92 patients with localized prostate cancer undergoing radical prostatectomy who participated the Prostate Cancer Outcomes Cohort. Baseline and follow-up assessments took place before and six weeks after prostatectomy in 2010 and 2011. We assessed the EPIC, EORTC QLQ-PR25, Feeling Thermometer, SF-36 and a global rating of health state change variable. We calculated the internal consistency, test-retest reliability, construct validity, responsiveness and MID. Results For most EPIC domains and subscales, our a priori defined criteria for reliability were fulfilled (construct reliability: Cronbach’s alpha 0.7–0.9; test-retest reliability: intraclass-correlation coefficient ≥ 0.7). Cross-sectional and longitudinal correlations between EPIC and EORTC QLQ-PR25 domains ranged from 0.14–0.79, and 0.06–0.5 and 0.08–0.72 for Feeling Thermometer and SF-36, respectively. We established MID values of 10, 4, 12, and 6 for the urinary, bowel, sexual and hormonal domain. Conclusion The German version of the EPIC is reliable, responsive and valid to measure HRQL in prostate cancer patients and is now available in German language. With the suggested MID we provide interpretation to what extent changes in HRQL are clinically relevant for patients. Hence, study results are of interest beyond German speaking countries. Electronic supplementary material The online version of this article (10.1186/s12955-018-0859-1) contains supplementary material, which is available to authorized users.


Background
Prostate cancer is one of the most prevalent cancers in men. Due to the growing number of long-term survival rates [1], maintenance of health-related quality of life (HRQL) is crucial for prostate cancer patients and should be taken carefully into account when planning individual treatment strategies [2]. Data on effects and side effects [3] of different treatment modalities on HRQL are of great value to guide the physicians' advice and patients' decision on choice of treatment. Therefore, patient-reported outcome measures are required that appropriately assess relevant aspects of HRQL and that have well-established psychometric properties, i.e. the measures should be reliable, valid and able to detect patient important changes in HRQL over time (minimal important difference, MID). A well-established MID of an instrument is particularly important for clinicians and researchers because it shows whether a specific change score derived from assessments before and after an intervention actually reflects a difference that is relevant for the patient.
Several generic and disease specific HRQL instruments have been introduced and recommended for prostate cancer patients so far [4]. One of the most established and frequently used instrument focussing on diseasespecific aspects of prostate cancer and its therapies is the 50-item Expanded Prostate Cancer Index Composite (EPIC) [5]. The EPIC was evolved from the UCLA-Prostate Cancer Index (UCLA-PCI) [6] by an expert panel. Originally developed in the U.S. in English language, the EPIC has been translated and validated into several other languages such as Korean [7], Japanese [8], Brazilian [9] and Spanish [10]. In addition, two short form versions have been introduced, a 26-item (EPIC-26) [11] and a 16-item (EPIC-CP) [12] version. Although short versions of questionnaires are generally useful in clinical practice, a loss of precision in the assessment is inevitable when using them compared to the extensive versions. Therefore, the original and extensive EPIC 50item version is valuable whenever detailed assessment is required. Compared to the other frequently used instrument, the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Prostate 25 (EORTC QLQ-PR25) tool [13], the EPIC seems to provide a better balanced performance to assess more in depths the various side effects independent of treatment modality finally chosen.
To our knowledge, no MID of the original EPIC 50item version has been established so far, and, except for a translation of the EPIC-26 [14], no validated German version exists. Since over 100 million people worldwide are native German speakers, a German version of this important instrument is of great value. The aims of this study were to translate and validate the German version of the EPIC with cultural adaptation to the different German speaking countries Germany, Austria and Switzerland, and to establish the MID for each domain of the instrument.

Methods
The unabridged version of the Material and methods section is presented in Additional file 2.

The Expanded Prostate Cancer Index Composite (EPIC)
The EPIC consists of 50 items with Likert type response options contributing to the four domains Urinary, Bowel, Sexual and Hormonal. Each domain is divided into the two subscales Functional and Bother assessing symptoms severity and the extent of symptom-related HRQL impairment. The urinary domain is additional divided into the two distinct Incontinence and Irritation/ Obstruction subscales. Domains and subscales are presented in 0-100 scales with higher scores representing better HRQL.

Translation and cultural adaptation
The translation and cultural adaptation process was based on the recommendations of the ISPOR Task Force international expert group [15] and followed a sequential forward and backward translation approach (Fig. 1). Two professional translators translated the original English EPIC version into German. In a consensus meeting, five experts assessed the consistency of the translations, judged their face validity and agreed on a first German version. This first version was pretested in cognitive debriefings in five prostate cancer patients who agreed on a second German version. A professional translator translated this version back into English language which was then presented to the author of the original English version [5]. The second German version then was pretested in the three German speaking countries (Austria, Germany and Switzerland), each of them within thirty patients to assess the need for cultural adaptation. Finally, the experts agreed on a third, final version of the German version of the EPIC (Additional file 2).

Study population and study design
For the validation study, we recruited a subsample of patients who participated the Prostate Cancer Outcomes Cohort (proCOC) [16] who had a diagnosis of localized prostate cancer and underwent robotic radical prostatectomy within the Department of Urology of the University Hospital of Zurich between November 2008 and December 2010. The local Ethical Committee of the Canton of Zurich approved the study.
Baseline assessments took place after the diagnosis and before radical prostatectomy and included assessments of the EPIC and the validation instruments (internal consistency and cross-sectional construct validity). To assess test-retest reliability, a subgroup of participants completed the EPIC a second time one to two weeks later, before initiation of treatment. Followup assessments were conducted six weeks after treatment (longitudinal construct validity, responsiveness and MID).

Validation instruments
The European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Prostate 25 (EORTC QLQ-PR25) [13] is a prostate specific additional tool to the general quality of life instrument EORTC QLQ-C30 [17]. It includes 25 items with a 4-point Likert scale that contribute to the sexual activity and sexual functioning (scores 0-100; higher scores = higher level of functioning) and urinary symptoms, bowel symptoms and hormonal treatmentrelated symptoms scales (scores 0-100, higher score = higher level of problems) [13]. For the EORTC-QLQ-C30, a MID of 5-10 has been established [17,18], no MID has been specifically reported so far for the prostate additional tool.
The Feeling Thermometer assesses generic health status on an analogue scale presented as a thermometer with 100 marked intervals (0 = dead to 100 = perfect health), a MID of 5 has been reported [19][20][21].
The SF-36 version 2.0 is a generic quality of life instrument that consists of 36 items describing 8 domains (scores 0-100; higher scores = better perceived state of health) [22,23]. MID between 7 and 12 have been suggested specifically for prostate cancer survivors [24].
The patients also rated the global change of their health state since baseline (after treatment) on a 5-point Likert scale at follow up, ranging from − 2 (my health state worsened much) to + 2 (my health state improved much).

Statistical analysis
We assessed internal consistency of the EPIC scores by Cronbach's alpha (adequate internal consistency a priori defined: 0.7-0.9) and test-retest reliability by intraclass Fig. 1 Translation process. The progress of for-and backward translation into German language including two rounds of pilot testing as well as cultural adaption Table 1 Mean scores of the EPIC summary domains and subscales and the validation instruments at baseline and at 6 weeks follow-up after robotic assisted radical prostatectomy and changes from baseline to follow-up  [25]). To assess cross-sectional and longitudinal construct validity we used Pearson or Spearman's rank correlation coefficients at baseline and at 6 weeks follow-up between the EPIC domain scores and the validation measures or the change sores, respecitvely. We a priori expected strong correlations (≥0.5) between the EPIC and the corresponding EORTC QLQ-PR25 scores and moderate correlations (0.3-0.5) between the EPIC and the Feeling Thermometer and selected SF-36 domain scores.
To quantify responsiveness, we assessed the standardised response mean (SRM) as the mean change score divided by SD of change score (a priori stronger effect sizes in urinary/sexual compared to bowel/hormonal domains expected). We established the MID by triangulation and used both anchor-based approaches (using EPIC domain change scores against the anchors "worsened" and "remained the same" of the global rating of health state chance variable) and distributionbased approaches (standard error of measurement [SEM], Cohen's effect size, empirical rule effect size) [26]. Analysis were performed using STATA version 13 [27].

Results
Ninety two consecutive participants of the proCOC study with localized prostate cancer and a mean age of 62.3 ± 7.1 years participated in the validation study and completed baseline assessments (before treatment) and 6 week follow-up assessments (after treatment). Mean scores of the EPIC summary domains and subscales and the validation instruments at baseline and at 6 weeks follow-up after robotic assisted radical prostatectomy and changes from baseline to follow-up are presented in Table 1. A subsample of 44 participants of proCOC with a mean age of 62.5 ± 7.4 years completed the EPIC a second time before treatment to assess test-retest reliability, on average with 10.9 ± 13.3 days between the assessments.

Internal consistency and test-retest reliability
Characteristics of the EPIC domain and subscale scores and the results on internal consistency and reproducibility are presented in Table 2. In general, urine and sexual domain scores of the patients decreased to a greater extent from baseline to 6 weeks follow-up than did bowel and hormonal domain scores. For the majority of the domain and subscale scores, Cronbach's alpha values were within our a priori defined boundaries and test retest reliability above the threshold; ICCs of the domain scores were between 0.69-0.87, of the subscales between 0.43-0.92. Tables 3 and 4 show correlation coefficients between the EPIC domain scores and the other validation instruments according to whether they fulfilled our a priori assumptions regarding strength of correlation. Cross-sectional correlations (Table 3) between the EPIC and the EORTC QLQ-PR25 domains ranged from 0.14-0.79. Correlations between the EPIC domains and the Feeling Thermometer and the SF-36 domains were weaker and ranged from 0.06-0.50 and 0.08-0.72, respectively. In most cases, correlations were stronger at 6 weeks follow-up than at baseline. Longitudinal correlations (Table 4) between the change scores of the EPIC and EORTC QLQ-PR25 domains were all > 0.5. Change score correlations between the EPIC domains and the   other validation instruments were weaker and ranged from 0.07-0.50.

Responsiveness to change and MID
The SRMs of the EPIC change scores were − 1.51 for the urinary domain, − 0.62 for the bowel, − 1.91 for the sexual and − 0.26 for the hormonal domain. Table 5 shows the mean changes in the EPIC domains according to the global ratings of health state change. Table 6 summarises the anchor and distribution based estimates. We established the MID by triangulation and suggest a MID of 10 for the urinary domain, 4 for the bowel domain, 12 for the sexual domain and 6 for the hormonal domain. Additional file 1: Tables S1 and S2 show the results for the EPIC subscales; in summary, we established for the urinary subscales MID between 9 and 12, for the bowel subscales 4 and 5, for the sexual subscales 11 and 13, and for the hormonal subscales 6 and 7 (Additional file 1).

Discussion
With this study we provide a culturally adapted and thoroughly validated German version of the widely used EPIC 50-item questionnaire. The German EPIC showed good internal consistency, reproducibility and construct validity and was responsive to detect changes in patients with localized prostate cancer after radical prostatectomy. We established an MID of 10 for the urinary, of 4 for the bowel, of 12 for the sexual and of 6 for the hormonal EPIC domains, respectively.
Compared to the participants of the original American development study [5] and some translation studies [7,9], the patients from our sample achieved slightly higher but similar EPIC scores at baseline. The exception was the sexual domain and subscales, for which our patients scored much higher, indicating a better HRQL in sexual aspects before treatment compared to the other populations.
All of the domain and most of the subscale scores fulfilled the a priori defined thresholds for consistency and test-retest reliability. The few exceptions were the urinary function subscale, which reached an insufficient reproducibility, and, together with the bowel and hormonal function subscales, also a low reliability. Interestingly, the test-retest reliability and internal consistency values of the EPIC domains and subscales in our study were very similar to those presented in the original development and validation study [5].
The usually moderate to strong correlations between the EPIC and the corresponding EORTC QLQ-PR25 domains suggest good convergent construct validity and confirm that the domains of the two questionnaires reflect very similar constructs of HRQL in prostate cancer. The weaker correlations with selected and more generic SF-36 domains and the generic Feeling Thermometer were expected. Interestingly, the EPIC hormonal domain showed mostly the highest correlations with the more generic scales.
As expected, the urinary and sexual domains were much more responsive to the treatment than the bowel and hormonal domains. Radical prostatectomy predominantly affects sexual and urinary aspects of HRQL, especially soon after surgery. In contrast, other therapies such as radiotherapy would have affected rather bowel and hormonal components.
To our knowledge, we propose for the first time MID for the EPIC 50-item version. This is surprising, since there is growing awareness of the fact that outcome measurements need to be able to detect clinically relevant changes when used for evaluative purposes. MIDs have already been suggested for the two EPIC short form versions. The EPIC-26 [11] uses the same scoring system as the EPIC 50-item and retained the domain structure (only the urinary domain dropped and using the two urinary subscales urinary incontinence and urinary irritation/obstruction). Our suggested MIDs were mainly in the range of the established MIDs for the EPIC-26 (6-9 for the urinary incontinence, 5-7 for the urinary irritation/obstruction, 4-6 for bowel, 10-12 for sexual and 4-6 for the hormonal domain) [28]. The scores of the 16-item EPIC-CP [12] domains range from 0 to 12 and, therefore, the MIDs are not comparable. However, also for this 16 item version, the MID was highest estimated for the sexual domain (1.6) and lowest for the vitality/hormonal (1.0) (MID for other  [29]. Strengths of our study are the rigorous adherence to the international ISPOR guidance for the translation and validation of patient reported outcomes. The cultural adaption also took differences in mentalities between the German speaking countries into consideration and resulted in an instrument applicable in all of them. Furthermore, the assessments took place in a prospective cohort study with a priori defined hypothesis regarding results. One limitation of the validation part of our study is that we included patients who underwent radical prostatectomy only and focused on short-term changes (6 weeks after treatment) to assess responsiveness and MID of the EPIC. As already stated and expected, the HRQL aspects of the domains urinary and sexual are more affected by this kind of treatment than the bowel and hormonal domains, which might challenge the generalizability of our results to prostate patient undergoing other treatments such as external radiation or hormonal deprivation therapy, experiencing other side effects. However, it would be interesting to replicate the analyses in prostate cancer patients undergoing other treatments, particularly the assessment of the MID in these populations. Another limitation is that we used the anchors "worsened" and "remained the same" of the global rating of health state chance variable as anchor based approach to establish the MIDs. This implies that we assume the differences between "worsened" and "remained the same" to reflect to be minimally important for patients, which we cannot be sure about. Unfortunately, the anchor EORTC QLQ-PR25 resulted in somewhat implausible values, probably due to the transformation of both instruments to 0-100 scales and the reverse scaling of some counterpart domains. An additional limitation is that for the method we used to test cross-sectional and longitudinal construct validity, correlation coefficients, the sample size of 92 patients is rather small. However, the consequence of a smaller sample size is not different correlation coefficients (which depends more on how the population is selected) but that the estimates are more imprecise, i.e. that the confidence intervals around correlation coefficients are wider compared to confidence intervals of correlation coefficients based on larger sample sizes. Averaged difference in mean change score in EPIC domains between those who rated their health state as "worsened" and "remained the same" c SEM = SD at baseline*square root[1-intraclass correlation coefficient]); Cohen's effect size = 0.5*SD of change score; empirical rule effect size = 0.08*6*SD of change score); 0.5*SD at baseline