Assessment of the quality of end-of-life care: translation and validation of the German version of the “Care of the Dying Evaluation” (CODE-GER) - a questionnaire for bereaved relatives

Background International studies indicate deficits in end-of-life care that can lead to distress for patients and their next-of-kin. The aim of the study was to translate and validate the “Care of the Dying Evaluation” (CODE) into German (CODE-GER). Methods Translation according to EORTC (European Organisation for Research and Treatment of Cancer) guidelines was followed by data collection to evaluate psychometric properties of CODE-GER. Participants were next-of-kin of patients who had died an expected death in two hospitals. They were invited to participate at least eight, but not later than 16 weeks after the patient’s death. To calculate construct validity, the Palliative care Outcome Scale (POS) was assessed. Difficulty and perceived strain of answering the questionnaire were assessed by a numeric scale (0–10). Results Out of 1137 next-of-kin eligible, 317 completed the questionnaire (response rate: 27.9%). Data from 237 main sample participants, 38 interraters and 55 next-of-kin who participated for repeated measurement were analysed. Overall internal consistency, α = 0.86, interrater reliability, ICC (1) = 0.79, and retest-reliability, ICC (1, 2) = 0.85, were good. Convergent validity between POS and CODE-GER, r = −.46, was satisfactory. A principal component analysis with varimax rotation showed a 7-factor solution. Difficulty, M = 2.2; SD ± 2.4, and perceived strain, M = 4.1; SD ± 3.0, of completing the questionnaire were rather low. Conclusion The results from the present study confirm CODE-GER as a reliable and valid instrument to assess the quality of care of the dying person. More over our study adds value to the original questionnaire by proposing a deepened analysis of obtained data. The development of seven subscales increases its potential for further surveys and research. Trial registration This study was registered retrospectively on the 25th of January 2018 at the German Clinical Trials Register (DRKS00013916).


Background
According to the founder of modern palliative care (PC) Cicely Saunders, physical, psychological, social and spiritual needs have to be considered when caring for dying patients and their families [1]. Despite the clear need for PC, not all dying patients can be treated on specialized wards due to limited access or space [2,3]. Therefore, it is of great importance to extend the principles of PC to any wards where people die.
It is equally important to assess the current state of quality of care (QOC) on these wards, and to identify unmet needs of patients and their next-of-kin. While the patients themselves are often unable to provide information about the perceived quality of their care, their nextof-kin can evaluate the last days of their loved ones [4]. They are not only providers of support to the patients, but also recipients of PC themselves [5]. Therefore an instrument which assesses the care given to the patient, but also to their next-of-kin, is crucial to represent holistic care at the end of life. To the best of our knowledge the only instrument that assesses a similar construct in German is the "Quality of Death and Dying" (QoDD) which was validated by some of this study's authors [6]. However, QoDD surveys the quality of death and not the quality of care given to the dying patient.
A suitable instrument for this purpose, the "Care of the Dying Evaluation" (CODE™) was developed by selecting key indicators from the rather long "Evaluation Care and Health Outcomesfor the Dying" (ECHO-D) [7]. CODE™ is a self-assessment questionnaire which retrospectively evaluates the QOC in the last 2 days of a patient's life by surveying next-of-kin. Twenty-eight core items cover different aspects of QOC (care received from healthcare team, symptom control, communication with the healthcare team, emotional and spiritual support, circumstances surrounding death). Verbal anchors represent a 5-point (0-4), 4-point (0-3) or 3-point (0-2) Likert scale. The higher the value, the better the QOC [7]. Three key composite scales which are represented by 12 of the 28 core items, survey "Environment", "Care" and "Communication". The items were initially assigned to the scales based on theoretical assumptions. Furthermore, CODE™ captures overall impression concerning treatment with respect and dignity by doctors and nurses as well as support of relatives. Ten items assess demographic or disease-related information [7]. CODE™ has so far been validated for the United Kingdom [7]. Internal consistencies of the key composite scales were good (α = 0.79-0.89). Test-retest-reliability was moderate to good [7]. A recent systematic review on tools measuring quality of death, dying and care completed after death identified CODE™ as an instrument with promising strong psychometric properties, which would benefit from further development and validation [8].

Methods
The aim of this study was to provide a German version of CODE™ (CODE-GER) and to evaluate its psychometric properties.

Translation process, pretesting and questionnaire adaption
Between 01/2013 and 04/2013, CODE™ was translated forward and backward according to EORTC guidelines [9]. To assess content validity, 'think aloud' interviews and verbal probing took place with 15 next-of-kin of deceased patients at 2 PC units (Mainz: n = 7; Erlangen: n = 8). Results from this pilot testing were discussed by an expert panel with expertise in PC. No items were evaluated as inappropriate, confusing or embarrassing. The questionnaire itself was rated as useful. Adaptions to the wording and formal structure were made. The 28 core items on QOC were maintained without modification. One item (recommendability of ward) was added to the overall section (originally three items). The 28 core items on QOC as well as the overall impression questions are shown in Table 1. Three items (type of ward, nationality of caregiver, amount of days on the ward where the patient had died) were added to the demographic section (originally 10 items compared to the original English version). The items of the resulting questionnaire CODE-GER used in this study are shown in Table 1.
After completing the CODE-GER, participants additionally were asked to give the time they needed to fill in the questionnaires and one question on the difficulty and the perceived strain of completing the questionnaire. They used a scale from 0 (very easy/no strain) to 10 (very hard/high strain). Although these questions have not yet been validated formally they have been used in previous studies [10][11][12].

Study population and data collection
The study was conducted at the two German university hospitals of Mainz (MZ) and Erlangen (E) on the following types of ward: intensive care, palliative care, internal medicine and neurology. A minimum number of 200 next-of-kin were planned to be included, as recommended for psychometric testing by Lienert and Raatz [13]. All consecutive patients who had died on these wards between 04/2016 and 03/2017 were included according to the following eligibility criteria: (a) ≥ 18 years old (b) stay ≥3 days on the ward where death occurred, (c) expected death, based on physician's judgment that the patient was soon to die, and the cause of death was not sudden.
To identify eligible patients, databases of all deaths on the predefined wards were electronically screened for criteria (a) and (b). Next, the responsible physicians were contacted personally to check for criterion (c). Next-of-kin data of patients were extracted from the electronic hospital information system. If more than one next-of-kin was registered, all of them were contacted to assure an inter-rater-population. Eligible nextof-kin were informed about the study and invited to take part by post at least eight, but not later than 16 weeks after the death. Next-of-kin were defined as family, friends or legal guardian. Through a postcard, which was sent to next-of kin, next-of-kin were able to inform the study team whether they wished to receive study information. If the corresponding box was ticked, a trained researcher phoned the next-of-kin to provide them with further information and to check for the following exclusion criteria: Under 18 years old; insufficient German language skills; no contact with patient in the last 2 days of life.
Eligible next-of-kin were asked whether they felt emotionally stable enough to participate in the study. After consent was given verbally over the phone, the study documents (detailed study information, informed consent form, CODE-GER, Palliative care Outcome Scale (POS) and a prepaid envelope) were sent to participants (T1). Participants were asked to tick a box on the informed consent form to indicate if they would be willing to repeat the survey (T2). Those who agreed received a second study pack 8 weeks after the first documents were completed. To determine interrater reliability, this first next-of-kin group was asked to provide contact details for additional relatives present during the last 2 days of the patient's life. Additional next-of-kin underwent the same recruitment process as the first next-of-kin group, although the latter were called directly if phone numbers were provided.

Description of questionnaires
All participants were asked to complete CODE-GER. In addition, participants completed the Palliative Care Outcome Scale (POS) for families. As there is no German instrument available regarding equivalent content, the content wise comparable tool POS (available as a validated German version) was chosen to allow for an approximate external criterion, since it assesses the convergent validity of CODE to some extent. POS is a 12-item self-assessment instrument that surveys for symptoms, concerns and psychosocial needs of patient and family in the past 3 days of the patient's life. Answers are scored on a 0-4 Likert scale. Scores of items 1-10 can be summarized into a Total-Score (0-40). Higher scores are associated with higher distress [10].

CODE-GER items and Total-score
Verbal anchors provided in the answering possibilities represent a 5-point (0-4), 4-point (0-3) or 3-point (0-2) Likert scale. In order to establish a unified rating scale from "0" to "4", the following rules were defined: the highest possible answer, indicating high quality, was coded with "4", while "0" was assigned to the lowest. Middle categories were represented by "2" (for more details see Table 1). For further analysis values of single items were summed up according to their respective subscales. Next, these values were added up to form a Total-Score (0-104). A high Total-Score corresponds to high quality end-of-life care.
Items with more than 50% of missing values across all questionnaires were excluded from further analysis [14]. To minimize the effect of imputation, a maximum of 15% of missing items was tolerated and imputed by Expectation Maximization for interval variables per questionnaire [15]. Questionnaires with more than 15% missings were excluded from further analysis. Missing values for dichotomous variables were imputed by the mode of the corresponding item to ensure conformity with the rating scale.

Data analysis Psychometric properties
Since only 12 of the 28 core items of the original CODE TM questionnaire have been examined by factor analytic methods so far, we analyzed the 28 core items on QOC with an explorative factor analysis. In order to explain as much variance as possible in the data we conducted a principal component analysis (PCA) with varimax rotation. To test whether our data were suitable for PCA the Bartlett's test and Kaiser-Meyer-Olkin Measure of Sampling Adequacy were carried out. The number of factors was determined by the Kaiser-Guttmann criterion (eigenvalues > 1), analysis of the scree plot and conceptual fit [16,17].

Inclusion of items
Decisions on the assignment of items to a factor were based on the following criteria: a) higher Cronbach's alpha if item was included (concerning the subscale) b) item to total correlation ≥0.4 [18] c) factor loading ≥0.3 [19] d) items that only load on one factor e) consistency between the item and the content of the factor All criteria should be met. In doubtful cases criterion e) was pivotal.

Validity
Construct validity was assessed with convergent validity by a Pearson's correlation coefficient between the Total-Scores of CODE-GER and POS. As low values in POS are associated with low distress, a negative correlation was expected.

Items of overall impression
To examine if items of overall impression (Table 1) represented the Total-Score, Pearson's (r) or Spearman's rank (rs) correlations were calculated according to the rating scale of the item. Correlations ≤0.3, > 0.3, > 0.7 are regarded as low, moderate and high, respectively [25].

Recruitment and demographic and disease-related information
Data on the recruitment of the study population and their demographic and disease-related information as well as the time needed to answer the CODE-GER were analyzed using descriptive statistics and frequency analysis.
All statistical analyses were performed using IBM SPSS Statistics 23 for Windows [26].

Study population and data collection
A total of 1714 patients died during the recruitment period. According to criteria a-c 750 patients were excluded. Fifty patients and their next-of-kin dropped out before first contact. Eventually, 1137 next-of-kin were invited to participate in the study, comprising 914 next-of-kin initially contacted, and 223 additional nextof-kin. Before phone screening, 704 dropped out. During the screening, 33 next-of-kin declined participation, and 14 were excluded. Eventually, 317 of 386 eligible and approachable next-of-kin returned the study documents (overall response rate: 27.9%). For statistical analysis 42 cases were excluded. As a consequence of deleting cases of first measurement, seven cases of repeated measurement were excluded, leaving 55 out of 62 completed questionnaires for repeated measurement analysis. The main sample consisted of 237, the interrater group of 38 participants. Details of data collection including reasons for drop out and exclusion are shown in the flow chart of study participation (Fig. 1).

Missing values
Missing rates for items "whether health care team met overall religious spiritual needs of patient" and "whether health care team met overall religious spiritual needs of next-of-kin" were about 10% each. Missing rates for the remaining items ranged between 0.4 and 7.6%.

Psychometric properties
Bartlett's test (χ 2 (378) = 2839.3; p < .001) and Kaiser-Meyer-Olkin Measure of Sampling Adequacy (0.8) indicated suitability for PCA. Therefore, a PCA with varimax rotation was conducted. As the scree plot did not show a definite "knee" point, a 7-factor solution ( Table 3) based on eigenvalues and its best conceptual fit was chosen. This solution explained 61.8% of the variance and included all core items on QOC. Items with critical values (shown in bold in Table 3) were analysed according to inclusion of items criteria (see Methods section).

Inclusion of items
Items loading on two factors were allocated according to higher loadings (items "whether patient had retained respiratory tract secretions", "whether discussion of giving fluids through a 'drip' took place", and "sensitivity of health care team after death" or content-related conformity ("time of nurses to listen and discuss the patients' condition"). The item on "whether ward was clean" was omitted from further analysis since there was no obvious content-related conformity to its factor; internal consistency of the Spiritual and emotional support subscale increased to 0.86 if it was deleted. The item on "whether patient died in the right place" was dropped from further analysis because its correlation with the Environment subscale was rather low; alpha increased to 0.81 if it was deleted. Although items on "whether patient had retained respiratory tract secretions" and "whether discussion of giving fluids through a 'drip' took place" had four critical values, they were not excluded because the expert panel rated their content consistent to their factors and the two to be indispensable components of QOC. Internal consistency for factors varied between α = 0.58 and α = 0.86 after the deletion of Items "whether ward was clean" and "whether patient died in the right place". Table 4 shows the scale analysis based on the 26-item solution. Items "whether discussion of giving fluids through a 'drip' took place", "whether discussion giving fluids through a 'drip' would have been helpful" and "emotional support to next-of-kin" showed only marginal critical values and therefore were kept for the final solution.

Correlation between Total-score, subscales and items of overall impressions
Mean Total-Score was 85.69 (SD = 14.17; range = 25-104). Correlations between items of overall impression and Total-Score were weak to moderate (r/rs = 0.36-0.67; p < .01). Concerning the subscales, factor 1 (support and time of doctors and nurses) showed the highest correlation (r = 0.72; p < .01) with items of overall impression, factor 6 (presence of symptoms) the lowest (rs = − 0.02) ( Table 5).
Difficulty of questionnaire, strain caused by assessment and time for filling out Difficulty of the questionnaire was rated rather low (M = 2.19; SD = 2.4; range = 0-10); and mean strain caused by the assessment was 4.05 (SD = 3.05; range = 0-10). Mean duration of the assessment was   Table 6.

Discussion
We performed translation, cultural adaptation and psychometric validation of the CODE questionnaire for the German setting. Participants of this study were nextof-kin, mostly husband/wife/partner or children of the deceased patient, similar to previous CODE™ or ECHO-D studies [7,27]. CODE-GER showed good psychometric properties. Content validity was achieved through the standardized translation process and cognitive interviews with next-of-kin, which led to minimal adaptions. Although overall internal consistency was relatively high, it varied between factors from satisfactory to good. However, as all factors cover meaningful contents, none of the factors were deleted from the final solution, as recommended by Schmitt [28]. Congruency of the Total-Score between two raters and over time was good; the same applies for convergent validity.

Items of overall impression
Although correlations between items of overall impression and Total-Score were moderate, none of them had consistent correlations with all factors. Thus, the sole use of these items is not recommended.

Comparison with previous data
The German and English versions of the CODE questionnaire are identical regarding the content and number of items referring to QOC, but they differ in the coding system of answering options, number of items used for scale formation and the number of subscales. While the English version includes 12 of its 28 core items distributed on 3 key composite scales ("Environment", "Care" and "Communication"), the German version includes 26 of its 28 core items distributed on 7 subscales to form the Total-Score (Table 4). Consequently, the ranges of the Total-Score differ between the English and the German version. Furthermore, the only identical subscale between the two versions is the Environment subscale. Herein, internal consistencies of the subscales are comparable (CODE-GER α = 0.81); CODE™ α = 0.89) [7].

Strengths of the study
High research quality was achieved by following strict translation and research guidelines. To date CODE™ data had been analysed on the basis of a priori assumptions on the relationship between items. A strength of this study was the use of PCA to reveal the underlying structure of the questionnaire without a priori assumptions and to reduce data, ensuring that the most important items were displayed. Range of numeric item scorings: 0 to 4 (see Table 1). Although scaling differ between items, higher values are always associated with higher quality. Values in bold were critical and have been analysed individually in order to decide on item in−/exclusion; HCT = Health Care Team; a = items were deleted after analysis Table 4 Final scale analysis (after omission of Items "whether ward was clean" and " Whether patient died in the right place") The response rate of this study indicates feasibility of the opt-in model, although low, but similar to previous ECHO-D or CODE™ studies using opt-out models [7,27,29,30]. The results for questionnaire difficulty and assessment strain were also comparable to a previous study [6]. The above results point out the feasibility of the questionnaire and support previous study results showing that next-of-kin are capable of evaluating the care of the dying patient.

Limitations
That said, it should be remembered that information on characteristics of non-responders such as socioeconomic status was not available. Socioeconomic status might have both an impact on response as well as on difficulties and perceived strains. The non-responders may feel more burdened after their relative's death than the participating study population. Thus, non-responders may have rated the questionnaire as more emotionally strainful than the study sample. Further limitations need to be considered when interpreting the results of the study. In addition, it is difficult to interpret test-retest reliability as it is not clear, whether the assessment of quality of care is a rather stable or unstable construct in an interval of 8 weeks. It is debatable whether the sample was representative of the hospital population. Most participants were German (89%) and, similar to previous findings, women with Christian affiliation [29]. As approximately 17% of patients in Germany have a migrant background and 5% of the German population are Muslim [31,32], the low participation rate of these groups might indicate a cultural obstacle, either in caring of these patients and their next-of-kin or in our recruitment method.
Further research is necessary to determine whether specific items are more essential for the Total-Score than others. Moreover, cut-off values which indicate poor, moderate or high quality of care would add practical value.
As not all significant decisions for the final CODE-GER version were exclusively based on statistical values the factor solution needs to be examined in further studies. Therefore, we would recommend future studies to apply confirmatory factor analysis to quantify the goodness of fit of our factorial solution. Table 4 Final scale analysis (after omission of Items "whether ward was clean" and " Whether patient died in the right place") (Continued)