German translation, cross-cultural adaptation and validation of the whiplash disability questionnaire

Background The Australian Whiplash Disability Questionnaire (WDQ) was cross-culturally translated, adapted, and tested for validity to be used in German-speaking patients. The self-administered questionnaire evaluates actual pain intensity, problems in personal care, role performance, sleep disturbances, tiredness, social and leisure activities, emotional and concentration impairments with 13 questions rated on an 11-point rating scale from zero to ten. Methods In a first part, the Australian-based WDQ was forward and backward translated. In a consensus conference with all translators and health care professionals, who were experts in the treatment of patients with a whiplash associated disorder (WAD), formulations were refined. Original authors were contacted for clarification and approval of the forward-backward translated version. The German version (WDQ-G) was evaluated for comprehensiveness and clarity in a pre-study patient survey by a random sample of German-speaking patients after WAD and four healthy twelve to thirteen year old teenagers. In a second part, the WDQ-G was evaluated in a patient validation study including patients affected by a WAD. Inpatients had to complete the WDQ-G, the North American Spine Society questionnaire (NASS cervical pain), and the Medical Outcomes Study 36-Item Short Form Health Survey (SF-36) at entry in the rehabilitation centre. Results In the pre-study patient survey (response rate 31%) patients rated clarity for title 9.6 ± 0.9, instruction 9.3 ± 1.4 and questions 9.6 ± 0.7, and comprehensiveness for title 9.6 ± 0.7, instruction 9.3 ± 1.4 and questions 9.8 ± 0.4. Time needed to fill in was 13.7 ± 9.0 minutes. In total, 70 patients (47 females, age = 43.4 ± 12.5 years, time since injury: 1.5 ± 2.6 years) were included in the validation study. WDQ-G total score was 74.0 ± 21.3 points (range between 15 and 117 points). Time needed to fill in was 6.7 ± 3.4 minutes with data from 22 patients. Internal consistency was confirmed with Cronbachs’s α = 0.89. Concurrent validity showed a highly significant correlation with subscale pain and disability (NASS) at r = 0.74 and subscale pain (SF-36) at r = 0.71. Conclusions The officially translated and adapted WDQ-G can be used in German-speaking patients affected by a WAD to evaluate patients’ impairments in different domains. The WDQ-G is a self-administered outcome measure showing a high internal consistency and good concurrent validity.


German abstract
Hintergrund: Der australische Whiplash Disability Questionnaire (WDQ) wurde für deutschsprachige Patienten übersetzt, angepasst und auf seine kriteriumsbegzogene Validität getestet. Der WDQ ist ein selbstauszufüllender Fragebogen mit 13 Fragen bezüglich Schmerz, Einschränkungen in der persönlichen Pflege, Rollenerfüllung, Schlafstörungen, Erschöpfung, soziale und Freizeitaktivitäten, emotionale und Konzentrationsprobleme, die auf einer Skala von null bis zehn bewertet werden. Methoden: In einem ersten Teil wurde der aus Australien stammende WDQ vorwärts und rückwärts übersetzt. In einer Konsensuskonferenz mit allen Übersetzern, Therapeuten und Ärzten mit Erfahrung in der Behandlung von Patienten nach einem Kraniozervikalen Beschleunigungstrauma (KZBT) wurden widersprüchliche Formulierungen angepasst und verfeinert. Danach wurden die Originalautoren für die Prüfung der vorwärts-rückwärts übersetzten Versionen und deren Einsatz am Patient kontaktiert. In einer Vorstudie wurde die deutsche Version des WDQ (WDQ-G) auf Klarheit und Verständlichkeit in einer Zufallsstichprobe von 47 deutschsprachigen Patienten nach einem KZBT und vier zwölf-bis 13-jährigen Jugendliche und überprüft. In einem zweiten Teil (Validierungsstudie) füllten Patienten nach einem KZBT bei ihrem stationären Eintritt den WDQ, den North American Spine Society Fragebogen (NASS zervikaler Schmerz) und den Fragebogen der Medical Outcomes Study 36-Item Short Form Health Survey (SF-36) aus. Resultate: In der Vorstudie (Rücklaufquote 31%) bewerteten die Patienten die Klarheit für Titel 9.6 ± 0.9, Instruktion 9.3 ± 1.4 und Fragen 9.6 ± 0.7, sowie Verständlichkeit für Titel 9.6 ± 0.7, Instruktion 9. Schlüsselwörter: Kraniozervikales beschleunigungstrauma, Fragebogen, Schmerz, Einschränkungen, Aktivitäten des täglichen Lebens Background Neck pain can pose a substantial limitation in daily life and profession for affected individuals and family members. Globally, about 180 of 1000 people experience neck pain at least one day a year [1]. In 1995 the Québec Task Force on Whiplash Associated Disorders (WAD) defined the disorder as "an acceleration-deceleration mechanism of energy transferred to the neck that results in soft tissue injury that may lead to a variety of clinical symptoms." [2]. That mechanism can occur predominantly in motor traffic accidents but also in injuries related to sport and work [3]. Holm and colleagues determined an annual incidence of at least 300 per 100,000 inhabitants for North America and Western Europe [3]. Guzman search between 1980 and 2006 the model included five  components describing risk factors for pain development, its re-occurring character, pain onset and course, pain management, and the impact of pain on life. In particular, the impact of pain on life should be evaluated with a specific questionnaire. This was addressed by the Neck Disability Index (NDI) and the Northwick Park Neck Pain Questionnaire (NPQ) [5,6]. In a crosssectional comparison study the NDI and the NPQ have been investigated in patients with WAD [7]. Participating patients identified seven categories that were either only evaluated by the NDI or the NPQ. None of the investigated questionnaires could cover all WAD-specific categories, e.g. emotional and social aspects. Results of Hoving et al.'s evaluation highlights the need for a disease-specific questionnaire. Therefore, the Whiplash Disability Questionnaire (WDQ) was specifically developed for individuals suffering from a WAD by Pinfold and colleagues and published in 2004 [8]. The development of the original WDQ comprised four steps: a) item generation based on the existing NDI items and semistructured interviews with 83 patients from Hoving et al.'s study [7], b) preliminary clinical testing with 101 patients, and c) expert review of the developed questionnaire. The WDQ is a disease-specific self-administered outcome measure to evaluate pain intensity and limitations due to a WAD in different domains: present pain levels, personal care, role performance, mobility, sleep disturbances, tiredness, social and leisure (sporting and non-sporting) activity, emotional and cognitive impairments.
Two systematic literature reviews on neck pain questionnaires critically appraised their quality and availability in different languages [9,10]. Authors found no publication on a German WAD-specific questionnaire but they found two publications on a German version of the Neck Pain and Disability Scale to evaluate Germanspeaking patients with non-specific neck pain or neck pain related to fusion surgery (C1-C2) [11,12]. The translation process described in both publications was classified as fair to poor [9]. No information on responsiveness of the German versions could be obtained. So far, no WAD-specific questionnaire exists that could be used in German-speaking individuals offering a trustworthy translation procedure and quality criteria.
Currently, the officially translated North American Spine Society questionnaire (NASS) and the Medical Outcomes Study 36-Item Short Form Health Survey (SF-36) have been used to evaluate treatment effects of patients with neck pain [13][14][15][16][17][18][19][20][21]. Consequently, the aims of the present study were formulated as follows: 1) to establish a German version of the WDQ following recommended guidelines, 2) to test the concurrent validity with the subscale pain and disability of the German NASS for cervical spine and with the subscale bodily pain of the German SF-36, and 3) to examine internal consistency of the German WDQ.
It is hypothesised that a German WDQ will highly correlate with the subscale pain and disability of the German version of the NASS for cervical spine and with the subscale bodily pain of the German version of the SF-36.

First part: translation and adaptation process
Translation and trans-cultural adaptation guidelines for self-administered outcome measures of Beaton et al. (2000) [22] was used as basis for the procedure applied in this study. The guidelines include six stages:

Stage (1) Translation into the target language
Three German speaking translators produced independent forward translations of the WDQ. One was done by an officially recognised translator with a history of low back pain, the second one by an English teacher, who experienced a WAD, the third one by a physiotherapist, who is specialised in neurological rehabilitation and has worked in an English speaking country for several years.

Stage (2) Synthesis of the forward translations
Forward translations were synthesised into one German version by the project leader.

Stage (3) Backward tanslations
The synthesised forward translated version was then backward translated into English by three independent translators: a bilingual physician, a bilingual financial analyst, and an English native-speaking housewife living in the German part of Switzerland for more than ten years. The backward translations were again synthesised by the project leader.

Stage (4) Consensus conference
In a two-hour long consensus conference all forward and backward translators, two occupational therapists, an additional physiotherapist, an additional physician, and the project leader reviewed the synthesised forward translated German and the backward translated English version. All healthcare professionals were experts with experiences in the treatment of patients with a WAD. A consensus version was produced, representing the preliminary German version of the WDQ, termed WDQ-G. The conference lasted for about two hours.

Stage (5) Pre-study patient survey
A randomly selected sample of former inpatients (60 out of 1019 patients between 1999 and 2005 of a Swiss rehab centre) diagnosed with WAD received the preliminary WDQ-G version by postal mail. Patients were asked to fill in the questionnaire and rate clarity and comprehensiveness of the title, questionnaire instruction, and questions on two eleven-point visual analogue scales (VAS), ranging from zero to ten (where ten indicated the highest level). Furthermore, all patients were asked to report the time needed to fill in the preliminary WDQ-G. The preliminary WDQ-G was moreover rated by four healthy twelve to thirteen year old teenagers for clarity and comprehensiveness.

Stage (6) Approval of original authors
The preliminary WDQ-G and all forward and backward translations were sent to the original authors, asking for approval to use the WDQ-G in a patient validation study.

Second part: patient validation study Study design
After receiving permission from the original authors to use the pre-final WDQ-G, a patient validation study was carried out. Inpatients were asked to fill in the questionnaires at entry. The study was approved by the responsible ethics committee in Aarau (reference number: 2005/039) and carried out in accordance with the Declaration of Helsinki.

Outcome measures
Based on the synthesis report by Frinking et al., data on influential factors (e.g. time since injury, insurance status, employability, number of comorbidities, and related treatments and impairments, medication, and demographic data (e.g. age, gender) were collected from each patient [23].
The Whiplash Disability Questionnaire (WDQ) The WDQ has been developed based on items of the existing NDI and semi-structured interviews conducted with 83 patients by Hoving and colleagues [7]. Patients have emphasised not to focus on present pain level but also to evaluate further domains that might be affected by a WAD, e.g. personal care, role performance, mobility, sleep disturbances, tiredness, social and leisure (sporting and non-sporting) activity, emotional and cognitive impairments [8]. Each of the 13 questions is rated on an 11-point scale ranging from zero to ten. The total score can vary between zero and 130 points. A high total score indicates a high level of perceived impairment. It takes about five to ten minutes to fill in the WDQ and does not require specific training [30]. In the present study, patients were asked to fill in the preliminary WDQ-G.
The North American Spine Societies Questionnaire (NASS) cervical spine The NASS consists of two subscales: 1) pain and disability (11 questions) and 2) neurogenic symptoms (8 questions). Subscale pain and disability addresses perceived impairment in everyday life (e.g. during dressing, walking, sleeping), at work (e.g. during lifting, sitting, writing), or during leisure activities (e.g. while travelling) [29]. Subscale neurogenic symptoms addresses feelings of weakness, numbness, or pins and needles in the upper limb. All items refer to perception over the last seven days and can be judged on a scale from one to six (e.g. level one: "I can perform without pain." to level six: "Due to my pain level I cannot perform at all."). A high score indicates a high degree of impairment [31]. A change of one point is considered to be clinically relevant [13]. The NASS has been officially translated into German showing a very good reliability for both subscales (0.90 and 0.89) [13,14] and have been used in WAD patients [15,16]. The total score of subscale pain and disability (11 questions) was used to test the concurrent validity with the WDQ-G.
The Medical Outcomes Study (MOS) 36-item short form health survey (SF-36) The SF-36 serves to determine perceived general health and quality of life [32]. Worldwide it is the most extensively used multidimensional questionnaire evaluating general health state containing 36 items clustered in two components: 1) physical health and 2) mental health with four multiitem scales each. Physical health contains physical function (10 items), role physical (4 items), bodily pain (2 items), and general health (5 items). Mental health includes mental health (5 items), role emotional (3 items), social function (2 items), vitality (4 items) and change in health (1 item). Item scores for each dimension are coded, summed and transformed to a scale from 0 (worst possible health state measured by the questionnaire) to 100 (best possible health sate). The higher value indicates a better evaluation of health. During the International Quality of Life Assessment (IQOLA) Project the SF-36 has been translated according to international guidelines into more than 40 different languages [21]. It has been used in more than 30 different disease conditions including patients with migraine, with pain in the upper or lower back, WAD, osteoarthritis or joint replacements [17][18][19][20][21][33][34][35]. Subscale bodily pain was used to further test the WDQ-G for concurrent validity. Both items ask about the extent of pain and its interference with the individual's work capability.
Additionally, participants were asked to rate their actual subjective pain intensity on a Visual Analogue Scale (VAS). Pain is indicated on a horizontal 10-cm straight line anchored by two extremes of pain: "no pain" and "pain as bad as it could be" [31].
Anonymised and completed SF-36 and NASS questionnaires were scanned to upload by secure data transfer to an independent company (RehabNET AG, Zurich, Switzerland) for data assembly and subsequently returned for in-house analysis. Questionnaires for demographic and descriptive statistics as well as VAS and WDQ data were recorded manually within the clinic using Microsoft Excel 2003. All data were eventually assembled for statistical examination with the Statistical Package for Social Sciences (SPSS).

Participants
Patients referred to inpatient rehabilitation were asked to participate when they fulfilled the following selection criteria: German speaking females and males with an acceleration-deceleration event of the head with or without a mild traumatic brain injury (MTBI), being older than 18 years, understand the aim and procedure of the study, and given written informed consent. Patients were excluded if they had additional neurological or psychiatric diseases, if they needed supporting devices for walking, e.g. walking sticks, or if they had additional systemic diseases, e.g. Fibromyalgia.

Statistical analyses
Patient and questionnaire descriptive data were calculated representing frequencies, means and standard deviations or confidence intervals. Concurrent validity was estimated by computing the Pearson Product-Moment Correlation Coefficient (r) between WDQ-G total score and subscale pain and disability of the NASS questionnaire, and between WDQ-G total score and subscale bodily pain of the SF-36. Internal consistency was computed using data from the first measurement event at study entry. Additionally, the inter-item correlation matrix was observed to detect very high correlation indicating item redundancy. All analyses were performed with SPSS version 16, 2007 (SPSS, Inc., Chicago Ill) with p ≤ 0.05.

First part: translation and adaptation process
During the translation stages 1 (forward) and 3 (backward) three different German and English versions were produced. In particular, the wording of the first parts of the questions on "How much do your whiplash symptoms interfere. . ." or "How much pain/ sadness/anger/ do you. . ." was not congruently translated. The synthesis (stage 2) was necessary to agree on a sole German version providing the basis for the backward translations conducted by three independent translators. During the consensus conference (stage 4) the wordings at the beginning of questions 1, 3, 6, 10, 11, and 12 were again modified to avoid the implication that patients have to have pain after whiplash injury, and if so that it should be on a high level, e.g. question 3: the original WDQ asks: "How much do your whiplash symptoms interfere. . .". For the German WDQ this had to be adapted to: "To what extent do your whiplash symptoms impair. . .". Furthermore, the scale descriptions on the lower and higher ends were shortened to minimise ambiguity.
In agreement with the authors of the original Australian WDQ it was decided to modify the description on how to fill in the questionnaire for two reasons: 1) to emphasise the scale (zero to ten) and 2) to minimise the risk that patients miss out items.
Original Australian version: "Please circle a number in each section to indicate how you have been affected by the whiplash injury and symptoms. If one or more questions are not relevant to you, please leave that section blank." Agreed German version: "For each question please circle on a scale from 0 to 10 the number corresponding to the extent to which you are affected by your whiplash symptoms. If one or more questions are not relevant, please cross them out".

Pre-study patient survey
Only 47 of 60 randomly selected patients could be contacted by postal mail for the pre-study patient survey. The response rate was 31% representing 16 patients (age 46.8 ± 10.5 years, time since injury 6.4 ± 2.6 years, 13 females), who filled in the preliminary WDQ-G. Clarity of title 9.6 (± 0.9), instructions 9.3 (± 1.4), and questions 9.6 (± 0.7), as well as comprehensiveness of title 9.6 (± 0.7), instructions 9.3 (± 1.4), and questions 9.8 (± 0.4) was rated by 15 patients. Time needed to fill in the WDQ-G was 13.7 (± 9.0) minutes ranging from 1. Eventually, a consecutive sample of 70 WAD patients (47 females, mean age = 43.4 ± 12.5 years, ranging from 21 to 75 years) could be recruited. Average time since injury was 1.5 ± 2.6 years on average (median 31 weeks, range 3.0 weeks to 17.8 years). Table 1 provides an overview on all questionnaire mean values at entry. Figure 1 presents the number of responses for each WDQ-G question of the pre-study survey sample and the validation study sample. The WDQ-G mean total score of the pre-study survey sample was 69.4 (± 24.0) for 16 patients and 74.0 (± 21.3) for the inpatient validation study sample for 67 patients, respectively. Time needed to fill in the WDQ-G was 13.7 (± 9.0) minutes for the pre-study survey sample with data from 14 patients and 6.7 (± 3.4) minutes for the validation study sample with data from 22 patients only.

WDQ-G responses
Mean values for each of the 13 WDQ-G questions are presented in Table 2.

Concurrent validity
In relation to the second study aim, concurrent validity of the WDQ-G was determined with the subscale pain and disability of the NASS questionnaire (r = 0.74) and with the bodily pain subscale of the SF-36 (r = 0.71). Both correlations were determined highly significant with p < 0.01. Table 3 provides an overview on correlations of the WDQ-G with all questionnaire subscales (NASS and SF-36).

Discussion
The study described a guideline-driven German translation, cross-cultural adaptation and validation process of a disease-specific questionnaire for WAD patients: the   Whiplash Disability Questionnaire. In six predefined stages the Australian-based WDQ was forward and backward translated, approved by the original authors, evaluated by WAD patients, and tested for its quality criteria. As hypothesised, the WDQ-G correlated highly significant with the NASS subscale pain and disability and the SF-36 subscale bodily pain showing a good concurrent validity. Furthermore, the WDQ-G presents a high internal consistency. As a further development of the NDI, the WDQ covers specific aspects of impairment for WAD patients: role performance, tiredness, social and leisure (sporting and non-sporting) activity, emotional and cognitive impairments that can be evaluated on an eleven point rating scale [8].
In clinical trials, treatment effects and calculated effect sizes as well as recommended treatment guidelines are based on subjective and objective outcome measures.
Those outcome measures are vital elements in the trial methodology. Therefore, it is essential to translate outcome measures in a standardised way into different languages to remain the original construct assessed and adopt it to the target country specific language, traditions, and customs. Furthermore, it is crucial to evaluate the quality criteria of the translated and adapted measurement [20,22]. The authors are confident that a rigorous process was applied to reach equivalence between the original WDQ and the resulting German version of the WDQ providing an assessment for use in clinical practice and research, which is supported by the excellent Cronbach's α of 0.894.
Difference for time needed to fill in the questionnaire between pre-study sample (13.7 minutes) and the validation study sample (6.7 minutes) could be explained by the additional task assigned to the pre-study sample to also evaluate the WDQ-G's clarity and comprehensiveness, whereas inpatients in the validation study sample only had to fill in the questionnaire.
The present patient validation sample showed a different gender distribution of 1.4:1 (female:male) compared to the patient sample in Pinfold et al.'s study with 4.3:1 [8] but similar gender distribution as indicated in the systematic efficacy review of Drescher et al. ranging between 1:1 to 2:1 [36]. Generally, the consecutive patient sample of the present validation study covers an older patient age range (>65) but can be compared with previous studies evaluating the use of the WDQ [8,30,37] or WAD interventions [36,38]. Demographic variables are also comparable with other German-speaking Swiss WAD inpatients [15].
Patients in the validation study showed the highest score for item 8 (7.89, sporting activity) and the lowest for item 2 (2.30, personal care). The average score for all 13 questions was 74.4. Those scores are almost identical to the Table 3   results from Pinfold et al. [8]. However, for the English version, item 8 was scored lower (6.1) and the mean WDQ score was 55.7. Both differences could be attributed to the shorter time since injury onset in the present study (20 months vs. 48 months on average in Pinfold's study). The scoring of item 2 and item 8 in the present study also suggest that there are no floor or ceiling effects. The translation and cross-cultural adaptation process followed the guidelines proposed by Beaton et al. [22]. Stage 2 and 4 were essential to synthesise all produced forward and backward questionnaire translations. All produced variations to formulate title, questionnaire items, questionnaire and scale descriptions, and their meaning had to be considered. Adaptations in the formulation at the beginning of questions 1, 3, 6, 10, 11, and 12 of the scale descriptions emphasise the need for a standardised translation process. Adaptations were necessary to avoid the implication that patients have to have pain after whiplash injury and if so that it should be on a high level. The authors assume that the reformulation of the questions mentioned above do not have an influence on the construct under investigation since the sense of the questions remained unchanged. That could be demonstrated by the calculated Cronbach's α (WDQ-G α = 0.89), which is only slightly lower than the original Australian version (WDQ α = 0.96) [8]. Furthermore, adaptations made were approved by all translators at the consensus conference and by the original authors of the Australian WDQ when reviewing all forward and backward translation documents.
To answer question four (driving or using public transport) patients differentiated between being the driver, the co-driver, or a passenger in a public transport vehicle. All three alternatives could be impaired on different severity levels after a whiplash injury. If a patient in the present validation study raised the question, which aspect should be evaluated, they were asked to indicate the impairment level for the most unpleasant situation. It is assumed that the differentiation can occur in all patients filling in the WDQ independent from different languages. Therefore, further research is needed to define a more precise patient instruction or add further questions to evaluate all three alternatives separately.
For a trustworthy questionnaire use in clinical routine or research it is important to determine quality criteria of the instrument including validity, reliability, and responsiveness. The present study focussed on the standardised translation process and data collection to determine concurrent validity and internal consistency. Meanwhile the paper was published.

Study limitations
The presented study aimed to produce a robust German version of the WDQ by following strict guidelines published by Beaton et al. [22]. However, different recommendations exist on how to cross-culturally translate and adapt self-administered measurements. In the present study the team followed the forward-backward translation approach rather than the two panel approach suggested by McKenna et al. [39]. The two panel approach prefers expert and lay committee meetings and does not include a backward translation. In a randomised study on the two translation approaches applied to the Rheumatoid Arthritis Quality of Life (RAQoL) for Sweden none of the translated questionnaires was preferred by bilinguals [40]. Reliability and validity characteristics were similar in both RAQoL versions. However, in the present six stages WDQ translation and cross-cultural adaptation process, the backward translation and the consensus conference with multidisciplinary health professionals and language experts ensured a comprehensive and trustworthy German version.
In general, validity tests for self-administered questionnaires are difficult to implement and to compare with a gold standard, in particular as there is no gold standard for WAD. In the case of the present investigation, it was chosen to determine concurrent validity with the subscale pain and disability (NASS) and subscale bodily pain (SF-36) to approach a close conceptional association to established related questionnaires.
In the validation study, data from 70 patients was analysed to determine validity. It could be argued that the sample size was too small for a final validity analysis. However, in other publications on translation and validity studies sample sizes varied distinctly [19,[41][42][43]. Recent publications provide suggested aids to the decisionmaking process on sample sizes for reliability and validity studies [44,45]. Hobart et al. suggest a sample of 20 for reliability studies and a sample of 80 or more for validation studies in neurology [44]. Javali et al. proposed a sample size of 50 to determine reliability for measures with a fivepoint Likert scale [45]. So far, no consensus has been reached on the ideal sample size. Apart of scientific reasoning, available financial and personnel resources have to be considered too.
Sample size was also the limiting factor to conduct factor analysis. It is recommended to have a case:item ratio of 10:1 requiring at least 130 cases for a WDQ-G factor analysis [46]. In some circumstances a sample size of 100 cases might be sufficient, nevertheless, in the present study only a sample of 70 cases could be recruited [46]. The authors of the original Australian WDQ performed a factor analysis with 101 cases and confirmed the unifactorial structure of the WDQ [8]. For now it must be assumed that the rigorous cross-cultural translation and adaptation process based on international guidelines resulted in a German WDQ with good concurrent validity, internal consistency, and a similar questionnaire structure as the Australian original version.