The study described a guideline-driven German translation, cross-cultural adaptation and validation process of a disease-specific questionnaire for WAD patients: the Whiplash Disability Questionnaire. In six predefined stages the Australian-based WDQ was forward and backward translated, approved by the original authors, evaluated by WAD patients, and tested for its quality criteria. As hypothesised, the WDQ-G correlated highly significant with the NASS subscale pain and disability and the SF-36 subscale bodily pain showing a good concurrent validity. Furthermore, the WDQ-G presents a high internal consistency. As a further development of the NDI, the WDQ covers specific aspects of impairment for WAD patients: role performance, tiredness, social and leisure (sporting and non-sporting) activity, emotional and cognitive impairments that can be evaluated on an eleven point rating scale .
In clinical trials, treatment effects and calculated effect sizes as well as recommended treatment guidelines are based on subjective and objective outcome measures. Those outcome measures are vital elements in the trial methodology. Therefore, it is essential to translate outcome measures in a standardised way into different languages to remain the original construct assessed and adopt it to the target country specific language, traditions, and customs. Furthermore, it is crucial to evaluate the quality criteria of the translated and adapted measurement [20, 22]. The authors are confident that a rigorous process was applied to reach equivalence between the original WDQ and the resulting German version of the WDQ providing an assessment for use in clinical practice and research, which is supported by the excellent Cronbach’s α of 0.894.
Difference for time needed to fill in the questionnaire between pre-study sample (13.7 minutes) and the validation study sample (6.7 minutes) could be explained by the additional task assigned to the pre-study sample to also evaluate the WDQ-G’s clarity and comprehensiveness, whereas inpatients in the validation study sample only had to fill in the questionnaire.
The present patient validation sample showed a different gender distribution of 1.4:1 (female:male) compared to the patient sample in Pinfold et al.’s study with 4.3:1  but similar gender distribution as indicated in the systematic efficacy review of Drescher et al. ranging between 1:1 to 2:1 . Generally, the consecutive patient sample of the present validation study covers an older patient age range (>65) but can be compared with previous studies evaluating the use of the WDQ [8, 30, 37] or WAD interventions [36, 38]. Demographic variables are also comparable with other German-speaking Swiss WAD inpatients .
Patients in the validation study showed the highest score for item 8 (7.89, sporting activity) and the lowest for item 2 (2.30, personal care). The average score for all 13 questions was 74.4. Those scores are almost identical to the results from Pinfold et al.
. However, for the English version, item 8 was scored lower (6.1) and the mean WDQ score was 55.7. Both differences could be attributed to the shorter time since injury onset in the present study (20 months vs. 48 months on average in Pinfold’s study). The scoring of item 2 and item 8 in the present study also suggest that there are no floor or ceiling effects.
The translation and cross-cultural adaptation process followed the guidelines proposed by Beaton et al.
. Stage 2 and 4 were essential to synthesise all produced forward and backward questionnaire translations. All produced variations to formulate title, questionnaire items, questionnaire and scale descriptions, and their meaning had to be considered. Adaptations in the formulation at the beginning of questions 1, 3, 6, 10, 11, and 12 of the scale descriptions emphasise the need for a standardised translation process. Adaptations were necessary to avoid the implication that patients have to have pain after whiplash injury and if so that it should be on a high level. The authors assume that the reformulation of the questions mentioned above do not have an influence on the construct under investigation since the sense of the questions remained unchanged. That could be demonstrated by the calculated Cronbach’s α (WDQ-G α = 0.89), which is only slightly lower than the original Australian version (WDQ α = 0.96) . Furthermore, adaptations made were approved by all translators at the consensus conference and by the original authors of the Australian WDQ when reviewing all forward and backward translation documents.
To answer question four (driving or using public transport) patients differentiated between being the driver, the co-driver, or a passenger in a public transport vehicle. All three alternatives could be impaired on different severity levels after a whiplash injury. If a patient in the present validation study raised the question, which aspect should be evaluated, they were asked to indicate the impairment level for the most unpleasant situation. It is assumed that the differentiation can occur in all patients filling in the WDQ independent from different languages. Therefore, further research is needed to define a more precise patient instruction or add further questions to evaluate all three alternatives separately.
For a trustworthy questionnaire use in clinical routine or research it is important to determine quality criteria of the instrument including validity, reliability, and responsiveness. The present study focussed on the standardised translation process and data collection to determine concurrent validity and internal consistency. Meanwhile the paper was published.
The presented study aimed to produce a robust German version of the WDQ by following strict guidelines published by Beaton et al.
. However, different recommendations exist on how to cross-culturally translate and adapt self-administered measurements. In the present study the team followed the forward-backward translation approach rather than the two panel approach suggested by McKenna et al.
. The two panel approach prefers expert and lay committee meetings and does not include a backward translation. In a randomised study on the two translation approaches applied to the Rheumatoid Arthritis Quality of Life (RAQoL) for Sweden none of the translated questionnaires was preferred by bilinguals . Reliability and validity characteristics were similar in both RAQoL versions. However, in the present six stages WDQ translation and cross-cultural adaptation process, the backward translation and the consensus conference with multidisciplinary health professionals and language experts ensured a comprehensive and trustworthy German version.
In general, validity tests for self-administered questionnaires are difficult to implement and to compare with a gold standard, in particular as there is no gold standard for WAD. In the case of the present investigation, it was chosen to determine concurrent validity with the subscale pain and disability (NASS) and subscale bodily pain (SF-36) to approach a close conceptional association to established related questionnaires.
In the validation study, data from 70 patients was analysed to determine validity. It could be argued that the sample size was too small for a final validity analysis. However, in other publications on translation and validity studies sample sizes varied distinctly [19, 41–43]. Recent publications provide suggested aids to the decision-making process on sample sizes for reliability and validity studies [44, 45]. Hobart et al. suggest a sample of 20 for reliability studies and a sample of 80 or more for validation studies in neurology . Javali et al. proposed a sample size of 50 to determine reliability for measures with a five-point Likert scale . So far, no consensus has been reached on the ideal sample size. Apart of scientific reasoning, available financial and personnel resources have to be considered too.
Sample size was also the limiting factor to conduct factor analysis. It is recommended to have a case:item ratio of 10:1 requiring at least 130 cases for a WDQ-G factor analysis . In some circumstances a sample size of 100 cases might be sufficient, nevertheless, in the present study only a sample of 70 cases could be recruited . The authors of the original Australian WDQ performed a factor analysis with 101 cases and confirmed the unifactorial structure of the WDQ . For now it must be assumed that the rigorous cross-cultural translation and adaptation process based on international guidelines resulted in a German WDQ with good concurrent validity, internal consistency, and a similar questionnaire structure as the Australian original version.