Tinnitus assessment by means of standardized self-report questionnaires: Psychometric properties of the Tinnitus Questionnaire (TQ), the Tinnitus Handicap Inventory (THI), and their short versions in an international and multi-lingual sample

Background Tinnitus research in an international context requires standardized and validated questionnaires in different languages. The aim of the present set of analyses was the reassessment of basic psychometric properties according to classical test theory of self-report instruments that are being used within the multicentre Tinnitus Research Initiative (TRI) database project. Methods 1318 patients of the TRI Database were eligible for the analyses. The basic psychometric properties reliability, validity, and sensitivity of Tinnitus Handicap Inventory (THI), Tinnitus Questionnaire (TQ) and Tinnitus Beeinträchtigungs Fragebogen (i.e., Tinnitus Impairment Questionnaire, TBF-12) were assessed by the use of Cronbach’s alpha, corrected item-total correlations, correlation coefficients and standardized response means. Results Throughout the languages, all questionnaires showed high internal consistencies (Cronbach’s alpha > 0.79) and solid item-total correlations, as well as high correlations among themselves (around 0.8) and in combination with the self-reported tinnitus severity. However, some paradoxical correlations between individual items of the TBF-12, constructed as a shortform of the THI, and the corresponding THI-items were seen. Standardized Response Means (SRM) were low if tinnitus did not change, and between 0.3 and 1.09 for improved or worsened tinnitus complaints, indicating the sensitivity of the measures. Conclusions All investigated instruments have high internal consistency, high convergence and discriminant validity and good change sensitivity in an unselected large multinational clinical sample and thus appear appropriate to evaluate the effects of tinnitus treatments in a cross-cultural context.


Background
Tinnitus is an auditory sensation in the absence of any external acoustic stimuli. Tinnitus represents a frequent disorder with prevalence rates ranging between 2.4% and 20.1% [1], Chapter 5. This rather large range can be explained with the highly variable definitions used in clinical trials as well as with the different duration and nature of this condition [2].
Many people experience a "ringing in their ears" from time to time but, in the majority of cases, this phantom perception disappears spontaneously after a short time. Tinnitus symptoms are referred to as a chronic condition when lasting more than six months. Some people are able to ignore this phantom sound and do not feel severely impaired, while others suffer considerably from these symptoms. Tinnitus is also often related to depression, anxiety, and insomnia [2,3], causing severely impaired quality of life.
Several self-report instruments have been developed and validated for the assessment of tinnitus severity [4].
To promote the research and development of more efficient treatment strategies, the Tinnitus Research Initiative has set up an international database that allows the comparison of longitudinal data from patients undergoing specific, well-defined treatment interventions, either in clinical trials or in clinical routine [5]. The idea is to evaluate the effects of different treatment options (e.g. pharmacological, psychotherapeutical, auditory stimulation, brain stimulation) on an aggregate level and, eventually, to develop a decision-making tool that allows the specification of promising treatment options for patients with a given clinical condition. So far, data from eight study centers from five different countries have contributed data to this project.
The data base, which is maintained at the Tinnitus Center in Regensburg, is open for clinical studies on tinnitus that contain a standardized data set. The data set includes three patient self-report forms for assessing tinnitus symptoms, i.e. the Tinnitus Handicap Inventory (THI) [6], the Tinnitus Impairment Questionnaire (TBF-12) [7], and the Tinnitus Questionnaire (TQ) as well as its short forms Mini-TQ and TQ-12 [8].
These instruments represent validated measures; they are available in different languages and used in clinical studies around the world. Psychometric properties of the assessments instruments have been documented in various reports [6][7][8][9][10][11][12][13][14][15]. However, data about change sensitivity are only available for the THI [16], the TQ [17] and the TBF-12 [15]. The different questionnaires have only been cross-validated to a very limited extent [1]. Thus, for example, even if the TBF-12 is considered a short version of the THI [15], cross-validation data between TBF-12 and THI are not available. Moreover, replication of psychometric criteria in follow-up studies is considered important since the validity and the usefulness of self-report instruments are empirical issues that depend on a growing body of evidence [18].
Furthermore, in case of multicentre projects and questionnaires in different languages, it has to be established that such instruments work properly in every language (FDA guidance 2009). Only then analyses of multicentre projects are reliable allowing unbiased conclusions with regard to the value of therapeutic options of tinnitus symptoms.
The aim of the present set of analyses is therefore the reassessment of basic psychometric properties according to classical test theory [19,20] of self-report instruments used within the multicentre TRI database project.

Database
The data analysis was based on data of the The data of the TRI Database stem from studies which are based on a range of various designs (randomized controlled, longitudinal, one-armed observational, crosssectional baseline). All studies comply with a pre-specified standardized documentation set [21]. Collection of data for the TRI database was approved by the local ethics committee of the University of Regensburg, Germany.
The data set of 1 May 2011 contained a total of n=1636 patients, of which n=1318 were eligible for our analyses. The 318 excluded patients had neither baseline nor final visit values of any of the three analyzed questionnaires.

Assessments
The Case Report Form (CRF) of the TRI database contains different types of questionnaires (medical history, audiological examinations, tinnitus related questionnaires, depression, and quality of life) in different languages (German, Dutch, Portuguese, and Spanish); all questionnaires were collected in a standardized manner [21].
The most common of the above questionnaires is the THI [6], which has already been translated into and validated in different languages, such as Chinese [22], Portuguese [9], Danish [13], German [12], French [23], and Italian [24,25]. Its primary use is the stratification of patients with tinnitus according to the impact of tinnitus symptoms in daily life [11]. Moreover, the THI has been often used as a primary outcome measurement for testing the efficacy of therapeutic interventions, even if the questionnaire has not been primarily developed for this purpose.
The TBF-12, developed by Greimel et al. [7], is a short version of the THI that includes 12 instead of 25 questions. However, a few questions are formulated in a slightly different manner, and the responses are "never ─ sometimes ─ often". In contrast, responses of the THI are "yes ─ sometimes ─ no" in an opposite order. To further complicate matters, the TBF-12 has recently been renamed as THI-12 [15].
The TQ is a self-report measure of tinnitus-related distress as perceived by patients [8]. This questionnaire has been translated into and validated in different languages, into German [26], Dutch, French [10], and Chinese [27]. In the current study, the sum score was calculated according to the validation of the TQ in the German language, which was based on 40 out of all 52 items [28]; items nos. 5 and 20 were counted double [26]. The TQ, the most widely used questionnaire in German-speaking areas, has also been applied as a primary outcome measurement in various clinical trials [5]. Because the 52 questions of the TQ are a rather high number to be filled in by patients during every visit, two shorter versions, the TQ 12 and the Mini-TQ, have been developed. The TQ 12, developed by Hiller and Goebel [29] according to an optimal combination of high item-total correlations, reliability, and sensitivity in assessing changes, consists of 12 items that correspond to items nos. 5,9,17,24,28,34,35,36,39,43,47, and 48 of the TQ. The Mini-TQ consists of 10 items, corresponding to items nos. 4,11,15,17,24,34,35,39,47, and 48. The two short versions are not an extra part of the TRI Case Report Form but can be easily extracted and calculated.
Both the Tinnitus Severity Scale and the Patient Global Impression-Change (PGI-C) consist of only one question to measure the subjective perception of patients about their tinnitus severity at the time and the change of tinnitus complaints over time. The Tinnitus Severity Scale asks patients how much of a problem tinnitus symptoms are at the time and provides the following response options: not a problem, a small problem, a moderate problem, a big problem, or a very big problem. A general review about self-rating scales can be found at Wewers et al. [30]. The PGI-C asks patients to rate the total abatement of their tinnitus complaints compared with the time before treatment and provides the following response options: very much better, much better, minimally better, no change, minimally worse, much worse, and very much worse.
For each questionnaire, all subscales, number of items, response scales, and score ranges are summarized in Table 1. Since patients from some of the centers have not filled in every questionnaire and some patients have had only baseline data, the number of complete questionnaires for each language and each visit are shown in Table 2. The median time interval between baseline and final visit assessments is 3.0 months (IQR 2.4 to 4.5 months).

Statistical analysis
Patient characteristics are summarized as median values and interquartile ranges (first to third quartiles) for continuous variables as well as frequency counts and percentages for categorical data. Statistical analyses were done with IBM SPSS Statistics 19.0.

Reliability
To assess internal consistency, Cronbach's alpha and the corrected item-total correlations were calculated with baseline and final visit data.

Validity
Convergent validity is the degree to which all tinnitusrelated questionnaires measure the same underlying construct. To assess convergent validity, Pearson's correlation was computed to measure the association between the total scores of all tinnitus-related questionnaires. To analyze relations between the TBF12 and the corresponding questions of the THI, we used cross tabulations as well as Spearman's rank correlation coefficient for each question. Known group differences (discriminant validity) involve the assessment of a tinnitus questionnaires' ability to distinguish between different levels of tinnitus severity. In this analysis, scores of each questionnaire were compared to tinnitus-related severity (not a problem, a small problem, a moderate problem, a big problem, a very big problem) rated by patients themselves. To test if higher scores of each questionnaire are linearly related to a higher level of patient-self-reported severity, we conducted the Spearman's rank correlation coefficient.

Sensitivity
To assess change sensitivity, we used the PGI-C to identify whether patients changed over time. Because of the small sample sizes in some PGI-C categories, the data were pooled into the categories 'improved' (PGI-C values 1-3), 'unchanged' (PGI-C value 4), and 'worsened' (PGI-C values 5-7) to yield a sufficient number of cases in each category. To assess the magnitude of the difference in scores between patients who improved, unchanged, or worsened, we calculated standardized response means (SRMs) by dividing the mean score changes by the standard deviation (SD) of the change [31]. We compared the SRMs against Cohen's rough rule of thumb for interpreting the magnitude of the mean differences of each questionnaire, which suggests that a change of 0.20 represents a small change, 0.50 a moderate change, and >0.80 a large change [31].

Patient characteristics
Out of n=1318 patients, 1149 were from Germany, 38 from Belgium, 98 from Brazil, and 33 from Argentina. Patients were aged between 10 and 89 years (median 53.2 years, interquartile range [IQR] 44.7 to 62.3 years). Tinnitus duration was between 1 month and 52 years (median 5.2 years, IQR 1.6 to 12.2 years). Baseline characteristics and other tinnitus-related information are presented in Table 3. To assess if patients' heterogeneity in age and tinnitus duration has influence on tinnitus severity and thus is potentially confounding the results, we analyzed the correlations between both variables and all questionnaires. The correlation coefficients according to Spearman's rho are very low (ranging between −0.03 and 0.16), hence in our sample neither age nor tinnitus duration did influence tinnitus severity.

Reliability
We calculated the internal consistencies of all questionnaires according to the different languages as well as the item-total correlation coefficients. Cronbach's alpha varied between 0.79 and 0.96 indicating a high internal consistency for all questionnaires in all languages ( Table 4).
The corrected item-total correlations for the THI items varied widely from −0.19 to 0.85. Question no. 24 seemed to have a low (<0.3) overall correlation; however, some questions only had a low correlation in specific languages, for instance, questions nos. 8, 11, and 19 in Dutch, 2 and 8 in Portuguese, and 9, 13, and 17 in Spanish.
For the TBF-12, the corrected item-total correlations mainly ranged between 0.4 and 0.6, and only questions nos. 5, 7, and 8 showed correlations of <0.3 for the Spanish version, which might be due to the small sample size.
The corrected item-total correlations for the Tinnitus Questionnaire for baseline and last visit had a narrow range from 0.344 to 0.695, and none of the questions showed a correlation below 0.3.
Summaries for all corrected item-total correlations coefficients for all questionnaires can be found in the Additional file 1 available in the online version of the journal.

Validity
Correlations between tinnitus-related questionnaires were very high (around 0.8) except for the Spanish version; THI vs. TBF12 at last visit was r=0.43, which was not significant because of the small sample size. All correlations of all questionnaires for each language are summarized in Table 5.
We evaluated the construct validity of the questionnaires by examining their respective correlations and mean values with the subjective ratings on tinnitus severity. Tables 6 and 7 summarize the THI, TBF12, TQ, Mini-TQ, and TQ-12 scores according to the categories of patient-perceived severity (not a problem, a small problem, a moderate problem, a big problem, a very big problem) at baseline and final visit. The THI correlated highly with the reported degree of severity in an interval between 0.57 and 0.70 for each language. The two outliers with r=0.92 and r=0.21 can be explained by the small number of patients in some categories of the tinnitus severity scale. The TBF12 is also correlated with tinnitus severity (0.45 ≤ r ≤ 0.62) and also showed two outliers because of small sample sizes. The TQ as well as the sub scores Mini-TQ and TQ-12 correlated highly between 0.65 and 0.68.

Cross-validation of THI and TBF-12
Because the TBF-12 consists of 12 items of the THI with a slightly different wording of some of the questions as well as the answer options (TBF12: never ─ sometimes ─ often vs. THI: yes ─ sometimes ─ no), we compared the given answers of both questionnaires. The calculated correlations over all languages of r=0.49 to r=0.85 were lower  than expected given that the TBF-12 had been developed as a short version of the THI ( Table 8). The cross tabulations showed that many patients (up to 50% for some questions) who had answered a question with "never" or "often" in the TBF12 used "sometimes" for the same question in the THI and vice versa. A minority of patients (9 to 40) even gave opposing answers for the corresponding questions of the two questionnaires. The cross table for THI question no. 23 ("Do you feel that you can no longer cope with your tinnitus?") and the corresponding TBF-12 question no. 12 ("Do you feel that you can't cope with your tinnitus?") is shown as an example in Table 9. An analysis of those patients who gave opposing answers did not show any systematic error patterns, such as giving the same answers (e.g. only "yes") throughout the questionnaire or confusing "yes" and "no" (data not shown).

Sensitivity
The SRMs of patients whose tinnitus complaints improved according to the PGI-C ranged between 0.80 and 1.09 for the THI and between 0.59 and 1.00 for the TBF12. For the German versions of the TQ, Mini-TQ, and TQ-12 the SRMs were calculated to be 1.04, 0.95, and 0.94,  respectively. Patients rating themselves as unchanged showed slightly positive mean score changes and the SRMs were low ranging between 0.06 to 0.26, except for the Dutch and Portuguese versions of the THI with SRMs of 0.45 and 1.05 respectively. The SRMs of patients who reported worsening of tinnitus complaints could only be calculated for the German versions of the questionnaires because of the small sample sizes of the Dutch, Portuguese, and Spanish versions. In this category, mean changes were negative, and the SRMs varied between 0.29 and 0.44. All SRMs are summarized in Table 10.

Discussion
Tinnitus Questionnaires such as the THI have been translated into several different languages, including German, Dutch, Spanish, Portuguese, Turkish, Danish, and Chinese. Thus, a psychometric validation of these types of questionnaires for each language adaption is important as well as a comparison of the results, and an examination of any differences that may exist, either in the translation process or in cultural matters. In the present study, we compared the German, Dutch, Spanish, and Portuguese versions of the THI and the TBF-12 (a modified subset of the THI) and, additionally, the German Version of the TQ with its subsets Mini-TQ and TQ 12. Data for the analyses were derived from the TRI Database, standing for a highly standardized and controlled data collection of international clinical studies on tinnitus.
In the first step, we assessed the reliability of the questionnaires throughout all languages and then for each language separately. A well-known measurement parameter for internal consistency is Cronbach's Alpha, which means that all test items measure the same construct. As a rough rule of thumb, values greater than 0.7 show good internal consistencies and values greater than 0.9   very high internal consistencies [32]. In the present study, we found high values between 0.79 and 0.95, indicating very good internal consistency throughout the languages for all questionnaires. Interestingly, the difference in Cronbachs alpha between the longer and shorter questionnaires is only about 0.07 or 0.08 points. This increase in alpha is not substantial and simply reflects an artificial inflation due to the higher number of items [33]. Thus the longer versions don't show more reliability than the shorter ones.
To check if each item is consistent with the average direction of the other items, an item-total correlation test was done for each questionnaire in each language. If the correlation (calculated with Pearson's correlation coefficient) between one single item and the total score without this item correlates low (values less than 0.3) [34], then the item does not measure the same construct as the others. For the THI questionnaire, item no 2 showed relatively low item-total correlations at baseline, which is in accordance with previous findings reported by Newman et al. [6]. Interestingly, item-total correlations were considerably higher at final visit. Unexpectedly, item 24 showed low item-total correlations (< . 30) in the whole sample and also in the German and Dutch subsamples. Nevertheless, due to its high-content validity of the ("tinnitus gets worse under stress"), the item should be retained (cf. Newman et al.). The lowest itemtotal correlations at baseline and at final visit (<. 2 and < −.2) occurred with respect to item no 19, but only in Dutch. At this stage it is unclear whether there is a problem with the translation or the finding is simply an artifact due to the relatively low sample size. These data indicated that worsening under stress (item 24), and hearing difficulties due to tinnitus loudness (item 2) were only lowly correlated with tinnitus handicap. The items of the TBF12 as well of the TQ generally showed good item/total correlations and thus seemed to measure the same construct.  Since all questionnaires had been designed to measure the strength of patients' tinnitus complaints, we calculated the correlations between the questionnaires. A low correlation would indicate differences between the constructs of the questionnaires measure. Although we found high correlations among all questionnaires, the correlation between THI and TBF12 was surprisingly low, given that the TBF12 is a modified subset of the THI. To explore potential reasons for this relatively low correlation, we correlated all items of the TBF-12 with the corresponding items of the THI. Since the questions had the same content in corresponding items, high correlations (>0.8) were expected for this analysis. Instead, correlations ranged between 0.5 and 0.7 and some were even lower than 0.5 (see Table 8). The most likely explanation for these low correlations is the differences in the answer scales of the TBF-12 and the THI (TBF-12: "never ─ sometimes ─ often"; THI: "yes ─ sometimes ─ no") and the slightly different wording of some of the questions. Table 9 illustrates in detail how much the response to the same question varied in the two questionnaires. The different order of the answer options cannot entirely explain the variability, since no systematic error pattern could be found for patients giving opposing answers. Since all patients had to complete all questionnaires at each assessment, the inconsistencies could also be due to lack of concentration or inattention. Nevertheless, our findings indicate that already small changes in the structure of questionnaire items may have substantial consequences. Moreover, our results indicate relevant differences between the answers to the 12 items of the TBF-12 and the corresponding items of the THI. Therefore, the TBF-12 should rather be regarded as a separate questionnaire than as a short version of the THI.
Another important aspect of the validity of questionnaires is whether the total score distinguishes between known groups of patients, e.g. different grades of tinnitus severity. In this context, we could show a clear linear relation between a global estimation of tinnitus severity (the answer to the question: "How much of a problem is your tinnitus?") and the corresponding scores for all questionnaires in all languages except for the Spanish version. The low correlation in the Spanish version was related to the very small sample size in this group.
Although most tinnitus questionnaires were designed to measure tinnitus complaints at the time but not the change of tinnitus over time, they are often used for detecting treatment-related improvements and also serve as a primary endpoint in clinical trials. First analyses for change sensitivity can be found for the THI [16], the TBF-12 [15] and the TQ [17], but not for the Mini-TQ, and the TQ 12. For use as an outcome parameter in clinical trials, an important point in the analysis of questionnaires is change sensitivity, indicating the potential of a questionnaire to map the change of tinnitus over time. We could show that all questionnaires have very good sensitivity when tinnitus complaints decrease or increase. For unchanged tinnitus complaints, the difference in the scores between final visit and baseline should ideally stay constant, resulting in a very low SRM. In our analysis, the Dutch and the Portuguese versions of the THI had relatively high SRM values for unchanged tinnitus complaints, whereas all other questionnaires proved to be robust. Since sample sizes for both, the Dutch and the Portuguese version are quite low in the unchanged group (n<20), the observed effects may be idiosyncratic. Further studies are needed to confirm or revise the calculated SRMs, before further conclusions can be drawn. Surprisingly, the subsets Mini-TQ and TQ 12 of the TQ showed the best results despite the small number of items, indicating their usefulness as outcome measures. However, all our conclusions with respect to the Mini-TQ and the TQ 12 are preliminary since they were derived from an analysis of extracted items of the TQ. We cannot exclude the possibility that completion of the short versions may result in different results.

Conclusion
The present analysis showed that the THI performs comparably well across several countries and languages. The TBF-12 performed equally well with respect to its summary score, although surprisingly low correlations were found between several individual items assessing the same content in both questionnaires. Therefore, caution is warranted when the TBF-12 is regarded as a short form of the THI. The TQ, which was only investigated in German, also showed satisfying psychometric results, which were equally good in the long forms and the short forms. In summary, all investigated instruments had high internal consistency, high convergent validity, good change sensitivity, and discriminated between patients with different tinnitus severity in an unselected large multinational clinical sample. Thus, the questionnaires are appropriate to evaluate the effects of tinnitus treatments in a cross-cultural context.

Additional file
Additional file 1: Corrected item/total correlation for THI. Corrected item/total correlation for TBF12. Corrected item/total correlation for TQ.

Competing interests
The authors declare that they have no competing interests.