Skip to main content

Confirmatory factor analysis of the thyroid-related quality of life questionnaire ThyPRO


Background and aim

Thyroid diseases are prevalent and chronic. With treatment, quality of life is restored in most, but not all patients. Construct validity of the thyroid-related quality of life questionnaire, ThyPRO, has been established by multi-trait scaling, but not evaluated with more elaborate methods. The purpose of the present study was to evaluate dimensionality of the ThyPRO scales and to attempt to understand possible item misfit through structural equation modeling for categorical data.


The current 85-item version of ThyPRO consists of 13 scales, covering domains of physical (4 scales) and mental (2 scales) symptoms, function and well-being (3 scales) and participation/social function (4 scales). The data were collected from a cross-sectional sample of 907 thyroid patients. One-factor confirmatory models were fitted to each scale, and evaluated by model fit statistics (comparative fit index >0.95, root mean square error of approximation <0.08), magnitude of factor loadings, model residual correlations and modification indices (MI). Indications of multi-dimensionality were tested in bi-factor models. Possible item misfit was evaluated in a combined, investigational model.


Each ThyPRO scale was adequately represented by a unidimensional model after minor revisions. Eleven items were identified in the unidimensional models as potentially misfitting and were investigated further by multidimensional modeling.


Elaborate psychometric modeling supported the construct validity of the ThyPRO. However, 11 potentially misfitting items and 18 items with local dependence to other items are candidates for removal in future item reduction processes.


Thyroid diseases are diseases related to the thyroid gland, which is an endocrine, i.e. hormone producing, gland located in the front of the neck. Thyroid diseases are prevalent, affecting approximately 15% of individuals of all ages, with a 4 to 1 women/men ratio [1],[2]. The main disease groups comprise non-toxic goiter (enlargement of the gland), hyperthyroidism (either as toxic nodular goiter or Graves' disease -with or without Graves' orbitopathy (GO, inflammation and protrusion of the eyes)) - and autoimmune hypothyroidism. The symptomatology is often diffuse, sharing features with many other diseases (fatigue, palpitations, dry skin, depression, uneasiness, etc.) as well as with the non-pathological fluctuations of well-being and function in life. Therefore, thyroid diseases may go un-diagnosed for many years in some patients and at the time of diagnosis, most patients have reduced quality of life [3],[4]. The diseases are chronic, but relevant treatment is available. In general though, there is a lag in treatment effect for thyroid diseases of up to several months and population-based studies document excess morbidity and mortality, also when adequately treated [5],[6]. Eventually, the quality of life of the majority of patients is restored [4],[7]. However, studies indicate that a substantial minority do not regain their premorbid level of well-being and function [8],[9]. Valid and reliable measures of health-related quality of life are necessary in order to describe the patients' experiences of the diseases adequately and for intervention studies attempting to improve treatment efficacy. Therefore, there has been a growing interest within thyroidology in measuring patient-reported outcomes (PRO), leading to the development of a comprehensive PRO measuring thyroid-related quality of life, the ThyPRO. Due to the fact that individual thyroid diseases often co-exist (e.g., goiter and hyperthyroidism) and that treatment of one disease entity may lead to another (e.g., removal of a goiter leading to hypothyroidism), the ThyPRO was developed as a comprehensive thyroid-related measure, aimed at any benign thyroid disease.

The content of the ThyPRO addresses the impact of all benign thyroid diseases [10],[11]. The validation of the current version has included evaluation of clinical validity in terms of known-groups comparisons and reliability in terms of internal consistency and test-retest reliability [12],[13]. Further, the ThyPRO's dimensionality or construct validity has been established by multi-trait scaling [12]. However, within such a framework, it is not possible to test the overall fit of a model [14], nor can misfit of items be modeled specifically.

The growing interest in applying the ThyPRO in clinical studies [7],[15],[16] and even in daily clinical practice has motivated efforts to develop shorter versions of the instrument as well as versions applicable to ecological momentary assessments. Development of such versions can be informed by the application of item response theory (IRT) models, which also provide a more detailed description of measurement precision and can provide data for interpretability of the ThyPRO. However, IRT models require additional, more detailed examinations of the dimensionality of the ThyPRO scales.

Structural equation models provide a latent variable modeling framework that is useful in detailed examinations of dimensionality. The measurement part of structural equation models can be used to assess the dimensionality of measured variables such as questionnaire items, using confirmatory factor analysis (CFA) for categorical data. Structural equation modeling can also test relationships among modeled latent variables (i.e., structural part of the models) [17]-[21]. We will exploit the former in the detailed analyses of the dimensionality of the ThyPRO scales, including overall test of model fit. We will use the structural part of the modeling approach when attempting to understand, through investigative modeling, any possible item misfit identified during the CFA step.

Thus, the purpose of the present study was to evaluate dimensionality of the ThyPRO scales in a sample of patients with a broad spectrum of thyroid diseases and to attempt to understand possible item misfit through investigative structural equation modeling.


The ThyPRO questionnaire

The current 85-item version of ThyPRO measures quality of life in 13 scales, covering physical (4 scales) and mental (2 scales) symptoms, function and well-being (3 scales) and participation/social function (4 scales) and one single item about overall quality of life. Content and scale structure were derived from a literature search [8] and from expert and patient interviews [10] and the development was conducted within a classical health-related quality of life theoretical framework [22]-[25]. Items are rated on a five-point scale from 0 = not at all to 4 = very much, with a reference period of 4 weeks. Thirteen scales are scored by reverting positively worded items and rescaling item scores from 0 (best QoL - absence of symptoms) to 100 (worst QoL – maximum level of symptoms) and taking the average across the items in the scale – i.e., standard summation and linear transformation.

Patient population

The patient population comprised a cross-sectional sample of 907 patients attending two university hospital endocrine outpatient clinics during 2007 (Table 1 (For further details, see reference [13])). At one center, all consecutive patients newly referred to the clinic were invited to participate; at the other center, all patients attending the clinic during a specified period of time were invited, regardless of their referral time. Thus, patients from the former were mainly newly diagnosed whereas from the latter most were already receiving treatment. All common benign thyroid diagnoses were represented, as were various stages of disease and treatment. Clinical description of the patients included physical examination, ultrasonographic imaging and biochemical testing. The overall response rate was 69%. The project was approved by the local ethical committee (KF01 2006–1579) and the Danish Data Protection Agency and was registered at (NCT00150033).

Table 1 Characteristics of the N = 907 patients

Statistical analyses

Prior to any of the statistical analyses mentioned below, a content analysis of each scale was performed to identify items which might be less associated with the remaining items in the same scale, and item pairs which might be closely related to one another after being accounted for by the scale (local item dependence). This was done to provide a content-based guidance to model fitting.

Then a one-factor confirmatory model for ordinal data was fitted to each individual scale [26],[27], using Mplus (version 7.11) [28]. The ordinal items were regressed on the scale-factor by probit regressions estimated by a robust weighted least squares estimator with mean and variance adjustment (WLSMV) [28],[29]. Appropriateness of the initial one-factor model for each scale was assessed by: 1) overall goodness-of-fit statistics including the comparative fit index (CFI) and the root mean square error of approximation (RMSEA), where CFI >0.95 and RMSEA < 0.08 were regarded as appropriate fit [30]-[34]; 2) magnitude of factor loadings; 3) model residual correlations (RC) and 4) modification indices (MI) [28],[35]. For the latter three criteria, their magnitude was evaluated in comparison to other items in the scale and in an integrative manner, taking all three under consideration at once, so no strict thresholds were applied for each criterion. In general though, modification indices >100 and residual correlations >|.10| were taken as indices of lack of fit (local dependence or lack of convergent validity), but smaller values could also give rise to model revision considerations, if several indices pointed in the same direction; e.g., if an item had a modification index of 40 for a specific residual correlation (a "WITH"-statement in Mplus) and also had residual model correlations with several items. Revisions to improve model fit were based on both confirmatory factor modeling and content analysis, including specification of residual correlations among items, omission of poorly associated items from the models, and specification of sub-factors (for example among positively worded items in a scale). For scales where secondary factors seem plausible, a bifactor model was fitted to evaluate the dominance of the primary factor when secondary factors were modeled. A bifactor model specifies that each item is regressed on both a general and a group (secondary) factor, and the general and group factors are uncorrelated with each other [34],[36]-[39]. The magnitude of loadings on the general and group factors were compared. The two-item scale on impaired sex life was not examined in this step, since a separate factor analysis of a two-item scale is not useful.

In an attempt to understand any possible item misfit identified through individual scale analyses, hypotheses which could explain the misfit were sought. These hypotheses were evaluated in a combined, investigational multidimensional model, where the individual scale factors were allowed to correlate freely. Also items were cross-loaded on multiple scale factors when necessary to explore a better understanding of item misfit. For example, if an item in a physical symptoms scale, e.g., "Palpitations", had low own-factor loadings, it could be hypothesized that this was due to palpitations being influenced by mental health, e.g., as part of anxiety. Then cross-loading of this item on the mental symptoms scales would be specified and evaluated in the combined model.

In order to examine the stability of the model across various estimation techniques, the overall final model was compared with graded response multidimensional IRT models [40], fitted with the Mplus program [28]. For computational reasons, a 13-dimensional IRT model could not be estimated, so the model was broken down to four separate models, each containing scales with cross-loadings across scales. Stability was examined by comparing the estimated factor scores for each patient from the SEM vs. the IRT-model using intra-class correlations.


Fitting unidimensional models to each individual ThyPRO scale

Table 2 shows the results of the content analyses and the confirmatory factor analyses of the ThyPRO scales in their current version. In general, loadings were high in all scales and CFI was also high for the vast majority of scales. In contrast, for most scales, RMSEA was not below the 0.08 threshold for appropriate fit. Model parameters indicative of item misfit are presented to the right in Table 2. The consequential remodeling resulted in the revised scales presented in Figure 1 and the remodeling as well as the overall goodness-of-fit statistics are described separately for each scale in the following text.

Table 2 Content analysis and confirmatory factor analyses of the individual ThyPRO scales
Figure 1
figure 1

Parameter estimates of the unidimensional confirmatory factor analyses of the revised ThyPRO scales. Overall goodness-of-fit of the models are provided in the text. Grayed out items were omitted during model revision. The two-item Impaired Sexlife scale was not estimated.

Goiter Symptoms

Three items were problematic (2b Visible swelling in front of neck, 2e Throat pain felt in ears and 2l Hoarseness), with relatively low loadings and indication of local dependence with other items. Two of these items were identified prior to the modeling as potentially less related to the concept. Two instances of local dependence among other items were identified (2c Pressure in throat vs. 2 g Need to clear throat often and 2 h Discomfort swallowing vs. 2i Difficulty swallowing, Table 2). When omitting the three items and modeling the local dependencies, an appropriately fitting unidimensional model was reached (Figure 1, CFI = 0.99, RMSEA(90%CI) = 0.08(0.07-0.09)).

Hyperthyroid Symptoms

For one pair of items (2n Increased sweating vs. 2q Sensitive to heat), the modification index suggested local dependence and one item (2t Loose stools) had large negative residual correlations with other items, when the initial model was estimated. When omitting the latter and fitting the local dependence, a unidimensional model obtained an appropriate fit to the data (Figure 1, CFI = 0.97 RMSEA(90%CI) = 0.06(0.05-0.08)).

Hypothyroid Symptoms

When modeling the expected local dependence between the items concerning skin (2gg Dry skin vs. 2hh Itching skin), an appropriate fit between an overall unidimensional model and data was demonstrated for this scale (Figure 1, CFI = 1.0 RMSEA(90%CI) = 0.00(0.00-0.09).

Eye Symptoms

With the specification of two local dependence-pairs (2w Watery eyes vs. 2x Bags under eyes and 2aa Pressure in eyes vs. 2cc Pain in eyes), an appropriate fit of a unidimensional model was found (Figure 1, CFI = 0.99 RMSEA(90%CI) = 0.06(0.04-0.07).


Despite quite high factor loadings, overall goodness-of-fit was poor for this scale. To avoid floor problems, three items had been formulated positively for this scale. The positively worded items had high positive residual correlations and modification indices. A bi-factor model distinguishing positively from negatively worded items was therefore evaluated (Figure 2, Panel A). Although the positively worded items had high loadings on the positive factor (Vitality), loadings on the general factor were higher. When modeling the local dependence among positively worded items as residual correlations and also allowing for the local dependence between 3a and 3b, the model had good fit (Figure 1, CFI = 1.0, RMSEA(90%CI) = 0.02 (0.00-0.04).

Figure 2
figure 2

Bi-factor models for the Tiredness (Panel A) and the Emotional Susceptibility (Panel B) scales.

Cognitive Complaints

All items had high loadings in the initial model (Table 2). When specifying two pairs of local dependence, suggested by modification indices (5a Problems remembering vs. 5d Been confused and 5e Difficulty learning vs. 5f Difficulty concentrating), overall model fit was appropriate (Figure 1, CFI = 1.0 RMSEA(90%CI) = 0.07(0.05-0.09)).


According to overall goodness-of-fit indices, the initial model did not obtain an appropriate fit to the data (Table 2). When fitting a model by excluding the item identified as less related with the other items (6d Afraid being seriously ill) and by specifying two item pairs with local dependence (6a Nervous vs. 6b Afraid or anxious and 6e Uneasy and 6f Restless), appropriate fit was obtained (Figure 1, CFI = 1.0, RMSEA(90%CI) = 0.07(0.04-0.10)).


All items had high loadings (Table 2). However, only after specification of two local dependence pairs (7e Crying easily vs. 7f Unhappy and 7 g Happy vs. 7i Self-confident), was an appropriate overall fit to data reached (Figure 1, CFI = 1.0 RMSEA(90%CI) = 0.07 (0.05-0.09)).

Emotional Susceptibility

In contrast to most other concepts measured by ThyPRO, this scale measures a unique aspect of mental health identified through qualitative analysis of patient interviews. Thus, it is not classically described as a separate concept. It is, however, an important aspect according to the patients and a prominent feature particularly among patients with thyroid autoimmunity [10]. According to the overall fit indices, these items do not appropriately conform to a unidimensional model, despite high factor loadings (Table 2). Several items had high inter-item residual correlations and were attempted to be modeled as a separate "Anger" sub-factor (Figure 2, Panel B). However, as shown in Figure 2, the sub-factor loadings were rather low. Four items had to be omitted in order to obtain appropriate fit between a unidimensional model and the data (Figure 1, CFI = 1.0 RMSEA(90%CI) = 0.08(0.05-0.11)). A local dependence (8c Easily stressed vs. 8i Felt in balance) was also modeled.

Impaired Social Life

Appropriate, albeit not good overall goodness-of-fit indices were found for the initial unidimensional model. Excluding the lowest-loading item (10d People lack understanding), which was also pre-specified as possibly less associated, resulted in a just-identified model, hence with perfect fit (Figure 1, CFI = 1.0 RMSEA(90%CI) = 0.00(0.00-0.00)).

Impaired Daily Life

With the specification of one local dependence (11d Difficulty getting around vs. 11e Everything takes longer), a unidimensional model fit the data appropriately (Figure 1, CFI = 1.0, RMSEA (90%CI) = 0.08(0.07-0.10)).

Cosmetic Complaints

The initial unidimensional model had almost appropriate goodness-of-fit indices (Table 2). When modeling one local dependence (13a Disease affect appearance vs. 13b Unsatisfied with appearance) and leaving out the very nonspecific item concerning feeling too fat (13g), a good fit between model and data was found (Figure 1, CFI = 1.0 RMSEA(90%CI) = 0.05(0.02-0.08)).

Investigative modeling of possible item misfit within one combined multidimensional model

This investigative model is presented in Table 3. The hypotheses concerning the reason for misfit of the omitted items are presented in the second column of the table. In these models, the possible sub-factors tested in bifactor models (Figure 2) were specified as residual correlations among the involved items. In the third column of Table 3, it is specified how these hypotheses were modeled in the combined multidimensional model, where all the factors were evaluated simultaneously and were allowed to correlate freely. The results of this investigative modeling are described in the rightmost column of Table 3. Generally, a closer association was found between items and their own scale for the items in the multidimensional model (e.g. items 2e, 2 t and 10e), than in the unidimensional model for each scale. For most items, the hypothesized explanations for the apparent misfit were confirmed. Thus, 2b Visible swelling on neck was indeed associated with Cosmetic Complaints (–0.23). Item 2l Hoarseness did load also on the Hypothyroid Symptoms scale (0.22), 2t Loose stools was negatively associated with particularly Hypothyroid Symptoms (–0.55), and a negative association between 6d Afraid of being seriously ill and time since diagnosis was found. In contrast, no relationship between item 10e Other people lack understanding and mental health scales was found. Item13g Feeling too fat was associated with both Hypothyroid Symptoms (–0.16), Anxiety (–0.22) and Depressivity (0.15), and had low loading on its own factor (0.53).

Table 3 For each item which was omitted during the single-scale analyses, hypotheses regarding possible reasons for misfit were formulated, modeled and tested as specified

In analyses of concordance of results from SEM and the IRT-model, high intra-class correlation coefficients (0.94-0.99) were found for all 13 scales, when comparing factor scores derived by the SEM with IRT score estimates (Table 4).

Table 4 Comparison of individual factor-scores derived from the ordinal confirmatory factor analysis approach with the factor scores derived from the item response theory (IRT) approach


The purpose of the present study was to evaluate the dimensionality of the ThyPRO scales and to detect and understand potential item misfit. Since an established scale structure already exists for the ThyPRO, we used a combination of confirmatory factor analyses of the individual scales and a combined multidimensional model comprising all 13 ThyPRO scales. In case of misfit for each individual scale, we revised the model to achieve the best description of data.

In general, items had high loadings on their own factors and the comparative fit indices were high, but for the majority of the scales, the root means square error of approximation indicated that a simple unidimensional model was not fitting the data sufficiently well. Based on prior expectations informed by content analyses, modeling results (model inter-item correlations and model residual correlations) and on model modification indices, the models were adjusted in order to reduce the overall misfit. For all scales, an appropriate fit according to the overall goodness-of-fit indices could be reached. During this process, a total of 11 items were left out of the models and 18 residual correlations indicating local dependence were specified.

In most instances, the magnitude of the residual correlations representing local dependencies was small, and the loading on the relevant general factor was still high. Most of the residual correlations were among very similarly worded items. Such local dependencies are not problematic for the current scoring of the ThyPRO, but may lead researchers to overestimate the precision gained by the instrument, because locally dependent items provide less measurement precision than assumed by standard psychometric analyses [41]. Moreover, one of the items involved in such pairs would be potential candidates for omission in future IRT-modeling of the instrument and in the development of abbreviated versions of the ThyPRO.

However, such item reduction should be done with caution and should take clinical analyses and considerations into account.

Although positively worded items did tend to exhibit residual correlations, we found no consistent evidence of a method factor among the positively worded items. Similar studies with other outcome measures have previously found substantial influence of the value of the wording [36],[42]-[44], whereas other studies either did not identify such an effect [45] or the identified effect had only minor influence on the results regarding the substantive factor [46].

We attempted to model potential item misfit identified during the dimensionality analyses of the existing ThyPRO scales. This was done within a model including all scales, which were allowed to correlate, in order to allow for cross-loadings of items to be examined and in order to evaluate if possible misfit identified during individual scale analyses was due to interrelation with other factors. In doing so, the hypothesized reason for misfit was confirmed in five of seven items: Item 2b, about visibility of the goiter, cross-loaded on Cosmetic Complaints. Item 2t, Loose stools, had a large negative loading on Hypothyroid Symptoms, as had 2l, Hoarseness. Both constipation and hoarseness are indeed salient and classical features of hypothyroidism [47]. The rather non-specific item 13g, Feeling too fat, which is a common complaint among hypothyroid patients and among hyperthyroid patients after treatment, had cross-loadings on several other scales and low loading on its own factor, also when modeled multidimensionally. Thus, these four items are very strong candidates for item reduction when developing abbreviated and focused versions of the scales or when fitting models where unidimensionality is a strong assumption, for example as in unidimensional IRT models.

A unique "duration of disease"-effect was observed for one item. Item 6d, Afraid of being seriously ill was negatively associated with time since diagnosis, indicating that the responses to this item reflects a relevant concern early in the disease course, for instance of a goiter being malignant, a concern that wanes as the diagnosis becomes more firmly established and malignancy thus ruled out. It thus measures something different from the other items in the scale, which are more classical indicators of an anxious state.

As an analysis of the robustness and appropriateness of the ordinal confirmatory WLSMV factor analysis, an alternative multidimensional IRT-based analysis was performed. Individual factor scores derived from each of these approaches were very similar, as illustrated by very high intra-class correlation coefficients. This corroborates the current simple scoring approach and the results of the present analyses.

The use of theoretically driven analyses within a clinically well-described and relatively (for thyroid diseases) large sample was a strength of this study. However, the analyses were carried out in one sample and should ideally be confirmed in a new independent sample. Furthermore, although the present sample comprised patients in all stages of disease and treatment, stability of the factor structure across time could not be evaluated, since the data did not contain longitudinal measurements.

In conclusion, each of the ThyPRO scales could be appropriately represented by a unidimensional model after minor revisions. Eleven items were identified in the unidimensional models as potentially misfitting and understood further by multidimensional modeling. Thus, overall the previous initial examinations of the construct validity of the scales [12] were corroborated using a more elaborate technique. Further, advanced psychometric modeling such as IRT, with strong assumptions about dimensionality, can be applied to the reduced scales. Finally, the locally dependent items identified here are strong candidates for removal, in future item reduction processes.


  1. Vanderpump MP: The epidemiology of thyroid disease. Br Med Bull 2011, 99: 39–51. 10.1093/bmb/ldr030

    Article  PubMed  Google Scholar 

  2. Canaris GJ, Manowitz NR, Mayor G, Ridgway EC: The Colorado thyroid disease prevalence study. Arch Intern Med 2000, 160: 526–534. 10.1001/archinte.160.4.526

    Article  CAS  PubMed  Google Scholar 

  3. Bianchi GP, Zaccheroni V, Solaroli E, Vescini F, Cerutti R, Zoli M, Marchesini G: Health-related quality of life in patients with thyroid disorders. Qual Life Res 2004, 13: 45–54. 10.1023/B:QURE.0000015315.35184.66

    Article  CAS  PubMed  Google Scholar 

  4. Elberling TV, Rasmussen AK, Feldt-Rasmussen U, Hording M, Perrild H, Waldemar G: Impaired health-related quality of life in Graves' disease: a prospective study. Eur J Endocrinol 2004, 151: 549–555. 10.1530/eje.0.1510549

    Article  CAS  PubMed  Google Scholar 

  5. Brandt F, Almind D, Christensen K, Green A, Brix TH, Hegedüs L: Excess mortality in hyperthyroidism: the influence of preexisting comorbidity and genetic confounding: a danish nationwide register-based cohort study of twins and singletons. J Clin Endocrinol Metab 2012, 97: 4123–4129. 10.1210/jc.2012-2268

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Thvilum M, Brandt F, Almind D, Christensen K, Hegedüs L, Brix TH: Excess mortality in patients diagnosed with hypothyroidism: a nationwide cohort study of singletons and twins. J Clin Endocrinol Metab 2013, 98: 1069–1075. 10.1210/jc.2012-3375

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Mishra A, Sabaretnam M, Chand G, Agarwal G, Agarwal A, Verma AK, Mishra SK: Quality of Life (QoL) in patients with benign thyroid goiters (Pre- and Post-Thyroidectomy): a prospective study. World J Surg 2013, 37: 2322–2329. 10.1007/s00268-013-2133-3

    Article  PubMed  Google Scholar 

  8. Watt T, Groenvold M, Rasmussen AK, Bonnema SJ, Hegedüs L, Bjorner JB, Feldt-Rasmussen U: Quality of life in patients with benign thyroid disorders: a review. Eur J Endocrinol 2006, 154: 501–510. 10.1530/eje.1.02124

    Article  CAS  PubMed  Google Scholar 

  9. Fahrenfort JJ, Wilterdink AM, van der Veen EA: Long-term residual complaints and psychosocial sequelae after remission of hyperthyroidism. Psychoneuroendocrinology 2000, 25: 201–211. 10.1016/S0306-4530(99)00050-5

    Article  CAS  PubMed  Google Scholar 

  10. Watt T, Hegedüs L, Rasmussen AK, Groenvold M, Bonnema SJ, Bjorner JB, Feldt-Rasmussen U: Which domains of thyroid-related quality of life are most relevant? Patients and clinicians provide complementary perspectives. Thyroid 2007, 17: 647–654. 10.1089/thy.2007.0069

    Article  PubMed  Google Scholar 

  11. Watt T, Rasmussen AK, Groenvold M, Bjorner JB, Watt SH, Bonnema SJ, Hegedüs L, Feldt-Rasmussen U: Improving a newly developed patient-reported outcome for thyroid patients, using cognitive interviewing. Qual Life Res 2008, 17: 1009–1017. 10.1007/s11136-008-9364-z

    Article  PubMed  Google Scholar 

  12. Watt T, Bjorner JB, Groenvold M, Rasmussen AK, Bonnema SJ, Hegedüs L, Feldt-Rasmussen U: Establishing construct validity for the thyroid-specific patient reported outcome measure (ThyPRO): an initial examination. Qual Life Res 2009, 18: 483–496. 10.1007/s11136-009-9460-8

    Article  PubMed  Google Scholar 

  13. Watt T, Hegedüs L, Groenvold M, Bjorner JB, Rasmussen AK, Bonnema SJ, Feldt-Rasmussen U: Validity and reliability of the novel thyroid-specific quality of life questionnaire, ThyPRO. Eur J Endocrinol 2010, 162: 161–167. 10.1530/EJE-09-0521

    Article  CAS  PubMed  Google Scholar 

  14. Campbell DT, Fiske DW: Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull 1959, 56: 81–105. 10.1037/h0046016

    Article  CAS  PubMed  Google Scholar 

  15. Watt T, Cramon P, Bjorner JB, Bonnema SJ, Feldt-Rasmussen U, Gluud C, Gram J, Hansen JL, Hegedüs L, Knudsen N, Bach-Mortensen P, Nolsoe R, Nygaard B, Pociot F, Skoog M, Winkel P, Rasmussen AK: Selenium supplementation for patients with Graves' hyperthyroidism (the GRASS trial): study protocol for a randomized controlled trial. Trials 2013, 14: 119. doi: 10.1186/1745–6215–14–119 10.1186/1745-6215-14-119

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Giusti M, Mortara L, Cecoli F, Pera G, Caorsi V, Minuto F: Evaluation of quality of life with the ThyPRO questionnaire in patients with disease-free differentiated thyroid carcinoma. EndocrRev 2012, 33(3):MON-436.

    Google Scholar 

  17. Van DV, Engels RC: Quality of life of adolescents with asthma: the role of personality, coping strategies, and symptom reporting. J Psychosom Res 2011, 71: 166–173. 10.1016/j.jpsychores.2011.03.002

    Article  Google Scholar 

  18. Annett RD, Turner C, Brody JL, Sedillo D, Dalen J: Using structural equation modeling to understand child and parent perceptions of asthma quality of life. J Pediatr Psychol 2010, 35: 870–882. 10.1093/jpepsy/jsp121

    Article  PubMed Central  PubMed  Google Scholar 

  19. Chen WJ, Chen CC, Ho CK, Chou FH, Lee MB, Lung F, Lin GG, Teng CY, Chung YT, Wang YC, Sun FC: The relationships between quality of life, psychiatric illness, and suicidal ideation in geriatric veterans living in a veterans' home: a structural equation modeling approach. Am J Geriatr Psychiatry 2011, 19: 597–601. 10.1097/JGP.0b013e3181faec0e

    Article  PubMed  Google Scholar 

  20. King-Kallimanis BL, Oort FJ, Nolte S, Schwartz CE, Sprangers MA: Using structural equation modeling to detect response shift in performance and health-related quality of life scores of multiple sclerosis patients. Qual Life Res 2011, 20: 1527–1540. 10.1007/s11136-010-9844-9

    Article  PubMed Central  PubMed  Google Scholar 

  21. Oort FJ: Using structural equation modeling to detect response shifts and true change. Qual Life Res 2005, 14: 587–598. 10.1007/s11136-004-0830-y

    Article  PubMed  Google Scholar 

  22. Wilson IB, Cleary PD: Linking clinical variables with health-related quality of life: a conceptual model of patient outcomes. JAMA 1995, 273: 59–65. 10.1001/jama.1995.03520250075037

    Article  CAS  PubMed  Google Scholar 

  23. Ware JE Jr: Conceptualization and measurement of health-related quality of life: comments on an evolving field. Arch Phys Med Rehabil 2003, 84: S43-S51. 10.1053/apmr.2003.50246

    Article  PubMed  Google Scholar 

  24. Watt T, Hegedüs L, Bjorner JB, Groenvold M, Bonnema SJ, Rasmussen AK, Feldt-Rasmussen U: Is thyroid autoimmunity per se a determinant of quality of life in patients with autoimmune hypothyroidism? Euro Thyroid J 2012, 1: 186–192. 10.1159/000342623

    Article  CAS  Google Scholar 

  25. Patrick DL, Chiang YP: Measurement of health outcomes in treatment effectiveness evaluations: conceptual and methodological challenges. Med Care 2000, 38: II14-II25.

    CAS  PubMed  Google Scholar 

  26. Muthen B: A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika 1984, 49: 115–132. 10.1007/BF02294210

    Article  Google Scholar 

  27. Muthen B: Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Psychometrika 1989, 54: 557–585. 10.1007/BF02296397

    Article  Google Scholar 

  28. Muthen B, Muthen L: Mplus User Guide. Muthen & Muthen, Los Angeles; 2010.

    Google Scholar 

  29. Beauducel A, Herzberg PY: On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Struct Eq Model: Multidiscip J 2006, 13: 186–203. 10.1207/s15328007sem1302_2

    Article  Google Scholar 

  30. Browne MW, Cudeck R: Alternative ways of assessing model fit. In Testing Structural Equation Models. Edited by: Bollen K, Long J. Sage, Newbury Park, CA; 1993:136–162.

    Google Scholar 

  31. Bentler PM: Comparative fix indexes in structural models. Psychol Bull 1990, 107: 238–246. 10.1037/0033-2909.107.2.238

    Article  CAS  PubMed  Google Scholar 

  32. Steiger JH: Structural model evaluation and modification: an interval estimation approach. Multivar Behav Res 1990, 25: 173–180. 10.1207/s15327906mbr2502_4

    Article  Google Scholar 

  33. Hu LT, Bentler PM: Cutoff criteria for fit indices in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model 1999, 6: 1–55. 10.1080/10705519909540118

    Article  Google Scholar 

  34. Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, Thissen D, Revicki DA, Weiss DJ, Hambleton RK, Liu H, Gershon R, Reise SP, Lai JS, Cella D: Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care 2007, 45: S22-S31. 10.1097/01.mlr.0000250483.85507.04

    Article  PubMed  Google Scholar 

  35. Schreiber JB, Stage FK, King J, Nora A, Barlow EA: Reporting structural equation modeling and confirmatory factor analysis results: a review. J Educ Meas 2006, 99: 323–337.

    Google Scholar 

  36. Anatchkova MD, Ware JE Jr, Bjorner JB: Assessing the factor structure of a role functioning item bank. Qual Life Res 2011, 20: 745–758. 10.1007/s11136-010-9807-1

    Article  PubMed Central  PubMed  Google Scholar 

  37. McDonald RP: Test Theory. A Unified Treatment. Lawrence Erlbaum Associates, Mahwah; 1999.

    Google Scholar 

  38. Gibbons RD, Hedeker D: Full-information item bi-factor analysis. Psychometrika 1992, 57: 423–436. 10.1007/BF02295430

    Article  Google Scholar 

  39. Reise SP, Morizot J, Hays RD: The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Qual Life Res 2007, 16(Suppl 1):19–31. 10.1007/s11136-007-9183-7

    Article  PubMed  Google Scholar 

  40. Forero CG, Maydeu-Olivares A: Estimation of IRT graded response models: limited versus full information methods. Psychol Methods 2009, 14: 275–299. 10.1037/a0015825

    Article  PubMed  Google Scholar 

  41. Wainer H, Thissen D: How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educ Meas 1996, 15: 22–29. 10.1111/j.1745-3992.1996.tb00803.x

    Article  Google Scholar 

  42. Brown TA: Confirmatory factor analysis of the Penn State worry questionnaire: multiple factors or method effects? Behav Res Ther 2003, 41: 1411–1426. 10.1016/S0005-7967(03)00059-7

    Article  PubMed  Google Scholar 

  43. Horan PM, Di Stefano C, Motl RW: Wording effects in self-esteem scales: methodological artifact or response style? Struct Equ Model 2003, 10: 435–455. 10.1207/S15328007SEM1003_6

    Article  Google Scholar 

  44. Tomás JM, Oliver A: Rosenberg's self-esteem scale: two factors or method effects. Struct Eq Model: Multidiscip J 1999, 6: 84–98. 10.1080/10705519909540120

    Article  Google Scholar 

  45. Ryff CD, Keyes CLM: The structure of psychological well-being revisitet. J Pers Soc Psychol 1995, 69: 719–727. 10.1037/0022-3514.69.4.719

    Article  CAS  PubMed  Google Scholar 

  46. Di Stefano C, Motl RW: Self-esteem and method effects associated with negatively worded items: Investigating factorial invariance by sex. Struct Eq Model: Multidiscip J 2009, 16: 134–146. 10.1080/10705510802565403

    Article  Google Scholar 

  47. Chakera AJ, Pearce SH, Vaidya B: Treatment for primary hypothyroidism: current approaches and future possibilities. Drug Des Devel Ther 2012, 6: 1–11.

    PubMed Central  PubMed  Google Scholar 

Download references


This study has been supported by grants from the Danish Medical Research Council, Agnes and Knut Mørk's Foundation, Aase and Ejnar Danielsen's Foundation, Else and Mogens Wedell-Wedellsborg's Foundation, the Genzyme Corporation, the Novo Nordisk Foundation, Arvid Nilsson's Fund and the Danish Thyroid Foundation.

*Researchers who want to use the ThyPRO may contact the first author (

Author information

Authors and Affiliations


Corresponding author

Correspondence to Torquil Watt.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

TW designed the study, conducted the CFA analyses and drafted the manuscript. MG, ND and BG was involved in analysis strategy and provided substantial intellectual input to the manuscript. UFR, ÅKR, LH and SJB was involved in design of the study, inclusion of patients and provided substantial intellectual input to the manuscript. JBB was involved in analysis strategy, conducted the supplemental IRT analyses and provided substantial intellectual input to the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Watt, T., Groenvold, M., Deng, N. et al. Confirmatory factor analysis of the thyroid-related quality of life questionnaire ThyPRO. Health Qual Life Outcomes 12, 126 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: