Skip to main content

Item reduction and validation of the Chinese version of diabetes quality-of-life measure (DQOL)



The Diabetes Quality-of-Life (DQOL) Measure is a 46-item diabetes-specific quality of life instrument. The original English version of the DQOL has been translated into Chinese after cultural adaption, and the Chinese DQOL has been validated in the Chinese diabetic patient population and used in diabetes-related studies. There are two recognized problems with the Chinese DQOL: 1) the instrument is too long, and 2) the non-response rate of certain items is relatively high. This study aimed to develop and validate a short version for the Chinese DQOL.


Item reduction was conducted based on the classical test theory (CTT) and item response theory (IRT), each combined with exploratory factor analysis (EFA). The confirmatory factor analysis (CFA) and Spearman correlation coefficient were employed in validating the short versions.


Both the study sample (n = 2,886) and the validation sample (n = 2,286) were from a longitudinal observation study of Chinese type 2 diabetic patients. The CTT kept 32 items, and the IRT kept 24 items from the original 46-item version. The two short versions were comparable in psychometric properties.


The 24-item IRT-based short version of the Chinese DQOL was selected as the preferred short version because it imposes a lower burden on patients without compromising the psychometric properties of the instrument.


The global prevalence of diabetes mellitus (DM) in adults was 9.1% (415 million people) in 2015, which makes DM one of the most common chronic diseases around the world [1]. Diabetes-related complications, receiving blood glucose control therapies, and dealing with hypoglycemic agents and/or insulin adverse reactions seriously affect patients’ (and their family members’) health-related quality of life (HRQoL) in both physical and psychological ways [2, 3]. Hence, diabetic patients’ HRQoL outcomes have been increasingly recognized as valuable and essential information to obtain in the fields of clinical research and diabetes management.

Diabetic patients’ HRQoL are measured by generic or diabetes-specific instruments [4]. Diabetes-specific instruments, as designed to focus on diabetes specific conditions, are more sensitive to diabetes-symptoms-related impacts on life and quality of life than generic instruments [5]. The Diabetes Quality-of-Life Measure (DQOL) is one of the most commonly used diabetes-specific instruments [6, 7]. It was developed and validated to compare two treatment regimens for chronic complications in patients with diabetes in the Diabetes Control and Complications Trial (DCCT) [8, 9]. The DQOL contains a total of 46 items, and all the items are categorized into one of the following four domains: life satisfaction (15 items), diabetes impact (20 items), social/vocational related worries (7 items), and diabetes related worries (4 items). The DQOL adopts a 5-point Likert scale for its response options. The scores range from 1, labeled as “very satisfied,” to 5, labeled as “very dissatisfied,” for items in the life satisfaction domain; from 1, labeled as “never impacted,” to 5, labeled as “always impacted,” for items in the diabetes impact domain; and from 1, labeled as “never worried,” to 5, labeled as “always worried,” for the social/vocational related and diabetes related worries domains.

The DQOL has been translated into five languages, including Chinese [10]. This measure was first translated and adapted for Chinese-Canadians who lived in the Toronto area by Cheng et al. [11, 12]. They removed 10 privacy-related (e.g. sexual life) items from the original DQOL and added six items regarding diet, worrying about death and so on. However, there was not sufficient psychometric evidence to support the cultural adaptation in Cheng et al.’s study [11], and the translation and validation were conducted based on an immigrant population, which cannot necessarily be generalized to the entire Chinese diabetic patient population. Ding et al. translated and adapted the DQOL for the Chinese population based on a sample of Chinese patients with diabetes who lived in Mainland China [13], and conducted validation of the Chinses DQOL on a separate sample of Chinese patients with type 2 DM living in Mainland China [14]. The wording of seven items was changed in Ding et al’s adaptation (see Additional file 1). Currently, the Chinese DQOL translated and adapted by Ding et al. has been used in diabetes-related clinical studies in China [15,16,17]. During its application among the Chinese diabetic patient population, the Chinese DQOL has exposed some of its own issues [18]. First, the non-response rate of certain privacy-related items was relatively high; and second, interviewees complained that the instrument was too long [19, 20]. In order to solve these issues, developing and validating a short version of the Chinese DQOL is necessary.

The classical test theory (CTT) and the item response theory (IRT) are two commonly used psychometric theories in conducting item selection and reduction for measures; however, these two theories work based on different assumptions and statistical approaches, and both have shortcomings [21, 22]. More specifically, the CTT assumes that each respondent has a true total score, T (latent variable), and each item is a representative of the score T; while the IRT follows the assumptions that the latent trait of a measure is unidimensional and all items are conditionally independent of each other. Generally, CTT tests the difficulty and discrimination at the item level and the reliability at the whole measure level; while IRT uses a set of logistic regression models to estimate the “discrimination,” “location,” and “information” for each item [21, 22]. The CTT is limited by the sample and item/test dependence and equal error (of measurement across examinees) assumption [21, 22]. The IRT overcomes these shortcomings but requires for large sample sizes for model fitting [21, 22]. There is no generally accepted approach or standard for item reduction. Currently, researchers have been using the IRT alone [23], the combination of the IRT and factor analyses [24, 25], or the combination of the CTT and factor analysis [26, 27] when selecting or reducing items.

Therefore, the present study aims to use both the CTT and IRT combined with factor analysis to derive and validate a short version of the Chinese DQOL, which can be rapidly administered in practice and can reduce response burden on patients.


Sample and data

We used the data from a Chinese community-based longitudinal survey of clinically diagnosed type 2 diabetic patients (T2DP) from five cities: Beijing, Chengdu, Guangzhou, Nanjing, and Shenyang. Patients were recruited and interviewed between December 2010 and October 2011, and followed every three months over a one-year study period. The Chinese DQOL and the EQ-5D-3L were administered at the baseline and at 12-months. Demographic, social-economic and diabetic-related information was also collected. We used the baseline data as the study sample for item reduction analysis, and the one-year end follow-up data as the validation sample to test the short versions of the Chinese DQOL reduced by CTT and IRT.

Reduction based on the classical test theory

Three steps were used to reduce the number of items based on the CTT. The first step tested each item at the individual item level, and the second and third steps examined the items at the whole measure or domain level. The following provides the details of the tests in each step and the corresponding item removal criteria.

Step 1. Item level tests

We tested three item level properties for each of the 46 items in this step, i.e., missing rate, item score mean, and item score standard deviation (SD).

Items which are unclear, ambiguous, or potentially embarrassing usually have a higher chance to have high non-response rate issues. This kind of item can provide very limited useful information, and its results are hard to interpret [21]. The exclusion criterion for the missing rate was higher than 5% [28].

In the CTT, item difficulty and discrimination are often evaluated in item level testing; however, most of the item difficulty and discrimination indexes are designed to test dichotomous items and can hardly be applied to test Likert items [29]. Norman has provided compelling evidence on the appropriateness of using descriptive statistics and parametric methods to test Likert items [30, 31]. The mean and SD of an item can provide fundamental information on whether the item can provide useful information or not [32]. For example, if the mean score is 4.7 for a 5-point Likert item (score range: 1 to 5), then the item is left-skewed and may not be able to provide the information it was designed to collect. In addition, if the SD of an item is low, then the item has low variability and it may not be useful either. There are no generally accepted criteria for the item level test using mean and standard deviation, and we used the most lenient criteria reported in the existing studies. We used the lowest score option plus 20% of the score range and the highest score option minus 20% of the score range to define the cut point of the exclusion criterion in terms of item score mean [21, 33, 34]. The lowest and highest score options for each item is 1 and 5, respectively, and the score range for each item is 4. Thus, the exclusion criterion for the item score mean was lower than 1.8 or higher than 4.2. The exclusion criterion for the item score SD was smaller than one-sixth of the score range, i.e., 0.67 (1/6*4) [21, 33,34,35].

Any item that met any two or more of the three exclusion criteria was removed from the measure. In addition, any item with a missing rate higher than 10% was removed regardless of the results of the other two criteria.

Step 2. Exploratory factor analysis

In this step, exploratory factor analysis (EFA) was employed on the remaining items to examine the underlying structure of the measure and remove items with low factor loadings on common factors.

More specifically, Bartlett’s test of sphericity [36] and Kaiser-Meyer-Olkin’s (KMO) measure of sampling adequacy [37] were conducted before conducting the EFA. Since the training sample violated the assumption of multivariate normality, we employed the principal-factor extraction method [38]. A scree plot was used to identify the number of factors [39]. Oblique rotation method was used in the EFA since the DQOL items were not completely unrelated to each other [40]. In this step, any item with a factor loading less than 0.3 was removed [41].

Step 3. Internal consistency reliability

Internal consistency reliability was tested in terms of the corrected item-total correlation and Cronbach’s alpha [29]. Both tests were conducted at the factor level based on the results of the EFA in step 2.

Since there is no standard scoring method for the Chinese DQOL, we used the patients’ mean score of the items in each factor as the “factor score” when calculating the corrected item-total correlation. For each item, the corrected item-total correlation was calculated as the Pearson correlation coefficient between the item score and the mean score of the rest of the items in the factor this item belonged to. A larger corrected item-total correlation coefficient indicates better internal consistency reliability. The exclusion criterion was a correlation coefficient smaller than 0.3 [42]. For the Cronbach’s alpha, the exclusion criterion was that the Cronbach’s alpha of the factor increased after removing an item [43].

In this step, any item that met one or more of these two exclusion criteria was removed from the measure. An additional EFA was used to check if the factor structure changed after this step; if so, the new factor structure would be used as the final structure of the short version developed based on the CTT.

Reduction based on the item response theory

One of the basic assumptions of the IRT is unidimensionality [44]; however, DQOL was designed to measure multiple aspects of burden that diabetes places on patients. In order to conduct the IRT analysis without violating the assumption of unidimensionality, we employed EFA in the first place to re-identify the potential dimensional structure of the original Chinese DQOL and then fitted the sets of IRT models for each individual dimension. Details of the two steps are as follows.

Step 1. Exploratory factor analysis

Similar to the EFA analysis process adopted under the CTT reduction approach, Bartlett’s test and KMO test were carried out for testing the sphericity and sampling adequacy, respectively, before implementing the EFA under the IRT reduction approach. Number of factors was identified by a scree plot generated based on the 46 Chinese DQOL items. Then principal-factor extraction method and oblique rotation method were employed to conduct the EFA. In this step, any item with a factor loading of less than 0.3 was removed.

Step 2. Item response theory analysis

The graded response model (GRM), which is a type of item response model for items with ordered response options [45], was employed in this step to analyze the remaining items within each factor identified in step 1. The GRM was first introduced by Samejima [45]. It models each item with its own discrimination parameter and a set of parameters that identify the boundaries between the ordered options using a logistic regression approach. The item information functions (IIFs) were built based on the fitted GRMs to evaluate the “information”, i.e., reliability, each item contributed to the factor. A larger amount of information an item can provide indicates a better item it is. The GRM and IIF formulas are presented in the Appendix.

In this step, any item that had an estimation of discrimination parameter less than 1.0 [46] and provided item information less than 0.5 was removed from the measure [25]. An additional EFA was also conducted to check the factor structure; and if the structure changed after this step, the new factor structure would be used as the final structure of the short version developed based on the IRT.

Validating and comparing the two short versions of the Chinese DQOL

We evaluated and compared the two short versions at three aspects, i.e., performance in the confirmatory analysis (CFA), correlation with EQ-5D, and the magnitude of reduced response burden.

Confirmatory factor analysis

The CFA was employed to validate the structure of the two short versions of the Chinese DQOL. We specified that the domains were correlated with each other and employed maximum likelihood estimation in the CFA. Two statistics produced by the CFA were used to compare the performance of the two versions: standardized root mean squared residual (SRMR) and comparative fit index (CFI).

The SRMR is the square root of the difference between the residuals of the sample covariance matrix and the proposed covariance model. It ranges from 0 to 1, and a smaller value indicates a better fit [47]. The CFI compares the sample covariance matrix with this null model based on the assumption that all latent variables (factors) are uncorrelated. The CFI ranges from 0 to 1, and a larger value indicates a better fit [47]. Since the variation of performance among fit indices, according to Hu and Bentler’s two-index presentation strategy [48], we adopted the SRMR as the fundamental fit index and the CFI and as a supplementary index.

Correlation with the EQ-5D

We tested the construct validity of the two reduced versions of the Chinese DQOL against the EQ-5D-3L index and EQ visual analogue scale (EQ-VAS).

The EQ-5D-3L is a widely used preference-based generic quality of life instrument which has 5 questions that ask about whether there are any problems in: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each question has three response levels, i.e., no problems, some (or moderate) problems, and extreme problems (or unable to). Patients’ EQ-5D-3L responses were converted in to EQ-5D-3L values by using the Chinese EQ-5D-3L value set [49]. The EQ-VAS records the patient’s self-rated health on a vertical, visual analogue scale which ranges from 0 (the worst imaginable health state) to 100 (the best imaginable health state) [50].

Spearman’s correlation coefficients between the EQ-5D-3L index and the mean score of each one of the two short versions of the Chinese DQOL were calculated respectively. The correlation coefficients between the EQ-VAS and the two short versions were also calculated individually. A larger correlation coefficient indicates a higher construct validity [28, 29].

Final short version selection

The short version which performed better in both the CFA and had higher correlation with EQ-5D was selected as the final short version of the Chinese DQOL. In the event of any conflict between the CFA and the correlation analysis results, we selected the short version reduced more response burden as the final short version of the Chinese DQOL.

All statistical analyses were conducted with a two-tailed test at the significance level of 0.05 in STATA 14.2 (StataCorp LP, Texas, USA).



A total of 2886 patients were recruited and interviewed at the baseline. The mean age and diabetes duration of the study sample was 61.15 years and 7.94 years, respectively. Among all patients, 55.68% were female, 64.10% were retired, and 16.18% had used insulin in the last 6 months. The mean scores of the EQ-5D-3L index, VAS, and the Chinese DQOL (mean score of the 46 items) were 0.89, 72.71, and 2.07, respectively (Table 1). In the validation analyses, the CFA and the calculation of the EQ-5D-3L index only employed observations without missing data. Because of this, our validation sample only included patients with no missing values on responses to the 5 questions of the EQ-5D and to the DQOL items kept after the item reduction based on the CTT and IRT. Of the 2542 patients who completed the year-end follow-up, 2286 were included in the validation sample (Table 1). Compared to the study sample, the validation sample had a higher proportion of people who were older, retired, and used insulin (Table 1).

Table 1 Patients’ baseline demographic and diabetes-related information

Item reduction results

Tables 2 and 3 show the item reduction results based on the CTT and IRT, respectively. A total of 14 and a total of 22 items (details see supplementary materials) were removed from the Chinese DQOL based on the CTT and IRT, respectively.

Table 2 Item reduction results based on the CTT
Table 3 Item reduction results based on the IRT

In step 1 of the reduction based on the CTT, two items, item #10 (satisfied with sex life) and item #25 (interferes with sex life) were removed from the measure because their missing rates were higher than 10%. Item #32 (being teased because of having diabetes), item #36 (worry about marriage), item #40 (worry about completing education), and item #41 (worry about unemployment) were removed because of their low mean scores (all < 1.8) and small SDs (all < 0.67). Item #35 (hide having an insulin reaction) was removed because of the high missing rate (8.07%) and low mean score and small SD. In step 2, the EFA identified two factors among the remaining items. Item #7 (satisfied with knowledge about diabetes), item #23 (feel good about yourself), item #26 (interfere with riding a bike or using a machine), item #29 (explain what it means to have diabetes), item #31(tell others about your diabetes), and item #34 (eat something you shouldn’t rather than tell someone that you have diabetes) were removed due to low factor loadings (< 0.3). In step 3, item #38 (worry about whether you can get a job you want) was removed because of the low correlation with the mean score of the factor it belonged to. The factor structure identified in Step 2 remained the same after removing item #38 in Step 3.

In the reduction based on the IRT, the EFA identified 2 factors of the 46 DQOL items, and removed items #7, #23, #26, #29, #31, and #34 because their factor loading were all smaller than 0.3. In step 2, item #5 (satisfied with the flexibility of the diet), item #8 (satisfied with sleep), item #10, item #12 (satisfied with the appearance of your body), item #13 (satisfied with the time spent on exercising), item #18 (low blood sugar reactions), item #21 (bad night’s sleep), item #24 (feel restricted by diet), item #25, item #32, item #33 (feel that because of diabetes you go to the bathroom more than others), item #38, item #39 (worry about the pension), item #40, and item #41 were removed in the IRT analysis due to their item discrimination being smaller than 1 and their item information being lower than 0.5 (Table 3). The factor structure identified in the EFA remained the same after the IRT analysis.

Validation results

Table 4 shows the validation results of the two short versions of the Chinese DQOL. In the CFA, the two short versions had similar SRMRs (0.078, after rounding, for both short versions) which were also similar to that of the original Chinese DQOL (SRMR = 0.077). The short version based on the IRT had a larger CFI (0.726) than that of the version reduced based on the CTT (CFI = 0.630). The CFI of each short versions was larger than that of the original Chinese DQOL (CFI = 0.616).

Table 4 Validation resulta

The absolute Spearman’s correlation coefficient between the CTT reduced version of the DQOL and the EQ-5D-3L index scores was 0.298, which was higher than that (ρ = 0.288) between the IRT reduced version and the EQ-5D-3L index scores. Both reduced versions had a higher correlation with the EQ-5D-3L index scores than the original Chinese DQOL (ρ = 0.276). In terms of testing using the EQ-VAS, the CTT-based short version had a higher correlation (ρ = 0.288) than the original version (ρ = 0.273), and the IRT-based short version had a slightly lower correlation (ρ = 0.269) than the original version.


This study shortened the 46-item Chinese version of the DQOL based on two psychometric theories, the CTT and IRT, each combined with the EFA, respectively. The two short versions were validated using the CFA and Spearman correlation coefficients. The CTT provided a short version of the Chinese DQOL with 32 items kept, and the IRT provided a short version with 24 items kept. Among the 14 items removed based on the CTT, 13 were removed based on the IRT as well.

There are few published studies we can compare our results with. Two items related to sexual life had high missing rates in our study, and were removed from the measure in the reduction processes based on both the CTT and IRT. This was consistent with the translation and cultural adaptation study conducted in 1999 among Chinese diabetic patients lived in Canada [12]. The high missing rate of the sexual life items is still in line with the findings in translation and cultural adaptation studies published after 2015 in other disease specific measures among the Chinese population [51]. Chinese people, especially those who are middle-aged and elderly, tend to be hesitant to talk about sex-related topics because of their relatively conservative culture background [52].

Three working and education-related items, i.e., items #38, #40, and #41, had low mean scores (Table 2) and low discriminations (Table 3), and were removed based on both the CTT and IRT. This was because most patients (64.10%) in our training sample were retired, and were not worried about working and education-related issues. These items were also removed according to the expert advice in Cheng’s [11, 12] translation and cultural adaptation study.

The insulin reaction item (item #35) was removed based on both the CTT and IRT. This was because the majority of the patients in the study sample had not used insulin in the last 6 months. Similarly, the diet-related item (item #34) was also removed mainly because the majority of the patients in the study sample controlled their diet by eating healthy food and balancing their amount of food intake due to their diabetes.

In Ding’s [13] translation and cultural adaptation analysis, the descriptive of item 26, “How often does your diabetes keep you from driving a car or using a machine (e.g., a typewriter)?” was changed into “How often does your diabetes keep you from riding a bike or being a typist?” This item was removed because of low factor loading in both reduction processes. Ding et al. changed the “driving a car” into “riding a bike” because civilian vehicle ownership in China was relatively low in the 1990’s, and bicycles were the main means of transportation for ordinary people. However, civilian vehicle ownership in 2012 increased by 544% from 1999 [53], which may make this change in descriptive out-of-date. In addition, typewriters have long been replaced by laptops and other smart electronics which are indispensable in contemporary Chinese people’s daily lives. Therefore, further studies examining the performance of a more up-to-date descriptive, for example, “How often does your diabetes keep you from driving a vehicle or using a computer or smart phone?” are necessary.

There were 9 items that were removed in the IRT-based short version but kept in the CTT-based short version. All of these items were removed due to their low estimated discrimination and item information in the IRT analysis. One of the possible reasons for this difference is that the reduction results were impacted by the exclusion criteria we employed. Even though we used the most lenient fail criteria reported in existing studies for each, respectively, the item reduction results may still not be comparable due to the different statistical approaches applied in the two different theories.

Items #1 to #4 (satisfaction level of “the amount of time it takes to manage your diabetes,” “the amount of time you spend getting a checkup,” “the time it takes to determine your sugar level,” and “your current treatment”) were the only four treatment and diabetes management related items in the DQOL. These items loaded onto the same factor in our EFA. The rest of the 28 items in the CTT-based short version and the rest of the 20 items in the IRT-based short version belonged to the other factor, respectively. This was different than the original Chinese DQOL which has four domains. The CFA and correlation soefficients showed that the structures of the two short versions were comparable to the original version. In addition, we did not emphasize the name of the factors identified in the short versions since the present study aimed to focus on reducing the number of items for the Chinese DQOL. Content and face validity of the short versions should be examined in further studies to optimize the structure and rename the factors of the short versions.

The often-used fit indexes in the CFA are the Chi-square test and the root mean square error of approximation (RMSEA) [47]. In the present study, we employed the SRMR and CFI instead of the Chi-square test and RMSEA. The Chi-square test result is affected by the number of parameters, complexity of the model, and the sample size [54]. Adding more parameters into the model can improve the RMSEA as well [55]. Our two short versions of the Chinese DQOL had different numbers of items; therefore, the Chi-square test and RMSEA were inappropriate to use for comparing the CFA results of these two short versions. The SRMR is not affected by the model complexity and the number of parameters. The CFI is affected by the number of parameters added into a model, but is relatively more stable than the Chi-square test and RMSEA.

Because the two short versions of the Chinese DQOL were comparable in the validation analysis, and we did not have a hierarchy in these two criteria, we selected the short version based on the IRT (24 items) as a preferred short version for two other reasons. First, this shorter version imposes a lower burden on patients without compromising its measurement properties [56]. Second, theoretically, as a modeling statistic approach, the parameters estimated from a set of IRFs can be generalized to the entire population the study sample comes from; however, as a person statistic approach, all CTT test results can only be specified to the given study sample [57].

There are some limitations in our study. First, the training and validation samples were not independent. We did not have a truly external validation sample for our study. Second, our training sample only contained community-based patients, and most of them did not use insulin. This sample was relatively healthier than the diabetic population who had more comorbidities, was inpatient, or using insulin; therefore, our results cannot necessarily be generalized to the entire diabetic patient population. At the validation stage of this study, the CFI value of both versions did not meet the generally accepted criteria for good fit, i.e., CFI > 0.90 [47]. Even though the CFI was used as a supplementary index to evaluate the model fit, this result still added uncertainty to our conclusions. Other psychometric properties such as test-retest reliability of the short version of the Chinese DQOL need to be examined in future studies.


The version developed based on the IRT retained 24 items was selected as our preferred short version of the 46-item Chinese DQOL. It can impose a lower response burden on patients in practice without compromising the psychometric properties. Further research validating the IRT-based short version of Chinese DQOL is needed.



Confirmatory factor analysis


Comparative fit index


Classical test theory


Diabetes control and complications trial


Diabetes mellitus


Diabetes quality-of-life measure


Exploratory factor analysis


Graded response model


Health-related quality of life


Item information function


Item response function


Item response theory


Root mean square error of approximation


Standard deviation


Standardized root mean squared residual


Type 2 diabetic patients


Visual analogue scale


  1. International Diabetes Federation. IDF Diabetes Atlas. 7th ed. Brussels: International Diabetes Federation; 2015.

    Google Scholar 

  2. Isla Pera P. Living with diabetes: quality of care and quality of life. Patient Prefer Adherence. 2011;5:65–72.

    Article  Google Scholar 

  3. Rubin RR. Diabetes and quality of life. Diabetes Spectrum. 2000;13:21.

    Google Scholar 

  4. Rubin RR, Peyrot M. Quality of life and diabetes. Diabet Metab Res Rev. 1999;5:205–18.

    Article  Google Scholar 

  5. Ware JE Jr, Gandek B, Guyer R, Deng N. Health Qual Life Outcomes. 2016;14:84–99.

    Article  PubMed  PubMed Central  Google Scholar 

  6. El Achhab Y, Nejjari C, Chikri M, Lyoussi B. Disease-specific health-related quality of life instruments among adults diabetic: a systematic review. Diabetes Res Clin Pract. 2008;80:171–84.

    Article  PubMed  Google Scholar 

  7. Watkins K, Connell CM. Measurement of health-related QOL in diabetes mellitus. PharmacoEconomics. 2004;22:1109–26.

    Article  PubMed  Google Scholar 

  8. Jacobson AM, Barofsky I, Cleary P, Rand LL. Reliability and validity of a diabetes quality-of-life measure for the diabetes control and complications trial (DCCT). Diabet Care. 1988;11:725–32.

    Article  Google Scholar 

  9. Jacobson AM. Quality of life in patients with diabetes mellitus. Semin. Clin Neuropsychol. 1997;2:82–93.

    CAS  Google Scholar 

  10. PROQOLID. Diabetes Quality of Life Measure (DQOL). Accessed 18 Dec 2016.

  11. Cheng AY, Tsui EY, Hanley AJ, Zinman B. Cultural adaptation of the diabetes quality-of-life measure for Chinese patients. Diabetes Care. 1999;22:1216–7.

    Article  CAS  PubMed  Google Scholar 

  12. Cheng AY, Tsui EY, Hanley AJ, Zinman B. Developing a quality of life measure for Chinese patients with diabetes. Diabetes Res Clin Pract. 1999;46:259–67.

    Article  CAS  PubMed  Google Scholar 

  13. Ding Y, Kong D, Ni Z, Deng H. Culture adaption and revision of diabetes-specific quality of life scale (QDOL). Chin J Behav Med Sci. 2004;13:102–3.

    Google Scholar 

  14. Ding Y, Ni Z, Zhang J, Chen G, Feng H. The assessment on reliability and validity of adjusted diabetes quality of life (A-DQOL) scale. Chin J Prev Contr Chron Non-commun Dis. 2000;8:160–2.

    Google Scholar 

  15. Liang M. Quality of life and its impact factors of community-based patients with type 2 diabetes in Beijing. Master degree thesis. Beijing: Beijing university of Chinese medicine; 2014.

    Google Scholar 

  16. Chen A, Su A, Bai H. Factors impact on the quality of life of type 2 diabetic patients. Chin Community Doctors. 2009;11:38.

    Google Scholar 

  17. Hou Y, Yang Q. Current research status of health-related quality of life in patients with diabetes mellitus. Chin J Clinicians. 2016;10:433–6.

    Google Scholar 

  18. Ren Z. Quality of life in the patients of diabetes mellitus with micro-albuminuria but no renal insufficiency. Master degree thesis. Hangzhou: Zhejiang University; 2010.

    Google Scholar 

  19. Li D, Ma A, Li H. Systematic review of diabetes-specific quality of life measures in China. Chin J Pharm Econ. 2012;34:45–52.

    CAS  Google Scholar 

  20. Qu L, Pan M. Research progress of diabetes-specific quality of life measures in China. Chin J Behav Med Sci. 2007;16:765–6.

    Google Scholar 

  21. Streiner DL, Norman GR, Cairney J. Health measurement scales. A practical guide to their development and use. 5th ed. Oxford: Oxford University Press; 2015.

    Google Scholar 

  22. Cappelleri JC, Lundy JJ, Hays RD. Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clin Ther. 2014;36:648–62.

    Article  PubMed  PubMed Central  Google Scholar 

  23. ten Holt JC, van Duijn MAJ, Boomsma A. Scale construction and evaluation in practice: a review of factor analysis versus item response theory applications. Psychol Test Assess Model. 2010;52:272–97.

    Google Scholar 

  24. Edelen MO, Reeve BB. Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual Life Res. 2007;16:5–18.

    Article  PubMed  Google Scholar 

  25. Weinhardt JM, Morse BJ, Chimeli J, Fisher J. An item response theory and factor analytic examination of two prominent maximizing tendency scales. Judgm Decis Mak. 2012;7:644–58.

    Google Scholar 

  26. The WHOQOL Group. Development of the World Health Organization WHOQOL-BREF quality of life assessment. Psychol Med. 1998;28:551–8.

    Article  Google Scholar 

  27. Prieto L, Alonso J, Lamarca R. Classical test theory versus Rasch analysis for quality of life questionnaire reduction. Health Qual Life Outcomes. 2003;1:27.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Fayers PM, Machin D. Quality of life: the assessment, analysis and reporting of patient-reported outcomes. 3rd ed. West Sussex: John Wiley & Sons; 2016.

    Google Scholar 

  29. Kline TJB. Psychological testing: a practical approach to design and evaluation. Thousand Oaks, CA: Sage Publications Inc; 2005.

    Google Scholar 

  30. Norman G. Likert scales, levels of measurement and the “laws” of statistics. Adv Health Sci Educ. 2010;15:625–32.

    Article  Google Scholar 

  31. Sullivan GM, Artino AR. Analyzing and Interpreting Data From. Likert-Type Scales. J Grad Med Educ. 2013;5:541–2.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Thompson N.  Interpreting item statistics from classical test theory. 2015. Accessed 20 Apr 2018.

  33. Lester PE, Inman D, Inman Freitas DL, Bishop LK. Handbook of tests and measurement in education and the social sciences. Lanham: Rowman & Littlefield Publishing Group; 2014.

    Google Scholar 

  34. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw Hill; 1994.

    Google Scholar 

  35. Qiu H. Quantitative research and statistical analysis. Chongqing: Chongqing University Press; 2009.

    Google Scholar 

  36. Bartlett MS. Tests of significance in factor analysis. Br J Stat Psychol. 1950;3:77–85.

    Article  Google Scholar 

  37. Kaiser HF. A second generation little jiffy. Psychometrika. 1970;35:401–15.

    Article  Google Scholar 

  38. Osborne JW. Best practices in exploratory factor analysis. Washington, US: CreateSpace Independent Publishing Platform; 2014.

    Google Scholar 

  39. Ledesma RD, Valero-Mora P, Macbeth G. The scree test and the number of factors: a dynamic graphics approach. Span J Psychol. 2015;18:1–10.

    Article  Google Scholar 

  40. Osborne JW. What is rotating in exploratory factor analysis? PARE. 2015;20

  41. Thompson B. Exploratory and confirmatory factor analysis: understanding concepts and applications. Washington, DC: American Psychological Association; 2004.

    Book  Google Scholar 

  42. Pallant J. SPSS survival manual: A step by step guide todata analysis using SPSS for windows. 3rd ed. New York, USA: Mc Graw Hill; 2007.

    Google Scholar 

  43. Gliem JA, GliemR R. Calculating, Interpreting, and Reporting Cronbach’s Alpha Reliability Coefficient for Likert-Type Scales. 2003. Accessed 26 Dec 2016.

  44. Yamamoto K. Hybrid model of IRT and latent class models. ETS Research Report Series. 1982. Accessed 20 Apr 2018.

  45. Samejima F. Estimation of latent ability using a response pattern of graded scores (psychometric monograph no. 17). Richmond, VA: psychometric Society 1969. Accessed 20 Apr 2018.

  46. Zickar MJ, Russell SS, Smith CS, Bohle P, Tilley AJ. Evaluating two morningness scales with item response theory. Pers Individ Dif. 2002;33:11–24.

    Article  Google Scholar 

  47. Hooper D, Coughlan J, Mullen MR. Structural equation modelling: guidelines for determining model fit. J Bus Res Meth. 2008;6:53–60.

    Google Scholar 

  48. Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6(1):1–55.

    Article  Google Scholar 

  49. Liu GG, Wu H, Li M, Gao C, Luo N. Chinese time trade-off values for EQ-5D health states. Value Health. 2014;17:597–604.

    Article  PubMed  Google Scholar 

  50. EuroQol Group. EQ-5D-3L User Guide. Accessed 20 Apr 2018.

  51. Chen X, Qiu Z, Gu M, Su Y, Liu L, Liu Y, Mo C, Xu Q, Sun J, Li D. Translation and validation of the Chinese version of the Quality OF Life Radiation Therapy Instrument and the Head & Neck Module (QOL-RTI/H&N). Health Qual Life Outcomes. 2014;12:51.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Global Affairs Canada. Cultural Information – China. Accessed 16 Jan 2017.

  53. National Bureau of Statistics of China. China Statistical Yearbook 2013. 2013. Accessed 26 Dec 2016.

  54. Moss S. Fit indices for structural equation modeling. Accessed 26 Dec 2016.

  55. Kenny DA, McCoach DB. Effect of the number of variables on measures of fit in structural equation modeling. Struct Equ Modeling. 2003;10:333–51.

    Article  Google Scholar 

  56. Bogen K. The effect of questionnaire length on response rates—A review of the literature. Accessed 16 Apr 2018.

  57. Fan X. Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educ Psychol Meas. 1998;58:357–81.

    Article  Google Scholar 

  58. StataCorp. Stata Item Response Theory Reference Manual (Release 14). College Station, TX: StataCorp LP. 2015. Accessed 26 Dec 2016.

Download references


This study used data from a survey sponsored by Guangzhou Zhongyi Pharmaceutical Co Ltd.

Availability of data and materials

The data that support the findings of this study are available from Gordon Liu upon reasonable request.

Author information

Authors and Affiliations



GL was the PI of the survey. GL and HL designed the protocol of the survey. XJ, HG, and HL were interviewers during data collection. XJ designed statistical analysis plan of the present study, analyzed data, and wrote the manuscript under FX’s direction. HG, ML, KS and FX provided critical revisions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Feng Xie.

Ethics declarations

Ethics approval and consent to participate

All procedures performed in this research involving human participants were in accordance with the ethical standards of Peking University and have been performed in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all patients included in the study. Participants can withdraw at any time without any consequences.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Original Chinese DQOL and short versions based on the CTT and IRT. (DOCX 29 kb)



The GRM and IIF models

The probability of respondent j with latent ability level θ j (the latent trait for respondent j) to choose response option k or higher (in our case, k = 0, 1, 2, 3, 4, 5) for item i is [45, 58]:

$$ \mathit{\Pr}\;\left({Y}_{ij}\ge k\left|{\theta}_j\right.\right)=\frac{\mathit{\exp}\left\{{\alpha}_i\left({\theta}_j\hbox{-} {b}_{ik}\right)\right\}}{1+\mathit{\exp}\left\{{\alpha}_i\left({\theta}_j-{b}_{ik}\right)\right\}}{\theta}_j\sim N\left(0,1\right) $$

where, a i represents the discrimination of item i, and b ik is the cut-point of boundaries between the kth and (k + 1)th options for item i, which can be considered as the difficulty of choosing option k or higher for item i [45, 58].

The information function I i (θ) for item i is:

$$ {I}_i\left(\theta \right)={\sum}_{k=1}^K{I}_{ik}\left(\theta \right){p}_{ik}\left(\theta \right) $$

where, I ik (θ) is the information function, for response option k of item i, which is defined as:

$$ {I}_{ik}\left(\theta \right)=-\frac{\partial^2\mathit{\log}{p}_{ik}\left(\theta \right)}{\partial {\theta}^2} $$

where, ∂ is the partial derivative symbol, and p ik (θ) is the probability of a respondent with the latent trait level θ choosing response option k, which depends on the GRM for item i.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, X., Liu, G.G., Gerstein, H.C. et al. Item reduction and validation of the Chinese version of diabetes quality-of-life measure (DQOL). Health Qual Life Outcomes 16, 78 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: