- Short report
- Open Access
Item and response-category functioning of the Persian version of the KIDSCREEN-27: Rasch partial credit model
Health and Quality of Life Outcomesvolume 10, Article number: 127 (2012)
The purpose of the study was to determine whether the Persian version of the KIDSCREEN-27 has the optimal number of response category to measure health-related quality of life (HRQoL) in children and adolescents. Moreover, we aimed to determine if all the items contributed adequately to their own domain.
The Persian version of the KIDSCREEN-27 was completed by 1083 school children and 1070 of their parents. The Rasch partial credit model (PCM) was used to investigate item statistics and ordering of response categories. The PCM showed that no item was misfitting. The PCM also revealed that, successive response categories for all items were located in the expected order except for category 1 in self- and proxy-reports.
Although Rasch analysis confirms that all the items belong to their own underlying construct, response categories should be reorganized and evaluated in further studies, especially in children with chronic conditions.
The classical test theory (CTT) and the item response theory (IRT) are the two most common methods used to test the reliability and validity of the quality of life instruments. The advantages of IRT models outnumber those of CTT methods [1, 2]. While the CTT approach allocates an equal weight to all the items in the instrument and focuses on assessing summated scale scores, IRT models are able to analyze the properties of items individually with respect to the amount of information they provide on the underlying construct .
However, the researchers using IRT models are faced with different problems. These models require two crucial assumptions including unidimensionality and local independence to estimate the model parameters. Moreover, model fit indices depend on a variety of factors, including the number of response options and the spread of responses across categories. IRT models also need a huge sample size to guarantee accurate item parameter estimates [1, 2, 4].
The KIDSCREEN is an international instrument for measuring HRQoL in children and adolescents, which has been simultaneously applied and evaluated in several European countries [5–7]. Structural validity of the KIDSCREEN-27 has been assessed in 13 European countries using CTT and IRT methods [5, 8]. Although these studies revealed that all the items fit the data well, none of them discussed the optimal number of response categories except the handbook of the KIDSCREEN questionnaires . The main objective of the current study, hence, was to determine whether the adjacent response categories for each item in the Persian version of the KIDSCREEN-27 were located in the expected order. In the current research, the PCM was used to report item properties and rating scale structure of the KIDSCREEN-27.
The target population was Iranian school children aged 8–18 and their parents who were randomly selected by a two-stage cluster random sampling technique from the four educational districts of Shiraz, southern Iran. Written informed consent was obtained from the participants prior to enrollment in the study. The study was approved by the ethical committee of our institution, Shiraz University of Medical Sciences. The Persian version of the KIDSCREEN-27, which was previously translated by the KIDSCREEN group, was filled in by 1083 school children (55.4% boys, 44.6% girls) and 1070 of their parents. The mean (± standard deviation) age of boys and girls was 13.65±2.11 and 12.7±2.65, respectively. It encompasses 27 items divided into five domains including physical well-being (5 items), psychological well-being (7 items), autonomy and parent relation (7 items), social supports and peers (4 items), and school environment (4 items). The participants responded to the items on a 5-point Likert scale from 1=never to 5= always or from 1=not at all to 5=extremely. For ease of interpretation, rating scale categories of negatively worded items were reversed such that higher scores indicated better HRQoL.
Internal consistency for each domain was assessed by Choronbach’s alpha coefficient.
The value of a correlation coefficient of greater than 0.40 between an item and its own domain was considered as an adequate evidence of convergent validity. Discriminant validity was supported whenever a correlation between an item and its hypothesized domain was higher than that with the other scales .
The essential assumption of IRT models, unidimensionality, was examined using the Rasch PCM. Moreover, the PCM was used to assess item statistics and response-categories functioning [11, 12]. Parameters for this model were estimated using the program WINSTEP . The two key indicators including infit and outfit statistics were used to evaluate whether all the items contribute effectively to their own domain. The range of acceptable values for both infit and outfit item statistics was from 0.7 to 1.3 and values close to 1 were ideal . Items with lower fit statistics were considered redundant and those with high item-fit statistics indicated that the items may not be sufficiently related to the rest of the scale and unidimensionality may not hold [3, 11]. Average measures, step calibrations and fit statistics were used to test whether the response categories behaved sufficiently well [3, 13]. The categories were considered as misfitting if infit or outfit statistics were greater than 1.5 or less than 0.5 . For the five categories, there are four step calibrations corresponding to the locations on the domain at which participants are able to choose higher as compared lower responses (2 over 1, 3 over 2, 4 over 3, and 5 over 4). Average measures and step calibrations are expected to increase with increasing response categories. The violation of this pattern indicates that the response categories are disordered. In addition to average measure and step calibration estimates, category fit indices and category probability curves (CPC) provide additional information about functioning of response categories. According to Linacre’s criteria , categories with an outfit of greater than 2 were considered to be misfit.
Tables 1 and 2 represent item difficulty, average measures, step calibrations, and item and category fit indices for self- and proxy-reports. All of the items in the KIDSCREEN-27 demonstrated acceptable infit and outfit statistics (0.7-1.3). Hence, all domains in both self- and proxy-reports can be considered sufficiently unidimensional. Item difficulty estimates ranged from −0.77 to 0.50 and −0.55 to 0.55 for self- and proxy-reports respectively. Items 1 and 4 in the social support and peers domain for child self-report, and items 2 and 4 in the autonomy and parent relation domain for parent proxy-report were the most and least difficult items, respectively. As shown in Tables 1 and 2, the infit and outfit statistics for all response categories, except for “never or not at all”, were within the acceptable range (0.5–1.5). In the child self-report, items 1 and 2 in the physical well-being, items 6 and 7 in the psychological well-being, items 3 and 4 in the autonomy and parent relation, item 3 in the social support and peers, and item 4 in the school environment domains had infit and/or outfit greater than 1.5. Moreover, items 1 and 2 in the physical well-being, items 3 and 6 in the psychological well-being, and item 7 in the autonomy and parent relation domains, in parent-proxy report, had infit and/or outfit greater than 1.5. Within each item, the average measures and step calibrations increased monotonically as the rating scales moved from lower to higher categories. These results correspond to the intersections in the CPC, Figure 1.
Table 3 shows that all the domains have adequate internal consistency (greater than 0.7). Moreover, scaling success rates for convergent and discriminant validity were 100% in all domains.
In the current study, Cronbach's alpha coefficients for all five domains conformed to those obtained in the combined sample from all European countries . The Rasch PCM analysis of the self- and proxy-reports showed that no item was misfitting. These findings are in the same line with those of the previous study conducted in 13 European countries, indicating that each of the test items measures the underlying construct adequately . Although average measures and step calibrations for all five response categories increased monotonically, 5 and 8 out of 27 items had category fit statistics greater than 1.5 in the self- and proxy-reports, respectively. According to Linacre , for a five category scale, advances of at least 1.0 logits between step calibrations are needed in order to achieve the optimal number of response categories. As seen in Tables 1 and 2, the advance in step calibrations from a rating of 1 to 2 to a rating of 2 to 3 is less than 1.0 logits in almost all items. For example, in item 2 for child self-report, step calibrations advance from 1.52 to 1.05, a distance of 0.47. This is not sufficiently large to meet the criteria. These findings indicate that categories 1 (never or not at all) and 2 (seldom or slightly) should be combined in all items for self- and proxy-reports. Similar results were also observed in the Persian version of the PedsQL™ 4.0 Generic Core Scales .
Just as in the case with the PedsQL™ 4.0 on Iranian children with chronic conditions [16, 17], this study showed that the Persian version of the KIDSCREEN-27 has a good internal consistency, and excellent convergent and discriminant validity. However, although the PCM showed that all the items contributed adequately to their own domain, Rasch analysis revealed that the number of response categories should be reduced from five to four in the Persian version of the KIDSCREEN-27. It is not clear whether this problem is due to the meaning of the response options in the Persian language or an artifact of a mostly healthy schoolchildren who did not choose the full range of the response scale . Therefore, the response categories should be evaluated in further validation studies, especially in large samples of chronically ill children.
Health-related quality of life
Partial credit model
Classical test theory
Item response theory
Category probability curves
Differential item functioning
DeMars C: Item response theory. New York, Oxford; 2010.
Hays RD, Morales LS, Reise SP: Item response theory and health outcomes measurement in the 21st century. Med Care 2000, 38(9 Suppl):II28-II42.
Gothwal VK, Wright TA, Lamoureux EL, Pesudovs K: Rasch analysis of the quality of life and vision function questionnaire. Optom Vis Sci 2009, 86(7):E836-E844. 10.1097/OPX.0b013e3181ae1ec7
Embretson SE, Reise SP: Item Response Theory for Psychologists. Lawrence Erlbaum Associates, New Jersey; 2000.
Ravens-Sieberer U, Auquier P, Erhart M, Gosch A, Rajmil L, Bruil J, Power M, Duer W, Cloetta B, Czemy L, Mazur J, Czimbalmos A, Tountas Y, Hagquist C, Kilroe J, European KIDSCREEN Group: The KIDSCREEN-27 quality of life measure for children and adolescents: psychometric results from a cross-cultural survey in 13 European countries. Qual Life Res 2007, 16(8):1347–1356. 10.1007/s11136-007-9240-2
Erhart M, Ottova V, Gaspar T, Jericek H, Schnohr C, Alikasifoglu M, Morgan A, Ravens-Sieberer U, HBSC Positive Health Focus Group: Measuring mental health and well-being of school-children in 15 European countries using the KIDSCREEN-10 Index. Int J Public Health 2009, 54(Suppl 2):160–166.
Ravens-Sieberer U, Erhart M, Rajmil L, Herdman M, Auquier P, Bruil J, Power M, Duer W, Abel T, Czemy L, Mazur J, Czimbalmos A, Tountas Y, Hagquist C, Kilroe J, European KIDSCREEN Group: Reliability, construct and criterion validity of the KIDSCREEN-10 score: a short measure for children and adolescents’ well-being and health-related quality of life. Qual Life Res 2010, 19(10):1487–1500. 10.1007/s11136-010-9706-5
Robitail S, Ravens-Sieberer U, Simeoni MC, Rajmil L, Bruil J, Power M, Duer W, Cloetta B, Czemy L, Mazur J, Czimbalmos A, Tountas Y, Hagquist C, Kilroe J, Auquier P, KIDSCREEN Group: Testing the structural and cross-cultural validity of the KIDSCREEN-27 quality of life questionnaire. Qual Life Res 2007, 16(8):1335–1345. 10.1007/s11136-007-9241-1
Ravens-Sieberer U, the KIDSCREEN Group Europe: The KIDSCREEN questionnaires. Quality of life questionnaires for children and adolescents – handbook. Papst Science Publisher, Germany; 2006.
Fayers PM, Machin D: Quality of life: The assessment, analysis and interpretation of patient-reported outcomes. 2nd edition. John Wiley, Chichester; 2007.
Court H, Greenland K, Margrain TH: Measuring patient anxiety in primary care: Rasch analysis of the 6-item Spielberger State Anxiety Scale. Value Health 2010, 13(6):813–819. 10.1111/j.1524-4733.2010.00758.x
Luo X, Cappelleri JC, Cella D, Li JZ, Charbonneau C, Kim ST, Chen I, Motzer RJ: Using the Rasch model to validate and enhance the interpretation of the Functional Assessment of Cancer Therapy-Kidney Symptom Index-Disease-Related Symptoms scale. Value Health 2009, 12(4):580–586. 10.1111/j.1524-4733.2008.00473.x
Linacre JM: WINSTEPS® Rasch measurement computer program. Winsteps.com, Beaverton; 2011.
Linacre JM: Optimizing rating scale category effectiveness. J Appl Meas 2002, 3(1):85–106.
Jafari P, Bagheri Z, Ayatollahi SM, Soltani Z: Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales in school children. Health Qual Life Outcomes 2012, 10(1):27. 10.1186/1477-7525-10-27
Jafari P, Ghanizadeh A, Akhondzadeh S, Mohammadi MR: Health-related quality of life of Iranian children with attention deficit/hyperactivity disorder. Qual Life Res 2011, 20(1):31–36. 10.1007/s11136-010-9722-5
Jafari P, Forouzandeh E, Bagheri Z, Karamizadeh Z, Shalileh K: Health related quality of life of Iranian children with type 1 diabetes: reliability and validity of the Persian version of the PedsQLTM Generic Core Scales and Diabetes Module. Health Qual Life Outcomes 2011, 9: 104. 10.1186/1477-7525-9-104
This work was supported by the grant number 90–5882 from Shiraz University of Medical Sciences Research Council. This article was extracted from Mozhgan Safe's Master of Science thesis. We are thankful to the referees for their invaluable comments. We would also like to thank Dr N. Shokrpoor and Dr MA. Mosleh Shirazi for editing this manuscript.
The authors declare that they have no competing interests.
PJ researched and analyzed the data, and wrote the manuscript, ZB analyzed the data and wrote the manuscript, MS researched and analyzed the data. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.