Item and response-category functioning of the Persian version of the KIDSCREEN-27: Rasch partial credit model

Background The purpose of the study was to determine whether the Persian version of the KIDSCREEN-27 has the optimal number of response category to measure health-related quality of life (HRQoL) in children and adolescents. Moreover, we aimed to determine if all the items contributed adequately to their own domain. Findings The Persian version of the KIDSCREEN-27 was completed by 1083 school children and 1070 of their parents. The Rasch partial credit model (PCM) was used to investigate item statistics and ordering of response categories. The PCM showed that no item was misfitting. The PCM also revealed that, successive response categories for all items were located in the expected order except for category 1 in self- and proxy-reports. Conclusions Although Rasch analysis confirms that all the items belong to their own underlying construct, response categories should be reorganized and evaluated in further studies, especially in children with chronic conditions.


Findings
The classical test theory (CTT) and the item response theory (IRT) are the two most common methods used to test the reliability and validity of the quality of life instruments. The advantages of IRT models outnumber those of CTT methods [1,2]. While the CTT approach allocates an equal weight to all the items in the instrument and focuses on assessing summated scale scores, IRT models are able to analyze the properties of items individually with respect to the amount of information they provide on the underlying construct [3].
However, the researchers using IRT models are faced with different problems. These models require two crucial assumptions including unidimensionality and local independence to estimate the model parameters. Moreover, model fit indices depend on a variety of factors, including the number of response options and the spread of responses across categories. IRT models also need a huge sample size to guarantee accurate item parameter estimates [1,2,4].
The KIDSCREEN is an international instrument for measuring HRQoL in children and adolescents, which has been simultaneously applied and evaluated in several European countries [5][6][7]. Structural validity of the KIDSCREEN-27 has been assessed in 13 European countries using CTT and IRT methods [5,8]. Although these studies revealed that all the items fit the data well, none of them discussed the optimal number of response categories except the handbook of the KIDSCREEN questionnaires [9]. The main objective of the current study, hence, was to determine whether the adjacent response categories for each item in the Persian version of the KIDSCREEN-27 were located in the expected order. In the current research, the PCM was used to report item properties and rating scale structure of the KIDSCREEN-27.

Methods
The target population was Iranian school children aged 8-18 and their parents who were randomly selected by a two-stage cluster random sampling technique from the  The value of a correlation coefficient of greater than 0.40 between an item and its own domain was considered as an adequate evidence of convergent validity. Discriminant validity was supported whenever a correlation between an item and its hypothesized domain was higher than that with the other scales [10].
The essential assumption of IRT models, unidimensionality, was examined using the Rasch PCM. Moreover, the PCM was used to assess item statistics and response-categories functioning [11,12]. Parameters for this model were estimated using the program WINSTEP [13]. The two key indicators including infit and outfit statistics were used to evaluate whether all the items contribute effectively to their own domain. The range of Parent proxy-report Child self-report Item 6 Item 2 Parent proxy-report Child self-report acceptable values for both infit and outfit item statistics was from 0.7 to 1.3 and values close to 1 were ideal [3]. Items with lower fit statistics were considered redundant and those with high item-fit statistics indicated that the items may not be sufficiently related to the rest of the scale and unidimensionality may not hold [3,11]. Average measures, step calibrations and fit statistics were used to test whether the response categories behaved sufficiently well [3,13]. The categories were considered as misfitting if infit or outfit statistics were greater than 1.5 or less than 0.5 [13]. For the five categories, there are four step calibrations corresponding to the locations on the domain at which participants are able to choose higher as compared lower responses (2 over 1, 3 over 2, 4 over 3, and 5 over 4). Average measures and step calibrations are expected to increase with increasing response categories. The violation of this pattern indicates that the response categories are disordered. In addition to average measure and step calibration estimates, category fit indices and category probability curves (CPC) provide additional information about functioning of response categories. According to Linacre's criteria [14], categories with an outfit of greater than 2 were considered to be misfit.

Results
Tables 1 and 2 represent item difficulty, average measures, step calibrations, and item and category fit indices for self-and proxy-reports. All of the items in the KIDSCREEN-27 demonstrated acceptable infit and outfit statistics (0.7-1.3). Hence, all domains in both self-and proxy-reports can be considered sufficiently unidimensional. Item difficulty estimates ranged from −0.77 to 0.50 and −0.55 to 0.55 for self-and proxy-reports respectively. Items 1 and 4 in the social support and peers domain for child self-report, and items 2 and 4 in the autonomy and parent relation domain for parent proxyreport were the most and least difficult items, respectively. As shown in Tables 1 and 2, the infit and outfit statistics for all response categories, except for "never or not at all", were within the acceptable range (0.5-1.5). In the child self-report, items 1 and 2 in the physical wellbeing, items 6 and 7 in the psychological well-being, items 3 and 4 in the autonomy and parent relation, item 3 in the social support and peers, and item 4 in the school environment domains had infit and/or outfit greater than 1.5. Moreover, items 1 and 2 in the physical well-being, items 3 and 6 in the psychological wellbeing, and item 7 in the autonomy and parent relation domains, in parent-proxy report, had infit and/or outfit greater than 1.5. Within each item, the average measures and step calibrations increased monotonically as the rating scales moved from lower to higher categories. These results correspond to the intersections in the CPC, Figure 1. Table 3 shows that all the domains have adequate internal consistency (greater than 0.7). Moreover, scaling success rates for convergent and discriminant validity were 100% in all domains.

Discussion
In the current study, Cronbach's alpha coefficients for all five domains conformed to those obtained in the combined sample from all European countries [8]. The Rasch PCM analysis of the self-and proxy-reports showed that no item was misfitting. These findings are in the same line with those of the previous study conducted in 13 European countries, indicating that each of the test items measures the underlying construct adequately [8]. Although average measures and step calibrations for all five response categories increased monotonically, 5 and 8 out of 27 items had category fit statistics greater than 1.5 in the self-and proxy-reports, respectively. According to Linacre [14], for a five category scale, advances of at least 1.0 logits between step calibrations are needed in order to achieve the optimal number of response categories. As seen in Tables 1 and  2, the advance in step calibrations from a rating of 1 to 2 to a rating of 2 to 3 is less than 1.0 logits in almost all items. For example, in item 2 for child self-report, step calibrations advance from 1.52 to 1.05, a distance of 0.47. This is not sufficiently large to meet the criteria. These findings indicate that categories 1 (never or not at all) and 2 (seldom or slightly) should be combined in all items for self-and proxy-reports. Similar results were also observed in the Persian version of the PedsQL ™ 4.0 Generic Core Scales [15]. Just as in the case with the PedsQL ™ 4.0 on Iranian children with chronic conditions [16,17], this study showed that the Persian version of the KIDSCREEN-27 has a good internal consistency, and excellent convergent and discriminant validity. However, although the PCM showed that all the items contributed adequately to their own domain, Rasch analysis revealed that the number of response categories should be reduced from five to four in the Persian version of the KIDSCREEN-27. It is not clear whether this problem is due to the meaning of the response options in the Persian language or an artifact of a mostly healthy schoolchildren who did not choose the full range of the response scale [15]. Therefore, the response categories should be evaluated in further validation studies, especially in large samples of chronically ill children.