Health and Quality of Life Outcomes BioMed Central

Background: To validate the Argentinean Spanish version of the PedsQLTM 4.0 Generic Core Scales in Argentinean children and adolescents with chronic conditions and to assess the impact of socio-demographic characteristics on the instrument's comprehensibility and acceptability. Reliability, and known-groups, and convergent validity were tested. Methods: Consecutive sample of 287 children with chronic conditions and 105 healthy children, ages 2–18, and their parents. Chronically ill children were: (1) attending outpatient clinics and (2) had one of the following diagnoses: stem cell transplant, chronic obstructive pulmonary disease, HIV/AIDS, cancer, end stage renal disease, complex congenital cardiopathy. Patients and adult proxies completed the PedsQLTM 4.0 and an overall health status assessment. Physicians were asked to rate degree of health status impairment. Results: The PedsQLTM 4.0 was feasible (only 9 children, all 5 to 7 year-olds, could not complete the instrument), easy to administer, completed without, or with minimal, help by most children and parents, and required a brief administration time (average 5–6 minutes). People living below the poverty line and/or low literacy needed more help to complete the instrument. Cronbach Alpha's internal consistency values for the total and subscale scores exceeded 0.70 for self-reports of children over 8 years-old and parent-reports of children over 5 years of age. Reliability of proxyreports of 2–4 year-olds was low but improved when school items were excluded. Internal consistency for 5–7 year-olds was low (α range = 0.28–0.76). Construct validity was good. Child self-report and parent proxy-report PedsQLTM 4.0 scores were moderately but significantly Published: 7 August 2008 Health and Quality of Life Outcomes 2008, 6:59 doi:10.1186/1477-7525-6-59 Received: 27 September 2007 Accepted: 7 August 2008 This article is available from: http://www.hqlo.com/content/6/1/59 © 2008 Roizen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Background
Health-related quality of life (HRQOL) measures should be based on patient's perceptions through self-assessment, use understandable and age appropriate language, provide evidence of acceptable or good reliability and validity, assess multiple dimensions, and consist of a 'core' set of questions as well as a set of specific items for different conditions. In addition, HRQOL measures should be feasible; that is, they should be short so that they may be administered repeatedly and easy to score and analyze, be acceptable to patients by being inoffensive, and be usable in a busy, clinical setting. Patients who are ill become tired after 15-20 minutes and lengthy questionnaires can increase the risk of failure to complete them or items near the end of a questionnaire [1].
The assessment of pediatric HRQOL is complicated by developmental considerations and by questions regarding the accuracy and acceptability of parent-proxy ratings of patients' quality of life. The Pediatric Quality of Life Inventory™ (PedsQL™) is a measure with demonstrated reliability and validity for child self-report and parent proxy-report. It has been developed to assess HRQOL in children and adolescents from 2 to 18 years of age. It is based on a modular approach with generic and diseasespecific instruments. As a generic instrument, the Ped-sQL™ 4.0 Generic Core Scales are brief (23 items), practical (less than 4 minutes to complete), flexible (designed for use with community, school, and clinical pediatric populations), and multidimensional [2]. The PedsQL™ 4.0 Generic Core Scales cover physical, emotional and social functioning which are the core dimensions of health as delineated by the World Health Organization (WHO), as well as role (school) functioning.
The PedsQL™ 4.0 Generic Core Scales have previously demonstrated evidence of feasibility, reliability and validity as a school population health measure in a U.S. sample [3], as well as in numerous clinical populations [4][5][6][7][8][9][10]. These previous studies have demonstrated the reliability and validity of PedsQL™ 4.0 Generic Core Scales using Classical Test Theory (CTT). However, CTT has a limitation that it is unable to estimate item difficulty and person ability characteristics separately. Another limitation of CTT is that it yields only a single reliability estimate and corresponding standard error of measurement, but the precision of measurement varies by ability level. Because of these limitations, the CTT method is less than ideal for applications that require item difficulty, person ability, and conditional standard error of measurement [11].
Although CTT has served test development well over several decades, Item Response Theory (IRT) has rapidly become mainstream as the theoretical basis for measurement [12]. IRT methods model the association between a respondent's underlying level on a characteristic (latent variable) and probability of a particular item response using a non-linear monotonic function [13]. The Rasch model [14], sometimes referred to as a one-parameter logistic model under IRT, provides a mathematical framework against which test developers can compare their data. The model is based on the idea that useful measurement involves examination of only one human attribute at a time (unidimensionality) on a hierarchical "more than/less than" line of inquiry. Person and item performance deviations from that line (fit) can be assessed, alerting the investigator to reconsider item wording and score interpretations from these data [15]. Additionally, the way each rating scale is constructed has great influence on the quality of data obtained from the scale [16], and a rating scale may not be used by respondents in the way it was intended by the developer of the scale [15]. Thus, the assumptions about both the quality of the measures and utility of the rating scale in facilitating interpretable measures should be tested empirically [15], which can be done utilizing the Rasch model [17].
The PedsQL™ 4.0 Generic Core Scales have been linguistically validated in many different languages. However, only local translations without linguistic validation have been available in Korea [18]. This study aimed to assess the psychometric properties of the Korean translation of the PedsQL™ Generic Core Scales for Korean school children and adolescents. The feasibility, reliability, construct validity, and agreement between child self-report and parent proxy-report were investigated based on previous Ped-sQL™ 4.0 CTT methods [3,[6][7][8][9][10]. Additionally, the person and item reliability, item statistics and category functioning were assessed using the Rasch model [17].

Participants and settings
The Korean translations of PedsQL™ 4.0 Generic Core Scales were administered to schoolchildren ages 8-18 and their parents in 60 classes (28 elementary school classes, 16 middle school classes, and 16 high school classes) at 5 elementary schools, 5 middle schools, and 4 high schools within two small cities, two metropolitan cities, and a capital city. Classes at schools were randomly selected within grade. Trained research personnel visited each classroom and distributed the questionnaires and informed parent consent and child assent forms for students to take home to their parents. Parents signed the informed consent and completed the parent report surveys at home, then returned them to school via students. Parents were asked to return the surveys even if they chose not to consent to participate. The students completed their questionnaire after the parents gave informed consent. The consent rate of all classes was above 70%.

Measures
The Korean translations of the Pediatric Quality of Life Inventory™ Version 4.0(PedsQL™ 4.0) Generic Core Scales The 23-item PedsQL™ 4.0 Generic Core Scales encompass: (1) Physical functioning (8 items), (2) Emotional functioning (5 items), (3) Social functioning (5 items), and (4) School functioning (5 items). The PedsQL™ 4.0 Generic Core Scales are composed of parallel child selfreport and parent proxy-report formats. Child self-report includes ages 5-7, 8-12, and 13-18. Parent proxy-report includes ages 2-4 (toddler), 5-7 (young child), 8-12 (child), 13-18 (adolescent), and assesses parent's perception of their child's HRQOL. The items for each of the forms are essentially identical, differing in the developmentally appropriate language, or first or third person tense. The instructions ask how much of a problem each item has been during the past 1 month. A 5-point response scale is utilized across child self-report for ages 8-18 and parent proxy-report (0 = never a problem; 1 = almost never a problem; 2 = sometimes a problem; 3 = often a problem; 4 = almost always a problem). Items are reverse-scored and linearly transformed to 0-100 scale (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0), so that higher scores indicate better HRQOL. Scale scores are computed as the sum of the items divided by the number of items answered (this accounts for missing data). The physical health summary score is the same as the physical functioning subscale. To create the psychosocial health summary score, the mean is computed as the sum of the items divided by the number of items answered in the emotional, social, and school functioning subscales. If more than 50% of the items in a scale are missing, the Scale Score is not computed [3,19].
The PedsQL™ 4.0 Generic Core Scales were translated independently into Korean by a clinical psychologist and a social psychologist fluent in English and translated back into English by a bilingual English native speaker. After review and comments by the instrument author, the second Korean translations of the PedsQL™ 4.0 Generic Core Scales were tested on a panel of 13 school children with cognitive interviewing methods. The cognitive interviews were conducted by four certified clinical psychologists at the participant's home and revisions in the translation were conducted to rectify the identified problems. Finally, the third versions were produced and proofread to be considered as final. All the results of phases were reported to the instrument author and Mapi Research Institute, which were reviewed and accepted by them.

The Korean translation of the PedsQL™ Family Information Form
The PedsQL™ Family Information Form [10] was completed by parents. The PedsQL™ Family Information Form contains demographic information including the child's date of birth, gender, race/ethnicity, and parental educa-tion and occupation information. One survey question asks the parent to report on the presence of a chronic health condition ("In the past 6 months, has your child had a chronic health condition?") defined as a physical or mental health condition that has lasted or is expected to last at least 6 months and interferes with the child's activities. If the parents check "Yes" to this question, they are asked to write in the name of the chronic health condition.
This form also was translated independently into Korean by two clinical psychologists fluent in English and translated back into English by a bilingual English native speaker. After review and comment by the instrument author, the Korean translations of the PedsQL™ Family Information Form was revised and accepted by the instrument author. All the results of phases were reported to the instrument author and Mapi Research Institute.

Statistical analysis
The feasibility of the PedsQL™ 4.0 Generic Core Scales as a school health measure was determined from the percentage of missing values for each item and distribution of item responses [20,21]. Range of measurement was further tested based on the percentage of scores at the extremes of the scaling range, that is, the maximum possible score (ceiling effect) and the minimum possible score (floor effect) [21]. Scale descriptives for child self-report and parent proxy-report were calculated using SPSS Version 13.0 for Windows.
Factor structure of the PedsQL™ 4.0 Generic Core Scales across age group was examined by a confirmatory factor analysis (CFA) of items with missing data, using the software Mplus [22]. The missing data option in Mplus was implemented to avoid list-wise deletion. Factor indicators were stated as categorical variables due to ceiling effect and the estimator was weighted least square parameter estimates using a diagonal weighted matrix with standard errors and mean-and variance-adjusted chi-square test statistic (WLSMV). WLSMV is one of the estimators that are robust to non-normality and involves the analysis of a matrix of polychoric correlations. The PedsQL™ four-factor model was tested, which consisted of physical, emotional, social, and school functioning factor. Additionally, the PedsQL™ second-order factor model was tested, which consisted of physical health and psychosocial health factors. Psychosocial health factor was the second-order factor, which consisted of three first-order factors including emotional, social and school functioning factor. The physical health factor is the same as the Physical Functional Scale.
The fit of models was evaluated by Chi-square statistic and fit indices including the Comparative Fit Index (CFI) [23], Tuker-Lewis Index (TLI) [24], and Root Mean Square Error of Approximation (RMSEA) [25]. Chi-square is a test of exact fit. With large samples, there is considerable power to reject the null hypotheses, even though the model may fit the data well. Therefore, other goodness of fit indices should be considered. The CFI [23] and TLI [24] both are incremental fit indices, ranging from 0 (indicating poor fit) to 1.00 (indicating a perfect fit) and are derived from the comparison of a restricted model with a null model. For two indices, a value greater than .90 indicates a psychometrically acceptable fit to the data. More recent literature suggests that high values greater than or equal to .95 indicate a good fit [26]. RMSEA is one of absolute fit indices and a measure of discrepancy between the observed and model implied covariance matrices adjusted for degrees of freedom. The values of RMSEA of .05 or less indicate close fit, less than .08 indicates a fair or reasonable fit, less than .10 indicates a mediocre fit, and greater than .10 indicates an unacceptable fit [25].
Construct validity was further determined utilizing the known-groups method. The known-groups method compares scale scores across groups known to differ in the health construct being investigated. In this study, groups differing in health status (healthy vs. chronic health condition groups) were compared, using t-tests. In order to determine the magnitude of the differences between healthy children and children with chronic health conditions, effect sizes were calculated [27]. Effect size as utilized in these analyses was calculated by taking the difference between the healthy sample mean and the chronic health condition sample mean, divided by the healthy sample standard deviation.
The person and item reliability, item statistics, and category functioning were assessed by the Rasch rating scale model (RSM) [28], using WINSTEPS [29]. The Rasch RSM analyses were conducted on the four subscales of child self-report and parent proxy-report. The Rasch model [17] can be generalized to polytomous items with ordered categories. The formulation of an extended Rasch model includes Partial Credit Model (PCM) [30] and Rating Scale Model (RSM) [31]. Given that Likert scales can be modeled according to either a PCM or a RSM, it is necessary to determine which polytomous Rasch model and its respective set of estimated parameters would best explain the data. To choose an appropriate model, several estimates obtained from the PCM and RSM are compared on the scales. For this study, a more parsimonious model, the RSM was chosen because the two models produced comparable person and item fit, reliability estimates.
The person reliability indicates the replicability of person ordering we would expect if this sample of persons were to be given another set of items measuring the same con-struct [28]. Analogous to Cronbach's alpha, it is bounded by 0 and 1. Person separation index is an estimate of the spread or separation of persons on this measured variable. Item reliability index is the estimate of the replicability of item placement within a hierarchy of items along the measured variable if these same items were to be given to another sample of comparable ability. Analogous to Cronbach's alpha, it is bounded by 0 and 1. The item separation index is an estimate of the spread or separation of items on the measured variable. It is expressed in standard error units. The person and item separation should be at least 2, indicating that the measure separated persons, items, or both into at least two distinct groups [15].
To check if items fit the model's expectation, item fit mean square (MNSQ) statistics were computed using the RSM. MNSQ determines how well each item contributes to defining one common construct. Item MNSQ values of about 1.0 are ideal and values greater than 1.4 may indicate a lack of construct homogeneity with other items in a scale and item MMSQ values smaller than 0.6 may indicate item redundancy [32]. However, the cutoff values tend to vary depending on the purpose for which the ratings are used [33]. Typically, two MNSQ statistics are used: infit (weighted) and outfit (unweighted) statistics. Infit is more sensitive to misfitting responses to items near the person's ability level, while outfit is sensitive to misfitting items that are further away [34].
It is often the case that respondents fail to react to a rating scale in the manner the test constructor intended [35]. Because it is always uncertain how a rating scale was used by a sample, an investigation of the functioning of the rating scale is always necessary [36] and can be done with the Rasch analysis. The rating scale diagnostics include category frequencies, average measures, threshold estimates, probabilities, and category fit. These diagnostics should be used in combination [15]. Average measure are defined as the average of the ability estimates for all persons in the sample who choose that particular response category, with the average calculated across all observations in that category [37]. They increase monotonically, indicating that on average, those with higher abilities/stronger attitudes endorse the higher categories, whereas those with lower abilities/weaker attitudes endorse the lower categories [15]. Because observations in higher categories must be produced by higher measures, the average measures across categories must increase monotonically. Fit statistics provide another criterion for assessing the quality of rating scales. Outfit mean squares greater than 1.3 indicate more misinformation than information, meaning that the particular category is introducing noise into the measurement process. The step measures or thresholds define the boundaries between categories. Thresholds too should increase monotonically [38]. Thresholds not increasing monotonically across the rating scale are considered disordered [15]. Finally, agreement between child self-report and parent proxy-report was determined through two-way mixed effect model (absolute agreement, single measure) Intraclass Correlations (ICC) [39]. The ICC offers an index of absolute agreement given that it takes into account the ratio between subject variability and total variability [39,40]. Intraclass Correlations (ICC) are designated as ≤ 0.40 poor to fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 good agreement, and 0.81-1.00 excellent agreement [41]. Statistical analyses were conducted using SPSS Version 13.0 for Windows.

Sample characteristics
The overall response rate was 70.9%. The response rate for the elementary school survey (grades three through six) was 71.0%. The response rate for the middle and high schools was 70.8. A total 1453 of parent-child dyads completed the Korean translations of PedsQL™ 4.0 Generic Core Scales and the Korean translations of PedsQL™ Family Information Form. Child self-reports for 1425 (98.1%) children were available since 28 (1.9%) child self-reports had more than 50% missing items in the scale. Parent proxy-reports for 1431 (98.5%) parents were available since 22 (1.5%) parent proxy-reports had more 50% missing items in the scale. There were 633 (44.4%) child selfreports and 638 (44.6%) parent proxy-reports for ages 8-12. There were 792 (55.6%) adolescent self-reports and 793 (55.4%) parent proxy-reports for ages 13-18.

Feasibility
The percentage of missing item responses was less than 1.7% for child self-report and 1.4% for parent proxyreport.

Descriptive statistics
For child self-report and parent proxy-report, all items were negatively skewed and 12 items showed skewness greater than -2. Table 1 presents the Cronbach's alphas, means, standard deviations, range, and percent of floor and ceiling effect of the PedsQL™ 4.0 Generic Core Scales for total sample. Cronbach's alpha coefficients for child self-report and parent proxy-report all exceeded the minimum reliability standard of .70. The alpha values were higher for the total score and lower for the school functioning scale of child self-report and parent proxy-report. Scale means all were higher than those of the PedsQL™ school study [3]. The full range of 0-100 was used for the emotional functioning scale of child self-report. The range of 40-100 was used for the total score and psychosocial health scale of parent proxy-report. There were essentially no floor effects. However, moderate to high ceiling effects existed in the majority of scales, except for the total score of child self-report. Especially, notable ceiling effects were found in the social functioning scale of child self-report and parent proxy-report in this mostly healthy sample. Table 2 shows the goodness-of fit indices for four-and second-order factor model in the PedsQL™ 4.0 Generic Core Scales. All Chi-square statistics were significant and indicated a poor fit. For child self-report and parent proxyreport, the CFI approximated or exceeded the .90 standards of acceptable model fit and the TLI exceeded the .95 value of good model fit. For parent proxy-report ages 13-18, the CFI exceeded the .95 value of good model fit and the RMSEA was less than .08 that indicates a fair fit. For other scales, the RMSEA generally were greater than .08 but less than .09, those indicate a mediocre fit. Table 3 and 4 show the factor loadings and covariances for the four-factor and the second-order factor model across age group. As can be seen, all loadings are over .60, which indicates that the items and first-order factor fit well with their respective factors and their second-order factor. The covariances were relatively high, suggesting all scales are correlated across age group.  [3,10] with the PedsQL™ 4.0, healthy children scored significantly higher on the PedsQL™ 4.0 (better HRQOL) than children with a chronic health condition in the scales. The only exception was on the social functioning scale of child selfreport. Table 6 shows the reliability and separation index for persons and items across the four subscales. Person reliability and separation are low while Item reliability and separation are high. These results indicate that the sample has a narrow spread and the sample size is large enough. Table 7 shows item infit and outfit statistics on the four subscales. The majority of items showed mean square infit and outfit statistics within the 0.6 and 1.4 range, save for item 5 (Hard to take a bath or shower) of the physical health scale and item 3 (Teased) of the social functioning scale for child self-report. Table 8 shows average measures, infit and outfit MNSQ, and step measures on the four subscales. The average measures in all scales of child self-report and parent proxy-report increase monotonically across the rating scale. They function as expected and indicate that, on average, persons with higher measures selected higher categories. Most infit and outfit are close to 1.00 or a little below except category 4. The people who chose each category accord with the people we would expect to choose those categories. Somewhat problematic is the infits or the outfits for category 4 in the physical, social and school functioning of child self-report and all subscales of parent proxy-report. This indicates that persons with low meas- ures unexpectedly selected this high category.

Rating scale diagnostics
Step measures indicate the structure of the category probability curves in as sample-independent manner as possible.
They are advancing, and show a structure of a "range of hills" in physical, emotional, and school functioning of child self-report and parent-proxy-report. However, step measures 3 and 4 are disordered in social functioning of child self-report and parent proxy-report. Numbers in parentheses are factor loadings of subscale on psychosocial health of second-order factor. Numbers in parentheses are covariances between physical health factor and psychosocial health factor of second-order factor.
For child self-report and parent proxy-report, the RSM category probability curves are shown in Figures 1, 2, 3 and 4. There are 5 curves visible for each scale, starting from the left. They in general depict the expected succession of "hills". However, the disordered step measures 3 and 4 in social functioning scales of child self-report and parent proxy-report also are reflected in the probability curves. As shown in Figure 3, the cross-over between the curves for category 3 and 4 is to the left of that for category 2 and 3 in social functioning scales of child-self-report and parent proxy-report. Table 9 shows the ICCs between PedsQL™ 4.0 child selfreport and parent proxy-report. For the total sample, ICCs were higher for total score and psychosocial health scales and lower for physical health scale. For children ages 8-12, ICCs were higher for school functioning scale and lower for physical health scale and social functioning scale. For adolescents ages 13-18, ICCs were higher for total score and psychosocial health scale and lower for physical health scale and social functioning scale. However, the range of ICCs was between .47 and .61 across the ages. These results suggest moderate agreement. In particular, there was good agreement for the total score of ages 13-18. Furthermore, the results indicate a trend towards higher ICCs with increasing age, save for the school functioning scale.

Discussion
The purpose of this study was to assess the psychometric properties of the Korean translation of the PedsQL™ 4.0 Generic Core Scales in school children and adolescents ages [8][9][10][11][12][13][14][15][16][17][18]. Like in the school study with the original U.S. English instrument [3] and other translation studies [4,42,43], items on the PedsQL™ 4.0 had minimal missing responses. It suggests that children and parents are willing and able to provide good quality data regarding the child's HRQOL [3].
There were no floor effects and moderate to high ceiling effects, especially for social functioning scales, which showed notable ceiling effects. These findings might be expected for a healthy school-age population. Responsiveness is an important measurement property in a clinical  trial, and one of the factors that can affect responsiveness is floor and ceiling effect [19]. However, detecting improving health among persons who are already quite well may prove difficult because of ceiling effects, and most school children are quite healthy [3]. The presence of ceiling effects may be expected in generic HRQOL instruments since they are designed to be applicable to a wide range of populations [44]. Thus, the findings can be a reflection of the sample characteristics, i.e., a healthy school population. Although most children are quite healthy, measuring HRQOL in large school populations has several distinct benefits. It can aid in identifying subgroups of children who are at risk for health problem, in determining the burden of a particular disease or disability, and informing efforts aimed at prevention and intervention [45]. In addition, utilization of HRQOL measures may assist in the evaluation of the healthcare needs of a school district, and results can be used to inform public policy, including the development of strategic healthcare plans and school health clinics, identifying health disparities, promoting policies and legislation related to school health, and aiding in the allocation of health care resources [46].
On the other hand, it has been suggested that concepts and measures from the more positive end of the HRQOL continuum are needed for healthy populations [47] and inclusion of emotional well-being, positive affect, vitality, and health perceptions aid in discriminating and measuring change in well populations [48]. Even though the items of PedsQL™ 4.0 are reverse-scored and higher score indicate better HRQOL, the instructions ask how much of a problem each item has been during the past 1 month. In other words, the interaction between sample characteristics and the focus on "problems" in the items and instructions of PedsQL™ 4.0 might cause such ceiling effects in a healthy sample. Finally, in the Korean culture, individuals who have good interpersonal relationships tend to be regarded as having a good personality and virtue, which may lead to some social desirability responding on social functioning items, leading to notable ceiling effects. Compared with other translation studies [43,49], these potential cultural differences require further research using a wide range of the Korean population, including healthy and chronically ill children and adolescents to more fully understand cultural differences. The CFA on the PedsQL™ 4.0 Generic Core Scales supported a four-factor model and a second-order factor model. It suggests the statistical evidence that the Ped-sQL™ 4.0 Generic Core Scales cover the core dimensions of health as delineated by the WHO and have construct validity for the utilization of five summary and scale scores.
Children with chronic health conditions were reported to experience lower physical, emotional, and school functioning in comparison to healthy children. This indicates that PedsQL™ 4.0 Generic Core Scales can differentiate HRQOL in healthy children as a group in comparison to children with chronic health conditions. However, there was no significant difference on the social functioning scale between healthy and unhealthy children in this study, even though the social functioning of the children with chronic health conditions was lower than that of the healthy children. In the previous PedsQL™ school study in the US [3], there was a statistically significant difference on the social functioning scale between healthy and unhealthy children. Comparisons to the mean scores of the other subscales within the present study to those of the previous PedsQL™ school study [3], the mean scores on the social functioning scale of both healthy children and unhealthy children were very high. Therefore, nonsignificant difference on the social functioning of child self-report should be further studied in Korean samples, especially when compared to clinical populations with larger sample sizes of chronically ill children with physician-diagnosed chronic health conditions. This comparison is essential because the type and severity of chronic health conditions did not have a significant impact on the social functioning of the children who participated in the present study. In addition, it should be noted that it might be caused by social desirability and cultural differences in Korean populations.
Rasch RSM analysis on the four subscales of PedsQL™ 4.0 Generic Core Scales show that person reliability and sep

&KLOG VHOI UHSRUW 3DUHQW SUR[\ UHSRUW
aration are low, while item reliability and separation are high. As we mentioned earlier, these results indicate that the sample has a narrow spread and the sample size is large enough. Person reliability refers to the replicability of person placement across other items measuring the same construct. Item reliability refers to the replicability of item placement within the hierarchy across other samples [28]. The chief influences on person reliability are sample "true" standard deviation, test length, number of categories per item, and test targeting sample [50]. In this study, test lengths of each subscale are adequate in length and number of categories per item is sufficient. Person reliability is a characteristic of the person measures for the sample being tested. To increase person reliability, testing persons with more extreme abilities or attitudes and improving the test targeting may be slightly helpful. Ped-sQL™ 4.0 Generic Core Scales have been originally developed for targeting clinical samples. Considering the predominantly healthy characteristics of this study sample, most of the PedsQL™ 4.0 Generic Core Scales items might be too "severe" for healthy school populations in Korea. On the other hand, it should be noted that internal consistency reliability alpha coefficients presented in Table 1 were between .72 and .90. However, raw-score based reliabilities (e.g., Cronbach's alpha) in general overstate the "true" reliability while the Rasch reliabilities understate the "true" reliability [51]. Therefore, further studies on clinical samples are needed to find out what exactly caused low person reliability in Korean samples. According to the results of item statistics, all items of the subscales were found to represent a homogenous construct and it has been already confirmed by CFA as well.
Rating scale diagnostics to identify the optimal categorization showed that category 4 is somewhat problematic as well as step measures 3 and 4 are disordered in the social functioning scale of child self-report and parent proxyreport. These results indicate a low probability of observance of certain categories, i.e., category 4 (almost always a problem) seems not to work as intended for this healthy school sample in Korea.
The pattern of parent-child correlation for the total sample, child ages 8-12, and adolescent ages 13-18 was different from those of the PedsQL™ 4.0 school population study [3] and the UK-English version study on the Ped-sQL™ 4.0 Generic Core Scales [49], where better correlation was found for physical than for psychological and social functioning. While it might be expected that the intercorrelations between child and parent report across the physical, emotional, social and school functioning scales would follow the conceptualization that more observable domains (i.e., physical functioning) would yield higher agreement, this has not necessarily been the case in the published literature with other HRQOL instru-

&KLOG VHOI UHSRUW 3DUHQW SUR[\ UHSRUW
ments. A comprehensive review [52] found mixed results in terms of higher intercorrelations between self and proxy reports of physical functioning across pediatric HRQOL instruments, with most studies demonstrating this effect, while some others did not. In addition, it was suggested that levels of agreement can be affected by child age, domains investigated, and parent's own QOL [40]. On the other hand, all the ICCs between PedsQL™ 4.0 child self-report and parent proxy-report showed moderate agreement and a general trend towards higher agreements with increasing age. The ICCs were consistently higher than those of the PedsQL™ 4.0 school population study [3], despite the fact that the ICC values of this study were derived using absolute agreement type while the PedsQL™ 4.0 school population study used consistency type. In situations where children are unable or unwilling to respond for themselves, measurement of QOL is often obtained by parent proxy-report [40]. Thus, these consistencies between child self-report and parent proxy-report suggest that parent proxy-report can be informative for measuring HRQOL of children when they are not able to respond. The trend towards higher agreement with Response Functions for 5 categories: School Functioning Figure 4 Response Functions for 5 categories: School Functioning.

&KLOG VHOI UHSRUW 3DUHQW SUR[\ UHSRUW
increasing age is consistent with the results of the Ped-sQL™ 4.0 school population study [3] and can be explained by the greater verbal communication skills typically manifested with increasing developmental age.
There are several limitations to this study. First, we were not able to collect data from a representative sample based on the Korean population census. However, it should be noted that we had a large enough sample size in two small cities, two metropolitan cities, and a capital city. Second, we were not able to determine which children and adolescents did not understand the instructions of PedsQL™ 4.0 due to cognitive dysfunction, even though there were no developmental disorders in the parent's report on the presence of a chronic health condition in their children. For this study, PedsQL™ 4.0 Generic Core Scales were administered as a group test in schools, and thus, there may be some covariates that were not accounted for. Furthermore, the sample size for children with a chronic health condition was very small, and may not be representative of chronically ill children in general or specifically in Korea. In particular, if the same factor structure is not confirmed on a less healthy population, their scores might not be comparable. Thus, further validation studies on Korean clinical samples are required. Finally, we applied the unidimensional Rasch model to analyze item responses in the PedsQL™ 4.0 Generic Core Scales. However, the unidimensional approach ignores the correlations between latent traits and yields imprecise measures when tests are short [53]. PedsQL™ 4.0 Generic Core Scales can be analyzed as a whole, but the approach ignores the evidence for the subscale structure. In a further study, to take the correlations into account, the application of multidimensional item response models is needed. Additionally, for assessing cross-cultural equivalence of PedsQL™ 4.0 Generic Core Scales, the analysis of differential item functioning (DIF) is needed for both the Korean and the US samples.

Conclusion
The results demonstrate the feasibility, validity, item reliability, item fit, and agreement between child self-report and parent proxy-report of the Korean version of PedsQL™ 4.0 Generic Core Scales for school population health research in Korea. However, the utilization of the Korean version of the PedsQL™ 4.0 Generic Core Scales for healthy school populations needs to consider low person reliability, ceiling effect and cultural differences, and further validation studies on Korean clinical samples are required.