Psychometric validation of the French self and proxy versions of the PedsQL™ 4.0 generic health-related quality of life questionnaire for 8–12 year-old children

Background The Pediatric Quality of Life Inventory Version 4.0 (PedsQLTM4.0) is a generic health-related quality of life (HRQoL) questionnaire, widely used in pediatric clinical trials but not yet validated in France. We performed the psychometric validation of the self and proxy PedsQLTM4.0 generic questionnaires for French children aged 8–12 years old. Methods This bicentric cross-sectional study included 123 children and their parents with congenital heart disease (CHD) and 97 controls. The psychometric validation method was based on the consensus-based standards for the selection of health measurement instruments (COSMIN). The reliability was tested using the intraclass correlation coefficient (ICC). To evaluate the validity of this scale, content, face, criterion, and construct validity psychometric proprieties were tested. Acceptability was studied regarding questionnaires’ completion and the existence of a floor or a ceiling effect. Results Test–retest reliability intra-class correlation coefficients were mainly in good range (0.49–0.66). Face validity was very good among parents (0.85) and children (0.75). Content validity was good (0.70), despite misinterpretation of some items. In construct validity, each subscale had acceptable internal consistency reliability (Cronbach's α > 0.72 in self-reports, > 0.69 in proxy-reports). In the confirmatory factor analysis, the goodness-of-fit statistics rejected the original structure with 4 factors. The exploratory factor analysis revealed an alternative two-factor structure corresponding to physical and psychological dimensions. Convergent validity was supported by moderate (> 0.41) to high correlations (0.57) between PedsQL and Kidscreeen questionnaires for physical, emotion and school dimensions. The ability of the PedsQL to discriminate CHD severity was better with physical, social and total scores for both self-reports and proxy-reports. Conclusions The PedsQLTM4.0 generic self and proxy HRQoL questionnaires found good psychometric properties, with regard to acceptability, responsiveness, validity, and reliability. This instrument appeared to be easy to use and comprehend within the target population of children aged 8 to 12 years old and their parents. Trial registration: This study was approved by the South-Mediterranean-IV Ethics Committee and registered on ClinicalTrials.gov (NCT01202916), https://clinicaltrials.gov/ct2/show/NCT01202916.


Background
Quality of life (QoL) assessment in pediatrics has been given more attention in the past decade, although patient-reported outcomes (PRO's) are not systematically quantified by caregivers and physicians, who primarily rely on clinical symptoms and disease complications [1,2].
Nowadays, most medicine agencies recommend measuring PRO's in pediatric drug trials [3,4]. Quality of life is a general and subjective concept, which has been defined as the "overall life satisfaction" [5]. However, clinical trials require a more operational definition and use validated instruments with good psychometric properties [5,6]. Those instruments, named "health-related quality of life" (HRQoL) questionnaires, are multidimensional and usually include physical and psycho-social aspects [6,7].
Therefore, we aimed to perform the psychometric validation of the self and proxy PedsQL TM 4.0 generic questionnaires for French [8][9][10][11][12] year-old children, from a cohort of subjects recruited in the general population and in tertiary care pediatric CHD centers.

Study design
This cross-sectional validation study was carried out between April 2013 and April 2016 (36 months) in pediatric patients with a congenital heart disease (CHD) and in children from the general population. Patients were prospectively recruited in two French tertiary care pediatric cardiology departments. The control children were recruited in 5 school classes (one per level from 3rd grade (elementary school) to 7th grade (middle school)), randomly selected in southern France (Occitanie Region) from the Education Ministry database.

Study population
Children with a CHD aged 8-12 were prospectively recruited in the two participating centers during a pediatric cardiology outpatient visit. Inclusion procedures were beforehand harmonized. We did not include children with any other severe chronic disease (neurodevelopmental disorder, chronic renal or respiratory failures) and children and/or families unable to understand the questionnaire. The pediatric CHD population was stratified into 4 severity groups described by Uzark et al. [12].
In the control group, all children aged 8-12 and their parents, among the 5 selected school classes, were offered to participate in the study. The recruitment procedure was the same for each class and common to the one at the hospital.

QoL questionnaires
Originally, the measurement properties of the PedsQL were analysed by Varni et al., who found an acceptable internal consistency reliability for group comparisons, in the total scale score (α = 0.88 child, 0.90 proxy), the physical health summary score (α = 0.80 child, 0.88 proxy), and the psychosocial health summary score (α = 0.83 child, 0.86 parent). The authors showed that the PedsQL could discriminate healthy and ill children and correlated with morbidity and illness burden. Cross-cultural validity of the PedsQL has shown similar properties to the original instrument in several countries [18,20,22,26,27].
The 8-12 year old self and proxy PedsQL TM 4.0 generic HRQoL questionnaires have each four multidimensional scales: physical (8 items), emotional (5 items), social (5 items), and school (5 items) functioning. The three summary scores are the total score (23 items), the physical health summary score (8 items), and the psychosocial health summary score (15 items). To creat the psychosocial health summary score, the mean is computed as the sum of the items over the number of items answered in the emotional, social, and school functioning scales. The physical health summary score it the same as the physical functioning scale score [17].
Each item uses a 5-point Likert scale from 0 (never) to 4 (almost always). Items are reversed scored and linearly transformed to a 0-100 scale, higher scores indicating a better HRQoL.
The logistical process for filling in questionnaires was similar in both groups, as described in our previous studies [28,29]:  [30,31]. This European generic validated HRQoL instrument is designed for 8-18 y.o healthy and chronically ill children [30,32]. We previously published Kidscreen self and parent-reported scores in CHD versus healthy children [28]. The dimensions of the Kidscreen and their correspondence with the PedsQL were reported in Fig. 1.

Statistical analysis
A sample size of 124 CHD children was previously calculated for the pilot study, which aimed to analyse the relationship between CHD severity and the Kidscreen physical dimension [28].
In the control group, considering a recruitement in 5 school classes, with 30 children per class, and a 60% participation rate, we expected to include 90 control children. The total sample size (CHD and control children) therefore provides about 9 patients per item, which seems sufficient to perform factorial analysis and validate the PedsQL [33].
The study population was described with means and SD for quantitative variables and with frequencies and percentages for qualitative variables. Quantitative variables were compared with the parametric Student's t-test when the distribution was Gaussian, and with the Mann-Whitney test otherwise. Missing data were not substituted. Qualitative variables were compared with the chi-square test or Fisher's exact test. Data were analyzed using the SAS software version 9.1 (SAS Institute, Cary, NC). The two-sided significance level was 0.05.

Psychometric validation method
The psychometric validation method was based on the consensus-based standards for the selection of health measurement instruments (COSMIN) [34]. The COS-MIN taxonomy of relationships of measurement properties was illustrated in Fig. 2.

Reliability
Two weeks after first assessment, children and their parents filled in again at home the same PedsQL versions, and mailed them back to the study coordinator. In the CHD group, only patients with a stable clinical status, during the interim period, as assessed by their pediatric cardiologist (e.g. no modification in terms of medical treatment and no hospitalization), were included in the test-retest procedure. The reliability of the PedsQL was estimated with the intraclass correlation coefficient (ICC) and its 95% confidence interval (CI).

Content validity
A group of five experts (specialist nurse, Ph.D. student in public health, adult expert patient with a CHD, and two pediatricians) assessed the simplicity and clarity of the questionnaire with a likert scale (1-4) ranging from unfavorable to favorable opinion, and evaluate whether items assess defined content, providing recommendations to add or remove any items. The content validity index (CVI) was defined as the number of experts who answered 3 or 4 divided by the total number of experts. CVI < 0.4, between 0.4 and 0.75, or > 0.75 indicated poor, intermediate-to-good, or excellent relevance, respectively [35,36].

Face validity
A sample of 10 children and 10 parents read, answered and discussed each item during a face-to-face interview with the principal investigator. They gave their opinion on the scale layout, the length and the wording of the items and modalities of the answers. The investigator wrote down their answers and unclear items were reviewed [37,38]. Items' clarity was ranked 0 (not clear) or 1 (clear) [39]. The total number of points divided by the number of participants determined the face validity index.

Criterion validity
Correlation analyses were performed between PedsQL and Kidscreen dimensions, for both self and parents reports, to assess concurrent valididy.

Construct validity
Structural validity In the multitrait-multi-item analysis, five hypotheses were tested. (1) Redundancy between items was assessed by calculating inter-items correlations within each dimension. Items of a given dimension were considered as non redundant if inter-items correlations were < 0.7. (2) Item-internal consistency (IIC) was assessed by correlating each item with its corresponding scale. An IIC was considered as satisfactory if 90% of the possible item-scale correlations were > 0.4. (3) Item discriminant validity (IDV) was assessed by determining to which extent the items correlated more with the dimensions they were supposed to reflect, than with any other dimensions. (4) Another assumption of the multi-trait analysis was that the coefficient of variation of each item is equal or higher than 20%. (5) The internal consistency validity reflected interrelations between PedsQL items, as assessed by the Cronbach α [40]. A value > 0.7 was considered as acceptable.
We first performed a confirmatory factor analysis (CFA) with 4 factors to test the original 4-factor structure in our population. Then, as the CFA did not find a good fit on the original structure, an exploratory factor analysis (EFA) was performed to optimize the quality of the fit and a 2-factor structure was tested. A CFA was then performed on the 2-factor structure.
In the CFA, we used a structural equation modeling, according to the 4 dimensions of the original PedsQL instrument, by fixing the variance of the latent constructs (factors) to 1.0, and leaving free the correlation between the latent constructs. The following absolute fit indices were calculated: the baseline model chi-square estimate (p χ 2 ), the adjusted goodness-of-fit (AGFI, interpreted similar to an R2 estimate), the root mean square error of approximation (RMSEA, the closer to zero, the better the model fit), the standardized root mean square residual (SRMR, the closer to zero, the better the model fit) and the comparative fit index (CFI, preferable estimate is greater than 0.80). The goodness of fit of the model was considered as well-fitted if p χ 2 > 0.05, RMSEA and SRMR < 0.08, and AGFI and CFI ≥ 0.80. The EFA was performed, using oblique rotation and polychoric correlations to identify the most appropriate factor structure. The number of factors was determined using scree test and parallel analysis (with 100 simulations). When the item's factor loadings (in absolute value) were above one divided by the square root of the number of items, then the item was considered as being part of the factor. The variance explained by each factor (computed without taking the other factors into account) was calculated.
Hypothesis testing The spearman correlation between the physical dimension of each instrument (Kidscreen and PedsQL) and the actual child's physical capacity, as assessed by the maximum oxygen uptake (VO2 max ) during an exercise test, was calculated.
Cross cultural validity The linguistic validation process from English to French was performed by MAPI institute, using a 4-step methodology: (1) forward translation step by 2 professional translators (reconciliation, quality control and discussion → target language version 1); (2) backward translation step by a professional translator (quality control and discussion → target language version 2); (3) adaptation step (review and adaptation of the mother language version to context of the target country → target language version 3); (4) cognitive debriefing step (on 3 parents of healthy children → final target language version) [41].

Acceptability and quality of items
The questionnaires' completion rate was reported. The existence of a floor effect (i.e. responses on the questionnaire cluster at the more negative health state end of the scale) or a ceiling effect (i.e. responses on the questionnaire cluster at the more positive health state end of the scale) was determined by the rate of children and parents who scored at the minimum (0) or maximum values (100), respectively for each item and dimension.

Discriminant validity
The PedsQL scores were compared between CHD and control populations, between girls and boys, and between four levels of disease severity [12]. For pairwise comparisons between each severity class, Holm's correction was applied and the two-sided Jonckheere trend test investigated the existence of a trend according to this severity.

Population
We included 220 children, of which 123 CHD and 97 controls. Among them, 210 children (117 CHD and 93 controls) completed the PedsQL self-questionnaire and 220 parents completed the PedsQL proxy-questionnaire.

Psychometric validation Reliability
Test-retest analyses showed that ICCs, overall and in each dimension, for both self and proxy reports, were in the range of 0.49 to 0.66, corresponding to moderate (0.41-0.6) to good agreement (0.6-0.8) ( Table 1).

Validity
Face validity and content validity Face validity index was excellent in the parents' group (0.85) and very good in the children group (0.75). However, for many children did not fully understand the meaning of item 4 ("it is hard for me to lift something heavy"), as most of them understood the question from a general perspective and not as a limitation potentially related to their health condition. During the interview, most children reported that "yes, it is hard for a child to lift something heavy, as compared to an adult". Similarly, item 20 ("I forget things") was frequently misunderstood, and two possible meanings were given for parents and children: forgetting concepts, lessons, or words during the class, or forgetting to bring an object to school (notebook, pencil case).
Content validity index was good (0.7). However, the experts considered that item 1 was not adapted to children living in the countryside ("It is hard for me to walk more than one block"). Moreover, item 4 ("It is hard for me to lift something heavy") was often misinterpreted as mentioned before, and item 23 ("I miss school to go to the doctor or hospital") was usually understood from a general perspective: both healthy and CHD children miss school to go to the doctor, but sick children may miss school more often than healthy subjects, which was not always interpreted this way.
Criterion validity In terms of concurrent validity, Ped-sQL and Kidscreen corresponding dimensions correlated well in physical (r = 0.57), emotion (r = 0.49 with psychological well-being, 0.50 with moods and emotions, and 0.48 with self perception of the Kidscreen) and school dimensions (r = 0.41) for self-report ( Table 2). For parents' reports, these correlations were good in physical (r = 0.48), psychological (r = 0.57), and school (r = 0.49) dimensions. Indeed, the highest correlations observed between both instruments were those expected, except for social dimension. For the PedsQL social dimension, only one of the three corresponding dimensions of the Kidscreen ("bullying") had a close-to-high correlation (r = 0.47). For parents reports, the social PedsQL dimension correlated better with the school dimension than with the two expected dimensions of the Kidscreen ("autonomy and parents relation" and "social support and peers").

Redundancy between items
In the PedsQL self-reports, none of the items had correlation coefficients above 0.70 for each dimension. In the proxy reports, strong correlations were found between items 2-running and 3-sports (r = 0.90, P < 0.001), and between items 19-attention and 21-schoolwork (r = 0.71, P < 0.001). All remaining items from the proxy reports had correlation coefficients < 0.70.

Item-internal consistency (IIC)
Most correlations between items and the corresponding dimension were ≥0.4 (Additonal file 1: Table S1). Lower correlations were found for item 5-bath, 6-chores and 7-aches of the physical dimension (self-reports only), for item 17-doing-things of the social dimension for controls (self and proxy reports), and for item 22-feeling-well and 23-doctor of the school dimension for both CHD and control children.

Item discriminant validity
In most cases, items correlated more with their own dimention than with other dimensions (Additonal file 1: Table S1). However, a few items better corre- lated with other dimensions, but with rather close correlation coefficients.

Variability of items
Among CHD children, all items had a coefficient of variation above 20%, except item 5-bath, in self-reports only. Among control children, selfreports showed coefficients of variation < 20% for item 1-walking, 3-sports and 5-bath, and for item 18-playing. Parent-reports yielded coefficients of variation < 20% for item 17-doing-things, and items 22-not-feeling-well and 23-doctor.

Internal consistency
In all 4 dimensions, Cronbach alpha coefficients were ≥0.69 (Table 1). Cronbach alpha coefficients for each dimension did not increase after removal of each item one by one.
In the PedsQL self-questionnaire, factor 1 included all items from the physical dimension, except one item (item 7-aches), and also included some items from the social dimension that could be interpreted by children as physical actions (items 17-doing-things and 18-playing), as well as two items from the school dimension, referring to somatic problems (items 22-not-feeling-well and 23-doctor). In the PedsQL proxy-questionnaire, factor 1 included all items of the physical dimension, but two items (items 7-aches and 8-energy), which were considered as belonging to the psycho-social domain. The factor 1 was therefore considered as the "physical and health dimension". The factor 2, in both self and proxyquestionnaires, included most items in a psycho-social domain grouping psychological, emotion, social and school dimensions (Table 3), so it was considered as the "emotional and psychosocial dimension". The confirmatory factor analysis (CFA) with these 2 factors found a slightly better goodness-of-fit statistics (Additonal file 1: Table S3). In the 4-factor loadings analysis, most items of the self-questionnaire were grouped in factor 1 for the physical dimension, factor 2 for the emotional and social dimensions, and factor 4 for the school dimension. As the proxy-questionnaire, most items could be grouped in factor 1 for the physical dimension, factor 2 for the emotional dimensions, factor 3 for the social dimension, and factor 4 for the school dimension (Additonal file 1: Table S2).

Hypothesis testing
The original physical dimension of the PedsQL moderately correlated with physical capacity, as assessed by the VO2 max , in both self-reports (r = 0.22, P = 0.08) and proxy reports (r = 0.35, P = 0.01). In the same patients, the correlations between the physical dimension of the Kidscreen and the VO2 max were even lower in both self-reports (r = 0.19, P = 0.16) and proxy reports (r = 0.25, P = 0.05).

Interpretability
Acceptability and quality of items Among the 210 children who completed the PedsQL self-questionnaire, 98% had no missing items. As for the PedsQL proxy question- naires, 213 of 220 parents (97%) had no missing items. Missing data did not relate to any specific item. Ceiling effect exceeded 20% for the social dimension (self and proxy reports for CHD and control children) and physical dimension (self-reports for CHD children and self and proxy reports for controls) ( Table 1). Floor effect was 0% for all dimensions in both groups. At the item level, a high ceiling effect (≥ 80%) was observed for item 1-walking and item 5-bath of the physical dimension in CHD and control self and parent-reports, and in item 18-playing of the social dimension for control self-reports only. No significant floor effect was observed.
Discriminant validity PedsQL self-reported scores were significantly lower in CHD children than in controls in all dimensions (Table 1). Effect size was medium for school, physical, psychosocial and total scores, and small for emo-tion and social scores. Parents-reported scores were lower for CHD patients in all dimension except the social one, with small effect sizes. Differences in PedsQL scores by gender and CHD severity were reported in Table 4. Female self-reported HRQoL scores were lower than male's scores for emotional, physical, and total scores. No difference was observed between boys and girls according to parentsreports. PedsQL self-reports were significantly different in terms of CHD severity for physical, social, psychosocial and total scores. PedsQL proxy-reports were significantly different in terms of CHD severity for physical, social, and total scores. The ability to discrimate CHD severity with the PedsQL was mainly observed, for both self and proxy questionnaires, between the low severity class (class 1) and the three other severity classes (2,

Table 3 Two-factor loadings exploratory factor analysis
Values marked in italic represent items participating to each factor; values marked in bold represent highest factor loadings for each item.

Item description
Item keyword Self-reports Proxy-reports

Discussion
In this study, from a cohort of 220 children, we analysed the psychometric properties of the self and proxy PedsQL ™ 4.0 generic questionnaires for French children aged 8-12 years.
In a standardized test-retest procedure, we found a moderate to good reliability, overall and in each dimension, for both self and proxy reports. With an ICC of 0.66 for the self-reported total scores, the reproductibility of the PedsQL can be considered as good in this young pediatric population.
Face validity index was excellent in the parents' group (0.85) and very good in the children group (0.75). However, two items had unexpected interpretations: for item 4 "is it difficult to lift something heavy?", we suggest adding "compared to other children of your age", to avoid any misunderstanding; and item 20 was understood in two different ways (forgetting to bring some objects at school or forgetting the lessons that have been learned), nevertheless, in both cases, the question intends to assess some degree of cognitive disorder.
Content validity index was good (0.7). However, item 1 should be adapted to children who don't live in a city ("walk more than one block" could be replaced by "walk around the playground at school"), and item 23 ("I miss school to go to the doctor or the hospital") could be separated in 2 questions ("do you miss school?" and "do you go to the doctor or hospital?").
In terms of criterion validity, PedsQL and Kidscreen corresponding dimensions correlated well in physical, emotion and school dimensions, for both self and proxy reports.
In terms of construct validity, most items were not redundant, excepted for items 2 (running) and 3 (sports) of the physical dimension. Nevertheless we believe that both items are of interest as they may have different meanings in children concerned with some degree of sports restriction, such as in inherited cardiac arrhythmia: such children may be allowed to "run" in their everyday recreational physical activity, but suffer from competitive sports restriction [42]. Item-internal consistency was correct, nevertheless some items reflected more autonomy than physical well-being (items 5-bath, 6-chores), or were not appropriate in chronic disease not concerned with pain (item 7-aches). As a result, despite overall good item discriminant validity, those same items better correlated with another dimension than their own (e.g. social dimension for "aches" instead of physical dimension). Items variability was good, except for the poorly understood items. Nevertheless, we observed an acceptable internal consistency for each dimension (Cronbach alpha ≥ 0.69 in all dimensions). Interestingly, the confirmatory analysis did not bring out the original 4-factor structure of the PedsQL [17]. Therefore, we performed an exploratory analysis, which showed that a 2-factor structure seemed the most appropriate to summarize the information. However, those 2 factors did not fully correspond to the 2 original PedsQL sub-scores (i.e. physical and psycho-social). Indeed, factor 1 included all physical items as well as some items considered as physical by the children (item 17-doing things, item 18-playing, item 22-not feeling well, and item 23-doctor). Cross-cultural comparisons to the factor structure obtained in the original PedsQL publication have shown heterogeneous results, from 2 factors [43] to 5 factors [27]. Such comparisons should be interpreted with caution as the populations are different in terms of culture, clinical status and age range. Nevertheless, such findings may be of interest in clinical trials study design using dichotomized HRQoL scores to assess PRO as primary outcome, secondary outcome or even in composite scores. Therefore, the total score may be more appropriate in pediatric trials using the PedsQL [16].
We observed a moderate correlation between physical well-being assessed by the PedsQL and the actual physical capacity of the child, in both CHD and control groups. Interestingly, similar results were observed with the Kidscreen instrument and this correlation was better from parents reports with both instruments [29]. Indeed, the concept of quality of life is much broader than what VO2 max represents in healthy children and in children with a cardiac disease.
The acceptability of the self and proxy PedsQL instruments was excellent, with only 2% and 3% missing items, respectively. As in the original psychometric analysis of the PedsQL, no floor effect was observed [17]. However, a ceiling effect was observed in both CHD and control children, especially in the physical and social dimensions. A similar effect has been observed in the psychometric validation of the PedsQL from a large cohort of school children [44].
The PedsQL instrument provided a good discriminant validity, as all scores were significantly lower in CHD children than in controls, overall, in each dimension, and in both self and proxy reports (except for the parents reported social dimension). Moreover, the PedsQL could discriminate severe from non severe CHD, but was less performant to discriminate intermediate severity levels. Interestingly, gender differences were observed in selfreports, female HRQoL scores being lower than males in most dimensions, but not in parents reports. Gender differences have been commonly observed in pediatric HRQOL studies [21,28]. For example, in boys with CHD, the feeling of overall well-being is linked to the practice of a physical activity, which is reflected in HRQOL scores [28,29,45]. A possible cofounding effect of gender on HRQOL's perception may exist, however the impact of diferential item functioning using the PedsQL has been considered negligible [46,47].
As classically observed in HRQoL studies in the pediatric population, our results highlighted the existing difference between self and parents reports, both in healthy and CHD children [24,28]. Usually, proxy reports provide lower scores than self-reports, and, in our experience, parents' reports seem to better reflect the actual disease severity, especially mothers of children with CHD [15,25,29].

Study limitations
Response to clinical change was not assessed in this study. We previously found a good response to change with the French proxy-version of the PedsQL from a large cohort of mothers of children under oral anticoagulants [15]. Nevertheless, futher studies using PedsQL self reports to determine reponse to clinical change remain necessary. As it is our area of expertise, the CHD population was used to validate the PedsQL in this study, which made relevant the use of exercise capacity outcomes [45]. The lack of heterogeneity of the population may explain the moderate correlation of the PedsQL with disease severity [25,29].