Development of the Adult PedsQL™ Neurofibromatosis Type 1 Module: Initial Feasibility, Reliability and Validity

Background Neurofibromatosis type 1 (NF1) is a common autosomal dominant genetic disorder with significant impact on health-related quality of life (HRQOL). Research in understanding the pathogenetic mechanisms of neurofibroma development has led to the use of new clinical trials for the treatment of NF1. One of the most important outcomes of a trial is improvement in quality of life, however, no condition specific HRQOL instrument for NF1 exists. The objective of this study was to develop an NF1 HRQOL instrument as a module of PedsQL™ and to test for its initial feasibility, internal consistency reliability and validity in adults with NF1. Methods The NF1 specific HRQOL instrument was developed using a standard method of PedsQL™ module development – literature review, focus group/semi-structured interviews, cognitive interviews and experts’ review of initial draft, pilot testing and field testing. Field testing involved 134 adults with NF1. Feasibility was measured by the percentage of missing responses, internal consistency reliability was measured with Cronbach’s alpha and validity was measured by the known-groups method. Results Feasibility, measured by the percentage of missing responses was 4.8% for all subscales on the adult version of the NF1-specific instrument. Internal consistency reliability for the Total Score (alpha =0.97) and subscale reliabilities ranging from 0.72 to 0.96 were acceptable for group comparisons. The PedsQL™ NF1 module distinguished between NF1 adults with excellent to very good, good, and fair to poor health status. Conclusions The results demonstrate the initial feasibility, reliability and validity of the PedsQL™ NF1 module in adult patients. The PedsQL™ NF1 Module can be used to understand the multidimensional nature of NF1 on the HRQOL patients with this disorder.


Background
Neurofibromatosis type 1 (NF1) is a common autosomal dominant genetic disorder with a prevalence of 1 in 3000 persons worldwide, independent of gender, race and ethnicity [1][2][3][4]. According to National Institutes of Health, NF1 is diagnosed by the presence of two or more of the following clinical features -1) six or more Café-au-lait spots (>5 mm in prepubertal individuals and >15 mm in postpubertal individuals, both in greatest diameter), 2) two or more neurofibromas of any type or one plexiform neurofibroma, 3) freckling in the axillary or inguinal regions, 4) optic glioma, 5) two or more Lisch nodules, 6) a distinctive osseous lesion (sphenoid dysplasia or thinning of the long bone cortex with or without pseudarthrosis), 7) having a first degree relative with NF1 [5,6]. NF1, also known as von Recklinghausen's disease, is characterized by the presence of multiple cutaneous neurofibromas, café au lait spots, intertriginous freckling, and Lisch nodules (iris hamartomas) [7]. Almost one half of the affected individuals with this disorder have been shown to have learning disabilities [8]. Other common findings include optic gliomas, bony abnormalities, headache and hyper-tension. Significant, but less common complications of NF1 include malignant peripheral nerve sheath tumors, brain tumors, vasculopathy, epilepsy, growth problems, neurological dysfunction and pruritus [1,[7][8][9].
Neurofibromas, pathognomonic for NF1, are benign nerve sheath tumors that can be extraneural or intraneural. They are composed of Schwann cells, perineural cells, fibroblasts, and mast cells [10,11]. They may remain asymptomatic or can cause a wide variety of symptoms including pain, pruritus, paresthesias (tingling, numbness) and local trauma. Extraneural neurofibromas cause cosmetic disfigurement whereas internal neurofibromas impinge on neighboring organs and significantly increase morbidity and mortality [12,13]. Childhood through early adulthood is a critical period for the accelerated growth of neurofibromas [1,14]. Rapid tumor growth also occurs during pregnancy due to associated hormonal changes [12,14]. The variable nature of neurofibromas and other symptoms associated with NF1 have a significant impact on the health-related quality of life (HRQOL) of individuals with this disorder [15,16].
NF1 is a lifelong, progressive, variable and unpredictable disorder [17]. The main stay of treatment for NF1 is supportive or surgical. However, surgical removal of neurofibromas is unsatisfactory as these tumors often regrow and the underlying cause has not been treated [14,18]. Progress in understanding the genetics and pathogenetic mechanisms has led to the use of new drugs for the treatment of NF1 and the emergence of a number of clinical trials [11,19]. Participants in these clinical trials need objective follow up to monitor changes in clinical symptoms. Radiographic imaging (3-dimensional MRI) is being done with some success however, defining success from calculations of tumor mass from radiographic imaging is difficult as neurofibromas have irregular shapes and may be fibrotic [14]. Thus, they may not show a significant decrease in size with treatments despite the report of improvement in clinically significant symptoms [18]. Patients with little tumor shrinkage have anecdotally reported large improvements in functioning and well-being that could be measured with a NF1 specific HRQOL instrument.
HRQOL is arguably one of the most important measures in evaluating effectiveness of clinical treatments [20,21]. HRQOL instruments used in previous studies in patients with NF1 have been generic and may be useful for comparing across different health conditions. Studies of generic instruments showed that NF1 had a significant impact on all domains of the Short Form 36 health survey (SF-36) when compared to the normative population [15,16]. Limitations exist to generic quality of life survey instruments when they are applied to patients with specific illnesses [22]. Generic instruments do not measure disease-specific HRQOL, for instance, skin paresthesias in individuals with NF1. In contrast, disease-specific instruments measure the impact of specific symptoms and are more sensitive for the detection and quantification of small changes over time [22]. A significant gap in the current empirical literature is the lack of a validated NF1specific HRQOL instrument. Consequently, the objective of this study was to develop an NF1 specific HRQOL instrument (as a Module of the PedsQL™) and to test for its initial feasibility, internal consistency reliability and validity. We hypothesized that HRQOL when measured by the PedsQL™ NF1 Module domains would be associated with the self-reported health status.

Methods
Human subjects ethics for this study was reviewed and approved by the Indiana University Institutional Review Board and in compliance with the Helsinki Declaration. In accordance with HRQOL instrument development protocols, the PedsQL™ NF1 Module was designed through the following five phases -1) literature review; 2) outline of the instrument; 3) pilot instrument developmenta) focus group/semi-structured interviews, b) cognitive interviews, c) experts' review; 4) pilot testing and 5) field testing.

Phase 2outline of the instrument
Clinicians taking care of patients with NF1 at Indiana University Hospitals, Indianapolis, IN, were interviewed to learn about their experiences with NF1. An initial outline of the instrument was developed based on literature review and clinicians' experiences. Pertinent questions were drawn from the existing PedsQL™ Arthritis, Cancer, Cerebral Palsy and Family Impact Modules [36][37][38][39]. Instrument domains were designed to address HRQOL issues specific to NF1.

Phase 3pilot instrument development
The initial instrument was modified after conducting a focus group or semi-structured interview, cognitive interviews and experts' review. a) Focus group/Semistructured interview: Information about the focus group was advertised in the NF clinic at Indiana University Hospitals. Individuals were enrolled into the focus group if they had NF1 and were willing to talk about their disorder and its effect on their health and well-being. Written-informed consent was obtained from all participants. We conducted one focus group of three individuals and 2 semi-structured interviews with two adults with NF1. Participants were encouraged to speak about how NF1 affected their health and well-being. All interviews were digitally recorded, transcribed and deidentified for research purposes. By interpretative phenomenological analysis [40,41], new domains were identified from the focus group/semi-structured interviews, specifically, Skin Irritation, Sensation, Movement and Balance and Sexual Functioning. b) Cognitive interviews: An initial draft instrument was administered at the NF clinic. Cognitive interviewing of the participants was done to find out problems with wording of items, interpretation of instructions and to estimate the time required to complete the surveys. The items were revised after receiving the participants' feedback. c) Experts' review: The modified instrument was further reviewed by NF1 researchers and clinicians. After cognitive interviews and expert reviews, additional changes were made which included rewording "sensitive skin" to "rough skin" in the Skin Irritation domain and adding "not applicable" as a choice in the Sexual Functioning domain. The pilot instrument developed as a result has 16 domains and 74 items.
The 74-item PedsQL™ NF1 Module: Adult self-report instrument comprises 16 domains/subscales: 1) Physical Functioning (8 items The NF1 HRQOL format, instructions, and Likert response scale are similar to the PedsQL™ 4.0 Generic Core Scales and other PedsQL™ Disease-Specific Modules. Although originally developed for use in children, the PedsQL™ format has been extended to adults with a generic instrument as well as several disease specific instruments [35]. The instructions ask how much of a problem each item has been during the past one month. A 5-point response scale is used for all items (0 = never a problem, 1 = almost never a problem, 2 = sometimes a problem, 3 = often a problem, 4 = almost always a problem). Items are reverse scored and linearly transformed to a scale of 0-100 similar to PedsQL™ 4.0 Generic Core Scales (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0) [35]. Hence, higher scores signify better HRQOL [42,43] and fewer symptoms or problems. The Total Score is computed as the sum of all items on the PedsQL™ NF1 Module divided by the number of items answered (this accounts for missing data). Subscale scores are computed as the sum of the items divided by the number of items that were answered in that subscale. If more than 50% of the items in the subscale are missing, the subscale score is not computed [44]. Information about demographics is not included in the instrument except for age of the participant. In addition, participants were asked to rate their health status on a Likert scale asexcellent, very good, good, fair, or poor.

Phase 4pilot testing
The PedsQL™ NF1 pilot instrument was tested at the Children's Tumor Foundation sponsored NF forum in July 2011 in Minnesota, to check for the initial feasibility and internal consistency reliability. All participants with NF1 who were not involved in prior phases of instrument development, were encouraged to fill out the surveys. A sample of 10 adults with NF1 completed the surveys. The mean age of the participants was 40.5 y (range 20 to 62 y). Feasibility was measured by the average time taken to complete the survey and by the percentage of missing responses [44]. Internal consistency reliability was measured by computing Cronbach's coefficient alpha [45]. Pilot testing showed that, on an average, participants took 6 minutes to complete the survey (range 4 to 8 min). The missing response rate was 2.3% (n = 17), with the Sexual Functioning domain having the highest number of missing responses (n = 10). Two participants were missing 1 response in Physical Functioning, one participant was missing 3 responses in Perceived Physical Appearance, one participant was missing 1 response and three participants were missing all 3 responses in the Sexual Functioning domain. The Worry and Movement & Balance domains each had 1 missing response. Total scale internal consistency reliability was 0.82, showing adequate initial internal consistency of the instrument. Pilot testing provided initial support with adequate feasibility and internal consistency to proceed to the next phase of instrument development.

Phase 5field testing
This final phase consisted of a larger NF1 sample across the country to determine feasibility, reliability and validity of PedsQL™ NF1 Module scales.

Sample & setting
Participants were recruited from the NF clinic at Indiana University hospitals and national NF1 conferences from July 2011 to February 2012. In addition, the NF1 module for adults was placed online and web-links were advertised through NF organizations by publishing information about the study in their newsletters and websites (NF Midwest and Texas NF foundation). A sample of 124 adults completed the surveys. Pilot surveys (n = 10) are included in this sample since they showed no significant difference in mean subscale scores by independent samples t-test (p > 0.05). Mean age of the participants was 40.2 years, ranging from 20 to 71 years.

Statistical analysis
Scale internal consistency reliability was determined by calculating Cronbach's coefficient alpha [45]. For each subscale, "Cronbach's alpha if item deleted" was determined to see if subscale reliability improves with the removal of the item. Subscale reliabilities of 0.70 or more are recommended for comparing patient groups, whereas reliability of 0.90 is recommended for analyzing individual patient scale scores [46,47]. Considering the small sample size, the Sexual Functioning domain was excluded from the analyses as it had the highest missing responses. Exploratory factor analysis using Promax rotation was conducted for the remaining 71 items. Item loadings were assessed using a cut-off value of 0.30 to see whether items loaded high on one and only one factor (i.e., 'simple structure') and whether the collection of items that loaded high on each factor formed a conceptually relevant subscale.
Feasibility was measured by the percentage of missing values [44]. Multitrait scaling analysis was performed to find out the extent to which individual items correlated with the hypothesized subscale construct rather than with other subscales [48]. We also examined the itemhypothesized subscale correlations (corrected by removing the item from the Total Score) and we used a cutoff of 0.40 or higher for indicating good item discrimination [44,45]. Multitrait scaling analyses were summarized via tests of individual item scaling success, defined as the number of times an item correlated higher with its hypothesized subscale construct rather than with another subscale by ≥ 2 standard errors [44], which provided an approximation of scaling success. The percentage of item scaling successes relative to the total number of item scaling tests was calculated for each subscale [44,49].
Construct validity of the instrument was determined using the known-groups method, which compares subscale scores across groups known to differ in the health construct being investigated [44,50]. NF1 participants were divided into 3 groups based on their self-reported health status -'excellent to very good' (n = 47), 'good' (n = 46), and 'fair to poor' (n = 41). Mean subscale scores were compared among these 3 groups using oneway ANOVA. Effect sizes were calculated for the subscale scores to estimate the magnitude of differences. Effect sizes are designated as small (.20), medium (.50) and large (.80) in magnitude [51]. Statistical analyses were done using SPSS 18 version for Windows (SPSS Inc., Chicago IL, USA) and SAS 9.3 version for Windows (SAS Institute Inc., Cary, NC, USA).

Item reduction
Based on the combined results of item-test statistics while determining reliability and exploratory factor analysis, 4 items were deleted from the following subscales of the instrument -"having headaches (Physical Functioning), feeling isolated from others (Social Functioning), worry about keeping or doing a job (Worry) and managing my NF1 (Treatment Anxiety). The final instrument reported in this manuscript has 16 subscales and 70 items (including the Sexual Functioning domain) as shown in Appendix A.

Item-internal consistency
Item-subscale correlations showed that all items on the adult version of the PedsQL™ NF1 Module exceeded our criterion (0.40) for item discrimination, except for one item on the Worry subscale (worry about future or the risk of having children with NF1) with a correlation of 0.34. We retained this item in the final version of the PedsQL™ NF1 Module, however, because it was deemed important by the NF1 adults during the focus group/ semi-structured interviews and by NF1 experts.

Item scaling tests
The results of scaling tests for the adult version of the PedsQL™ NF1 Module are shown in Table 1. The scaling success for Cognitive Functioning was highest with 100% and lowest for Social Functioning at 42.86%. The mean and median of scaling success for all the subscales of the adult version of the PedsQL™ NF1 Module was 73% and 71.4% respectively.
Internal consistency reliability Table 2 shows internal consistency reliability coefficients for all subscales of the Adult PedsQL™ NF1 Module. Subscale reliabilities ranged from 0.72 to 0.96, with all subscales exceeding the minimum reliability criterion of 0.70 required for group comparisons. Total Score was 0.97, which exceeded the reliability criterion of 0.90 recommended for analyzing individual patient scores. Table 3 compares mean subscale scores and effect sizes of three groups of NF1 participants based on their selfreported health status. Total Scores of the three groups showed statistically significant differences, with a lower score among NF1 participants with 'fair to poor' health status. All subscale scores of the instrument were significantly different among the three groups, supporting initial discriminant validity of the PedsQL™ NF1 Module. Effect sizes ranged from 0.22 to 0.63, with the largest effect sizes for the Pain and Hurt subscale, and the lowest effect sizes for the Perceived Physical Appearance subscale. The majority of the effect sizes were in medium range supporting discriminant validity of the individual subscales.

Discussion
The present study provides support for the initial feasibility, reliability and validity of the PedsQL™ NF1 Adult Version in a general population of adults with NF1. The adult version of the PedsQL™ NF1 Module could be completed in 6minutes and demonstrated minimal missing values, supporting the feasibility of the instrument. The majority of the missing responses were shown for the Sexual Functioning subscale, which was not included in statistical analyses. We included this domain in the instrument in the appendix since it was reflective of concerns expressed by the majority of patients during the focus group/semi-structured interviews and may be an important area of improvement with newer therapies. Internal consistency for the Adult PedsQL™ NF1 Module Total Score exceeded the minimum reliability criterion of 0.90 for individual patient analysis,  which supports the use of a Total Score as a measure of HRQOL in NF1 adults and supports the use of this instrument to follow improvement or deterioration over time in individuals. The individual subscale scores ranged from .72 to .96, which suggest that each subscale can be used to examine the specific domains of the PedsQL™ NF1 Module as well as using the Total Score for an overall assessment of NF1-specific HRQOL. The adult version of PedsQL™ NF1 Module was able to differentiate among patients with varying overall health status. These findings support the initial discriminant validity of the Adult PedsQL™ NF1 Module. Consistent with our hypothesis, lower scores on the Adult PedsQL™ NF1 Module domains were associated with adult patients' selfreported 'fair' to 'poor' health status. Adult patients with 'excellent' to 'very good' health status had higher HRQOL scores when compared to the other two health status groups across all subscales. The study sample has a mixture of participants from both clinical populations and general NF1 populations. The greatest difference in mean subscale scores existed in the Pain and Hurt subscale (50.84), which demonstrates that a clinic population reports more pain compared to rest of the participants. The Cognitive Functioning subscale (14.65) showed minimal differences among the groups. Although cognitive impairment is a frequent finding in NF1 [52], it is likely that the clinic group is presenting for physical symptoms.
Our study has several strengths, including the diversity of sample, nation-wide representation of the participants (clinic populations from states in and around Indiana, and a general NF1 population by advertising the study at national conferences and organizations) and broad age range (20-71 years) of participants in the field test. The Adult PedsQL™ NF1 Module can be self-administered, read easily (designed at the sixth grade reading level) and filled out quickly.
Although the Module appears lengthy with 16 domains and 70 items, participants took an average of 6 minutes to complete it in pilot testing.
Our study also has some limitations. First of all, the sample size was somewhat small for a field test, which limits the precision of our factor analysis when reducing the number of items. Secondly, for divergent validity, we used participants self-reported health status and hence, there exists the possibility of overestimating or underestimating the actual disease severity. The Social Functioning subscale in the Adult PedsQL™ NF1 Module was problematic with a low scaling success. Although, we originally had 3 items in this subscale, one item was dropped to improve the subscale internal consistency. In future versions of this instrument we recommend testing more Social Functioning items as well as the Sexual Functioning items.
Currently, we are developing teen report and parent proxy report versions of the NF1 instrument. In the future, we plan to follow the strict methodology for PedsQL™ instrument development to validate child, teen and parent versions of the instrument.

Conclusions
In summary, the adult version of PedsQL™ NF1 Module can be used to understand the multidimensional nature of NF1 on the HRQOL patients with this disorder and may assist in medical decision making. The instrument demonstrates initial feasibility, reliability, and discriminant validity.
Appendix A PedsQL™ NF1 Module-Adult report