Human subjects ethics for this study was reviewed and approved by the Indiana University Institutional Review Board and in compliance with the Helsinki Declaration. In accordance with HRQOL instrument development protocols, the PedsQL™ NF1 Module was designed through the following five phases – 1) literature review; 2) outline of the instrument; 3) pilot instrument development – a) focus group/semi-structured interviews, b) cognitive interviews, c) experts’ review; 4) pilot testing and 5) field testing.
Phase 1 – literature review
We conducted an extensive literature search in Pub Med database for symptoms and signs of NF1 [5–9, 23–34] (complete search methods and table of references available on request). The HRQOL literature was reviewed for measures pertinent to NF1 [35–39].
Phase 2 – outline of the instrument
Clinicians taking care of patients with NF1 at Indiana University Hospitals, Indianapolis, IN, were interviewed to learn about their experiences with NF1. An initial outline of the instrument was developed based on literature review and clinicians’ experiences. Pertinent questions were drawn from the existing PedsQL™ Arthritis, Cancer, Cerebral Palsy and Family Impact Modules [36–39]. Instrument domains were designed to address HRQOL issues specific to NF1.
Phase 3 – pilot instrument development
The initial instrument was modified after conducting a focus group or semi-structured interview, cognitive interviews and experts’ review. a) Focus group/Semi-structured interview: Information about the focus group was advertised in the NF clinic at Indiana University Hospitals. Individuals were enrolled into the focus group if they had NF1 and were willing to talk about their disorder and its effect on their health and well-being. Written-informed consent was obtained from all participants. We conducted one focus group of three individuals and 2 semi-structured interviews with two adults with NF1. Participants were encouraged to speak about how NF1 affected their health and well-being. All interviews were digitally recorded, transcribed and de-identified for research purposes. By interpretative phenomenological analysis [40, 41], new domains were identified from the focus group/semi-structured interviews, specifically, Skin Irritation, Sensation, Movement and Balance and Sexual Functioning. b) Cognitive interviews: An initial draft instrument was administered at the NF clinic. Cognitive interviewing of the participants was done to find out problems with wording of items, interpretation of instructions and to estimate the time required to complete the surveys. The items were revised after receiving the participants’ feedback. c) Experts’ review: The modified instrument was further reviewed by NF1 researchers and clinicians. After cognitive interviews and expert reviews, additional changes were made which included rewording “sensitive skin” to “rough skin” in the Skin Irritation domain and adding “not applicable” as a choice in the Sexual Functioning domain. The pilot instrument developed as a result has 16 domains and 74 items.
The 74-item PedsQL™ NF1 Module: Adult self- report instrument comprises 16 domains/subscales: 1) Physical Functioning (8 items), 2) Emotional Functioning (5 items), 3) Social Functioning (3 items), 4) Cognitive Functioning (5 items), 5) Communication (3 items), 6) Worry (7 items), 7) Perceived Physical Appearance (3 items), 8) Pain and Hurt (3 items), 9) Paresthesias (2 items), 10) Skin Irritation (5 items), 11) Sensation (4 items), 12) Movement and Balance (4 items), 13) Daily Activities (12 items), 14) Fatigue (3 items), 15) Treatment Anxiety (4 items) and 16) Sexual Functioning (3 items).
The NF1 HRQOL format, instructions, and Likert response scale are similar to the PedsQL™ 4.0 Generic Core Scales and other PedsQL™ Disease-Specific Modules. Although originally developed for use in children, the PedsQL™ format has been extended to adults with a generic instrument as well as several disease specific instruments . The instructions ask how much of a problem each item has been during the past one month. A 5-point response scale is used for all items (0 = never a problem, 1 = almost never a problem, 2 = sometimes a problem, 3 = often a problem, 4 = almost always a problem). Items are reverse scored and linearly transformed to a scale of 0–100 similar to PedsQL™ 4.0 Generic Core Scales (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0) . Hence, higher scores signify better HRQOL [42, 43] and fewer symptoms or problems. The Total Score is computed as the sum of all items on the PedsQL™ NF1 Module divided by the number of items answered (this accounts for missing data). Subscale scores are computed as the sum of the items divided by the number of items that were answered in that subscale. If more than 50% of the items in the subscale are missing, the subscale score is not computed . Information about demographics is not included in the instrument except for age of the participant. In addition, participants were asked to rate their health status on a Likert scale as – excellent, very good, good, fair, or poor.
Phase 4 – pilot testing
The PedsQL™ NF1 pilot instrument was tested at the Children’s Tumor Foundation sponsored NF forum in July 2011 in Minnesota, to check for the initial feasibility and internal consistency reliability. All participants with NF1 who were not involved in prior phases of instrument development, were encouraged to fill out the surveys. A sample of 10 adults with NF1 completed the surveys. The mean age of the participants was 40.5 y (range 20 to 62 y). Feasibility was measured by the average time taken to complete the survey and by the percentage of missing responses . Internal consistency reliability was measured by computing Cronbach’s coefficient alpha . Pilot testing showed that, on an average, participants took 6 minutes to complete the survey (range 4 to 8 min). The missing response rate was 2.3% (n = 17), with the Sexual Functioning domain having the highest number of missing responses (n = 10). Two participants were missing 1 response in Physical Functioning, one participant was missing 3 responses in Perceived Physical Appearance, one participant was missing 1 response and three participants were missing all 3 responses in the Sexual Functioning domain. The Worry and Movement & Balance domains each had 1 missing response. Total scale internal consistency reliability was 0.82, showing adequate initial internal consistency of the instrument. Pilot testing provided initial support with adequate feasibility and internal consistency to proceed to the next phase of instrument development.
Phase 5 – field testing
This final phase consisted of a larger NF1 sample across the country to determine feasibility, reliability and validity of PedsQL™ NF1 Module scales.
Sample & setting
Participants were recruited from the NF clinic at Indiana University hospitals and national NF1 conferences from July 2011 to February 2012. In addition, the NF1 module for adults was placed online and web-links were advertised through NF organizations by publishing information about the study in their newsletters and websites (NF Midwest and Texas NF foundation). A sample of 124 adults completed the surveys. Pilot surveys (n = 10) are included in this sample since they showed no significant difference in mean subscale scores by independent samples t-test (p > 0.05). Mean age of the participants was 40.2 years, ranging from 20 to 71 years.
Scale internal consistency reliability was determined by calculating Cronbach’s coefficient alpha . For each subscale, “Cronbach’s alpha if item deleted” was determined to see if subscale reliability improves with the removal of the item. Subscale reliabilities of 0.70 or more are recommended for comparing patient groups, whereas reliability of 0.90 is recommended for analyzing individual patient scale scores [46, 47]. Considering the small sample size, the Sexual Functioning domain was excluded from the analyses as it had the highest missing responses. Exploratory factor analysis using Promax rotation was conducted for the remaining 71 items. Item loadings were assessed using a cut-off value of 0.30 to see whether items loaded high on one and only one factor (i.e., ‘simple structure’) and whether the collection of items that loaded high on each factor formed a conceptually relevant subscale.
Feasibility was measured by the percentage of missing values . Multitrait scaling analysis was performed to find out the extent to which individual items correlated with the hypothesized subscale construct rather than with other subscales . We also examined the item-hypothesized subscale correlations (corrected by removing the item from the Total Score) and we used a cutoff of 0.40 or higher for indicating good item discrimination [44, 45]. Multitrait scaling analyses were summarized via tests of individual item scaling success, defined as the number of times an item correlated higher with its hypothesized subscale construct rather than with another subscale by ≥ 2 standard errors , which provided an approximation of scaling success. The percentage of item scaling successes relative to the total number of item scaling tests was calculated for each subscale [44, 49].
Construct validity of the instrument was determined using the known-groups method, which compares subscale scores across groups known to differ in the health construct being investigated [44, 50]. NF1 participants were divided into 3 groups based on their self-reported health status – ‘excellent to very good’ (n = 47), ‘good’ (n = 46), and ‘fair to poor’ (n = 41). Mean subscale scores were compared among these 3 groups using one-way ANOVA. Effect sizes were calculated for the subscale scores to estimate the magnitude of differences. Effect sizes are designated as small (.20), medium (.50) and large (.80) in magnitude . Statistical analyses were done using SPSS 18 version for Windows (SPSS Inc., Chicago IL, USA) and SAS 9.3 version for Windows (SAS Institute Inc., Cary, NC, USA).