Instrument completion and validation of the patient-reported apnea questionnaire (PRAQ)

Background We previously developed the preliminary version of the Patient-Reported Apnea Questionnaire (PRAQ), a questionnaire measuring health-related quality of life in patients with (suspected) obstructive sleep apnea (OSA). This questionnaire was developed for clinical practice, where it can potentially serve two goals: use on an individual patient level to improve patient care, and use on an aggregate level to measure outcomes for quality improvement at a sleep center. In this study we aim to finalize the PRAQ, make a subselection of items and domains specifically for outcome measurement, and assess the validity, reliability and responsiveness of the PRAQ. Methods Patients with suspected OSA were included and asked to complete the PRAQ and additional questionnaires one or more times. The collected data was used to perform the final item selection for clinical practice and for outcome measurement, create the domains for outcome measurement, and assess the measurement properties internal consistency, test-retest reliability, convergent validity and responsiveness. Results 180 patients were included in the study. The final version of the PRAQ for use in clinical practice contains 40 items and 10 domains. A subselection of 33 items in 5 domains was selected for optimal outcome measurement with the PRAQ. The results for the outcome measurement domains were: Cronbach’s alpha 0.88–0.95, ICC 0.81–0.88, and > 75% of hypotheses correct for convergent validity and responsiveness. Conclusions The PRAQ shows good measurement properties in patients with (suspected) OSA. Electronic supplementary material The online version of this article (10.1186/s12955-018-0988-6) contains supplementary material, which is available to authorized users.

Patient-reported outcome measures (PROMs) are questionnaires for patients about symptoms or daily functioning. Most PROMs have been developed for use in clinical trials, but interest in their use in daily practice is growing [10,11]. There, PROM scores can be used on an individual patient level to help bring patients' problems to the forefront during consultations and to monitor treatment response, or on an aggregate level across groups of patients for quality improvement purposes [12]. Use of a PROM on an individual patient level may be especially relevant when patient symptoms are multiple and complex. We therefore believe that it would be beneficial to employ a PROM for patients with OSA in daily clinical practice.
In a recently published article, we described the item generation and preliminary item selection of a PROM for patients with (suspected) OSA: the Patient-Reported Apnea Questionnaire (PRAQ) [13]. We used the input of patients with OSA and healthcare professionals to select topics and items important for measuring quality of life for this patient group, which are also useful to discuss during an intake or follow-up consultation.
There are two ways in which the preliminary version of the PRAQ requires further development. First, the item reduction for the topic "sleepiness" has not yet taken place. During the item selection process of the PRAQ, patients indicated that the number of items on the topic of sleepiness could be reduced. Since the patients had no preference for which items to exclude, we decided to perform the final item reduction after studying the psychometric properties of the items. Second, the factor structure of the PRAQ has not yet been studied, and we wanted to find the optimal way to group (a subset of) the items of the PRAQ into domains for the purpose of outcome measurement.
Our aim is for the PRAQ to be employed in the following way: patients complete all items of the PRAQ before their consultation, the results of which can be discussed with a healthcare professional; and the aggregate outcomes of groups of patients can then be studied by making use of a subset of the completed items. This is beneficial for patients, who get feedback from clinicians on their results; for physicians, who get a quick insight into their patients' main problems; and for sleep centers that wish to collect outcome data for quality improvement, because it ensures a steady stream of data due to integration in clinical practice.
In this article we describe the further development of the preliminary PRAQ. In addition, we aim to determine the reliability, validity and responsiveness of the PRAQ, with a focus on the domains that will be used for outcome measurement.

Population & method of completion of the PROMs Baseline measurement
Patients referred to the sleep center of the Albert Scheitzer Hospital in Dordrecht, The Netherlands for suspected OSA received an invitation by email to complete the PRAQ and additional PROMs, 2-3 weeks before their intake consultation. They were informed that the results of the PRAQ would be discussed during their intake consultation. A reminder was sent one week later if the PROMs were not yet completed at that time. Patients who had not completed the PRAQ at home were offered the option of completing the PRAQ at the sleep center before their consultation.

Retest measurement
In order to assess test-retest reliability, patients who had completed the baseline measurement at home were asked to complete it again immediately before their intake consultation, on a computer in a private area of the sleep center. Only patients who had completed the retest no less than 7, and no more than 21 days after the baseline measurement were included for assessment of test-retest reliability.

Follow-up measurement
A common measure to express the number of (partial) breathing stops experienced while asleep is the Apnea-Hypopnea Index (AHI). We measured the responsiveness of the PRAQ in patients with an AHI ≥ 15, which indicates moderate to severe sleep apnea [14], and who were prescribed continuous positive airway pressure (CPAP) after their intake consultation. CPAP is the preferred treatment for OSA [15]. If the patients were still using CPAP at the time of the first follow-up consultation (6-8 weeks after start of CPAP), they were included for responsiveness. They were asked to complete the PRAQ and the additional PROMs immediately before their follow-up consultation at the sleep center. Ideally, responsiveness should be determined in a patient group in which CPAP therapy is successful and therefore a substantial change is expected with regard to the patient's symptoms. CPAP therapy is generally considered successful when compliance is ≥4 h nightly [16].
A secure website was used for the completion of the PROMs. For any of the measurements, patients who were unable or unwilling to use a computer were offered the option of completing a paper copy of the PROMs.

Final stage of PRAQ development
The development article of the preliminary PRAQ [13] shows how the initial 43 items were selected based on their relevance for clinical practice and were sorted into preliminary domains: symptoms at night (6 items), sleepiness (8 items), tiredness (3 items), daily activities (5 items), unsafe situations (2 items), memory and concentration (2 items), quality of sleep (2 items), emotions (6 items), social interactions (8 items), and health concerns (1 item) (Appendix 1). All items are scored on a 7-point Likert scale (higher scores indicate worse problems), and the average item scores in a domain form its domain score.
First, we performed item reduction on the sleepiness domain, as 8 items was deemed too much by patients. Then, we looked at how the PRAQ could be best used for outcome measurement. It is important that all items fit into a domain that is either "coherent" in terms of clinical relevance, or (preferably) in terms of covariance matrix as determined by principal component analysis (PCA). Therefore, our aim was to identify which items of the PRAQ can be grouped into domains for outcome measurement after use of the results of the PRAQ for an individual patient. We describe below how we first reduced the number of items for the domain 'sleepiness' , and then how from the remaining items a subset of items was selected for outcome measurement.

Item reduction of the sleepiness domain
During the development of the PRAQ, patients indicated that they felt that the number of items on the topic of sleepiness could be reduced. Because they had no preference for which items should be excluded, we took a statistical approach. We first looked for items with a high inter-item correlation (> 0.9), indicating that one of these items can be removed without a substantial loss of information [17]. As a second step, we used exploratory factor analysis to identify potential items with lower factor loadings (< 0.5), indicating that they do not cover the construct as well as the other items and are therefore more suitable for removal [17,18].

Creating domains for the PRAQ-outcome
Two of our preliminary domains, 'symptoms at night' and 'social interactions' , we considered formative rather than reflective domains: they do not aim to measure aspects of the same latent construct, but the items are grouped together based on clinical relevance. Grouping items in this way can be considered a "clinimetric" approach, as opposed to a "psychometric" approach which uses statistical methods to determine the dimensionality of a PROM [19]. We wanted to group these items together irrespective of their covariance matrix, because for content reasons we did not consider it desirable to combine these items with any of the other (potential) domains. Therefore, we excluded them from the PCA and kept these domains as they were.
We performed a PCA with oblique rotation (because correlations between the different patient complaints were expected) on the 26 items of the other preliminary domains. Items that did not load on any domain with a factor loading of at least 0.5 or that had a factor loading of > 0.3 on more than one factor [17], were then one by one removed from the analysis, starting with those items that for content reasons did not seem to fit well with the items they were grouped with in the PCA. Additionally, since domains should ideally consist of at least three items, we used this as a requirement for the PRAQ-outcome domains [17]. The one-dimensional domains that were identified by the analysis were added to the two clinimetric domains. Together, these domains form the subset of the PRAQ that can be used for outcome measurement.

Assessment of measurement properties
We studied the distribution of the individual items and the PRAQ domain scores at baseline to check for floor and ceiling effects (i.e. whether < 15% of the respondents achieved the highest or lowest possible scores [20]). We assessed the reliability, validity and responsiveness of the PRAQ following the taxonomy of measurement properties as constructed by the COSMIN panel [21].
We calculated the internal consistency parameter Cronbach's alpha which should have a value between 0.70 and 0.95 [20]. We assessed test-retest reliability by calculatingthe intraclass correlation coefficient (ICC consistency) for each PRAQ domain. ICC values of 0.7 are considered acceptable, but values of ≥0.8 are preferred [17]. Additionally, we calculated the standard error of measurement (SEM).
We used hypothesis testing to assess convergent validity, which involves studying the correlations of the scores of the PROM under study with the scores of other PROMs. We hypothesized on the size and direction of the (Spearman's) correlations of the PRAQ domains with the (subscales of) PROMs with similar constructs (Appendix 2). We also hypothesized which PROMs should have a lower correlation with the PRAQ domain. Good convergent validity means that 75% of hypotheses are correct [20]. We used the following (subscales of) PROMs for convergent validity in their official Dutch translations: The Epworth Sleepiness Scale (ESS) [22], measuring daytime sleep propensity. For eight situations, a patient indicates the likelihood that they would fall asleep while in that situation. The measurement properties of the ESS have been studied in a sleep apnea population [23]. The "vitality" domain of the RAND-36 [24]. The (freely available) RAND-36, which is the predecessor of the well-known SF-36, measures general quality of life in several domains. The vitality domain of the RAND-36 contains 4 items about a patient's perceived energy level. The items are identical to the items of the vitality domain of the SF-36, and the domain's measurement properties have been studied in a sleep apnea population in that context [23]. The following short-forms of the Patient-Reported Outcomes Measurement Information System (PRO-MIS) databank [25][26][27]: sleep disturbance (5 items), sleep-related impairment (6 items), fatigue, satisfaction with participation in social roles, ability to participate in social roles, anger, anxiety and depression (the latter 6 all contained 4 items per short-form) [28][29][30][31]. For "sleep disturbance" and "anger" these were custom short-forms with fewer items than the standard short forms, in order to reduce the number of items that patients had to complete for this study.
To assess responsiveness, we constructed hypotheses about the change scores of PRAQ in correlation to the change scores of the same instruments that were employed for hypothesis testing in construct validity (Appendix 2).

Population characteristics
The baseline population consisted of 180 patients with suspected OSA who completed the baseline measurement. Of these patients, 105 completed the retest between 7 and 21 days (average 14 days), and 53 patients completed the follow-up measurement after 6-8 weeks of treatment with CPAP. Characteristics of these respective (sub)populations can be found in Table 1.

Missing data
Patients completing the online PRAQ were not allowed to leave any items open (no missings allowed). Eleven patients completed the PRAQ on paper one or more times, and in one of these completed PRAQs (for follow-up after CPAP), item 33 (Appendix 1) was missing from the domain "social interactions". We computed the domain score for this patient as the average of the remaining items.
Seven items allowed the response item "not applicable" (see Appendix 1). Between 19 and 46% of respondents selected this response category for the respective items.

Final stage of PRAQ development Finishing the item selection of the sleepiness domain
None of the inter-item correlations in the preliminary 'sleepiness' domain was higher than 0.9. Principal component analysis showed that the lowest factor loading was 0.65, well above 0.5. Therefore, we took practical elimination decisions: the two items with a "not applicable" option were removed (about sleepiness while reading, and while driving a car) as well as an item about napping in the afternoon that had a different answering scale than the other items. This improves the homogeneity of the domain for patients. The final version of the PRAQ consists of 10 domains and 40 items (Appendix 1).

Identification and grouping of items for outcome measurement
The results of the final PCA can be found in Table 2. The items of the PRAQ domains "memory & concentration", "sleep quality", and "concerns about health" were removed because they did not have sufficient loading on any of the factors found in the PCA, or because the items loaded on more than one factor. The items of the PRAQ domains "tiredness" and "daily activities" loaded on a single factor rather than on two separate factors: these items were therefore combined in one domain called "energy & daily activities" for the goal of outcome measurement. The items of the PRAQ domain "unsafe situations" both loaded on one separate domain. However, since this domain contained only two items it was not added to the PRAQ-outcome.
The 19 remaining items in the PCA form three one-dimensional domains: sleepiness, energy & daily activities, and emotions, which together explain 73% of the variance. The PCA showed intercorrelations of these domains of 0.36-0.57. The domains are added to the two formative domains "symptoms at night" and "social interactions", resulting in subset of 33 items in five domains. Figure 1 illustrates how the items and domains of the PRAQ result in the subselection of PRAQ items for outcome measurement. The domains that are present in both the full 40-item PRAQ and in the 33-item outcome subset overlap to a great extent.

Measurement properties
In this section we describe the measurement properties of the domains that are used for outcome measurement; the results for the domains of the 40-item version can be found in Additional file 1. The average baseline scores, standard deviations, and percentages of lowest and highest scores of the five outcome domains can be found in Table 3. No floor-or ceiling effects were found, except for a floor effect in the 'sleepiness' domain (20% of subjects scored 1-1.5). The results of the different aspects of reliability (internal consistency with Cronbach's alpha, test-retest reliability with ICC, SEM) are also shown in Table 3. The values of Cronbach's alpha and the ICC values are all above 0.8, indicating that these measurement properties are of good quality.
The correlations of the outcome domains with comparator instruments, which were used to determine convergent validity, are presented in Table 4. The correlations with the (somewhat) similar constructs were all within the ranges that we hypothesized (n = 14 hypotheses), and the correlations of selected PRAQ-domains with the dissimilar constructs were all lower than those with the similar constructs (n = 3 hypotheses), as expected.
The absolute change scores of the PRAQ outcome domains after patients were treated with CPAP ranged from 0.76 (domain "emotions") to 1.96 (domain "energy & daily activities") (Appendix 3). The correlations of the change scores of the PRAQ and the change scores of the comparator instruments (Table 5) were generally in agreement with our hypotheses (n = 17 hypotheses). The exception was the "emotions" domain of the PRAQ, which did not correlate as strongly with the change scores of the PROMIS domains about emotions (anger, anxiety, depression; r = 0.26-0.43) as we had expected. When a hypothesis is not met, it is important to identify why the results are different than expected [32]. To gain more insight into these unexpected scores, we therefore ran an additional analysis on the correlation of the PRAQ scores and the PROMIS scores at the follow-up measurement, showing results of r = 0.62-0.71. This shows that the discrepancy lies with the change score itself and not the absolute score of the follow-up measurement.

Discussion
In this article we present the finalized Patient-Reported Apnea Questionnaire (PRAQ). The PRAQ has a unique approach with regard to the integration of use on an individual patient level and for aggregate outcome measurement: patients complete all items of the PRAQ before their consultation, the results of which can be discussed with a healthcare professional; and the aggregate outcomes of groups of patients can then be studied by making use of a subset of the completed items. The PRAQ contains all topics and items that patients and healthcare providers consider important to discuss in practice, and for this purpose includes 40 items in 10 domains. For outcome measurement, a subset of 33 items of the PRAQ was selected, divided into two formative domains (items grouped together based on what makes sense clinically) and three one-dimensional subscales. These five outcome domains generally have good measurement properties in terms of internal consistency, test-retest reliability, convergent validity and responsiveness.
PCA showed that items of the PRAQ domains "tiredness" and "daily activities" load on the same factor, Falling asleep at inappropriate times or places? .802 Feeling very tired? .785 Lacking energy? .856 Still feeling tired when you wake up in the morning? .790 In the past 4 weeks: How difficult was it for you to do your most important daily activity? (such as your job, studying, caring for the children, housework) .841 How often did you use all your energy on only your most important, daily activity? (such as your job, studying, caring for the children, housework) .940 How often did you use all your energy to accomplish only your most important daily activity? (such as your job, studying, caring for the children, housework) .825 How much difficulty did you have finding energy for your hobbies? .770 How difficult was it for you to get your chores done? .849 How often did you feel depressed or hopeless? .266 .677 How often did you feel anxious? .793 How often did you lose your temper? .803 How often did you feel that you could not cope with everyday life? .746 How often did you feel irritated? .889 How often did you have a strong emotional reaction to everyday events? .875 a. The bold font numbers indicate the highest factor loading for that item b. Absolute factor loadings < 0.2 are not shown in the table

items
Energy & daily activites 8 items Fig. 1 The subselection of items and domains of the PRAQ for outcome measurement which is why the items of these preliminary domains are combined into one domain for the purpose of outcome measurement. For use an individual patient level, however, we decided to keep the two domains separate. Even though we acknowledge that feeling tired (a symptom), and the extent to which daily activities can be performed normally (a consequence of that symptom), are closely related concepts, they may be relevant to discuss separately for an individual patient in clinical practice. We will test this assumption in future research, in which the PRAQ will be employed and studied empirically. The domains that are used for outcome measurement show good responsiveness. The one exception is the domain "emotions", the change score of which showed a much weaker correlation than expected with the change scores of PROMs with similar constructs. We hypothesize that the discrepancy between expectation and results caused by the low scores of this domain at baseline (average 2.89) and the subsequent relatively small improvement that is achieved after treatment with CPAP (average 0.76). We do not doubt the construct validity of the domain, because the comparator PROMs show the same pattern in terms of low scores and small change scores, and because the correlation of the absolute scores after treatment with CPAP shows good convergent validity. However, because the change scores are small, it is likely that measurement error plays a relatively large role in the change scores of both the PRAQ domain and the comparator instruments, reducing the accuracy of the change scores and therefore also diffusing the correlation size. This means that that in terms of outcome/ quality measurement, emotional problems appear to be of less importance than the topics of the other domains and more difficult to accurately measure, because relatively few people with (suspected) OSA experience severe problems.

Subselection of domains/items for outcome measurement
Surprisingly, 20% of the study population had low scores (1-1.5) on the domain 'sleepiness' , while sleepiness is one of the main complaints of OSA. We think that this is due to a relatively high difficulty of the sleepiness items of the PRAQ (such as falling asleep during a conversation) in combination with a generally low sleepiness in this population (average ESS < 10). This reason for the low sleepiness in the population is probably twofold. First, the main complaint of  some patients who were referred for suspected OSA in this study is probably (socially problematic) snoring rather than sleepiness or tiredness during the day. OSA treatment will reduce snoring and is reimbursed by healthcare insurers, making it beneficial for these patients to visit the sleep center and get an OSA diagnosis. Second, for logistical reasons some patients with suspected severe OSA were not included in the study. These patients a fast-track procedure to bypass the sleep center's the waiting list, which meant they were in practice not always asked to join the study. This is a limitation of the study. What we can derive from the current results is that the sleepiness domain of the PRAQ seems more useful to detect cases of severe sleepiness, which definitely requires treatment, than to distinguish mild and moderate sleepiness. However, future research should take place in a more representative patient group to study how the sleepiness domain performs in this population.
The PRAQ is designed for use in clinical practice, to help focus consultations on the problems that individual patients encounter. When using a PROM for this purpose, the ICC should preferably be very high (0.9-0.95 at individual level vs. 0.8 or higher at group level for aggregate outcome measurement [17]). The ICC values of the PRAQ are lower (0.81-0.88). However, the PRAQ is meant to open the conversation about a patient's symptoms and functioning, not to serve as a "cut-off" score. Any elevated score could therefore result in conversation about this topic, and we believe that the PRAQ can serve its purpose despite the slightly lower ICCs.

Methodological considerations
The domains for outcome measurement were created with a combination of the "clinimetric" approach, in which items are grouped together based on clinical relevance; and the "psychometric" approach, which groups items together based on PCA [33][34][35]. The combination of these two approaches is uncommon. We believe that scores of psychometric domains, with a clear one-dimensional construct, are more meaningful than formative domains because they have a clear interpretation. However, this approach is not always feasible when items have been selected to be part of a quality-of-life or symptoms questionnaire based on their deemed importance by the target population [36]. Items which cover symptoms of the same disease or treatment will often share covariance and thus appear to be covering the same latent construct, even when looking at the content of the items this makes no apparent sense (e.g. lack of appetite and decreased sexual interest in patients undergoing cancer treatment [36]). Therefore, we considered the best approach grouping together the different symptoms patients experience at night, as well as the variety of different ways in which sleepiness, tiredness and emotions might influence a patients' social life, without subjecting them to PCA.
To aid the use of the PRAQ in clinical practice, we developed a patient-friendly digital report (the PRAQ-report) together with patients and healthcare professionals (Fig. 2). When using the PRAQ in clinical practice, it can be useful to look at individual item scores as well as the domain scores, especially in the formative domains in which item scores will generally differ more from each other. Therefore, both domain and individual item scores are shown in the report.

Conclusions
In conclusion, we have shown that the PRAQ-practice and PRAQ-outcome generally have acceptable measurement properties and appear to be suitable PROMs for their respective purposes. However, further validation research is needed in patients who suffer from higher levels of sleepiness, to study the validity of the sleepiness domain. The applicability of a PROM for use in clinical practice and for measuring outcomes on aggregate level, Fig. 2 The first page of the PRAQ-report a. The shaded items of the "sleepiness" domain were removed from this domain in the final version of the PRAQ b. These items had an additional response option "not applicable" or (for item 39) "no answer" The ESS asks about current daytime sleep propensity, while the PRAQ domain asks to look back on the past month and indicate how much of a problem sleepiness or falling asleep was. The domains do not cover the exact same construct, but are relatively similar. We expect a moderately strong correlation (h1). "Sleep-related impairment" is a domain with questions covering both sleepiness and tiredness. We expect a moderate to strong correlation with the PRAQ domain "sleepiness" (h2) because of the overlapping items on sleepiness, but also because the concept of sleepiness itself correlates with the concept of tiredness and how tiredness affects daily activities in our study population (correlation strength 0.57 as found in our principal component analysis).

Dissimilar constructs
We expect the correlations with the PROMIS fatigue and RAND-36 vitality domains to be lower than the correlations with the similar constructs of the PROMs mentioned above (h3

Dissimilar constructs
We expect the correlations with the ESS and the "PROMIS satisfaction with social roles and activities" domain to be lower than the correlations with the similar constructs of the PROMs mentioned above (h6). Emotions PROMIS anger 0.6-0.9 (+) The three PROMIS domains contain items asking about how often certain emotions are felt, which is the same approach as the PRAQ-domain. The PRAQ-domain also contains all three of these types of emotions. Therefore we expect a strong correlation with all three of these domains (h1-3).

Dissimilar constructs
We expect the correlations of the PRAQ "emotions" domain with the ESS, RAND-36, and the PROMIS domains ability to participate in social roles and activities and "satisfaction with social roles and activities" to be lower than the correlations with the similar constructs of the PROMs mentioned above (h4). The most similar domain that we included for the PRAQ domain "symptoms at night" is the PROMIS "sleep disturbance" domain. This domain contains items about whether patients are sleeping well. Even though a majority of the items in the PRAQ domain "symptoms at night" will affect the quality of sleep, the content of the two domains is very different. We therefore will not make a very precise hypothesis, and expect at least a low to moderate correlation (h1   a. A positive change score stands for a reduction in symptoms b. All change scores are significant, p-value < 0.00