Skip to main content

Psychometric properties of the itch numeric rating scale, skin pain numeric rating scale, and atopic dermatitis sleep scale in adult patients with moderate-to-severe atopic dermatitis

Abstract

Background

The Itch Numeric Rating Scale (NRS), Skin Pain NRS, and Atopic Dermatitis Sleep Scale (ADSS) are self-administered patient-reported outcome (PRO) instruments developed to assess symptoms in patients with atopic dermatitis (AD). The objective of this study was to evaluate the psychometric properties (reliability, validity, and responsiveness) and interpretability thresholds of these PROs using data from three pivotal Phase 3 studies in adults.

Methods

BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5 evaluated the safety and efficacy of baricitinib in adults with moderate-to-severe AD. Clinician-reported outcomes and other PROs commonly assessed in patients with AD were used to estimate meaningful changes and evaluate test–retest reliability, convergent and divergent validity, known-groups validity, responsiveness, and meaningful change thresholds (MCTs) of the Itch NRS, Skin Pain NRS, and ADSS.

Results

The test–retest reliability of the Itch NRS, Skin Pain NRS, and ADSS was evidenced by generally large intraclass correlation coefficients (> 0.7) in stable groups of patients between baseline and Week 1 and Weeks 4 and 8. Moderate-to-large correlations (r > 0.4) at baseline and Week 16 were generally observed between each measure and other PROs measuring the same concept, supporting convergent validity. Small-to-moderate correlations with clinician-reported outcomes demonstrated divergent validity. Each instrument was able to distinguish between known groups of disease severity as assessed using other indicators of AD severity. The responsiveness of the Itch NRS, Skin Pain NRS, and ADSS scales was demonstrated through significant differences in their change scores from baseline to Week 16 between categories of change in another PRO also from baseline to Week 16. Thresholds for interpreting meaningful change were estimated as − 4.0 for the 0–10 Itch and Skin Pain NRS items; − 1.25 for the 0–4 ADSS Items 1 and 3 and; − 1.50 for the 0–29 ADSS Item 2, these equivalent to moderate degrees of change.

Conclusions

Results of this study demonstrate that the psychometric properties of the Itch NRS, Skin Pain NRS, and ADSS are good to excellent. These findings support the use of these instruments in daily assessment of AD symptoms in adults with moderate-to-severe AD.

Trial registration ClinicalTrials.gov numbers: NCT03334396, NCT03334422, and NCT03435081.

Background

Patients with moderate-to-severe atopic dermatitis (AD) experience a heavy disease burden that substantially impacts both physical and mental functioning. Intense itch, skin pain, and related sleep disturbance are highly prevalent symptoms that patients with AD report as significantly affecting their quality of life (QoL) [1, 2]. The most commonly used instruments to assess the severity of AD include the Investigator Global Assessment (IGA) and the Eczema Area and Severity Index (EASI) [3,4,5]. These instruments are based on a physician’s visual assessment of clinical signs, and thus fail to capture the patient-experienced symptoms of itch, skin pain, and their impact on sleep. Though itch, skin pain, and sleep disturbance are important to patients with AD, measurement of these burdensome symptoms in clinical trials has so far been limited. Specific patient-reported outcome (PRO) measures may be useful to understand the burden from these symptoms better.

The Itch Numeric Rating Scale (NRS), Skin Pain NRS, and Atopic Dermatitis Sleep Scale (ADSS) are PROs designed to specifically measure the severity of a patient’s itch and skin pain, and assess impact of itch on sleep, respectively. These tools were developed according to the Food and Drug Administration (FDA) PRO guidelines [6], as simple, self-administered assessments in daily electronic diaries used in AD clinical trials. Previous studies found that the Itch NRS, Skin Pain NRS [7], and ADSS had good content validity, i.e. represent aspects of disease that are meaningful to patients. However, the psychometric properties of each measure were not assessed. Instruments can assess clinically relevant information, but not have sufficient validity, reliability, or interpretability to be used in clinical trials or practice. These psychometric properties are needed to support the use of these measures in clinical trials. The objective of this study was to determine the reliability, validity, responsiveness, and meaningful change of the Itch NRS, Skin Pain NRS, and ADSS in patients with moderate-to-severe AD using data from three Phase 3 clinical trials.

Methods

Study population

BREEZE-AD1 (AD1), BREEZE-AD2 (AD2), and BREEZE-AD5 (AD5) were three multicenter, randomized, double-blind, placebo-controlled, parallel-group Phase 3 clinical trials that evaluated the safety and efficacy of once daily, oral baricitinib 1 mg, and 2 mg, and 4 mg (in AD1 and AD2 only) versus placebo in adult patients with moderate-to-severe AD. In each trial, patients were ≥ 18 years old and intolerant or inadequate responders to topical therapy. At screening and baseline, patients were required to have an EASI score ≥ 16, a validated Investigator Global Assessment for Atopic Dermatitis (vIGA-AD™) score ≥ 3, and a body surface area (BSA) involvement ≥ 10%. Full details of each study, including the primary efficacy and safety outcomes, have been reported previously [8, 9]. Each study was conducted with informed consent, under institutional review board approval, and in accordance with the Declaration of Helsinki (ClinicalTrials.gov numbers: NCT03334396 (AD1), NCT03334422 (AD2), and NCT03435081 (AD5)).

Instruments used in the psychometric analyses

Itch NRS, Skin Pain NRS, ADSS

The Itch NRS is a single item designed to capture information on self-reported severity of worst itching each day. Patients were asked to rate itching severity based on the worst level of itching in the past 24 h using an 11-point scale from 0 (“no itch”) to 10 (“worst itch imaginable”). The single-item Skin Pain NRS assesses self-reported severity of worst skin pain each day. For this, patients were asked to select a number from 0 (“no pain”) to 10 (“worst pain imaginable”) that best described the worst level of skin pain in the past 24 h. The three-item ADSS captures self-reported impact of itch on sleep disturbance each day, including: difficulty falling asleep (Item 1); number of night-time awakenings (Item 2) and; difficulty falling back asleep after waking (Item 3) during the previous night. Each ADSS item was scored individually. For Items 1 and 3, patients were asked to select a score ranging from 0 (“not at all”) to 4 (“very difficult”). For Item 2, patients selected the number of times they woke up each night, ranging from 0 to 29 times. Patients only answered Item 3 if their answer to Item 2 was greater than 0. These three PROs were self-assessed using a daily electronic diary, starting at screening through Week 16. Information was entered into the electronic diary at the end of each patient’s day. For each measure, weekly mean scores using the previous 7 days were calculated if at least 4 of the 7 diary values were non-missing. Weekly averages were calculated at baseline (Week 0) and Weeks 1, 2, 4, 8, 12, and 16.

Other scales

The PROs used to evaluate the psychometric properties of the Itch NRS, Skin Pain NRS, and ADSS included: (1) the Dermatology Life Quality Index (DLQI) [10], a self-reported measure of the impact of AD on QoL; (2) the Patient Oriented Eczema Measure (POEM) (11), a self-assessed disease severity score; and (3) the Patient Global Impression of Severity-Atopic Dermatitis (PGI-S-AD). More specifically, the PGI-S-AD is a single item asking patients to rate their overall AD symptoms over the last 24 h, ranging from “no symptoms” to “severe.” The PGI-S-AD measure was collected in the daily diary along with the Itch NRS, Skin Pain NRS, and ADSS items; the other PROs (DLQI and POEM) were assessed during clinic visits. In addition, the clinician-completed EASI, an evaluation of disease extent and clinical signs, was used in the psychometric validation.

Statistical analyses

The following psychometric evaluation methods used in this study are in accordance with the published FDA guidance for assessing the measurement properties of PROs [6] and recent psychometric consensus discussions and presentations [12]. Unless otherwise stated, all analyses were conducted on eligible patients from the intent-to-treat (ITT) population who had weekly mean scores for the Itch NRS, Skin Pain NRS, or ADSS items at baseline. Analysis at visits following baseline includes all patients who had data at baseline and at the respective follow-up days or visits. All analyses were conducted using SAS Version 9.3 or higher (SAS Version 9. 2013. Cary, NC, SAS Institute Inc.).

Test–retest reliability

Test–retest reliability, which measures if instrument scores are reproducible across time, was assessed in a stable patient population during the interval between Week 0 and Week 1 as well as between Weeks 4 and 8. Stable patients were defined as those in the ITT population with weekly mean PGI-S-AD scores between − 0.50 and + 0.50 during each time interval. Intra-class correlation coefficients (ICCs) were calculated between the initial and retest periods. An ICC of ≥ 0.70 was considered acceptable agreement [13,14,15].

Construct validity (convergent and divergent validity)

Construct validity refers to the degree to which scores from one measure are theoretically consistent with those of another measure. Convergent and divergent validity were assessed using Spearman’s correlations between each of the Itch NRS, Skin Pain NRS, and ADSS items, and the scores of the PGI-S-AD, DLQI, POEM, and EASI. All analyses were conducted at Weeks 0 and 16. The strength of correlations was interpreted using Cohen’s conventions, where > 0.70 is large, 0.40–0.70 is moderate, and < 0.40 is small [12,13,14, 16, 17].

It was hypothesized that convergent validity, evidenced by moderate or large correlations, would be demonstrated at Weeks 0 and 16 between each of the Itch NRS, Skin Pain NRS, and ADSS items with the other PROs related to AD symptoms (POEM, DLQI, and PGI-S-AD), and that divergent validity, evidenced by small-to-moderate correlations, would be demonstrated between each of the instruments of interest with the more distally related clinician-completed assessment (EASI).

Known-groups validity (discriminant validity)

Known-groups validity was assessed by exploring the ability of each instrument to discriminate between subgroups of patients with different underlying disease severity. Based on the evaluation of construct validity, measures correlating with the Itch NRS, Skin Pain NRS, or ADSS above the 0.35 criterion for acceptable correlations [18, 19] were considered in the analyses of known-groups validity.

Patients were stratified into severity groups based on baseline scores of PGI-S-AD (weekly mean score of < 3 “no symptoms to mild symptoms” and ≥ 3 “moderate-to-severe symptoms”) and POEM (scores 0–7 “clear to mild,” scores 8–16 “moderate-to-severe,” and scores 17–28 “severe to very severe” [11]. The weekly average scores on the Itch NRS, Skin Pain NRS, and ADSS items were assessed between these groups using independent samples t-tests (2 groups) and analysis of covariance (ANCOVA) controlling for the effects of age, race, and gender (> 2 groups). When ANCOVA was used, post hoc t-tests assessed the mean weekly score between consecutive severity groups. Any severity group with < 20 patients were omitted from the analysis to ensure sufficient data for interpretation.

Responsiveness

Responsiveness, the ability of the measure to detect change when change in the construct of relevance has occurred, was evaluated using ANCOVAs and post-hoc paired t-tests to assess significant differences in mean changes in the Itch NRS, Skin Pain NRS, and ADSS items from Week 0 to Week 4 and Week 0 to Week 16 between groups of patients with different degrees of change in the construct of relevance. The standardized response mean (SRM) [19] was used to interpret the magnitude of responsiveness of each measure; based on Cohen’s recommendations [19], SRMs of 0.20, 0.50, and 0.80 represent small, moderate, and large changes, respectively [20].

Mean changes were assessed within 4 change categories of the POEM: (1) “much improved” patients who moved more than one health category to a better health category (> 1 category improvement); (2) “improved” patients who moved by one health category to a better health category (1 category improvement); (3) “stable” patients who remained in the same health category (no category change); and (4) “declined” patients who moved to a worse health category (≥ 1 category worsening). These categories were based on changes from baseline to the respective time point in the POEM severity category (scores 0–7 “clear to mild,” scores 8–16 “moderate,” and scores 17–28 “severe to very severe” [11]. It was hypothesized that statistically significant differences in the Itch NRS, Skin Pain NRS, and ADSS items would be observed between POEM change categories [11]. Differences in change scores between groups were tested using ANCOVA, controlling for age, gender, and race [21]. Post hoc t-tests and SRMs between consecutive change groups were also conducted.

Meaningful change estimation

Meaningful change refers to the individual-patient level of differences in scores in the domain of relevance which patients perceive as meaningful [6].

Anchor-based assessment

An anchor-based analysis, with weekly mean PGI-S-AD serving as the anchor variable, was the primary method used to derive clinical interpretations of the Itch NRS, Skin Pain NRS, and ADSS items. Spearman’s correlations were evaluated between the PGI-S-AD weekly average score and each measure at baseline, Week 4, and Week 16. Spearman’s correlations were also used to compare the change in the PGI-S-AD weekly average with each measure’s weekly average from baseline to Week 4 and Week 16.

To determine within patient meaningful change thresholds (MCTs), patients were classified into response groups based on their level of change in the PGI-S-AD between baseline and Weeks 4 and 16. These groups included “very marked improvement” (≤ −2.5 weekly average score change), “marked improvement” (> −2.5 and ≤ −1.5), “minimal improvement” (> −1.5 and ≤ −0.5), “no change” (> −0.5 and < 0.5), “minimal worsening” (≥ 0.5 and < 1.5), and “marked worsening” (≥ 1.5). MCTs on the Itch NRS, Skin Pain NRS, and ADSS items were based on change from baseline to Week 16 (primary analysis) and baseline to Week 4 (sensitivity analysis) within PGI-S-AD severity groups. A range of MCT estimates (minimal, moderate, and large) were computed for changes in each measure based on observed changes in the minimal, marked, and very marked PGI-S-AD improvement groups. A final MCT estimate for each measure was taken as the MCT equivalent to a moderate degree of change.

Distribution-based methods

Meaningful change analyses were also supported by distribution-based methods, which identify the raw score change on a measure that will produce a prespecified effect size and which identify a change which is beyond measurement error [22]. Distribution-based estimates were derived using weekly averages of the Itch NRS, Skin Pain NRS, and ADSS items at baseline. MCT estimates equivalent to 0.2, 0.5, and 0.8 pooled SDs were calculated. The Standard Error of Measurement (SEM) was calculated using the ICC from the test–retest analysis.

Handling of missing data

For Weeks 1, 2, 4, 8, and 12, weekly mean scores for Itch NRS, Skin Pain NRS, and ADSS items were set to missing if there were fewer than 4 non-missing values in the 7-day period before the respective clinic visit. For Week 0 and Week 16 analyses, if there were fewer than 4 non-missing assessments during the week prior to the visit, the 7-day window was extended by 1 day at a time (up to a maximum of 7 additional days) until there were at least 4 non-missing values.

Results

A total of 624 patients in AD1, 615 patients in AD2, and 440 patients in AD5 were included. Patients’ baseline demographics and scores for the instruments of interest and other assessments are listed in Table 1.

Table 1 Descriptive analysis of baseline demographic characteristics for BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5

Test–retest reliability

The results of the test–retest analysis for each instrument in each study are provided in Table 2. Across all studies, the ICCs ranged from 0.770 to 0.875 for the weekly average Itch NRS and from 0.753 to 0.845 for the weekly average Skin Pain NRS; this indicated acceptable agreement among stable patients using both 1-week and 4-week intervals. For ADSS Items 1, 2 and 3, the ICCs for the weekly average score ranged from 0.754 to 0.843, 0.585 to 0.921, and 0.671 to 0.784, respectively, indicating generally acceptable agreement using both 1- and 4-week assessment intervals. These high levels of agreement indicated that all measures had good test–retest validity.

Table 2 Test–retest reliability assessment of itch NRS, skin pain NRS and ADSS for BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5

Construct validity (convergent and divergent validity)

Results supporting convergent and divergent validity of the Itch NRS, Skin Pain NRS, and ADSS items are shown in Table 3. Moderate-to-large correlations between the reference PRO assessments of AD symptoms and the Itch NRS (r range: 0.483–0.762 at baseline and 0.586–0.834 at Week 16) and the Skin Pain NRS (r range: 0.474–0.727 at baseline and 0.549–0.768 at Week 16) supported convergent validity. Similarly, moderate correlations, supporting convergent validity, were generally observed between the PRO assessments and ADSS Item 1 (r range: 0.499–0.651 at baseline and 0.508–0.670 at Week 16), Item 2 (r range: 0.368–0.468 at baseline and 0.424 and 0.516 at Week 16), and Item 3 (r range: 0.403–0.639 at baseline and 0.466–0.809 at Week 16).

Table 3 Correlations between the Itch NRS, skin pain NRS, and ADSS with other instruments for BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5 at baseline and week 16

Small-to-moderate correlations, supporting divergent validity, were observed between the clinical assessment and the following: Itch NRS (r range: 0.223–0.229 at baseline and 0.398–0.505 at Week 16); Skin Pain NRS (r range: 0.222–0.251 at baseline and 0.338–0.455 at Week 16); ADSS Item 1 (r range 0.140–0.281 at baseline and 0.363–0.403 at Week 16); ADSS Item 2 (r range: 0.131–0.245 at baseline and 0.254–0.357 at Week 16), and; ADSS Item 3 (r range 0.152–0.298 at baseline and 0.237 and 0.394 at Week 16).

Known-groups validity

Table 4 reports the findings of known-groups validity analysis of each instrument using PGI-S-AD and POEM subgroups to define AD severity. At baseline, in all 3 studies, compared with patients in the moderate categories, patients in the severe categories of the PGI-S-AD and POEM had significantly more itching (p < 0.0001), skin pain (p < 0.0001), sleep disturbance (p < 0.0001), night-time awakenings (p < 0.01), and difficulty falling back asleep after waking (p < 0.0001) as demonstrated by higher mean scores on Itch NRS, Skin Pain NRS, ADSS Items 1, 2, and 3, respectively. These findings suggest that the Itch NRS, Skin Pain NRS, and ADSS items are able to distinguish between known groups based on disease severity.

Table 4 Known-groups validity of the itch NRS, skin pain NRS, and ADSS using PGI-S-AD and POEM subgroups at baseline for BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5

Responsiveness

The responsiveness of the Itch NRS, Skin Pain NRS, and ADSS items between Weeks 0 and 16 and between Weeks 0 and 4 are shown in Tables 5 and 6, respectively. In all three studies, the magnitude of improvement in each instrument increased with greater improvement in the POEM, supporting the ability of each measure to detect change in the construct of relevance where change has occurred. For the Itch NRS and Skin Pain NRS, in each study at Weeks 4 and 16, the “much improved” group statistically significantly differed from the “improved” group (p < 0.001 for Itch NRS, p < 0.05 for Skin Pain NRS), and the “improved” category statistically significantly differed from the “stable” group (p < 0.0001 for both). In each study, at Week 16, the scores of each ADSS item increased with each improvement category; however, not all comparisons between consecutive improvement categories were statistically significant (Table 5).

Table 5 Within group mean and median change scores for responsiveness of the itch NRS, skin pain NRS, and ADSS to change on the POEM between baseline and week 16 for BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5
Table 6 Within group mean and median change scores for responsiveness of the itch NRS, skin pain NRS, and ADSS to change on the POEM between baseline and week 4 for BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5

Meaningful change estimation

Anchor-based

Anchor-based estimates of the MCTs (minimal, moderate, and large) for each measure are listed in Table 7. For the 0–10 Itch NRS, the final estimate of meaningful change was − 4.0, with a reduction of 4 categories on the instrument consistent with moderate degree of change. Similarly, the final MCT for the 0–10 Skin Pain NRS was taken as − 4.0, also equivalent to a moderate degree of change. The final MCTs for ADSS Items 1, 2, and 3, respectively were − 1.25, − 1.50, and − 1.25, indicating that the smallest weekly averages are consistent with at least a moderate degree of improvement.

Table 7 Anchor-based estimates of MCTs for the itch NRS, skin pain NRS, and ADSS items in BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5

Distribution-based

Distribution-based MCTs are listed in Table 8. Compared with anchor-based thresholds, SD and SEM estimates were smaller for all measures but the ADSS Item 2; this indicated that the anchor-based estimates are generally above measurement error and thus that improvements in these measures reflect a true improvement in condition severity. The larger distribution-based estimates for ADSS Item 2 reflected the large variability and skewness of this measure at baseline.

Table 8 Distribution-based estimates of MCTs for the Itch NRS, Skin Pain NRS, and ADSS items in BREEZE-AD1, BREEZE-AD2, and BREEZE-AD5

Discussion

This study evaluated the psychometric properties of the Itch NRS, Skin Pain NRS, and ADSS using data from three clinical trials of patients with moderate-to-severe AD. For each measure, assessment of test–retest reliability found high levels of agreement in stable groups of patients across all three studies for both 1-week and 4-week comparisons, indicating reliability of each instrument when no change would be expected. As hypothesized, the construct validity of each measure was also demonstrated, with moderate-to-large correlations with other PROs (POEM, DLQI, and PGI-S-AD) supporting convergent validity and smaller correlations with the more distally-related provider assessment (EASI) supporting divergent validity. These findings suggest that the Itch NRS, Skin Pain NRS and ADSS measure the underlying concept of AD symptomatology and, moreover, encapsulate unique information regarding disease symptoms, which can complement clinician-reported assessments in clinical trials. In addition, comparisons of the Itch NRS, Skin Pain NRS, and each ADSS item between PGI-S-AD and POEM severity categories demonstrated each measure’s ability to distinguish between known groups based on disease severity. Responsiveness was established through the ability of each instrument to discriminate significantly between subgroups of patients based on four change categories of the POEM (“much improved,” “improved,” “stable” and “declined”). Overall, the Itch NRS, Skin Pain NRS, and ADSS were determined to be highly reliable, valid, and responsive, supporting the use of these PRO instruments in daily assessment of AD symptoms in adults with moderate-to-severe AD.

Using anchor- and distribution-based analyses, thresholds for interpreting change of each measure were derived as criteria to assess treatment benefits in patients with AD. Four-point changes in the Itch NRS and Skin Pain NRS were found to demonstrate clinically meaningful responses in itch and skin pain severity, respectively. This 4-point change in the Itch NRS is consistent with minimal clinically important differences reported for similar itch scales [23, 24]. Changes of 1.25 points in ADSS Items 1 and 3 and 1.5 points in ADSS Item 2 were found to optimally demonstrate clinically meaningful improvements in sleep disturbance. These findings further confirm previous psychometric validation data of itch NRS in AD and psoriasis [23, 24].

The potential importance of these measures in clinical practice is indicated by the fact that patients with AD have identified itch, skin pain, and sleep disturbance as bothersome and distressing symptoms of their disease [25], but these are difficult or impossible for clinicians to assess using conventional tools. There is thus an unmet need for measures which can assess these patient-perceived symptoms. For example, EASI or BSA instruments assess important signs of disease, but these do not capture the impacts of itch, skin pain and sleep disturbance from AD as perceived by patients. Existing PROs of AD, such as the POEM, and Scoring Atopic Dermatitis or SCORAD include sleep items, but these items are included as part of a total score and do not assess the full impact of itch on sleep disturbance [11, 26]. These existing instruments are thus limited in their ability to accurately evaluate the impact of treatments on specific patient-reported symptoms in clinical trials. The implementation of the Itch NRS, Skin Pain NRS, and ADSS in AD clinical trials may therefore address this unmet need. Further, given the increasing use of electronic diaries in clinical settings, these low burden, simple, and specific PRO measures of symptoms may be useful in guiding treatment decisions in practice.

Though this study demonstrated strong evidence for the reliability, validity, and responsiveness of the Itch NRS, Skin Pain NRS, and ADSS, the data used in this psychometric validation are from a clinical trial and hence may not be generalizable to clinical practice. In addition, the inclusion and exclusion criteria of the three underlying studies limit this validation to adult patients with moderate-to-severe AD. Only a few patients were available in the mild group for assessing known-groups validity of each instrument using PGI-S-AD and POEM subgroups to define AD severity. The results of this study are also limited to a subset of patients who fluently spoke a language into which the assessment tool had been translated. The FDA recommends daily assessment of symptoms by patients as a shorter recall period allows for more reliable interpretation of symptom data [6]. However, while averaging scores over a 7-day period accounts for day-to-day variation in this analysis, this reduced variability may artificially increase the correlations with other measures [24]. Additionally, a similar study of itch severity measurement suggested a 7-day recall may be more clinically relevant [27]. Nevertheless, future studies are warranted to assess correlations between the Itch NRS, Skin Pain NRS and ADSS, which may further support the use of the three separate instruments in clinical practice.

Conclusions

The results of this study demonstrate that the Itch NRS, Skin Pain NRS, and ADSS are highly reliable, valid, and responsive measures of symptoms that are important to patients with AD. In addition, each PRO is able to measure clinically important symptom changes in these patients. These findings support the use of these PRO instruments in clinical trials of patients with moderate-to-severe AD.

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available due to individual data privacy but may be available from the corresponding author on reasonable request. Use of the three measures can be requested from copyright@lilly.com.

Abbreviations

AD:

Atopic dermatitis

ADSS:

Atopic dermatitis sleep scale

ANCOVA:

Analysis of covariant

ANOVA:

Analysis of variance

BSA:

Body surface area

DLQI:

Dermatology life quality index

EASI:

Eczema area and severity index

FDA:

Food and drug administration

ICC:

Intra-class correlation coefficients

IGA:

Investigator global assessment

ITT:

Intent-to-treat

MCT:

Meaningful change thresholds

NRS:

Numeric rating scale

PGI-S-AD:

Patient global impression of severity–atopic dermatitis

POEM:

Patient oriented eczema measure

PRO:

Patient-reported outcome

QoL:

Quality of life

SEM:

Standard error measurement

SD:

Standard deviation

SRM:

Standardized response mean

vIGA-AD:

Validated investigator global assessment for atopic dermatitis

References

  1. 1.

    Lifschitz C. The impact of atopic dermatitis on quality of life. Ann Nutr Metab. 2015;66(Suppl 1):34–40.

    CAS  Article  Google Scholar 

  2. 2.

    Vakharia PP, Chopra R, Sacotte R, Patel KR, Singam V, Patel N, et al. Burden of skin pain in atopic dermatitis. Ann Allergy Asthma Immunol. 2017;119(6):548–52.

    Article  Google Scholar 

  3. 3.

    Hanifin JM, Thurston M, Omoto M, Cherill R, Tofte SJ, Graeber M. The eczema area and severity index (EASI): assessment of reliability in atopic dermatitis. EASI Evaluator Group Exp Dermatol. 2001;10(1):11–8.

    CAS  Article  Google Scholar 

  4. 4.

    Futamura M, Leshem YA, Thomas KS, Nankervis H, Williams HC, Simpson EL. A systematic review of Investigator Global Assessment (IGA) in atopic dermatitis (AD) trials: Many options, no standards. J Am Acad Dermatol. 2016;74(2):288–94.

    Article  Google Scholar 

  5. 5.

    Simpson E, Bissonnette R, Eichenfield LF, Guttman-Yassky E, King B, Silverberg JI, et al. The Validated Investigator Global Assessment for Atopic Dermatitis (vIGA-AD): The development and reliability testing of a novel clinical outcome measurement instrument for the severity of atopic dermatitis. J Am Acad Dermatol. 2020;83(3):839–46.

    Article  Google Scholar 

  6. 6.

    US-FDA. Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims 2009 [updated December 2009. https://www.fda.gov/downloads/drugs/guidances/ucm193282.pdf.

  7. 7.

    Newton L, DeLozier AM, Griffiths PC, Hill JN, Hudgens S, Symonds T, et al. Exploring content and psychometric validity of newly developed assessment tools for itch and skin pain in atopic dermatitis. J Patient Rep Outcomes. 2019;3(1):42.

    Article  Google Scholar 

  8. 8.

    Simpson EL, Lacour JP, Spelman L, Galimberti R, Eichenfield LF, Bissonnette R, et al. Baricitinib in patients with moderate-to-severe atopic dermatitis and inadequate response to topical corticosteroids: results from two randomized monotherapy phase III trials. Br J Dermatol. 2020;183(2):242–55.

    CAS  Article  Google Scholar 

  9. 9.

    Simpson EFS, Silverberg J, Zirwas E, Han G, Guttman-Yassky E, Marnell D, Bissonnette R, Waibel J, Nunes F, DeLozier A, Angle R, Holzwarth K, Goldblum O, Zhong J, Papp K. Efficacy and safety of baricitinib in moderate-to-severe atopic dermatitis: Results from a randomized, double-blinded, placebo-controlled phase 3 clinical trial (BREEZE-AD5). Revolutionizing Atopic Dermatitis, 5 April 2020. Br J Dermatol. 2020;183(4):e94–121.

    Google Scholar 

  10. 10.

    Finlay AY, Khan GK. Dermatology Life Quality Index (DLQI)—a simple practical measure for routine clinical use. Clin Exp Dermatol. 1994;19(3):210–6.

    CAS  Article  Google Scholar 

  11. 11.

    Charman CR, Venn AJ, Ravenscroft JC, Williams HC. Translating Patient-Oriented Eczema Measure (POEM) scores into clinical practice by suggesting severity strata derived using anchor-based methods. Br J Dermatol. 2013;169(6):1326–32.

    CAS  Article  Google Scholar 

  12. 12.

    Outcomes and Psychometric Summit. Clinical Outcomes Solutions, C-Path PRO Consortium partner led Meeting Tucson, Arizona 2015.

  13. 13.

    Litwin M. How to measure survey reliability and validity. 7th ed. Thousand Oaks: Sage Publications; 1995.

    Book  Google Scholar 

  14. 14.

    Nunnally J. The assessment of reliability. In: Bernstein I, editor. Psychometric theory. New York: McGraw Hill; 1994. p. 248–92.

    Google Scholar 

  15. 15.

    Vaz S, Falkmer T, Passmore AE, Parsons R, Andreou P. The case for using the repeatability coefficient when calculating test-retest reliability. PLoS ONE. 2013;8(9):e73990.

    CAS  Article  Google Scholar 

  16. 16.

    Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. New York: Lawrence Erlbaum Associates; 1988.

    Google Scholar 

  17. 17.

    Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16(3):297–334.

    Article  Google Scholar 

  18. 18.

    Coon CD, Cook KF. Moving from significance to real-world meaning: methods for interpreting change in clinical outcome assessment scores. Qual Life Res. 2018;27(1):33–40.

    Article  Google Scholar 

  19. 19.

    Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61(2):102–9.

    Article  Google Scholar 

  20. 20.

    Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989;27(3 Suppl):S178–89.

    CAS  Article  Google Scholar 

  21. 21.

    Heeren T, D’Agostino R. Robustness of the two independent samples t-test when applied to ordinal scaled data. Stat Med. 1987;6(1):79–90.

    CAS  Article  Google Scholar 

  22. 22.

    McLeod LD, Coon CD, Martin SA, Fehnel SE, Hays RD. Interpreting patient-reported outcome results: US FDA guidance and emerging methods. Expert Rev Pharmacoecon Outcomes Res. 2011;11(2):163–9.

    Article  Google Scholar 

  23. 23.

    Kimball AB, Naegeli AN, Edson-Heredia E, Lin CY, Gaich C, Nikai E, et al. Psychometric properties of the Itch Numeric Rating Scale in patients with moderate-to-severe plaque psoriasis. Br J Dermatol. 2016;175(1):157–62.

    CAS  Article  Google Scholar 

  24. 24.

    Yosipovitch G, Reaney M, Mastey V, Eckert L, Abbe A, Nelson L, et al. Peak Pruritus Numerical Rating Scale: psychometric validation and responder definition for assessing itch in moderate-to-severe atopic dermatitis. Br J Dermatol. 2019;181(4):761–9.

    CAS  Article  Google Scholar 

  25. 25.

    Silverberg JI. Associations between atopic dermatitis and other disorders. F1000Res. 2018;7:303.

    Article  Google Scholar 

  26. 26.

    Kunz B, Oranje AP, Labrèze L, Stalder JF, Ring J, Taïeb A. Clinical Validation and Guidelines for the SCORAD Index: Consensus Report of the European Task Force on Atopic Dermatitis. Dermatology. 1997;195(1):10–9.

    CAS  Article  Google Scholar 

  27. 27.

    Silverberg JI, Lai JS, Patel KR, Singam V, Vakharia PP, Chopra R, et al. Measurement properties of the Patient-Reported Outcomes Information System (PROMIS((R))) Itch Questionnaire: itch severity assessments in adults with atopic dermatitis. Br J Dermatol. 2020;183:891–8.

    CAS  Article  Google Scholar 

Download references

Acknowledgements

Medical writing and editorial support were provided by Amy Ellinwood, MPH, Ph.D., and Santanu Bhadra, Ph.D., of Eli Lilly and Company.

Funding

This study was funded by Eli Lilly and Company.

Author information

Affiliations

Authors

Contributions

JIS, AD, LS, JPT, BK, GY, FPN, PCG, HAD and LFE made substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data. All authors were involved in drafting the manuscript or revising it critically for important intellectual content and gave final approval of the version to be published.

Corresponding author

Correspondence to Amy DeLozier.

Ethics declarations

Ethics approval and consent to participate

Each study was conducted with informed consent, under institutional review board approval, and in accordance with the Declaration of Helsinki (ClinicalTrials.gov numbers NCT03334396 [AD1], NCT03334422 [AD2], and NCT03435081 [AD5]).

Competing interests

JIS has received honoraria as a consultant and/or advisory board member for Abbvie, Afyx, Arena, Asana, Bluefin, Boehringer-Ingelheim, Celgene, Dermavant, Dermira, Eli Lilly, Galderma, GlaxoSmithKline, Incyte, Kiniksa, Leo, Luna, Menlo, Novartis, Pfizer, RAPT, Regeneron, Sanofi; speaker for Regeneron, Sanofi; institution received grants from Galderma. AD, LS and FN are employees of Eli Lilly and Company and may hold stock and/or stock options in the company. JPT reports personal fees from Pfizer, personal fees from Eli Lilly & Co, personal fees from Abbvie, personal fees from LEO Pharma, grants and personal fees from Regeneron, grants and personal fees from Sanofi-Genzyme, outside the submitted work. GY has been on advisory boards for and received honoraria from Sanofi and Regeneron Pharmaceuticals, Inc. TREVI, Pfizer, Novartis, Eli Lilly, Kiniksa, LEO, Galderma, Kiniksa, GSK, and his research has been funded by Pfizer, Galderma, Novartis, LEO, Kinksa, Sanofi Regeneron and Sun Pharma. LFE has received honoroaria for his work as a consultant for Abbvie, Dermavant, Dermira, Leo, Eli Lilly, Novartis, Regeneron, Sanofi-Genzyme and Ortho Dermatology, been an investigator/received grants for Abbvie, Galderma Laboratories, Ortho Dermatology and Pfizer. BK reports personal fees from AbbVie, personal fees from Almirall, personal fees from Boehringer Ingelheim, grants and personal fees from Cara Therapeutics, personal fees from AstraZeneca, personal fees from Menlo Therapeutics, personal fees from Regeneron, personal fees from Sanofi Genzyme, grants and personal fees from LEO Pharma, personal fees from Trevi Therapeutics, personal fees from Daewoong, personal fees from OM Pharma, personal fees from Incyte, personal fees from Amagma, personal fees from Maruho, outside the submitted work; In addition, BK has a patent on JAK inhibitors for chronic itch pending to None. PCG and HAD report no conflict of interest. Fabio P. Nunes was an employee of Eli Lilly and Company, Indianapolis, Indiana, USA at the time of conducting this study. Currently he is an employee of Janssen Pharmaceutical Companies of Johnson & Johnson, Raritan, New Jersey, USA.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Silverberg, J.I., DeLozier, A., Sun, L. et al. Psychometric properties of the itch numeric rating scale, skin pain numeric rating scale, and atopic dermatitis sleep scale in adult patients with moderate-to-severe atopic dermatitis. Health Qual Life Outcomes 19, 247 (2021). https://doi.org/10.1186/s12955-021-01877-8

Download citation

Keywords

  • Atopic dermatitis
  • Atopic dermatitis sleep scale
  • Convergent-divergent validity
  • Itch NRS
  • Numeric rating scale
  • Patient-reported outcome
  • Psychometric
  • Reliability
  • Responsiveness
  • Skin pain NRS
  • Validity