Skip to main content

Psychometric properties of computerized adaptive testing for chronic obstructive pulmonary disease patient-reported outcome measurement

Abstract

Background

Computerized adaptive testing (CAT) is an effective way to reduce time, repetitious redundancy, and respond burden, and has been used to measure outcomes in many diseases. This study aimed to develop and validate a comprehensive disease-specific CAT for chronic obstructive pulmonary disease (COPD) patient-reported outcome measurement.

Methods

The discrimination and difficulty of the items from the modified patient-reported outcome scale for COPD (mCOPD-PRO) were analyzed using item response theory. Then the initial item, item selection method, ability estimation method, and stopping criteria were further set based on Concerto platform to form the CAT. Finally, the reliability and validity were validated.

Results

The item discrimination ranged from 1.05 to 2.71, and the item difficulty ranged from − 3.08 to 3.65. The measurement reliability of the CAT ranged from 0.910 to 0.922 using random method, while that ranged from 0.910 to 0.924 using maximum Fisher information (MFI) method. The content validity was good. The correlation coefficient between theta of the CAT and COPD assessment test and modified Medical Research Council dyspnea scale scores using random method was 0.628 and 0.540 (P < 0.001; P < 0.001) respectively, while that using MFI method was 0.347 and 0.328 (P = 0.007; P = 0.010) respectively. About 11 items (reducing by 59.3%) on average were tested using random method, while about seven items (reducing by 74.1%) on average using MFI method. The correlation coefficient between theta of the CAT and mCOPD-PRO total scores using random method was 0.919 (P < 0.001), while that using MFI method was 0.760 (P < 0.001).

Conclusions

The comprehensive disease-specific CAT for COPD patient-reported outcome measurement is well developed with good psychometric properties, which can provide an efficient, accurate, and user-friendly measurement for patient-reported outcome of COPD.

Background

Chronic obstructive pulmonary disease (COPD), characterized by chronic respiratory symptoms and persistent (often progressive) airflow limitation, is a leading cause of morbidity and mortality worldwide inducing an economic and social burden that is both substantial and increasing [1]. The prevalence of COPD among people aged 40 years or older is 12.64% around the whole world, and 13.7% in China [2, 3]. COPD is the third leading causes of death responsible for approximately 6% of the world’s total deaths in 2019 [4]. Health status in patients with COPD declines over time [5]. A patient-reported outcome (PRO) is any report of a patient’s health status derived directly from the patient [6]. It has been increasingly recognized that PRO instruments play an important role in assessing health status and treatment outcome of COPD [7, 8]. Previously, our team has developed and validated a 27-item comprehensive disease-specific PRO measurement, the modified Patient-reported Outcome Scale for COPD (mCOPD-PRO), which is a 5-point Likert scale including physiological, psychological, and environmental domains, with lower scores indicating better health status [9]. In the stage of validation of the instrument, the measurement properties assessed referred to internal consistency reliability, content validity, construct validity, criterion validity, known groups validity, and feasibility [9]. It is showed that the mCOPD-PRO has good psychometric properties with the Cronbach’s alpha of 0.954 [9]. As a common chronic disease worldwide, patients with COPD usually require frequent outcome measures. In this case, the response burden of measurement tools can’t be ignored. Although the median completion time of the mCOPD-PRO is only 5 min, given the number of items, the response burden still needs to be considered [9].

Computerized adaptive testing (CAT), based on item response theory (IRT), is a form of testing that uses a computer to automatically select appropriate items for the examinee [10]. Generally speaking, CAT selects an item from an item bank that is appropriate to the examinee’s theta, an index of the latent trait in IRT, here the health status, and then updates the examinee’s theta according to the responses to this item [10]. This process is repeated until the examinee’s theta is accurately estimated [10]. CAT is an effective way to reduce time, repetitious redundancy, and respond burden, and has been increasingly used for psychological and health measurement in an efficient, accurate, and user-friendly manner [11,12,13]. Recently, CAT has also been used for outcome measurement of COPD [14,15,16,17,18,19,20,21,22]. Norweg A et al. developed a modified and expanded item bank of Dyspnea Management Questionnaire measuring the dyspnea, one of the most prominent symptoms in COPD, to construct the CAT using an internally developed CAT program, and subsequently conducted CAT real data simulations to estimate the CAT’s accuracy, precision, and validity [14]. Choi SW et al. and Yount SE, et al. also focused on the dyspnea in COPD, and developed a measure of dyspnea and related functional limitations, where post-hoc CAT simulations were conducted to determine the number of items required to achieve high precision [15, 16]. Additionally, Yount SE et al. administered the Patient-Reported Outcomes Measurement Information System (PROMIS®) measures using CAT, followed by administration of any remaining short form items that had not yet been administered by CAT, to examine their responsiveness to changes associated with COPD exacerbation recovery [17]. Paap MCS et al. performed simulations using empirical data from patients with COPD to assess the incremental value of multidimensional CAT compared with unidimensional CAT, and investigated the item usage for the multidimensional CAT drawing items from three PROMIS domains (fatigue, physical function, and ability to participate in social roles and activities) and a COPD-specific item bank [18, 19]. O’Hoski S et al. estimated the test-retest reliability, construct validity, and responsiveness of the CAT version of Late Life Disability Instrument in patients with COPD, which was a participation measure not specifically developed for COPD [20, 21]. Ho EH et al. used plausible values to account for measurement error and analyze the probability of true within-individual change in a sample of patients with COPD completing two PROMIS domains (physical function and fatigue), and indicated that CAT have better ability to detect change compared to short forms [22]. However, although two of the measures are designed for COPD, they are not comprehensive. By comparison, the others are multi-dimensional, but they are not designed for COPD or only include part of disease-specific items. The comprehensive disease-specific CAT for PRO measurement of COPD is urgently needed. Therefore, the current study aimed to develop and validate a comprehensive disease-specific CAT for PRO measurement of COPD based on the paper-and-pencil version of mCOPD-PRO.

Methods

Developing the CAT

The CAT was developed based on an open-source online adaptive testing platform Concerto developed and maintained by the University of Cambridge Psychometrics Centre (Available from: https://concertoplatform.com/about) [23]. The essential elements of the CAT involved item parameters calibration, initial item, item selection method, ability estimation method, and stopping criteria [24].

Item parameters calibration

The item bank of the CAT was the same as the paper-and-pencil version of mCOPD-PRO. Item parameters including discrimination (a) and difficulty (b) were estimated using graded response model of IRT. Before modelling, the core assumptions (unidimensionality, local independence, and monotonicity) were evaluated. The unidimensionality assumption was assessed using both exploratory and confirmatory factor analysis. For the former, a ratio of the eigenvalue for the first factor to the second factor in excess of four was supportive of the unidimensionality assumption [25]. As for the latter, the recommended criteria for fit indices were as follows: (1) the comparative fit index and non-normed fit index were close to 0.90; (2) the incremental fit index was close to 0.95; and (3) the standardized root-mean-square residual and root-mean-square error of approximate (RMSEA) were close to 0.08 [26, 27]. A Chen and Thissen’s index of > 0.30 implied possible local dependence [28]. The monotonicity was considered acceptable if the scalability coefficients (Hi) for the items were > 0.30 [29]. The data were from 366 patients with COPD in the phase of validation of the paper-and-pencil version of mCOPD-PRO [9]. The mean age was 66 years; 279 cases (76.2%) were males and 87 cases (23.8%) were females.

Initial item and item selection method

Item selection is dependent on the examinee’s responses to a given item. The random and maximum Fisher information (MFI) methods were adopted to select items. The former is simple and effective, and selects items randomly, while the latter selects items based on the information of items. Thus, the initial item selected by the random method is random, while that selected by the MFI method is the one with the maximum information.

Ability estimation method

As a key component of CAT, the ability estimation method not only affects the accuracy of ability estimation, but also affects the efficiency of item selection and the determination of stopping rule. In this study, the maximum likelihood estimation, one of the most widely used ability estimation method, was used to conduct ability estimation.

Stopping criteria

Generally speaking, the stopping criteria of CAT involves fixed length and standard error of measurement (SEM). The measurement accuracy may vary among subjects with different fixed lengths, and the fixed SEM can best reflect the core idea of CAT. In this study, the SEM of ≤ 0.30 was determined as our stopping criteria, which meant that the test was terminated if the pre-specified value of SEM was met or the item bank was exhausted [10].

Simulation test

The CAT simulation was performed to determine the appropriate sample size for the validation of the CAT based on the true participants’ responses data from the phase of validation of the paper-and-pencil version of mCOPD-PRO [9]. R commands for the simulated data of different sample sizes (60 cases, 100 cases, 300 cases, 500 cases, 1000 cases, 3000 cases, and 5000 cases, respectively) were generated using Firestar version 1.5.1, and then ran in R version 3.5.1. The CAT simulation settings for Firestar version 1.5.1 were as follows: (1) for the IRT model, graded response model was used; (2) for the item selection method, random and MFI methods were used respectively; (3) for the stopping criteria, the maximum SEM was 0.30; and (4) for the simulated data, the mean was specified as zero with the standard deviation of one.

Validating the CAT

There is no consensus regarding the validation of CAT for PRO instruments. In our study, the reliability was estimated based on IRT method, while the content and criterion validity were evaluated referring to classical test theory method. It was assumed that the response to each item of the CAT was consistent with that of the paper-and-pencil version of mCOPD-PRO in the phase of validation [9]. However, the number of items selected by the CAT might be less than that of the paper-and-pencil version of mCOPD-PRO.

Reliability

The measurement reliability (r) was calculated through the SEM of the CAT. The relationship between SEM and measurement reliability (r) is inversely proportional function (assuming that the mean ability of the subjects is zero and the standard deviation is one). The formula: SEM = (1-r)1/2; that is, the measurement reliability (r) = 1-SEM2 [10, 24].

Content validity

The content validity of the CAT was assessed based on the content validity of the paper-and-pencil version of mCOPD-PRO in the phase of validation [9].

Criterion validity

The COPD assessment test and modified Medical Research Council dyspnea scale (mMRC) recommended by the Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease were both used as gold standard [1, 30, 31]. The criterion validity was evaluated using correlation coefficient between the test result theta of the CAT and COPD assessment test and mMRC scores. The correlation coefficient of ≥ 0.40 was considered acceptable [9].

Statistical analysis

Continuous data were expressed as the mean ± standard deviation or median (interquartile range), while categorical data were presented as frequencies (percentages). The confirmatory factor analysis was conducted using LISREL version 8.70 (Scientific Software International, Inc., Chicago, IL, USA). The local independence and monotonicity were tested using “TestAnaAPP” and “mokken” packages in R (The R Foundation, Vienna, Austria), respectively. The IRT analysis was performed using MULTILOG version 7.03 (Scientific Software International, Inc., Chicago, IL, USA), and the CAT simulation was conducted using Firestar version 1.5.1 (Northwestern University Feinberg School of Medicine, Chicago, IL, USA) and R version 3.5.1 (The R Foundation, Vienna, Austria). Additionally, descriptive statistics, exploratory factor analysis, calculation of measurement reliability (r), correlation analysis, and independent sample t-test were conducted using SPSS version 22.0 (IBM Corporation, Armonk, NY, USA).

Results

Calibration of item parameters

The exploratory factor analysis showed that the ratio of the eigenvalues for the first factor (12.551) to the second factor (2.015) is 6.229 surpassing the thresholds of 4. Moreover, the confirmatory factor analysis showed that comparative fit index, incremental fit index, non-normed fit index, standardized root-means-quare residual, and RMSEA were 0.91, 0.91, 0.90, 0.11, and 0.16, respectively. The fit indices were close to the thresholds, except for the RMSEA which was less satisfactory. 32 of 351 item pairs (9.1%) showed a Chen and Thissen’s index of above threshold of 0.30 with a maximum of 0.526, and 25 of 32 item pairs (78.1%) ranged from 0.30 to 0.40. Most of the locally dependent items were found to be related to respiratory symptoms of COPD, such as cough, sputum, chest tightness, panting, and shortness of breath. Given the fact that multiple respiratory symptoms often coexist, and that one symptom may involve more than one item (for example, the items “Did you cough?“, “Was your cough aggravated by daily activities?“, and “Was your cough aggravated by mood swings?“), the test results of local independence are considered acceptable. The scalability coefficients (Hi) for all the items ranged from 0.36 to 0.54 exceeding the thresholds of 0.30. Overall, it can be considered that all three assumptions for IRT analysis are met. The item discrimination (a) of mCOPD-PRO ranged from 1.05 to 2.71, and the item difficulty (b) ranged from − 3.08 to 3.65 (Table 1). Moreover, the difficulty (b) of 24 items (88.9%) was between − 3.0 and 3.0. The first and fifth item characteristic curve showed monotonous changes, and the second, third and fourth item characteristic curve showed normal distribution except for individual items (Additional file 1). The maximum value of total information was 34.224 (Additional file 2).

Table 1 The item discrimination (a) and difficulty (b) of the modified patient-reported outcome scale for chronic obstructive pulmonary disease (mCOPD-PRO)

Establishment of CAT model

The CAT model established based on Concerto platform included item bank, algorithm, test system, score report, and management system modules.

Formation of the CAT

The CAT was formed. At least five links (Login, Run, Test, Feedback, and End) were needed to complete the whole test. Taking a test with random item selection method as an example, a total of 11 items were tested. The test procedure was presented in Additional file 3.

Simulation test

The results of CAT simulation using random and MFI methods were described in Table 2, Figs. 1 and 2, and Additional file 49. Despite different sample sizes of simulated data (60 cases, 100 cases, 300 cases, 500 cases, 1000 cases, 3000 cases, and 5000 cases, respectively), the items administered in different simulation testes using random method were relatively discrete, and the average number was all ten. The correlation coefficient between simulated and true theta estimates ranged from 0.970 to 0.976, and the average SEM was 0.290 or 0.291. By comparison, the items administered in different simulation testes using MFI method were relatively centralized, and the average number was all seven. The correlation coefficient between simulated and true theta estimates ranged from 0.968 to 0.979, and the average SEM ranged from 0.289 to 0.292.

Table 2 The average SEM and number of items administered, and correlation coefficient between simulated and true theta estimates in different simulation tests
Fig. 1
figure 1

Number of items administered using random method in different simulation tests

Fig. 2
figure 2

Number of items administered using maximum Fisher information method in different simulation tests

Validation of the CAT

According to the results of CAT simulation above, the sample size for the validation of the CAT was determined as 60 cases. Therefore, the true participants’ responses data from 60 patients with COPD in the phase of validation of the paper-and-pencil version of mCOPD-PRO were used to evaluate the reliability and validity of the CAT. The mean age was 65 years; 41 cases (68.3%) were males and 19 cases (31.7%) were females.

Reliability

The SEM ranged from 0.266 to 0.300 using random method, while that ranged from 0.276 to 0.300 using MFI method (Table 3). On this basis, the calculated measurement reliability (r) ranged from 0.910 to 0.929 using random method, while that ranged from 0.910 to 0.924 using MFI method (Table 3). The correlation coefficient of SEM and measurement reliability (r) between the two methods were both 0.267 (P = 0.040; P = 0.040), and the independent sample t-test showed that there were no significant differences in SEM and measurement reliability (r) between the two methods (t=-0.533, P = 0.594; t=-0.472, P = 0.637).

Table 3 The SEM and measurement reliability (r) using random and MFI methods

Content validity

As was reported in our previous publication, the paper-and-pencil version of mCOPD-PRO had good content validity [9]. The CAT was developed based on the 27 items of the paper-and-pencil version of mCOPD-PRO, and therefore, was considered to have good content validity.

Criterion validity

The theta ranged from − 2.331 to 1.226 using random method, while that ranged from − 2.336 to 1.102 using MFI method (Table 3). The correlation coefficient of theta between the two methods was 0.753 (P < 0.001), and the independent sample t-test showed that there were no significant differences in theta between the two methods (t=-0.514, P = 0.609). The correlation coefficient between theta and COPD assessment test and mMRC scores using random method was 0.628 and 0.540 (P < 0.001; P < 0.001) respectively, while that using MFI method was 0.347 and 0.328 (P = 0.007; P = 0.010) respectively.

Comparisons between the CAT and the paper-and-pencil version of mCOPD-PRO

The initial item of the paper-and-pencil version of mCOPD-PRO was all the same (“Did you cough?“), while that of the CAT was random using random method, and was all the same (“Was your chest tightness aggravated by daily activities?“) using MFI method. All (27 items) were tested for the paper-and-pencil version, while about 11 items (reducing by 59.3%) were tested on average for the CAT using random method with at least eight items (four cases) and at most 26 items (only one case), and about seven items (reducing by 74.1%) on average using MFI method with at least six items (36 cases) and at most 20 items (only one case) (Additional file 10). The correlation coefficient between theta of the CAT and the paper-and-pencil version of mCOPD-PRO total scores using random and MFI methods was 0.919 and 0.760 (P < 0.001; P < 0.001) respectively. The correlation coefficient between theta and physiological, psychological and surrounding domain scores using random and MFI methods was presented in Table 4.

Table 4 The correlation coefficient between theta of the CAT and domain scores of the paper-and-pencil version of mCOPD-PRO

Discussion

To our knowledge, this was the first study to develop and validate a comprehensive disease-specific CAT for PRO measurement of COPD by using the Concerto platform. Our study showed that the CAT is efficient, accurate, and user-friendly with good reliability and validity. Compared with the paper-and-pencil version of mCOPD-PRO, the average number of items tested for the CAT reduced by more than 50%, which indicated a significant reduction of respond burden. Moreover, the measurement accuracy was quite high. These findings are believed to contribute to the field.

As an open-source online adaptive testing platform, the Concerto platform has been used for neuropsychological testing and PRO measurement, and has been considered capable of harnessing the power of CAT and machine learning for developing and administering advanced PRO measurement [23, 32,33,34]. Therefore, our study chose this platform to establish the CAT. The item bank of high quality is the basis to ensure the scientificity and maximize the advantages of CAT. In the current study, the item discrimination (a) and difficulty (b) were calibrated using IRT analysis. Our results suggested that the item parameters were generally ideal, which ensured the robustness of the CAT. As is well known, the sample size for the validation of CAT is important. However, it is difficult to determine the appropriate sample size by repeated clinical tests. Therefore, the CAT simulation with different sample sizes was performed based on the true participants’ responses data of the paper-and-pencil version of mCOPD-PRO in the phase of validation. It was showed that the results of simulation tests were stable and reliable, and the sample size of 60 cases was enough. Therefore, the true participants’ responses data from 60 participants was used for analysis in this study.

At present, the researches on psychometric properties of CAT for PRO instruments are lacking. In our study, the reliability and validity were evaluated by referring to relevant researches in the field of psychological and educational measurements. The measurement reliability (r) was calculated through the SEM obtained from the CAT. As a result, the measurement reliability (r) using random and MFI methods were both greater than 0.9, which indicated a good reliability of the CAT. As for criterion validity, the selection of standard instruments is quite important. Both COPD assessment test and mMRC were selected as our gold standards, which were recommended by the Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease [1, 30, 31]. The test results of the CAT showed moderate correlation with the two standard instruments using random method, and this indicated a good criterion validity. By comparison, the test results showed weak correlation with the two standard instruments using MFI method. This may be due to the concentration of items in psychological domain, and the results need to be further verified.

Although the item bank of the CAT and the paper-and-pencil version of mCOPD-PRO was the same, the order and number of items used were different. The initial item of the paper-and-pencil version was the first item “Did you cough?“, while that of the CAT was random or the same “Was your chest tightness aggravated by daily activities?“. A number of studies have assessed the effect of CAT on the length and accuracy of PRO measurement, robustly demonstrating that CAT can reduce the length of measurement by more than 50% while keeping excellent agreement between fixed-length measurement and CAT [34]. A CAT simulation study showed that when the stopping rule was matched to the reliability of published World Health Organization Quality of Life (WHOQOL) assessment instrument, the item bank produced a measurement that was as reliable as the paper-and-pencil version of WHOQOL-BREF and WHOQOL-100 with 43% and 75% fewer items, respectively [35]. Our study showed that the average number of items tested for the CAT was about 11 using random method and seven using MFI method, respectively, which were almost consistent with our simulation tests. More importantly, compared with the paper-and-pencil version mCOPD-PRO, the average number of items tested for the CAT reduced by 59.3% and 74.1%, respectively, and the findings are similar to the results of the CAT simulation study mentioned above [35]. Accordingly, it is obvious that the respond burden of the mCOPD-PRO was significantly reduced. That is, the CAT can improve the testing efficiency without reducing the measurement accuracy. In addition, the test results of the CAT showed strong correlation with the paper-and-pencil version of mCOPD-PRO, which supported the reliability of the CAT, and was in line with the published literature [34]. What’s more, the strong correlation of the test results between the two methods indicated the robustness of our CAT model.

There were limitations to our study. First of all, the CAT was developed and validated based on the Concerto platform, there may be limitation. Second, due to the different scoring rules and dimensions between the CAT and the paper-and-pencil version of mCOPD-PRO, this study did not directly compare the differences of test results between the two versions. Third, given no consensus regarding the psychometric properties of CAT for PRO instruments, further studies are needed to verify our discoveries. In addition, the tests based on different item selection methods, ability estimation methods, or stopping criteria remain to be explored. Furthermore, the accuracy of parameter estimation would benefit from the survey with larger sample sizes in the future.

Conclusions

In conclusion, the comprehensive disease-specific CAT for PRO measurement of COPD is well developed, and has good reliability, content validity, and criterion validity. Compared with the paper-and-pencil version of mCOPD-PRO, the CAT can provide an efficient, accurate, and user-friendly measurement for PRO of COPD. Further studies remain to be explored in the future.

Data availability

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

COPD:

Chronic Obstructive Pulmonary Disease

PRO:

Patient-reported Outcome

mCOPD-PRO:

Modified Patient-reported Outcome Scale for Chronic Obstructive Pulmonary Disease

CAT:

Computered Adaptive Testing

IRT:

Item Response Theory

RMSEA:

Root-mean-square Error of Approximate

MFI:

Maximum Fisher Information

SEM:

Standard Error of Measurement

mMRC:

Modified Medical Research Council Dyspnea Scale

WHOQOL:

World Health Organization Quality of Life

References

  1. Agustí A, Celli BR, Criner GJ, Halpin D, Anzueto A, Barnes P, et al. Global Initiative for Chronic Obstructive Lung Disease 2023 Report: GOLD Executive Summary. Eur Respir J. 2023;61:2300239.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Al Wachami N, Guennouni M, Iderdar Y, Boumendil K, Arraji M, Mourajid Y, et al. Estimating the global prevalence of chronic obstructive pulmonary disease (COPD): a systematic review and meta-analysis. BMC Public Health. 2024;24:297.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Wang C, Xu J, Yang L, Xu Y, Zhang X, Bai C, et al. Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China Pulmonary Health [CPH] study): a national cross-sectional study. Lancet. 2018;391:1706–17.

    Article  PubMed  Google Scholar 

  4. World Health Organization. The top 10 causes of death. 2020. https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death. Accessed 1 Feb 2024.

  5. Jones PW. Health status and the spiral of decline. COPD. 2009;6:59–63.

    Article  PubMed  Google Scholar 

  6. U.S. Food and Drug Administration. Guidance for Industry Patient-reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. 2009. www.fda.gov/regulatory-information/search-fda-guidance-documents/patient-reported-outcome-measures-use-medical-product-development-support-labeling-claims. Accessed 1 Feb 2024.

  7. Cazzola M, MacNee W, Martinez FJ, Rabe KF, Franciosi LG, Barnes PJ, et al. Outcomes for COPD pharmacological trials: from lung function to biomarkers. Eur Respir J. 2008;31:416–69.

    Article  CAS  PubMed  Google Scholar 

  8. Weldam SW, Schuurmans MJ, Liu R, Lammers JW. Evaluation of quality of Life instruments for use in COPD care and research: a systematic review. Int J Nurs Stud. 2013;50:688–707.

    Article  PubMed  Google Scholar 

  9. Li J, Wang J, Xie Y, Feng Z. Development and validation of the modified patient-reported Outcome Scale for Chronic Obstructive Pulmonary Disease (mCOPD-PRO). Int J Chron Obstruct Pulmon Dis. 2020;15:661–9.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Tan Q, Cai Y, Li Q, Zhang Y, Tu D. Development and validation of an Item Bank for Depression Screening in the Chinese Population using computer adaptive testing: a Simulation Study. Front Psychol. 2018;9:1225.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Banerjee S, Plummer O, Abboud JA, Deirmengian GK, Levicoff EA, Courtney PM. Accuracy and Validity of Computer Adaptive Testing for Outcome Assessment in patients undergoing total hip arthroplasty. J Arthroplasty. 2020;35:756–61.

    Article  PubMed  Google Scholar 

  12. Gibbons RD, Weiss DJ, Frank E, Kupfer D. Computerized adaptive diagnosis and testing of Mental Health disorders. Annu Rev Clin Psychol. 2016;12:83–104.

    Article  PubMed  Google Scholar 

  13. Rebollo P, Castejón I, Cuervo J, Villa G, García-Cueto E, Díaz-Cuervo H, et al. Validation of a computer-adaptive test to evaluate generic health-related quality of life. Health Qual Life Outcomes. 2010;8:147.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Norweg A, Ni P, Garshick E, O’Connor G, Wilke K, Jette AM. A multidimensional computer adaptive test approach to dyspnea assessment. Arch Phys Med Rehabil. 2011;92:1561–9.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Choi SW, Victorson DE, Yount S, Anton S, Cella D. Development of a conceptual framework and calibrated item banks to measure patient-reported dyspnea severity and related functional limitations. Value Health. 2011;14:291–306.

    Article  PubMed  Google Scholar 

  16. Yount SE, Choi SW, Victorson D, Ruo B, Cella D, Anton S, et al. Brief, valid measures of dyspnea and related functional limitations in chronic obstructive pulmonary disease (COPD). Value Health. 2011;14:307–15.

    Article  PubMed  Google Scholar 

  17. Yount SE, Atwood C, Donohue J, Hays RD, Irwin D, Leidy NK, et al. Responsiveness of PROMIS® to change in chronic obstructive pulmonary disease. J Patient Rep Outcomes. 2019;3(1):65.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Paap MCS, Kroeze KA, Glas CAW, Terwee CB, van der Palen J, Veldkamp BP. Measuring patient-reported outcomes adaptively: multidimensionality matters! Appl Psychol Meas. 2018;42:327–42.

    Article  PubMed  Google Scholar 

  19. Paap MCS, Kroeze KA, Terwee CB, van der Palen J, Veldkamp BP. Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life. Qual Life Res. 2017;26:2909–18.

    Article  PubMed  PubMed Central  Google Scholar 

  20. O’Hoski S, Richardson J, Kuspinar A, Wald J, Goldstein R, Beauchamp MK. A brief measure of life participation for people with COPD: validation of the computer adaptive test version of the late life disability instrument. COPD. 2021;18:385–92.

    Article  PubMed  Google Scholar 

  21. O’Hoski S, Kuspinar A, Richardson J, Wald J, Goldstein R, Beauchamp MK. Responsiveness of the late life disability instrument to pulmonary rehabilitation in people with COPD. Respir Med. 2023;207:107113.

    Article  PubMed  Google Scholar 

  22. Ho EH, Verkuilen J, Fischer F. Measuring individual true change with PROMIS using IRT-based plausible values. Qual Life Res. 2023;32:1369–79.

    Article  PubMed  Google Scholar 

  23. Scalise K, Allen DD. Use of open-source software for adaptive measurement: Concerto as an R-based computer adaptive development and delivery platform. Br J Math Stat Psychol. 2015;68:478–96.

    Article  PubMed  Google Scholar 

  24. Tu DB, Zheng CJ, Dai BY, Wang WY. Computerized adaptive testing: theory and method. Beijing: Beijing Normal University; 2017.

    Google Scholar 

  25. Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the patient-reported outcomes Measurement Information System (PROMIS). Med Care. 2007;45:S22–31.

    Article  PubMed  Google Scholar 

  26. Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model. 1999;6:1–55.

    Article  Google Scholar 

  27. Wen ZL, Hau KT, Marsh HW. Structural equation model testing: cutoff criteria for goodness of fit indices and chi-square test. Acta Psychol Sin. 2004;36:186–94.

    Google Scholar 

  28. Chen WH, Thissen D. Local dependence indexes for item pairs using item response theory. Educational Behav Stat. 1997;22:265–89.

    Article  Google Scholar 

  29. Van der Ark LA. New developments in Mokken scale analysis in R. J Stat Softw. 2011;48:1–27.

    Google Scholar 

  30. Jones PW, Harding G, Berry P, Wiklund I, Chen WH. Kline Leidy N. Development and first validation of the COPD Assessment Test. Eur Respir J. 2009;34:648–54.

    Article  CAS  PubMed  Google Scholar 

  31. Perez T, Burgel PR, Paillasseur JL, Caillaud D, Deslée G, Chanez P, et al. Modified Medical Research Council scale vs baseline Dyspnea Index to evaluate dyspnea in chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis. 2015;10:1663–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Yudien MA, Moore TM, Port AM, Ruparel K, Gur RE, Gur RC. Development and public release of the Penn Reading Assessment Computerized Adaptive Test (PRA-CAT) for premorbid IQ. Psychol Assess. 2019;31:1168–73.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Harrison C, Trickett R, Wormald J, Dobbs T, Lis P, Popov V, et al. Remote Symptom Monitoring with Ecological momentary computerized adaptive testing: pilot cohort study of a platform for frequent, Low-Burden, and personalized patient-reported outcome measures. J Med Internet Res. 2023;25:e47179.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Harrison C, Loe BS, Lis P, Sidey-Gibbons C. Maximizing the potential of patient-reported assessments by using the Open-Source Concerto platform with computerized adaptive testing and machine learning. J Med Internet Res. 2020;22:e20950.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Gibbons C, Bower P, Lovell K, Valderas J, Skevington S. Electronic quality of Life Assessment using computer-adaptive testing. J Med Internet Res. 2016;18:e240.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank all the experts and patients involved in this study for their kind help.

Funding

This study was supported by National Natural Science Foundation of China (81473648, 81830116), Qi-Huang Chief Scientist Project of National Administration of Traditional Chinese Medicine ([2020]219), and Central Plains Thousand People Program (ZYQR201810159).

Author information

Authors and Affiliations

Authors

Contributions

JJW was responsible for the modeling and analysis of data, and drafted the manuscript. YX and ZZF were responsible for the interpretation of data. YX and JSL conceived this study and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiansheng Li.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Institution Review Board of the First Affiliated Hospital of Henan University of Chinese Medicine (2015HL-048). Written informed consent was obtained from all participants included in this study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12955_2024_2291_MOESM1_ESM.jpg

Supplementary Material 1: Additional file 1 The matrix plot of item characteristic curves of the modified patient-reported outcome scale for chronic obstructive pulmonary disease (mCOPD-PRO)

12955_2024_2291_MOESM2_ESM.jpg

Supplementary Material 2: Additional file 2 The test information and standard error of measurement of the modified patient-reported outcome scale for chronic obstructive pulmonary disease (mCOPD-PRO)

Supplementary Material 3: Additional file 3 The computerized adaptive testing procedure of an example

Supplementary Material 4: Additional file 4 The percent of item usage using random method in different simulation tests

12955_2024_2291_MOESM5_ESM.jpg

Supplementary Material 5: Additional file 5 The correlation between simulated and true theta estimates using random method in different simulation tests

12955_2024_2291_MOESM6_ESM.jpg

Supplementary Material 6: Additional file 6 The test information and standard error of measurement using random method in different simulation tests

12955_2024_2291_MOESM7_ESM.jpg

Supplementary Material 7: Additional file 7 The percent of item usage using maximum Fisher information method in different simulation tests

12955_2024_2291_MOESM8_ESM.jpg

Supplementary Material 8: Additional file 8 The correlation between simulated and true theta estimates using maximum Fisher information method in different simulation tests

12955_2024_2291_MOESM9_ESM.jpg

Supplementary Material 9: Additional file 9 The test information and standard error of measurement using maximum Fisher information method in different simulation tests

Supplementary Material 10: Additional file 10 The number of items tested for the computerized adaptive testing

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Xie, Y., Feng, Z. et al. Psychometric properties of computerized adaptive testing for chronic obstructive pulmonary disease patient-reported outcome measurement. Health Qual Life Outcomes 22, 73 (2024). https://doi.org/10.1186/s12955-024-02291-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12955-024-02291-6

Keywords