Skip to main content

Development and validation of the patient-reported outcome for older people living with HIV/AIDS in China (PROHIV-OLD)



The involvement of quality of life as the UNAIDS fourth 90 target to monitor the global HIV response highlighted the development of patient-reported outcome (PRO) measures to help address the holistic needs of people living with HIV/AIDS (PLWHA) beyond viral suppression. This study developed and tested preliminary measurement properties of a new patient-reported outcome (PROHIV-OLD) measure designed specifically to capture influences of HIV on patients aged 50 and older in China.


Ninety-three older people living with HIV/AIDS (PLWHA) were interviewed to solicit items and two rounds of patient cognitive interviews were conducted to modify the content and wording of the initial items. A validation study was then conducted to refine the initial instrument and evaluate measurement properties. Patients were recruited between February 2021 and November 2021, and followed six months later after the first investigation. Classical test theory (CTT) and item response theory (IRT) were used to select items using the baseline data. The follow-up data were used to evaluate the measurement properties of the final instrument.


A total of 600 patients were recruited at the baseline. Of the 485 patients who completed the follow-up investigation, 483 were included in the validation sample. The final scale of PROHIV-OLD contained 25 items describing five dimensions (physical symptoms, mental status, illness perception, family relationship, and treatment). All the PROHIV-OLD dimensions had satisfactory reliability with Cronbach’s alpha coefficient, McDonald’s ω, and composite reliability of each dimension being all higher than 0.85. Most dimensions met the test-retest reliability standard except for the physical symptoms dimension (ICC = 0.64). Confirmatory factor analysis supported the structural validity of the final scale, and the model fit index satisfied the criterion. The correlations between dimensions of PROHIV-OLD and MOS-HIV met hypotheses in general. Significant differences on scores of the PROHIV-OLD were found between demographic and clinical subgroups, supporting known-groups validity.


The PROHIV-OLD was found to have good feasibility, reliability and validity for evaluating health outcome of Chinese older PLWHA. Other measurement properties such as responsiveness and interpretability will be further examined.


HIV has infected a total of 84.2 million people and claimed 36.3 million lives worldwide since the start of the epidemic [1]. Today, HIV remains to be a major global public issue. An estimated 40.1 million people were living with HIV/AIDS worldwide at the end of 2021 [2]. Given the large population of China, the influence of HIV in China should not be underestimated despite the relatively low prevalence. By the end of 2020, China had 1.05 million people living with HIV/AIDs and 351,000 cumulative reported deaths [3].

The widespread application of the highly active antiretroviral therapy (HAART) has made HIV infection a manageable chronic health condition, enabling people living with HIV/AIDS (PLWHA) to live a longer life. At the same time, HIV infection and antiretroviral treatment could accelerate the aging process of PLWHA [4]. The World Health Organization suggested the age of 50 to be a cut-off to discriminate older subjects within HIV-infected people [5]. As of the end of 2019, there were about 7.5 million PLWHA aged 50 and over worldwide, making up one fifth of PLWHA [6]. As a result of increasing access to effective HIV diagnosis and treatment, China has also witnessed an increasing number of older PLWHA in recent years [7]. In 2011, the proportion of older PLWHA aged between 50 and 64 in China reached 13.6%, up from 1.6% in 2000 [8].

However, longer life expectancy does not necessarily mean better well-being. Alongside physical discomforts, PLWHA also struggle with depression, anxiety, financial stress, and HIV-related discrimination [9]. To fully understand the health status of PLWHA and address their holistic needs beyond viral suppression, patient-reported outcome (PRO) measures should be developed and validated to complement biomarkers to depict patients’ experience with the disease and treatment [10].

Among the previous studies assessing health outcomes of PLWHA, generic instruments have been most widely used as they can facilitate comparison between different disease or treatment groups, but they were not originally designed to identify disease-specific issues and therefore may fail to capture important impacts of HIV [11]. As for specific PRO instruments established for PLWHA, quite a number of them were developed before the wide application of HARRT, decreasing their validity in evaluating treatment effectiveness [12, 13]. Besides, PRO instruments for PLWHA introduced from foreign countries should be used with caution as they might be culturally inappropriate [14]. Another big problem is that few PRO measurement instruments exist for older PLWHA. Aging is accompanied with decline of physical function and transition of social roles, further deteriorating and complicating the physical, psychological and social consequences for older patients. Measuring how older adults perceive their overall health condition is gaining increasing attention, both generic and disease-specific PRO instruments have already developed modules specific for older adults [15, 16].

Most of the instruments mentioned above were developed using classical test theory (CTT), which does not allow the test items to be divided up and reorganized to meet different test needs without compromising the instrument’s reliability. An alternative to the CTT approach is item response theory (IRT), which postulates that the probability of correctly responding to a given item can be modelled as a function of the item’s difficulty, discrimination and participant ability on the trait being measured [17]. Different from CTT statistics being dependent upon the sample from which they are taken, IRT could provide stable estimates of an item’s difficulty, discrimination and guessing probability that do not vary with changes in sample, item order and test conditions [18]. This characteristic makes it an ideal approach to developing adaptable yet rigorous instruments.

PRO measures for HIV/AIDS are expanding but still no gold standard exists, the advancement in treatment therapy, unique needs of older PLWHA, the culture-dependent feature of PRO, as well as the progressive development in psycho-metrics, raised concerns about developing new measures to accommodate different situations. This study aimed to use both the CTT and IRT to develop a disease specific PRO instrument for Chinese older PLWHA (PROHIV-OLD), hoping collected PRO data could better interpret life with HIV/AIDS of older people in China and accordingly improve the treatment and care for this population. This article reports on the iterative process of item selection; initial validation of reliability and validity of this instrument will also be conducted in this study.

Preliminary work

Literature review and focus group interviews with health care professionals were conducted first, based on which an initial conceptual framework involving physical, emotional, social, and treatment was generated. According to the conceptual framework, a total of 93 patients were interviewed face-to-face and videotaped. At numerous points in the interview, participants were encouraged to spontaneously add any comments or areas related to the disease that they deemed appropriate and important. Once completed, the videotapes were transcribed. Transcriptions were then compared against the original videotapes by a second set of research assistants. The transcripts of the interviews were reviewed and coded by 2 researchers, and items were generated and categorized. A draft preliminary item pool of 56 items was then presented to patients who had not participated in the initial interviews to evaluate the relevance, importance, comprehensibility, and potential redundancy of items, during which one item was discarded because of overlap with other items. The remaining 55 items comprised the preliminary PROHIV-OLD instrument tested here. Items were scored using a 7-point Likert scale with anchor points labored from “not at all” to “very much”. The recall period is determined to be one month.


Design and subjects

From February 2021 to November 2021, participants were recruited from six designated hospitals of three cities with varying socioeconomic status according to GDP per capita in Zhejiang Province, China. Participants were followed six months later after first investigation. PLWHA aged 50 and over, with ongoing antiviral therapy were eligible to participate in this study, while those who had cognitive issues, could not understand Mandarin Chinese, or at terminal stage of AIDS were excluded.

The PROHIV-OLD and a validated outcome measure, the Medical Outcomes Study HIV Health Survey (MOS-HIV) [19] were administered at baseline and at 6-month follow up. Demographic and HIV-related information were also collected. The baseline data was used as the study sample for item reduction analyses (Phase I), and the follow-up data as the validation sample to test the final instrument (Phase II).

This study was approved by the Institutional Review Board of Zhejiang University (approval number: ZGL202007-03), and written informed consent was obtained from all participants.

Phase I: item reduction

Item reduction based on the CTT

Distribution of scores of each item was analyzed. An item should be removed if floor or ceiling effects exceed 20% [20]. Items with standard deviations lower than 1, or coefficients of variation lower than 0.3 are deemed to be of low degree of variability and should be removed [21].

Exploratory factor analysis (EFA) aided in item reduction and exploration of factor structure. Exploratory structural equation modeling (ESEM) was also employed to analyze the factor structure. ESEM can be seen as a compromise between the flexibility of EFA and the rigor of SEM [22]. It has been used when factor structures were not yet well established as it allows for a more detailed model fit assessment [23, 24]. The principal axis factoring analysis with an oblique rotation was employed to extract factors. The scree plot [25], Horn’s parallel analysis (PA) [26] and Velicer’s minimum average partial (MAP) [27] were adopted to determine the number of factors to be extracted. Proposed models were compared by ESEM using the following fit indices, chi-square divided by degree of freedom (χ2/df), Tucker-Lewis index (TLI), standardized root mean square residual (SRMR), root mean square error of approximation (RMSEA), and Bayesian information criterion (BIC). Satisfactory model fit requires χ2/df < 3, TLI\( \ge \)0.9, SRMR<0.08, RMSEA<0.08, and a lower BIC [28, 29]. Fit indices of ESEM analysis, the conceptual clarity and the model’s simplicity were taken into account to select the optimal factor structure [30, 31]. Items with lower factor loads were dropped one by one in an ascending order until all the remaining items have a loading of 0.35 or higher on only one factor [32, 33].

After factors have been determined after factor analysis, the internal consistency of items was evaluated using the Cronbach’s alpha if item deleted (CAID) values. If the removal of an item leads to an increase of the CAID value, that item will be removed as it poorly contributes to the internal consistency [34].

Item reduction based on the IRT

Given the ordered categorical nature of the response categories, the graded response model (GRM) was employed in this step to analyze the items within each dimension [35].

The assumption of unidimensionality and monotonicity are checked before estimating item parameters and latent trait levels. PA was used to check unidimensionality, which requires that there is a single latent trait underlying a set of test items [36]. Monotonicity could be verified by the graphical ascent of the item characteristic curve (ICC) [37].

Discrimination and difficulty are the two parameters of interest in IRT. Item discrimination (α) represents the ability of an item to discriminate respondents with close latent trait level. Discrimination values between 0.4 and 4.0 are deemed acceptable [38]. Item difficulty (βi) is defined by the latent trait levels indicating the thresholds between response options. There is supposed to be a graded monotonic relationship between the respondents’ trait level and the item response options such that respondents with low trait level endorse low response options. Disordered thresholds occur when this monotonic relationship does not exist on the category characteristic curves (CCCs). A polytomous item with 7 response categories has six difficulty parameters (denoted β1, β2, β3, β4, β5, β6). The six degrees of difficulty values should range from − 3.0 to 3.0 and should be sorted in order [21, 39, 40].

Examining differential item functioning (DIF) is important in the investigation of the stability of an item’s measurement properties across subgroups differing in background characteristics [41]. The presence of DIF was evaluated, whether uniform or non-uniform, by logistic regression analysis. Items were flagged for possible DIF when the probability associated with the \( {{\upchi }}^{2}\) test was < 0.01 and the effect size measures (McFadden’s pseudo R2) > 0.13 [42, 43]. Variables used to test DIF in this study were gender (male vs. female), place of residence (city vs. village), and household monthly income per capita (≤ 600 RMB vs. >600 RMB).

Phase II: scale validation


Internal consistency reliability was determined by calculating Cronbach’s alpha coefficient, McDonald’s ω, and composite reliability (CR). Values of 0.7 or above were considered appropriate [31, 44].

Test-retest reliability was assessed in a two-week interval in a group of 60 patients with stable disease condition using intraclass correlation coefficients (ICCs) with a two-way mixed effects model. Generally, ICCs\( \ge \)0.7 were acceptable [45].


CFA was implemented to examine the structure validity. The measurement model with χ2/df < 3, CFI\( \ge \)0.9, TLI\( \ge \)0.9, SRMR<0.08, RMSEA<0.08 was considered to be of goodness-of-fit [28].

Convergent and discriminant validity was assessed through correlation analyses between the PROHIV-OLD and the MOS-HIV. Correlations between comparable dimensions are expected to be larger than those between less comparable dimensions [46]. Spearman’s correlation coefficients of 0.50 or above were regarded as strong, 0.30–0.49 as moderate, and lower than 0.30 as weak [47].

Known-groups validity examines how well the instrument can discriminate among participants with different demographic backgrounds and clinical conditions. Previous studies have found the health outcome of PLWHA was poorer for females and those with heavy financial burden, high plasma HIV-1 RNA level, low CD4+T cell counts, and at terminal stage of AIDS [48,49,50]. In addition, we hypothesized patients with co-morbidity, abnormal liver or renal function would have worse quality of life. One-way ANOVA was performed to assess group differences.

Data analysis software

EFA, IRT-based item reduction, and the calculation of McDonald’s ω were conducted by R (Version 1.3.959, macOS). ESEM and CFA were conducted in Mplus (Version 8.6, macOS). All the other analyses were performed using SPSS (Version 24.0, macOS). A p value of smaller than 0.05 was set as the statistically significant level for all the analyses except DIF, for which the p-value was set at < 0.01.


Sample characteristics

Of the 600 patients recruited at the baseline, 82.17% were male. The average age of the study sample was 61.31 years (SD=\( \pm \)8.01). Most of the participants were married (71.17%), had middle school education or below (76.50%), and got infected due to heterosexual sex contact (69.78%). A total of 180 participants (30.00%) reported comorbidity. 57.50% patients were asymptomatic HIV carriers. Respondents with CD4+T cell count above 200 cell/\( {\upmu }\text{l}\) occupied 81.90%, and 87.00% participants’ baseline plasma HIV-1 RNA level below level of quantification (20 copies/ml). Of the 485 patients who completed the follow-up investigation, 483 were included in the validation sample (Table 1).

Table 1 Sample demographic and disease-related information (n = 600)

Item reduction results

The percentage of response at the floor (score = 0) ranged from 7.00 to 16.17%, and the percentage of response at the ceiling (score = 6) ranged from 4.33 to 19.17% (Table 2). Each item demonstrated acceptable discrete trend, with SD ranging from 1.67 to 1.99 and CV ranging from 0.55 to 0.74 (see Additional file 1).

Table 2 Percentage of each option for all items

In determining the number of factors to be extracted, the results of PA and MAP suggested to extract 4 and 5 factors respectively. The scree plot showed that a total of 9 factors had eigenvalues greater than 1, but factors 7, 8, 9 were discarded as they were difficult to interpret. The hypothesized conceptual framework of PROHIV-OLD proposed a four-factor structure. Therefore, three EFA models with four to six factors were proposed, ESEM was conducted to compare the fitness of these models (Table 3). Fit indices of χ2/df, TLI, SRMR, and RMSEA seemed to be more satisfactory when more factors were retained, but BIC of the five-factor model was the smallest. Considering the interpretability and simplicity of the model structure, the five-factor solution was finally considered as the most theoretically sensible pattern of the results.

Table 3 Comparison of the three models by their fit indices

Factors were then extracted by principle axis analysis using oblique rotation, and the items were sorted by descending order of factor loads on each factor. According to the results, item 39, 37, 52, 36, 30, 41, 40, 8, 9, 29, 5, 33, 42 were dropped accordingly due to factor loads lower than 0.35, and item 43, 48, 51, 2, 4, 14, 17, 18, and 24 with loads of 0.35 and higher on multiple factors were also discarded. Finally, 33 items were retained after EFA, accounting for 52.24% of the total variance (Table 4).

Table 4 Exploratory factor analysis for the PROHIV-OLD five-factor model

The five-factor structure with 33 items was further verified to be of good fitness by ESEM (χ2/df = 2.91, TLI = 0.89, SRMR = 0.027, RMSEA = 0.056). The remaining items were closely related with their own dimension (all r > 0.6, p < 0.05), and the deletion of the item did not lead to the increase of CAID values (see Additional file 1), therefore, no more items were removed in the reduction based on CTT.

In the reduction based on IRT, several assumptions were examined first. PA suggested that each of the five factor established by CTT was unidimensional (Table 5), as only the first eigenvalue generated from raw data was greater than that expected by random data (simulations based on normal distributions). All the ICCs were monotonically rising (see Additional file 2), verifying the monotonicity.

Table 5 Test results of unidimensionality by PA

All items showed acceptable discrimination ability except item 35 (\( {\upalpha }\)=4.99) and item 38 (\( {\upalpha }\)=5.09) on Factor 4. Item 38 was first deleted as its discrimination parameter was slightly higher, after which item 31 exhibited an extremely large discrimination value (\( {\upalpha }\)=51.45) and was consequently deleted. Discrimination and difficulty of the remaining 3 items on this factor were retested and results were acceptable. Item 6, 22, and 25 were deleted due to disordered thresholds, Figs. 1, 2 and 3 listed the CCCs for these three items, the CCCs of other items could be found in Additional file 2. Significant uniform DIF were detected for item 7, 10, 46, 49, and 50, and non-uniform DIF by registration or monthly household income per capita were detected for item 7, 22, 27, and 46. The R2 coefficients were all lower than 0.13, indicating that the impact of DIF on the assessment was small. Items with non-uniform DIF, i.e. item 7, 22, 27, and 46 were finally deleted. Therefore, the item reduction process resulted in a final version that comprised 25 items within 5 dimensions (Table 6). Based on the content of grouped items, the five dimensions were finally named physical symptoms, mental status, illness perception, family relationship, and treatment dimension (Table 7).

Fig. 1
figure 1

Category characteristic curves of item 6

Fig. 2
figure 2

Category characteristic curves of item 22

Fig. 3
figure 3

Category characteristic curves of item 25

Table 6 Item reduction based on IRT
Table 7 Bank of 25 items in the final PROHIV-OLD

Validation results


The Cronbach’s alpha, McDonald’s ω and CR were excellent (> 0.85), supporting the internal reliability of the PROHIV-OLD instrument. The ICCs of the physical symptoms dimension was slightly lower than 0.7, while all other dimensions had ICCs higher than 0.7, indicating that the PROHIV-OLD had acceptable test-retest reliability (Table 8).

Table 8 Internal consistency reliability and test-retest reliability of the PROHIV-OLD instrument


The CFA was conducted on the final PROHIV-OLD instrument to test structure validity. The five-factor model achieved a good fit (χ2/df = 2.54, CFI = 0.94, TLI = 0.93, SRMR = 0.06, RMSEA = 0.06), with the factor loads of all 25 items ranging from 0.47 to 0.90, indicating good structure validity.

The Spearman correlation coefficients between the PROHIV-OLD and the MOS-HIV were stronger between more comparable dimensions (e.g., 0.65 between PROHIV-OLD physical symptoms dimension and MOS-HIV physical functioning scale) than those between less comparable dimensions (e.g., 0.26 between PROHIV-OLD family relationship dimension and MOS-HIV physical functioning scale). Generally, convergent and discriminant validity of the PROHIV-OLD was considered to be satisfactory (Table 9).

Table 9 Correlations between the PROHIV-OLD and the MOS-HIV (n = 483)

Table 10 shows the mean PROHIV-OLD dimension scores by subgroups. Female participants had significantly lower scores on physical symptoms and mental status dimensions than males. Household monthly income per capita was positively related with physical symptoms scores. No significant effect was found for different HIV-1 RNA level on all of the five dimensions. The CD4+T cell count at the latest blood test was positively associated with physical symptoms and treatment scores. Patients with CD4+T cell counts higher than 500 cell/\( {\upmu }\text{l}\) scored highest on mental status and family relationship dimensions, while those with CD4+T cell counts lower than 200 cell/\( {\upmu }\text{l}\) scored lowest on illness perception dimension. PLWHA who had progressed into the stage of AIDS performed worse on treatment dimension. Comorbidity and dyslipidemia were significantly related with lower scores on physical symptoms dimension, while patients with abnormal liver or kidney function did not report more physical symptoms.

Table 10 Validity of the PROHIV-OLD instrument assessed by the known-groups method (n = 483)


As an increasing number of PLWHA are now living into older age, more attention should be paid to the overall quality of life of the extended years. Older PLWHA were once rarely involved in the development, validation and application of related PRO instruments as few of them could lead a long life before, thus the validity of existing patient-reported measures for older PLWHA could be challenged. Therefore, this study developed and validated an instrument to understand how HIV influenced Chinese older patients.

In scale development and psychometric evaluation, CTT is the most frequently used method as it is easier to understand and implement. However, reliability of results based on CTT statistics can be inadequate as these methods are associated with certain disadvantages, such as being item sample dependent, and lack of information on respondents’ ability [51], while IRT methods are independent from sample characteristics and can afford more accurate examination of each item [52], which have gained IRT popularity in item selection [53]. However, few HIV/AIDS specific instruments have been developed using IRT to date. This study used both CTT and IRT to select items in the phase of item reduction, hoping to further improve the performance this instrument.

In item selection by EFA, determining the appropriate number of factors is an important yet controversial issue as no single procedure seems to be entirely satisfactory among the many rules of thumb and statistical indices for addressing the dimensionality issue [54, 55]. The more common indices of the Kaiser’s criterion [54] and the more accurate methods of the PA and MAP [30, 42] were employed in this study to identify the number of latent factors needed to accurately account for the common variance among the items. ESEM, which offers the advantage of providing the overall tests of model fit [56], was then conducted to compare the fitness of the proposed competing models to determine the optimal factor structure. A five-factor structure was finally determined and the factor rotation resulted in as many as 22 items being deleted, the strict requirements of EFA on the number and correlation of variables, as well as the sample size and distribution could explain the large number of items being deleted at this stage [57], previous studies also found quite a number of items being removed by EFA [58, 59].

In item reduction using IRT, 2 items failed to meet the discrimination criterion and were first deleted. Disordered thresholds were detected for 3 items, indicating that respondents may have difficulty in distinguishing between the response options and these 3 items were removed consequently. Uniform DIF was observed for 5 items and 4 items exhibited non-uniform DIF. No consensus has been reached on the disposition of items with DIF. Items with non-uniform DIF were generally required to be deleted, while appropriate weightings can be applied to items with uniform DIF [60, 61]. Some studies suggested to determine the salience of DIF by testing the magnitude of DIF beyond significance, and items that exhibits DIF with large magnitude of impact, whether uniform or non-uniform, are supposed to be deleted [37, 42, 43]. This study also examined the magnitude of DIF, the DIF observed had no substantial influence, therefore only items with non-uniform DIF were finally removed.

The reliability and validity of the final instrument have been rigorously tested. Internal consistency reliability of the PROHIV-OLD was supported by the high Cronbach’s alpha coefficients, McDonald’s ω and CR, which are deemed to be more suitable to evaluate reliability for multidimensional instruments [62], further confirmed the reliability for each dimension. All dimensions demonstrated good test-retest reliability except that the ICC of the physical symptoms dimension was slightly less than 0.7. Apart from disease and treatment related symptoms, the physical symptoms dimension also contains items less specifically related with HIV infection, such as energy, and sleep quality, which might be responsible for the lower test-retest reliability of this dimension.

Regarding the structure validity of PROHIV-OLD, the poor fitness of the one-factor model confirmed that the PROHIV-OLD is multidimensional in nature, and the final structure of the instrument was supported by CFA. Correlations between comparable PROHIV-OLD and MOS-HIV dimensions were stronger than those between less comparable dimensions. The correlations between the role functioning scale of the MOS-HIV with all five dimensions of the PROHIV-OLD were weak. The two entries in the MOS-HIV role functioning scale concern the ability to do certain kinds or amounts of work, housework, or schoolwork, which are no longer the main content of older adults’ social life, instead, their social relationship and interaction will be more confined to family [63, 64], which possibly resulted in the stronger correlation between the MOS-HIV role functioning scale with the PROHIV-OLD family relationship dimension. This also implied the uniqueness of older patients’ experience and the conceptual framework of the PROHIV-OLD.

Known-groups validity was examined across a range of demographic and clinical relevant factors. Similar with existing studies, gender [48] and income differences [49] on dimension scores have been detected. For clinical factors, all the five dimensions of PROHIV-OLD distinguished patients with different levels of CD4+T cell counts well, while no significant associations were found between any dimensions of the PROHIV-OLD and HIV-1 RNA level. The proportion of patients with abnormal plasma HIV-1 RNA level (11.47%) might be too small to detect its effect on patients’ perceived health status. Dyslipidemia was associated with poorer performance on the physical symptoms dimension, whereas patients with abnormal liver or kidney function did not report more physical symptoms. One possible reason was that the liver and kidney function can only be roughly determined based on limited medical information, future studies can consider to employ more precise medical examinations and include respondents’ self-perceived condition.

Several potential limitations of this study should be stated. First, generalizability of this study might be inadequate given that only patients in Zhejiang province were included. Besides, epidemic-related control policies under COVID-19 prevented us from interviewing hospitalized patients, who are at higher possibility of undergoing serious opportunistic infections or other adverse events, which further limited the representativeness of the study sample. Second, for older PLWHA with poor vision, investigators assisted them to fill the survey by reading the items verbatim to them, which might cause selection and social desirability bias. Third, the primary aim of instrument development and validation limited this study to only detect the presence and the salience of DIF, the underlying complex mechanisms for DIF remain to be identified in future qualitative and quantitative studies. Fourth, although the reliability and validity shown in this study seems to be satisfactory, the instrument’s ability to detect change over time remains to be examined to further support the psychometric properties of this instrument. Nevertheless, this large study in multiple sites with rigorous instrument development and validation methods provided a strong foundation for health outcome assessment and promotion for the ever-increasing population of older PLWHA.


The PROHIV-OLD instrument demonstrated acceptable reliability and validity, suggesting that it can be implemented in clinical research and practice to provide further valuable information on health outcome of older PLWHA in China. Other measurement properties such as responsiveness and interpretability will be further examined.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.


  1. UNAIDS, Global HIV. & AIDS statistics–Fact sheet. Accessed 9 April 2023.

  2. World Health Organization. HIV and AIDS. Accessed 10 January 2023.

  3. Annals of information on comprehensive prevention and treatment for AIDS, STD and, Hepatitis C. Beijing: National Center for AIDS & STD Control and Prevention, The Chinese Center for Disease Control and Prevention (CDC); 2020.

  4. High KP, Brennan-Ing M, Clifford DB, Cohen MH, Currier J, Deeks SG, et al. HIV and aging: state of knowledge and areas of critical need for research. A report to the NIH Office of AIDS Research by the HIV and Aging Working Group. J Acquir Immune Defic Syndr. 2012;60(Suppl 1Suppl 1):S1–18.

    Article  CAS  PubMed  Google Scholar 

  5. World Health Organization. Impact of AIDS on older people in Africa: Zimbabwe case study. Switzerland: World Health Organization; 2002.

    Google Scholar 

  6. UNAIDS Data 2019. The Joint United Nations Programme on HIV and AIDS. Accessed 12 January 2023.

  7. Wei H, Li B, Lan G. Research progress on AIDS epidemic characteristics of Elderly Population in China. Appl Prev Med. 2021;27(2):189–93.

    Google Scholar 

  8. Wang L, Qin Q, Ge L, Ding Z, Cai C, Guo W, et al. Characteristics of HIV infections among over 50-year-olds population in China. Chin J Epidemiol. 2016;37(2):222–6.

    Google Scholar 

  9. Lazarus JV, Safreed-Harmon K, Barton SE, Costagliola D, Dedes N, Del Amo Valero J, et al. Beyond viral suppression of HIV - the new quality of life frontier. BMC Med. 2016;14(1):94.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Kall M, Marcellin F, Harding R, Lazarus JV, Carrieri P. Patient-reported outcomes to enhance person-centred HIV care. Lancet Hiv. 2020;7(1):E59–68.

    Article  PubMed  Google Scholar 

  11. O’Brien N, Chi YL, Krause KR. Measuring Health outcomes in HIV: Time to bring in the patient experience. Ann Glob Health. 2021;87(1):2.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Brown G, Mikołajczak G, Lyons A, Power J, Drummond F, Cogle A, et al. Development and validation of PozQoL: a scale to assess quality of life of PLHIV. BMC Public Health. 2018;18(1):527.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Holmes WC, Shea JA. Two approaches to measuring quality of life in the HIV/AIDS population: HAT-QoL and MOS-HIV. Qual Life Res. 1999;8(6):515–27.

    Article  CAS  PubMed  Google Scholar 

  14. Maimaiti R, Yuexin Z, Kejun P, Wubili M, Lalanne C, Duracinsky M, et al. Assessment of Health-related quality of life among people living with HIV in Xinjiang, West China. J Int Assoc Provid AIDS Care. 2017;16(6):588–94.

    Article  PubMed  Google Scholar 

  15. Liu R, Wu S, Hao Y, Gu J, Fang J, Cai N, et al. The Chinese version of the world health organization quality of life instrument-older adults module (WHOQOL-OLD): psychometric evaluation. Health Qual Life Outcomes. 2013;11:156.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Wheelwright S, Darlington AS, Fitzsimmons D, Fayers P, Arraras JI, Bonnetain F, et al. International validation of the EORTC QLQ-ELD14 questionnaire for assessment of health-related quality of life elderly patients with cancer. Br J Cancer. 2013;109(4):852–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. De Ayala RJ. The theory and practice of item response theory. 1st ed. New York: Guildord; 2009.

    Google Scholar 

  18. Barlow PB, Skolits G, Heidel RE, Metheny W, Smith TL. Development of the Biostatistics and clinical epidemiology skills (BACES) assessment for medical residents. Postgrad Med J. 2015;91(1078):423–30.

    Article  PubMed  Google Scholar 

  19. Wu AW, Revicki DA, Jacobson D, Malitz FE. Evidence for reliability, validity and usefulness of the Medical outcomes Study HIV Health Survey (MOS-HIV). Qual life Research: Int J Qual life Aspects Treat care Rehabilitation. 1997;6(6):481–93.

    Article  CAS  Google Scholar 

  20. Raat H, Botterweck AM, Landgraf JM, Hoogeveen WC, Essink-Bot ML. Reliability and validity of the short form of the child health questionnaire for parents (CHQ-PF28) in large random school based and general population samples. J Epidemiol Community Health. 2005;59(1):75–82.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Pang H, Kang X, Li Z, Zhang J, Lv R, Jiang J. An application of Item Response Theory in Item Selection of Chinese Self-management of heart failure instrument. Chin J Health Stat. 2014;31(1):57–60.

    Google Scholar 

  22. Marsh HW, Morin AJS, Parker PD, Kaur G. Exploratory Structural Equation Modeling: An Integration of the Best Features of Exploratory and Confirmatory Factor Analysis. In: Cannon TD, Widiger T, editors. Annual Review of Clinical Psychology, Vol 10. Annual Review of Clinical Psychology. 102014. p. 85-+.

  23. Morin AJS, Maiano C. Cross-validation of the short form of the physical self-inventory (PSI-S) using exploratory structural equation modeling (ESEM). Psychol Sport Exerc. 2011;12(5):540–54.

    Article  Google Scholar 

  24. Maiano C, Morin AJS, Lanfranchi MC, Therme P. The Eating attitudes Test-26 revisited using exploratory structural equation modeling. J Abnorm Child Psychol. 2013;41(5):775–88.

    Article  PubMed  Google Scholar 

  25. Cattell RB. Handbook of multivariate experimental psychology. 2nd ed. Chicago: Rand McNally; 1966.

    Google Scholar 

  26. Horn JL. A rationale and test for the number of factors in factor analysis. Psychometrika. 1965;30(2):179–85.

    Article  CAS  PubMed  Google Scholar 

  27. Velicer WF, Eaton CA, Fava JL. In: Goffin RD, Helmes E, editors. Construct explication through factor or component analysis: a review and evaluation of alternative procedures for determining the number of factors or components. Boston, MA: Springer; 2000.

    Google Scholar 

  28. Yu M. Scale Preparation and Development: application of the Rasch Measurement Model. Xinbei, China: Psychological; 2020.

    Google Scholar 

  29. Posada D, Buckley TR. Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol. 2004;53(5):793–808.

    Article  PubMed  Google Scholar 

  30. Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ. Evaluating the use of exploratory factor analysis in psychological research. Psychol Methods. 1999;4(3):272–99.

    Article  Google Scholar 

  31. Cousi C, Igier V, Quintard B. French cross-cultural adaptation and validation of the Quality of Life-Alzheimer’s Disease scale in Nursing Homes (QOL-AD NH). Health Qual Life Outcomes. 2021;19(1).

  32. Sekiguchi M, Wakita T, Otani K, Onishi Y, Fukuhara S, Kikuchi S, et al. Development and validation of a Symptom scale for lumbar spinal stenosis. Spine. 2012;37(3):232–9.

    Article  PubMed  Google Scholar 

  33. Lamash L, Josman N. Full-information factor analysis of the Daily Routine and Autonomy (DRA) questionnaire among adolescents with autism spectrum disorder. J Adolesc. 2020;79:221–31.

    Article  PubMed  Google Scholar 

  34. Luo YH, Yang J, Zhang YB. Development and validation of a patient-reported outcome measure for stroke patients. Health Qual Life Outcomes. 2015;13.

  35. Samejima F. Estimation of latent ability using a response pattern of graded scores. Psychometrika. 1969;34(1):1–97.

    Article  Google Scholar 

  36. Liu B. Measurement of patient-reported outcomes: principles, methods, and applications. Beijing, China: People’s Medical Publishing House; 2011.

    Google Scholar 

  37. Pinto MNFC, Pinto RMC, Mendonca TMS, Souza CG, da Silva CHM. Validation and calibration of the patient-reported outcomes measurement information system: Pediatric PROMIS®Emotional distress domain item banks, Portuguese version (Brazil/Portugal). Qual Life Res. 2020;29(7):1987–97.

    Article  PubMed  Google Scholar 

  38. Hu X, Zhao Z, Zhang S-K, Luo Y, Yu H, Zhang Y. CA-PROM: validation of a general patient-reported outcomes measure for Chinese patients with cancer. Cancer Epidemiol. 2020;67.

  39. Wang W, Zhou Y. Application status and Prospect of item response theory in Health-Related scales. Chin J Health Stat. 2018;35(4):633–6.

    Google Scholar 

  40. Baker FB. The basics of Item Response Theory. 2nd ed. ERIC Clearinghouse on Assessment and Evaluation; 2001.

  41. Crane PK, Gibbons LE, Jolley L, van Belle G. Differential item functioning analysis with ordinal logistic regression techniques - DIFdetect and difwithpar. Med Care. 2006;44(11):S115–23.

    Article  PubMed  Google Scholar 

  42. Fayers PM, Machin D. Quality of life: the assessment, analysis and reporting of patient-reported outcomes. 3rd ed. Chichester: Wiley; 2016.

    Google Scholar 

  43. Jodoin MG, Gierl MJ. Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Appl Measur Educ. 2001;14(4):329–49.

    Article  Google Scholar 

  44. Mokhtaryan-Gilani T, Ozgoli G, Kariman N, Nia HS, Doulabi MA, Nasiri M. Psychometric properties of the Persian translation of maternal postpartum quality of life questionnaire (MAPP-QOL). Health Qual Life Outcomes. 2021;19(1).

  45. Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.

    Article  PubMed  Google Scholar 

  46. Hays RD, Hayashi T. Beyond internal consistency reliability: Rationale and user’s guide for Multitrait Analysis Program on the microcomputer. Behav Res Methods Instruments Computers. 1990;22(2):167–75.

    Article  Google Scholar 

  47. Cohen J. A power primer. Psychol Bull. 1992;112(1):155–9.

    Article  CAS  PubMed  Google Scholar 

  48. Huang Y, Yin Y, Yang B, Tian C, Yu J, Liu C, et al. Study on the quality of life and related influencing factors among HIV/AIDS patients over 50 yearsold. Chin J Aids STD. 2021;27(5):490–3.

    CAS  Google Scholar 

  49. Xie F, Zheng H, Huang L, Yuan Z, Lu Y. Social Capital Associated with Quality of Life among People Living with HIV/AIDS in Nanchang, China. Int J Environ Res Public Health. 2019;16(2).

  50. Guaraldi G, Orlando G, Zona S, Menozzi M, Carli F, Garlassi E, et al. Premature age-related comorbidities among HIV-Infected persons compared with the General Population. Clin Infect Dis. 2011;53(11):1120–6.

    Article  PubMed  Google Scholar 

  51. Liu N, Lv J, Liu JC, Zhang YB. The PU-PROM: a patient-reported outcome measure for peptic ulcer disease. Health Expect. 2017;20(6):1350–66.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Hagman BT, Kuerbis AN, Morgenstern J, Bux DA, Parsons JT, Heidinger BE. An item response theory (IRT) analysis of the short inventory of problems-alcohol and drugs (SIP-AD) among non-treatment seeking men-who-have-sex-with-men: evidence for a shortened 10-item SIP-AD. Addict Behav. 2009;34(11):948–54.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Cappelleri JC, Lundy JJ, Hays RD. Overview of classical test theory and item response theory for the quantitative Assessment of items in developing patient-reported outcomes measures comment. Clin Ther. 2014;36(5):648–62.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Reise SP, Waller NG, Comrey AL. Factor analysis and scale revision. Psychol Assess. 2000;12(3):287–97.

    Article  CAS  PubMed  Google Scholar 

  55. Gorsuch RL. Exploratory factor analysis: its role in item analysis. J Pers Assess. 1997;68(3):532–60.

    Article  CAS  PubMed  Google Scholar 

  56. Asparouhov T, Muthen B. Exploratory structural equation modeling. Struct Equation Modeling-a Multidisciplinary J. 2009;16(3):397–438.

    Article  Google Scholar 

  57. Kahn JH. Factor analysis in counseling psychology research, training, and practice: principles, advances, and applications. Couns Psychol. 2006;34(5):684–718.

    Article  Google Scholar 

  58. Lv J, Xue J, Luo Y, Zhang Y. Item screening about PRO Scale of Chronic Heart failure. Chin J Health Stat. 2014;31(3):379–82.

    Google Scholar 

  59. Zhu L, Kong J, Zheng Y, Song M, Cheng X, Zhang L, et al. Development and initial validation of the chronic hepatitis B quality of life instrument (CHBQOL) among Chinese patients. Qual Life Res. 2019;28(11):3071–81.

    Article  PubMed  Google Scholar 

  60. Lutomski JE, Krabbe PFM, den Elzen WPJ, Olde-Rikkert MGM, Steyerberg EW, Muntinga ME, et al. Rasch analysis reveals comparative analyses of activities of daily living/instrumental activities of daily living summary scores from different residential settings is inappropriate. J Clin Epidemiol. 2016;74:207–17.

    Article  PubMed  Google Scholar 

  61. Pallant JF, Miller RL, Tennant A. Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis. BMC Psychiatry. 2006;6.

  62. Flora DB. Your coefficient alpha is probably wrong, but which Coefficient omega is right? A tutorial on using R to obtain better reliability estimates. Adv Methods Practices Psychol Sci. 2020;3(3):484–501.

    Article  Google Scholar 

  63. Moor N, de Graaf PM, Komter A. Family, welfare state generosity and the vulnerability of older adults: a cross-national study. J Aging Stud. 2013;27(4):347–57.

    Article  PubMed  Google Scholar 

  64. Huang Y. Family relations and life satisfaction of older people: a comparative study between two different hukous in China. Ageing Soc. 2012;32:19–40.

    Article  Google Scholar 

Download references


We would like to thank the patients for their participation. We are also thankful to physicians and nurses who participated in this study for their kind help during data collection.


This study was supported by the National Natural Science Foundation of China, Grant Number 72174177.

Author information

Authors and Affiliations



HMW, DLP, and TCE conceived and designed the study. YJZ contributed to research design, data collection and statistical analysis. RZ assisted in data collection and drafted the manuscript. JYY, JZ, RJG, BJW, and BHM participated in data collection. HMW, DLP, TCE, RZ, YJZ, and BJW finalized the manuscript. All authors have read and agreed to the final version of the manuscript.

Corresponding author

Correspondence to Hong-Mei Wang.

Ethics declarations

Ethics approval and consent to participate

This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Institutional Review Board of Zhejiang University (approval number: ZGL202007-03). Informed consent was obtained from all individual participants included in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, R., Zheng, YJ., Wang, BJ. et al. Development and validation of the patient-reported outcome for older people living with HIV/AIDS in China (PROHIV-OLD). Health Qual Life Outcomes 22, 30 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: