Development and calibration of a novel social relationship item bank to measure health-related quality of life (HRQoL) in Singapore

Background Social relationships (SR) is an important domain of health-related quality of life. We developed and calibrated a novel item bank to measure SR in Singapore, a multi-ethnic city in Southeast Asia. Methods We developed an initial candidate pool of 51 items from focus groups, individual in-depth interviews and existing instruments that had been developed and/or validated for use in Singapore. We administered all items in English to a multi-stage sample of subjects, stratified for age and gender, with and without medical conditions, recruited from community and hospital settings. We calibrated their responses using Samejima’s Graded Response Model (SGRM). We evaluated a final 30-item bank with respect to Item Response Theory (IRT) model assumptions, model fit, differential item functioning (DIF), and concurrent and known-groups validity. Results Among 503 participants (47.7% male, 41.4% above 50 years old, 34.0% Chinese, 33.6% Malay and 32.4% Indian), bi-factor model analyses supported essential unidimensionality: explained common variance of the general factor was 0.805 and omega hierarchical was 0.98. Local independence was deemed acceptable: the average absolute residual correlations were < 0.06 and 1.8% of the total item-pair residuals were flagged for local dependence. The overall SGRM model fit was adequate (p = 0.146). Five items exhibited DIF with respect to age, ethnicity and education, but were retained without modification of scores because they measured important aspects of SR. The SR scores correlated in the hypothesized direction with a self-reported measure of global health (Spearman’s rho = − 0.28, p < 0.001). Conclusion The 30-item SR item bank has shown acceptable psychometric properties. Future studies to evaluate the validity of SR scores when items are administered adaptively are needed. Electronic supplementary material The online version of this article (10.1186/s12955-019-1150-9) contains supplementary material, which is available to authorized users.


Introduction
The World Health Organization (WHO) states that health is a state of complete physical, mental and social well-being, and not merely the absence of disease or infirmity. [1] SRs are defined as having deep and meaningful human connectionsin other words, having good relationships with family, friends and others. [2,3] SR is found to be an important determinant of health-related quality of life (HRQoL) in the literature. [4] Although there are static instruments such as Lubben Social Network Scale (LSNS) to measure SR, there is no item bank to measure SR in the adult population. [5] There are item banks developed to measure social-related constructs such as social health before. [6,7] One such example was an item bank that measured social health on an adult general population was developed on a very diverse latent construct that involved social role participation, social network quality, social integration and interpersonal communication. [6] This item bank may not be optimal to meaningful measure social relationship. Being able to measure how deep and meaningful an individual's social relationships are, will facilitate interventions to be developed or refined to improve SR. [8] To address the gap, we developed a comprehensive and culturally sensitive SR item bank to measure SR in Singapore. The aim of this study was to calibrate an item bank of SR that includes important and culturally appropriate items measuring SR that can be used across different age, gender and ethnic groups. A successfully calibrated item bank will allow us to develop CAT or short static instruments to measure SR in Singapore, whose multi-ethnic, English speaking population is in some ways a microcosm of Asia.

Methods
This institutional board review-approved study (Ref 2014/916/A) consisted of the following sequential steps: development of a candidate item bank, administration of this candidate item bank via a community and hospital-based survey, and item bank calibration through assessing the assumptions of item response theory (IRT), fitting the responses to an IRT model, testing for differential item functioning (DIF) and testing the SR scores of the item bank using a priori hypotheses. In this manuscript, we will describe the details of the SR item bank calibration. The development of the calibrated item bank has been separately described and is briefly summarised below. [3,9,10] Development of a candidate item bank Methodological details for developing candidate items had been reported separately. [3,9,10] In brief, we adapted the PROMIS Qualitative Item Review (QIR) protocol [11], with input and endorsement from expert panels (comprising patients, members of the general public, and experts in psychology, social work and psychometrics). Items were generated from thematic analyses from focus groups and in-depth interviews and a literature search to identify studies that developed or validated a health-related quality of life instrument among adults in Singapore. Items from these sources were "binned" and "winnowed" (as detailed in the PRO-MIS QIR protocol) by two independent reviewers, blinded to the source of the items, who harmonized their selections to generate a list of candidate items (each item representing a sub domain). An expert panel reviewed and refined the face and content validity of these candidate items.

A community and hospital-based survey
We recruited English and Mandarin speaking Singapore citizens or permanent residents from the community and from the specialist outpatient clinics of Singapore General Hospital and National Heart Centre Singapore to sample subjects with and without illnesses, who would be expected to have a wider spectrum of social relationships. Within each language sampling frame, a purposive sample of participants was drawn based on age, gender, ethnicity and presence or absence of chronic illnesses. The list of chronic illnesses was based on the Singapore Burden of Disease Study [12] and is detailed in Additional file 1: Table S1. The presence or absence of a chronic illness was based on a participant's self-report of having been diagnosed of an illness by a physician. Participants were categorized into well, mildly unwell, and unwell, according to the number and severity of chronic illnesses. We excluded individuals who had impairments that precluded a meaningful exchange of ideas or other conditions that prohibited them from carrying out a normal interview, such as severe mental illness and cognitive impairment. In order to include participants with a wide spectrum of health, we predefined the proportion of participant recruitment in health categories to be 35% well, 15% mildly unwell, and 50% unwell.
Participants from the community were sampled using a proprietary sampling frame of public housing which accounts for 82% of Singapore residential households [13]. The primary sampling units were plots of land with approximately equal numbers of households, stratified according to geographic location and dwelling type. Households in each primary sampling unit were selected based on fixed route rules and skip patterns based on pre-specified ethnic and age quotas. Only one respondent per household was selected for a face-to-face interview. Three call attempts to each household were made at different times of the day with at least 1 visit on a non-work day (Saturday or Sunday). This residential-household-based sampling method has been used in the Singapore National Health Survey since 2004 [14,15]. The response rate of the study was computed using the standard set by the Council of American Survey Research Organization [16], generally defined as the number of completed interviews divided by the number of eligible reporting units in sample. We engaged a research company to conduct the standardized surveys on behalf of the study team.
We recruited participants to test the response for all 3 of our item banks (Physical Function, Positive Mindset and Social Relationship). Each recruited participant was administered the items for only one of the three domains, in either English or Mandarin. The survey was administered by trained interviewers. We chose to have the survey as interviewer-administered rather than self-administered so that illiterate subjects (who form 20% of Singapore population) could be included and the resulting item bank could be applied to all English and Mandarin speakers in Singapore. [17] There were 51 candidate items presented to the participants with 5-level item response options adapted from the PRO-MIS. The response options were "Never", "Seldom", "Sometimes", "Usually" and "Always" for items on frequency and "Not at all", "Mildly", "Moderately", "Quite a lot" and "Extremely" for items on intensity. We collected demographic information including age, gender, ethnicity, education, and current marital status. We collected a single-item, participant-reported assessment of global health for comparison.

Item bank calibration
We adapted the methodology published by PROMIS to calibrate the SR item bank. To assess Item Response Theory (IRT) model assumptions, we performed the following: for unidimensionality, we used factor analyses, which involved Exploratory (EFA) and Confirmatory (CFA) and Exploratory bifactor analyses with orthogonal rotation. If EFA and CFA indicated secondary dimensions, we provided details of the latter. In the bifactor analyses, we used (1) the average relative parameter bias (ARPB) which is the mean of the absolute differences between item loadings on the unidimensional model and item loadings in the bifactor's general factor [18], (2) the explained common variance (ECV) of the general factor, (3) omega hierarchical (omegaH) and (4) item ECVs (IECVs) to judge whether manifestations of secondary dimensions do not bar the instrument's interpretation of the construct as being predominantly unidimensional. For local independence, we examined the residual correlation matrix from the single factor CFA and where applicable, the residual correlation matrix from bifactor analyses as well. We state the criteria and thresholds for appraising IRT model assumptions in Table 2. We used Mplus Version 8.0 software to verify unidimensionality and local independence [19]. We adopted Samejima's graded response model (GRM) and estimated parameters via marginal maximum likelihood using the Xcalibre 4.2 IRT software (Assessment Systems Corporation, USA). We checked the adequacy of the overall model fit and individual item fits using a chi-square-based fit statistic. We examined differential item functioning (DIF) by these subgroups: age (age < 50 versus age ≥ 50), gender (Male/Female), ethnicity (Chinese vs non-Chinese) and education (completers of secondary education vs non-completers), by means of likelihood chi-square statistics from nested ordinal logistic regression models, assessing the incremental contribution of subgroup membership at a 5% level of significance. We assessed Fig. 1 The theta range for the SR item bank both uniform and non-uniform DIF using a specially written syntax for IBM Statistics Version 23.0 (http://www-01. ibm.com/support/docview.wss?uid=swg21572191, downloaded on 18 December 2017). We evaluated the 30 SR items for concurrent validity using a self-reported measure of global health (1 = Excellent health, 2 = Very good, 3 = Good, 4 = Fair, 5 = Poor), positing a moderate negative correlation (Spearman's rho < − 0.25) between SR theta scores and the global health self-report. We also verified that adjusted means of global health categories showed a decreasing trend. Adjustment was made for participant's age (20-35, 36-49, 50 and above), gender (Male/Female), completion of secondary education (Yes/No) and current marital status (Single, Married, Divorced/Widowed/Separated). We used a 5% significance level. Evaluations of concurrent validity were implemented in IBM Statistics Version 25.0 software.

Results
Of 8027 contacted subjects, 4918 were eligible (see Additional file 1: Figure S1 for details). We implemented a quota system for eligible subjects, as a result of which 41.2% (2034/4918) of eligible subjects were surveyed, while 2851 eligible subjects were excluded as their quotas had been met. All set quotas for sociodemographic categories were achieved within 5% of differences. Thus a total of 2034 Singapore citizens or permanent residents (consisting of 1170 subjects from hospital-based specialist outpatient and 864 subjects from the community) completed one of 3 item banks (SR, physical functioning, and positive mindset), of which 679 subjects completed the SR item bank survey in English (n = 503) or Chinese (n = 173). This paper focuses on the analysis and calibration of the English SR item bank. Characteristics of the study participants are shown in Table 1. The full range of theta of the SR item bank is presented in Fig. 1.

Item analyses
Thirty of 51 candidate items were retained in the final SR item bank after reviewing initial IRT model fits and adequacy checks and consulting with the expert panel. The 30 items showed a very high inter-item consistency with a Cronbach's alpha of 0.96. Item means varied from 2.76 to 4.58 with a mean of 4.24 and standard deviation of 0.36. The mean item-to-total score correlation was 0.65 (SD = 0.12). The percentage of non-response did not exceed 0.2%.

IRT assumptions of unidimensionality and local independence
Unidimensionality was evaluated with EFA, CFA and bifactor analyses. In the EFA, the first factor accounted for 18.1% of the variance and the ratio of the first and second highest eigen values was 8.01 (Table 2). In the CFA, the results indicated the presence of secondary dimensions based on Comparative Fit Index (CFI) < 0.95, Tucker-Lewis Index (TLI) < 0.95 and Root Mean Square Error of Approximation (RMSEA) > 0.06 (Table 2). In the light of EFA and CFA results, we pursued exploratory bifactor analyses specified with two, three and four specific factors. The results showed that the presence of secondary dimensions did not impede the interpretation of the item bank as being predominantly unidimensional: the ARPB < 10% [18], the minimum ECV and omegaH values were respectively 0.80 and 0.98 which are much higher than Reise et al's suggested criteria (ECV > 0.60 and omegaH> 0.70) [20]. Therefore, the item bank can be regarded as being essentially unidimensional. This interpretation was reinforced by mean item ECVs which were mostly above 0.80 ( Table 3). Inspection of the single-factor CFA residual correlation matrix revealed little local dependence: the mean of the residual correlations was < 0.07 which was less than the 0.1 threshold. The proportion of Based on the list of chronic diseases as defined in Additional file 1: Table S1 item-pairs having problematic residual correlations (i.e., greater than 0.20) was 1.8% (8 of 435). Items 1 ("I have a good relationship with my family") and 16 ("I keep in touch with my friends") accounted for 4 out of the 8 problematic residual correlations. Examination of the bifactor residual correlation matrices (across models with two, three and four specific factors) showed a maximum mean residual correlation was 0.026 which is less than the threshold of 0.10. In all three bifactor models, the percentage of problematic residual correlations was < 0.1%. We thus judged the degree of local dependency to be slight as not to bias the accuracy of IRT parameter estimation.

IRT calibration and fit
SR items were summed so that higher scores reflected better social relationships. The overall fit of the GRM was found to be adequate (chi-square = 1710.53, df = 1650, p = 0.146). The items and parameter estimates are given in Table 4. Setting the level of significance at 0.01 for GRM item fit, the model did not fit well for three items: Items 34 ("I know that I have someone to help me when I have financial difficulties."), 20 ("I spend time with my friends.") and 50 ("Overall, I am satisfied with the support I give to others."). For all other items, p values ranged from 0.03 to 1.00 with a mean of 0.55.

Differential item function detection
At the 1% level of significance, none of the items had gender-related DIF but five items were found to have significant DIF in age, ethnicity and education. The two items with non-uniform age-related DIF were Items 10 ("I take care of my family.") and 51 ("Overall, I am satisfied with how well I communicate with others."). In ethnicity, Items 18 ("I have gatherings with my circle of friends.") and 8 ("My family is willing to give me information when I need it.") were respectively found to have uniform and non-uniform DIF. In education, both Items 10 and 16 ("I keep in touch with my circle of friends.") displayed non-uniform DIF.

Concurrent validity evaluation
The spearman correlation between SR scores and self-reported global health was r = − 0.28 (95% CI: -0.359 to − 0.196), supporting the hypothesis of a moderate correlation between the two measures. After accounting for age, gender, completion of secondary education and current marital status, the adjusted means of the ordered categories of global health likewise showed a decreasing trend (Table 5). Both these findings supported the concurrent validity of the SR item bank.   Table 3 ǂ Minimum general factor ECV attained among three exploratory bifactor models with 2, 3 and 4 specific factors. See Table 3 ¥ Minimum OmegaH attained among three exploratory bifactor models. See Table 3

Discussion
This study describes the calibration of a culturally sensitive item bank for SR. Items from this SR item bank were derived from (1) qualitative research to identify and incorporate perspectives from subjects in the population, representing a wide spectrum of healthy and ill subjects (with chronic diseases) and (2) Items from developed static instruments measuring related concepts in the same population. The item bank we developed thus has high content validity. The calibration processes aligned with the approach espoused by the PROMIS group [20][21][22][23][24][25]. The findings of this successful calibration indicate that this social relationship item bank is a promising tool for measuring SR. The analyses of the IRT assumptions show that the assumptions of essential unidimensionality and local independence are met. The bifactor model results exceeded the recommended thresholds. [21] DIF tests for age, ethnicity and education identified five itemshowever the impact of DIF was modest. In item bank development, statistical methods were used to inform, and not to decide item selection. [26] Therefore, items were retained because of their importance and the modest impact of DIF [27,28].
SR is a novel construct which has wide-ranging impact on health and its measurement is thus important to improve health. For example, high SR has been shown to improve social support and ameliorate the impact of diseases on overall health. [29] High SR has also been shown to be associated with low mortality, improved immune function and also delay the development of cardiovascular disease. [30] Given this, the SR item bank has several potential usesfor example as an outcome measure for individual-or family-based cohort studies or interventional trials in community or hospital-based settings. [31] This study also supports the concurrent construct validity of the SR item bank. Our hypothesis testing showed moderate correlation between the SR scores and self-reported global health. Good social relationship may contribute to better health status due to stronger social support. [32] Another possible use of the SR item bank may therefore be to screen for people with poor social support and intervene as appropriate. However, further studies are needed to validate the SR item bank as a screening tool.
We recognize several limitations of this study. First, a significant number of eligible subjects were excluded because the quota for these subjects had been met. However, partly because of the use of quota sampling, the demographics in our sample are comparable to that of the population in Singapore. [33] Second, the SR item bank may have poorer coverage on higher SR trait but better coverage on lower SR trait. The SR item bank will be most useful to identify people at risk of impaired social relationship or people who are in need of social support. [32]

Conclusions
We developed and calibrated a 30-item bank for SR that is relevant to the Singaporean population and applicable to healthy adults and those having chronic illnesses. This item bank shows promise and will subsequently be used to develop relevant short-form tests or CATs to facilitate routine clinical use.

Additional file
Additional file 1: Table S1. Chronic Illnesses qualifying for patient recruitment. Figure S1