- Open Access
Development and calibration of a novel social relationship item bank to measure health-related quality of life (HRQoL) in Singapore
Health and Quality of Life Outcomesvolume 17, Article number: 82 (2019)
Social relationships (SR) is an important domain of health-related quality of life. We developed and calibrated a novel item bank to measure SR in Singapore, a multi-ethnic city in Southeast Asia.
We developed an initial candidate pool of 51 items from focus groups, individual in-depth interviews and existing instruments that had been developed and/or validated for use in Singapore. We administered all items in English to a multi-stage sample of subjects, stratified for age and gender, with and without medical conditions, recruited from community and hospital settings. We calibrated their responses using Samejima’s Graded Response Model (SGRM). We evaluated a final 30-item bank with respect to Item Response Theory (IRT) model assumptions, model fit, differential item functioning (DIF), and concurrent and known-groups validity.
Among 503 participants (47.7% male, 41.4% above 50 years old, 34.0% Chinese, 33.6% Malay and 32.4% Indian), bi-factor model analyses supported essential unidimensionality: explained common variance of the general factor was 0.805 and omega hierarchical was 0.98. Local independence was deemed acceptable: the average absolute residual correlations were < 0.06 and 1.8% of the total item-pair residuals were flagged for local dependence. The overall SGRM model fit was adequate (p = 0.146). Five items exhibited DIF with respect to age, ethnicity and education, but were retained without modification of scores because they measured important aspects of SR. The SR scores correlated in the hypothesized direction with a self-reported measure of global health (Spearman’s rho = − 0.28, p < 0.001).
The 30-item SR item bank has shown acceptable psychometric properties. Future studies to evaluate the validity of SR scores when items are administered adaptively are needed.
The World Health Organization (WHO) states that health is a state of complete physical, mental and social well-being, and not merely the absence of disease or infirmity.  SRs are defined as having deep and meaningful human connections – in other words, having good relationships with family, friends and others. [2, 3] SR is found to be an important determinant of health-related quality of life (HRQoL) in the literature.  Although there are static instruments such as Lubben Social Network Scale (LSNS) to measure SR, there is no item bank to measure SR in the adult population. 
There are item banks developed to measure social-related constructs such as social health before. [6, 7] One such example was an item bank that measured social health on an adult general population was developed on a very diverse latent construct that involved social role participation, social network quality, social integration and interpersonal communication.  This item bank may not be optimal to meaningful measure social relationship. Being able to measure how deep and meaningful an individual’s social relationships are, will facilitate interventions to be developed or refined to improve SR. 
To address the gap, we developed a comprehensive and culturally sensitive SR item bank to measure SR in Singapore. The aim of this study was to calibrate an item bank of SR that includes important and culturally appropriate items measuring SR that can be used across different age, gender and ethnic groups. A successfully calibrated item bank will allow us to develop CAT or short static instruments to measure SR in Singapore, whose multi-ethnic, English speaking population is in some ways a microcosm of Asia.
This institutional board review-approved study (Ref 2014/916/A) consisted of the following sequential steps: development of a candidate item bank, administration of this candidate item bank via a community and hospital-based survey, and item bank calibration through assessing the assumptions of item response theory (IRT), fitting the responses to an IRT model, testing for differential item functioning (DIF) and testing the SR scores of the item bank using a priori hypotheses. In this manuscript, we will describe the details of the SR item bank calibration. The development of the calibrated item bank has been separately described and is briefly summarised below. [3, 9, 10]
Development of a candidate item bank
Methodological details for developing candidate items had been reported separately. [3, 9, 10] In brief, we adapted the PROMIS Qualitative Item Review (QIR) protocol , with input and endorsement from expert panels (comprising patients, members of the general public, and experts in psychology, social work and psychometrics). Items were generated from thematic analyses from focus groups and in-depth interviews and a literature search to identify studies that developed or validated a health-related quality of life instrument among adults in Singapore. Items from these sources were “binned” and “winnowed” (as detailed in the PROMIS QIR protocol) by two independent reviewers, blinded to the source of the items, who harmonized their selections to generate a list of candidate items (each item representing a sub domain). An expert panel reviewed and refined the face and content validity of these candidate items.
A community and hospital-based survey
We recruited English and Mandarin speaking Singapore citizens or permanent residents from the community and from the specialist outpatient clinics of Singapore General Hospital and National Heart Centre Singapore to sample subjects with and without illnesses, who would be expected to have a wider spectrum of social relationships. Within each language sampling frame, a purposive sample of participants was drawn based on age, gender, ethnicity and presence or absence of chronic illnesses. The list of chronic illnesses was based on the Singapore Burden of Disease Study  and is detailed in Additional file 1: Table S1. The presence or absence of a chronic illness was based on a participant’s self-report of having been diagnosed of an illness by a physician. Participants were categorized into well, mildly unwell, and unwell, according to the number and severity of chronic illnesses. We excluded individuals who had impairments that precluded a meaningful exchange of ideas or other conditions that prohibited them from carrying out a normal interview, such as severe mental illness and cognitive impairment. In order to include participants with a wide spectrum of health, we predefined the proportion of participant recruitment in health categories to be 35% well, 15% mildly unwell, and 50% unwell.
Participants from the community were sampled using a proprietary sampling frame of public housing which accounts for 82% of Singapore residential households . The primary sampling units were plots of land with approximately equal numbers of households, stratified according to geographic location and dwelling type. Households in each primary sampling unit were selected based on fixed route rules and skip patterns based on pre-specified ethnic and age quotas. Only one respondent per household was selected for a face-to-face interview. Three call attempts to each household were made at different times of the day with at least 1 visit on a non-work day (Saturday or Sunday). This residential-household-based sampling method has been used in the Singapore National Health Survey since 2004 [14, 15]. The response rate of the study was computed using the standard set by the Council of American Survey Research Organization , generally defined as the number of completed interviews divided by the number of eligible reporting units in sample. We engaged a research company to conduct the standardized surveys on behalf of the study team.
We recruited participants to test the response for all 3 of our item banks (Physical Function, Positive Mindset and Social Relationship). Each recruited participant was administered the items for only one of the three domains, in either English or Mandarin. The survey was administered by trained interviewers. We chose to have the survey as interviewer-administered rather than self-administered so that illiterate subjects (who form 20% of Singapore population) could be included and the resulting item bank could be applied to all English and Mandarin speakers in Singapore.  There were 51 candidate items presented to the participants with 5-level item response options adapted from the PROMIS. The response options were “Never”, “Seldom”, “Sometimes”, “Usually” and “Always” for items on frequency and “Not at all”, “Mildly”, “Moderately”, “Quite a lot” and “Extremely” for items on intensity. We collected demographic information including age, gender, ethnicity, education, and current marital status. We collected a single-item, participant-reported assessment of global health for comparison.
Item bank calibration
We adapted the methodology published by PROMIS to calibrate the SR item bank. To assess Item Response Theory (IRT) model assumptions, we performed the following: for unidimensionality, we used factor analyses, which involved Exploratory (EFA) and Confirmatory (CFA) and Exploratory bifactor analyses with orthogonal rotation. If EFA and CFA indicated secondary dimensions, we provided details of the latter. In the bifactor analyses, we used (1) the average relative parameter bias (ARPB) which is the mean of the absolute differences between item loadings on the unidimensional model and item loadings in the bifactor’s general factor , (2) the explained common variance (ECV) of the general factor, (3) omega hierarchical (omegaH) and (4) item ECVs (IECVs) to judge whether manifestations of secondary dimensions do not bar the instrument’s interpretation of the construct as being predominantly unidimensional. For local independence, we examined the residual correlation matrix from the single factor CFA and where applicable, the residual correlation matrix from bifactor analyses as well. We state the criteria and thresholds for appraising IRT model assumptions in Table 2. We used Mplus Version 8.0 software to verify unidimensionality and local independence . We adopted Samejima’s graded response model (GRM) and estimated parameters via marginal maximum likelihood using the Xcalibre 4.2 IRT software (Assessment Systems Corporation, USA). We checked the adequacy of the overall model fit and individual item fits using a chi-square-based fit statistic. We examined differential item functioning (DIF) by these subgroups: age (age < 50 versus age ≥ 50), gender (Male/Female), ethnicity (Chinese vs non-Chinese) and education (completers of secondary education vs non-completers), by means of likelihood chi-square statistics from nested ordinal logistic regression models, assessing the incremental contribution of subgroup membership at a 5% level of significance. We assessed both uniform and non-uniform DIF using a specially written syntax for IBM Statistics Version 23.0 (http://www-01.ibm.com/support/docview.wss?uid=swg21572191, downloaded on 18 December 2017). We evaluated the 30 SR items for concurrent validity using a self-reported measure of global health (1 = Excellent health, 2 = Very good, 3 = Good, 4 = Fair, 5 = Poor), positing a moderate negative correlation (Spearman’s rho < − 0.25) between SR theta scores and the global health self-report. We also verified that adjusted means of global health categories showed a decreasing trend. Adjustment was made for participant’s age (20–35, 36–49, 50 and above), gender (Male/Female), completion of secondary education (Yes/No) and current marital status (Single, Married, Divorced/Widowed/Separated). We used a 5% significance level. Evaluations of concurrent validity were implemented in IBM Statistics Version 25.0 software.
Of 8027 contacted subjects, 4918 were eligible (see Additional file 1: Figure S1 for details). We implemented a quota system for eligible subjects, as a result of which 41.2% (2034/4918) of eligible subjects were surveyed, while 2851 eligible subjects were excluded as their quotas had been met. All set quotas for sociodemographic categories were achieved within 5% of differences. Thus a total of 2034 Singapore citizens or permanent residents (consisting of 1170 subjects from hospital-based specialist outpatient and 864 subjects from the community) completed one of 3 item banks (SR, physical functioning, and positive mindset), of which 679 subjects completed the SR item bank survey in English (n = 503) or Chinese (n = 173). This paper focuses on the analysis and calibration of the English SR item bank. Characteristics of the study participants are shown in Table 1. The full range of theta of the SR item bank is presented in Fig. 1.
Thirty of 51 candidate items were retained in the final SR item bank after reviewing initial IRT model fits and adequacy checks and consulting with the expert panel. The 30 items showed a very high inter-item consistency with a Cronbach’s alpha of 0.96. Item means varied from 2.76 to 4.58 with a mean of 4.24 and standard deviation of 0.36. The mean item-to-total score correlation was 0.65 (SD = 0.12). The percentage of non-response did not exceed 0.2%.
IRT assumptions of unidimensionality and local independence
Unidimensionality was evaluated with EFA, CFA and bifactor analyses. In the EFA, the first factor accounted for 18.1% of the variance and the ratio of the first and second highest eigen values was 8.01 (Table 2). In the CFA, the results indicated the presence of secondary dimensions based on Comparative Fit Index (CFI) < 0.95, Tucker-Lewis Index (TLI) < 0.95 and Root Mean Square Error of Approximation (RMSEA) > 0.06 (Table 2). In the light of EFA and CFA results, we pursued exploratory bifactor analyses specified with two, three and four specific factors. The results showed that the presence of secondary dimensions did not impede the interpretation of the item bank as being predominantly unidimensional: the ARPB < 10% , the minimum ECV and omegaH values were respectively 0.80 and 0.98 which are much higher than Reise et al’s suggested criteria (ECV > 0.60 and omegaH> 0.70) . Therefore, the item bank can be regarded as being essentially unidimensional. This interpretation was reinforced by mean item ECVs which were mostly above 0.80 (Table 3). Inspection of the single-factor CFA residual correlation matrix revealed little local dependence: the mean of the residual correlations was < 0.07 which was less than the 0.1 threshold. The proportion of item-pairs having problematic residual correlations (i.e., greater than 0.20) was 1.8% (8 of 435). Items 1 (“I have a good relationship with my family”) and 16 (“I keep in touch with my friends”) accounted for 4 out of the 8 problematic residual correlations. Examination of the bifactor residual correlation matrices (across models with two, three and four specific factors) showed a maximum mean residual correlation was 0.026 which is less than the threshold of 0.10. In all three bifactor models, the percentage of problematic residual correlations was < 0.1%. We thus judged the degree of local dependency to be slight as not to bias the accuracy of IRT parameter estimation.
IRT calibration and fit
SR items were summed so that higher scores reflected better social relationships. The overall fit of the GRM was found to be adequate (chi-square = 1710.53, df = 1650, p = 0.146). The items and parameter estimates are given in Table 4. Setting the level of significance at 0.01 for GRM item fit, the model did not fit well for three items: Items 34 (“I know that I have someone to help me when I have financial difficulties.”), 20 (“I spend time with my friends.”) and 50 (“Overall, I am satisfied with the support I give to others.”). For all other items, p values ranged from 0.03 to 1.00 with a mean of 0.55. The median of item discrimination parameters was 1.22 (mean = 1.24, median = 1.43).
Differential item function detection
At the 1% level of significance, none of the items had gender-related DIF but five items were found to have significant DIF in age, ethnicity and education. The two items with non-uniform age-related DIF were Items 10 (“I take care of my family.”) and 51 (“Overall, I am satisfied with how well I communicate with others.”). In ethnicity, Items 18 (“I have gatherings with my circle of friends.”) and 8 (“My family is willing to give me information when I need it.”) were respectively found to have uniform and non-uniform DIF. In education, both Items 10 and 16 (“I keep in touch with my circle of friends.”) displayed non-uniform DIF.
Concurrent validity evaluation
The spearman correlation between SR scores and self-reported global health was r = − 0.28 (95% CI: -0.359 to − 0.196), supporting the hypothesis of a moderate correlation between the two measures. After accounting for age, gender, completion of secondary education and current marital status, the adjusted means of the ordered categories of global health likewise showed a decreasing trend (Table 5). Both these findings supported the concurrent validity of the SR item bank.
This study describes the calibration of a culturally sensitive item bank for SR. Items from this SR item bank were derived from (1) qualitative research to identify and incorporate perspectives from subjects in the population, representing a wide spectrum of healthy and ill subjects (with chronic diseases) and (2) Items from developed static instruments measuring related concepts in the same population. The item bank we developed thus has high content validity. The calibration processes aligned with the approach espoused by the PROMIS group [20,21,22,23,24,25]. The findings of this successful calibration indicate that this social relationship item bank is a promising tool for measuring SR.
The analyses of the IRT assumptions show that the assumptions of essential unidimensionality and local independence are met. The bifactor model results exceeded the recommended thresholds.  DIF tests for age, ethnicity and education identified five items – however the impact of DIF was modest. In item bank development, statistical methods were used to inform, and not to decide item selection.  Therefore, items were retained because of their importance and the modest impact of DIF [27, 28].
SR is a novel construct which has wide-ranging impact on health and its measurement is thus important to improve health. For example, high SR has been shown to improve social support and ameliorate the impact of diseases on overall health.  High SR has also been shown to be associated with low mortality, improved immune function and also delay the development of cardiovascular disease.  Given this, the SR item bank has several potential uses – for example as an outcome measure for individual- or family-based cohort studies or interventional trials in community or hospital-based settings. 
This study also supports the concurrent construct validity of the SR item bank. Our hypothesis testing showed moderate correlation between the SR scores and self-reported global health. Good social relationship may contribute to better health status due to stronger social support.  Another possible use of the SR item bank may therefore be to screen for people with poor social support and intervene as appropriate. However, further studies are needed to validate the SR item bank as a screening tool.
We recognize several limitations of this study. First, a significant number of eligible subjects were excluded because the quota for these subjects had been met. However, partly because of the use of quota sampling, the demographics in our sample are comparable to that of the population in Singapore.  Second, the SR item bank may have poorer coverage on higher SR trait but better coverage on lower SR trait. The SR item bank will be most useful to identify people at risk of impaired social relationship or people who are in need of social support. 
We developed and calibrated a 30-item bank for SR that is relevant to the Singaporean population and applicable to healthy adults and those having chronic illnesses. This item bank shows promise and will subsequently be used to develop relevant short-form tests or CATs to facilitate routine clinical use.
Analysis of Variance
Computerised Adaptive Testing
Confirmatory factor analysis
Comparative Fit Index
Differential Item Functioning
Explained Common Variance
Explanatory Factor Analysis
Graded Response Model
Health-Related Quality of Life
Item Explained Common Variance
Item Response Theory
Patient Reported Outcome Measurement Information System
Qualitative Item Review
Root Mean Square Error of Approximation
Tucker Lewis Index
World Health Organization
World Health Organization Quality of Life Scale
Kuhn S, Rieger UM. Health is a state of complete physical, mental and social well-being and not merely absence of disease or infirmity. Surg Obes Relat Dis. 2017;13:887.
Yacoub YI, Amine B, Laatiris A, Hajjaj-Hassouni N. Spinsterhood and its impact on disease features in women with rheumatoid arthritis. Health Qual Life Outcomes. 2011;9:58.
Thumboo J, Ow MY, Uy EJB, Xin X, Chan ZYC, Sung SC, Bautista DC, Cheung YB. Developing a comprehensive, culturally sensitive conceptual framework of health domains in Singapore; 2018.
Thumboo J, Ow MYL, Uy EJB, Xin X, Chan ZYC, Sung SC, Bautista DC, Cheung YB. Developing a comprehensive, culturally sensitive conceptual framework of health domains in Singapore. PLoS One. 2018;13:e0199881.
Lubben J, Blozik E, Gillmann G, Iliffe S, von Renteln Kruse W, Beck JC, Stuck AE. Performance of an abbreviated version of the Lubben social network scale among three European community-dwelling older adult populations. Gerontologist. 2006;46:503–13.
Hahn EA, Devellis RF, Bode RK, Garcia SF, Castel LD, Eisen SV, Bosworth HB, Heinemann AW, Rothrock N, Cella D. Measuring social health in the patient-reported outcomes measurement information system (PROMIS): item bank development and testing. Qual Life Res. 2010;19:1035–44.
DeWalt DA, Thissen D, Stucky BD, Langer MM, DeWitt EM, Irwin DE, Lai J-S, Yeatts KB, Gross HE, Taylor O, Varni JW. PROMIS pediatric peer relationships scale: development of a peer relationships item Bank as part of social health measurement. Health psychology : official journal of the Division of Health Psychology, American Psychological Association. 2013;32. https://doi.org/10.1037/a0032670.
Deatrick JA. Where is “family” in the social determinants of health? Implications for family nursing practice, research, education, and policy. J Fam Nurs. 2017;23:423–33.
Uy EJB, Bautista DC, Xin X, Cheung YB, Thio ST, Thumboo J. Using best-worst scaling choice experiments to elicit the most important domains of health for health-related quality of life in Singapore. PLoS One. 2018;13:e0189687.
Uy EJ, Xiao Y, Xin X, Yeo PTJ, Pua Y-H, Lee GL, Kwan YH, Vaingankar JA, Subramaniam M, Chan MF, et al: Developing item banks to measure 3 important domains of health-related quality of life (HRQoL) in Singapore. 2018.
Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, Ader D, Fries JF, Bruce B, Rose M. The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Med Care. 2007;45:S3–S11.
SMo H. Singapore burden of disease study, vol. 2014; 2010.
Industry SMoT: census of population 2010. 2011.
Health SMo: National Health Survey 2004. 2005.
Health SMo: National Health Survey 2007. 2009.
(CASRO) CoASRO: On the Definition of Response Rates. 1982.
Statistics SDo: census of population 2010: demographic characteristics, education, language and religion. (Statistics SDo ed. Singapore; 2010.
Muthén B, Kaplan D, Hollis M. On structural equation modeling with data that are not missing completely at random. Psychometrika. 1987;52:431–62.
L.K. M, B.O. M: Mplus User's Guide. Sixth edn. Los Angeles, CA; 2010.
Steven PR, Richard S, Keith FW, Mark GH. Multidimensionality and structural coefficient Bias in structural equation modeling: a Bifactor perspective. Educ Psychol Meas. 2012;73:5–26.
PROMIS instrument development and validation scientific standards version 2.0. 2013.
Reise SP, Bonifay WE, Haviland MG. Scoring and modeling psychological measures in the presence of multidimensionality. J Pers Assess. 2013;95:129–40.
Amtmann D, Cook KF, Jensen MP, Chen WH, Choi S, Revicki D, Cella D, Rothrock N, Keefe F, Callahan L, Lai JS. Development of a PROMIS item bank to measure pain interference. Pain. 2010;150:173–82.
Brian DS, David T, Maria Orlando E. Using logistic approximations of marginal trace lines to develop short assessments. Appl Psychol Meas. 2012;37:41–57.
Brian DS, Maria Orlando E: Using hierarchical IRT models to create unidimensional measures from multidimensional data. In Handbook of Item Response Theory Modeling. Routledge; 2014.
Coste J, Guillemin F, Pouchot J, Fermanian J. Methodological approaches to shortening composite measurement scales. J Clin Epidemiol. 1997;50:247–52.
Crins MHP, Terwee CB, Klausch T, Smits N, de Vet HCW, Westhovens R, Cella D, Cook KF, Revicki DA, van Leeuwen J, et al. The Dutch-Flemish PROMIS physical function item bank exhibited strong psychometric properties in patients with chronic pain. J Clin Epidemiol. 2017;87:47–58.
Haley SM, Fragala-Pinkham MA, Dumas HM, Ni P, Gorton GE, Watson K, Montpetit K, Bilodeau N, Hambleton RK, Tucker CA. Evaluation of an item bank for a computerized adaptive test of activity in children with cerebral palsy. Phys Ther. 2009;89:589–600.
Hakulinen C, Pulkki-Raback L, Virtanen M, Jokela M, Kivimaki M, Elovainio M. Social isolation and loneliness as risk factors for myocardial infarction, stroke and mortality: UK biobank cohort study of 479 054 men and women. Heart. 2018.
Umberson D, Montez JK. Social relationships and health: a flashpoint for health policy. J Health Soc Behav. 2010;51:S54–66.
Coop Gordon K, Roberson PNE, Hughes JA, Khaddouma AM, Swamy GK, Noonan D, Gonzalez AM, Fish L, Pollak KI. The effects of a couples-based health behavior intervention during pregnancy on Latino Couples' dyadic satisfaction postpartum. Fam Process. 2018.
Turner RJ, Turner JB, Hale WB. Social relationships and social support. In: Johnson RJ, Turner RJ, Link BG, editors. Sociology of mental health: selected topics from forty years 1970s–2010s. Cham: Springer International Publishing; 2014. p. 1–20.
Sow WT, Wee HL, Wu Y, Tai ES, Gandek B, Lee J, Ma S, Heng D, Thumboo J. Normative data for the Singapore English and Chinese SF-36 version 2 health survey. Ann Acad Med Singap. 2014;43:15–23.
We thank Prof Angelique Wei-Ming Chan and Ms. Grace Teck Cheng Kwek who served on the expert panel to review candidate items, and all subjects who participated in this research.
This study was funded by grant HSRG/0034/2013 from the National Medical Research Council of Singapore.
Availability of data and materials
Data can be requested from the corresponding author.
Ethics approval and consent to participate
The SingHealth Centralised Institutional Board approved this research (Ref No: 2014/916/A) and written informed consent was provided before recruitment into the study.
Consent for publication
All authors disclose there is no conflict of interest in production of this manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Chronic Illnesses qualifying for patient recruitment. Figure S1 Flow chart describing the response rate. (DOCX 35 kb)