Longitudinal measurement invariance and explanatory IRT models for adolescents’ oral health-related quality of life

Background Longitudinal invariance is a perquisite for a valid comparison of oral health-related quality of life (OHRQoL) scores over time. Item response theory (IRT) models can assess measurement invariance and allow better estimation of the associations between predictors and latent construct. By extending IRT models, this study aimed to investigate the longitudinal invariance of the two 8-item short forms of the Child Perception Questionnaire (CPQ11–14) regression short form (RSF:8) and item-impact short form (ISF:8) and identify factors associated with adolescents’ OHRQoL and its change. Methods All students from S1 and S2 (equivalent to US grades 6 and 7) who were born in April 1997 and May 1997 (at age 12) from 45 randomly selected secondary schools were invited to participate in this study and followed up after 3 years. Data on the CPQ11–14 RSF:8 and CPQ11–14 ISF:8, demographics, oral health behavior and status were collected. Explanatory graded response models were fitted to both short forms of the CPQ11–14 data for assessing longitudinal invariance and factors associated with OHRQoL. The Bayesian estimation method – Monte Carlo Markov Chain (MCMC) with Gibbs sampling was adopted for parameter estimation and the credible intervals were used for inference. Results Data from 649 children at age 12 at baseline and 415 children at age 15 at follow up were analyzed. For the 12 years old children, healthier oral health behavior, better gum status, families with both parents employed and parents’ education level were found to be associated with better OHRQoL. Four items among the 2 short forms lacked longitudinal invariance. With statistical adjustment of longitudinal invariance, OHRQoL were found improved in general over the 3 years but no predictor was associated with OHRQoL in follow-up. For those with decreased family income, their OHRQoL had worsened over 3 years. Conclusions IRT explanatory analysis enables a more valid identification of the factors associated with OHRQoL and its changes over time. It provides important information to oral healthcare researchers and policymakers. Electronic supplementary material The online version of this article (10.1186/s12955-018-0879-x) contains supplementary material, which is available to authorized users.


Background
There has been a growing concern of measuring an individual's oral health-related quality of life (OHRQoL) in health research in the past two decades. OHRQoL is an abstract concept which cannot be directly observed and different measurement instruments have been developed over the years. It is an essential prerequisite to assess the psychometric properties of an instrument before it can be adopted and use appropriately in research.
In studying OHRQoL, rating scales on frequency or severity of conditions described by items are often used and a number is assigned to each response option, for example, "Never" = 0, "Once/ twice" = 1 and "Sometimes" = 2 …,etc. The total score is typically used as the participant's score of OHRQoL. This scoring method assumes that each item equally relates to the overall OHRQoL level and the response options are in interval scale, which may not be complied in reality. Alternatively, item response theory (IRT) can be used to analyze individuals' OHRQoL without the above assumptions [1]. IRT models relates the probability of answering certain options of each item in relation to (1) the respondents' level of latent construct (i.e. OHRQoL) and (2) the measurement properties (reflected by discriminatory and threshold parameters) of each item. IRT has been commonly applied to evaluate the measurement properties of measurement including OHRQoL instruments [2,3].
Child Perceptions Questionnaire (CPQ [11][12][13][14] regression short form (RSF:8) and item-impact short form (ISF:8) are 2 versions of 8-item questionnaire commonly used in measuring OHRQoL for children aged at around the onset of puberty [4]. The 8-item forms CPQ [11][12][13][14] (RSF:8 and ISF:8) have been extensively validated in Hong Kong using not only conventional approaches, but also factor analysis and IRT [2,5,6]. Children at this stage are characterized by increasing centrality of peer crown and their pre-occupation with others' view of self [7]. With this rapid psychosocial development, their perception of CPQ [11][12][13][14] items may change as they grow up. For example, a rating of "2" in the follow-up questionnaire may be interpreted in the same way as a rating of "1" at baseline [8]. Therefore, statistical adjustment is needed to ensure the CPQ 11-14 questionnaire administered at different ages are comparable (i.e. the concept of longitudinal invariance). The establishment of longitudinal invariance help warrant that any change in the score is attributed only to the true change in OHRQoL instead of a change in conceptualization, interpretation or recalibration of the measurement scale over time [9].
Besides the assessment of the psychometric properties of an instrument, researchers are often interested in investigating factors explaining the OHRQoL using explanatory analysis. It is commonly performed with the sum score of the OHRQoL items as the outcome measurement in multiple linear regressions. However, this approach again ignores the ordinal nature of the response options [1]. Some researchers have made use of the latent score estimated form IRT model (step 1) and subsequently use the score to perform regression analysis (step 2). However, this 2-step approach may yield biased standard error of the regression coefficients due to the possible violation of homoscedasticity assumption of linear regression [10]. Extending the IRT model by including predictors (explanatory item response analysis) helps (1) quantify the heteroskedasticity and (2) secure a more reliable inference of parameter by simultaneously estimating the item parameters, latent OHRQoL scores and regression coefficients [11,12] .
Explanatory IRT analysis allows adjustment for any longitudinal invariance and provided a more reliable inference on regression coefficients [13]. This study aimed to explore factors contributing to OHRQoL of adolescents at age 12 and 15, and its change with statistical adjustment for longitudinal invariance under the framework of IRT. This article endeavors to achieve 3 objectives: (1) investigating factors associated with OHR-QoL using the 2 short forms of CPQ 11-14 (RSF:8 and ISF:8) at 12 years old by an explanatory item response model; (2) investigating the longitudinal invariance of the 2 short forms under the framework of IRT; and (3) assessing the effect of demographics, behavioral and clinical factors on OHRQoL of adolescents aged 12 and 15, and its change using three explanatory item response models while adjusting for any longitudinal invariance.

Sample
The participants in this study were secondary school students recruited for an observation survey to study the association between dental caries and adiposity status [14]. Baseline data were collected from January to April 2010 and they were followed up after 3 years in January to April 2013. The primary sampling unit was Hong Kong secondary schools and the sampling frame was the list of Hong Kong local secondary schools. 45 local secondary schools (about 10% of local secondary school) were randomly drawn. Within each secondary school, all students from S1 and S2 (equivalent to US grades 6 and 7) who were born in April 1997 and May 1997 were invited to participate in this study. The participants were ensured to be 12 and 15 years old respectively at baseline and follow-up. The study protocol was approved by Institutional Review Board of the University of Hong Kong/ Hospital Authority Hong Kong West Cluster (WU09-435) and written parental consent was obtained.

Measure
Participants were asked to complete a questionnaire in Chinese, consisted of both CPQ 11-14 RSF:8 and ISF:8, and questions concerning their global self-health ratings, demographic (place of birth and years of residency in Hong Kong) and oral health behaviors (snacking frequency between meals, brushing frequencies, use of fluoridated toothpaste, and previous participation of the School Dental Care Service (SDCS)). Parents were asked about the families' social economic status (SES) (parents' place of birth, employment status, family income, education level and whether they have lived in Hong Kong for 7 years or more). Participants and their parents completed the questionnaires in a self-administered manner.
In each item of CPQ [11][12][13][14] , participants were asked "In the past 3 months, how often have you … (had/been)…because of your teeth/mouth?". The five-point scale were: "Never" = 0; "Once/ twice" = 1; "Sometimes" = 2; "Often" = 3; "Every day/ almost every day"=4 [15]. The possible total score ranges of both CPQ 11-14 RSF:8 and ISF:8 is 0-32 (with a lower score indicates better OHRQoL). "Never" = 0 were imputed for the missing responses at baseline and follow-up (1.4% of all responses) because it was presumed that children not answering the items probably have not encountered the situation described. This imputing method was previously used to handle another OHR-QoL questionnaire with a "Don't know" option [3]. Respondents with more than 2 missing items were excluded in order to avoid over imputation.

Clinical examination
Each participant received an oral examination for assessment of dental caries experience and periodontal condition following the World Health Organization's recommendation [16]. Dental caries experiences were measured by the number of decayed, missing and filled teeth (DMFT). Periodontal conditions were measured by the community periodontal index (CPI) with 3 levels, namely "Healthy gum/ absence of bleeding or calculus"=0, "Bleeding on probing"=1 and "Calculus deposits" = 2 on 6 index teeth (16, 11, 26, 36, 31 and 46). The highest CPI score was used as a summary measure of periodontal health status. All oral examinations were performed by trained and calibrated dentists in the school premises on a portable dental chair. The oral examinations were performed using intra-oral disposable mouth mirrors with LED light and CPI probes by one examiner at baseline and another one in the follow-up study. About 10% of the participants were re-examined to evaluate the intra examiner reliability using Intra-class correlation (ICC) and Kappa Statistics, which indicated the consistency in measuring DMFT and CPI respectively (DMFT: ICC = 0.99 at age 12, ICC = 0.82 at age 15; CPI: Kappa = 0.74 at age 12, Kappa = 0.84 at age 15). Anthropometric assessment was also conducted but the information was not used in this study.

Statistical models Explanatory GRM
The graded response model (GRM) was extended by incorporating variables (listed in Tables 1 and 2) to explain the latent construct -OHRQoL. The explanatory GRM with p explanatory variables, J items and K response options is formulated as a j = item discriminatory parameter for the j th item; b jk = item threshold parameter for the k th response option in the j th item; j = 1,2,3…J; k = 1,2,3…K-1; P + jk is the probability of choosing the k + 1 th or higher response options in the j th item. The latent score (θ) consists of a linear combination of explanatory variables and regression coefficients (x 1 γ 1 + x 2 γ 2 + … + x p γ p ), plus an error term (ε) [17]. 14 items were fitted into the GRM model (8 items in RSF:8 plus 8 items in ISF:8 minus 2 overlapping items among the 2 short forms).
The explanatory GRM was used to investigate the baseline factors associated with OHRQoL at 12 years old. The explanatory variables included demographic variables (place of birth and year of residency in Hong Kong), oral health behaviors (snacking frequency between meals, brushing frequencies, use of fluoridated toothpaste, and previous participation of the School Dental Care Service (SDCS)), the family social economic status (parents' place of birth, employment status, family income, education level and whether they have lived in Hong Kong for 7 years or more), DMFT and CPI. Each categorical variable was recoded as dummy variables and included into the models.
The Bayesian estimation method -Monte Carlo Markov Chain (MCMC) with Gibbs sampling was adopted for parameter estimation and implemented via WinBUGS (Additional file 1) [18]. Non-informative priors were used and the posterior distribution was constructed using 12,000 simulations after 8000 burn-ins. The parameter estimates resulted from the use of noninformative priors can resemble the maximum likelihood estimation in the classical (frequentist) approach [19]. The 95% credible intervals of the parameters (95% probability that the true parameters is within the interval) were obtained from the 12,000 simulations. Associations of explanatory variables with OHRQoL were established when the 95% credible interval excluded 0 or difference between each pair of regression coefficients within a factor excluded 0.

GRM for assessing longitudinal invariance with external anchor items
Longitudinal invariance of the questionnaire (Table 3) was investigated by the GRM with varying discriminatory (a jt ) and threshold parameters (b jkt ) across age 12 and 15 years old. The model is formulated as: where t = 1 (at 12 years old) or 2 (at 15 years old). Meanwhile the scores (θ t ) of the same respondents across the 2 time points were allowed to be correlated. Since too few respondents chose "Every day/ almost every day"=4 in some CPQ [11][12][13][14] items, response options "Often" = 3 and "Every day/ almost every day" = 4 were combined in the longitudinal invariance analysis.
In order to separate the change in OHRQoL arise from the interpretation of items, 5 external anchor items (global self-rating items of OHRQoL and their perceived impact on OHRQoL) which reflect their own perceptions of OHRQoL, were used to equate the questionnaires administered to respondents at both time points onto the same scale [20]. The discriminatory and threshold parameters of the anchor items were set equal at both time points (a j1 = a j2 ; b jk1 = b jk2 ).  Using WinBUGS, the differences in the parameters across the 2 time points and their corresponding credible intervals were computed. Significant parameter drift were inferred when the 95% credible interval of change in item parameters excluded 0. Items with significant parameter drift were considered as biased (lack of longitudinal invariance).

Longitudinal explanatory GRM
For modeling the change in OHRQoL over the 3 years, δ is defined as the difference in OHRQoL at 15 years old (θ 2 ) and 12 years old (θ 1 ), i.e. δ = θ 2 -θ 1 . Three models relating the explanatory variables to OHRQoL in a longitudinal context are adopted:

Model 1. Presuming the change in OHRQoL is
attributed to their status at baseline: baseline explanatory variables (listed in Table 4) were used to explain the change in OHRQoL, i.e. δ = x 11 γ 11 + x 21 γ 21 + … + x p1 γ p1 + ε. Model 2. Presuming the change in OHRQoL is attributed to the change in status over 3 years: change in explanatory variables (listed in Table 5) were used to explain the change in OHRQoL, i.e. δ = x 1d γ 1 + x 2d γ 2 + … + x pd γ p + ε, where x pd represents the change in the explanatory variables x p over the 3 years.  Table 6). Static variables were used to explain both baseline and follow-up OHRQoL, i.e. θ t = x 1t γ 1t + x 2t γ 2t + … + x pt γ pt + ε t, where t = 1, 2.
Again, WinBUGS was used to estimate the credible intervals (Additional file 1). Association of explanatory variables with OHRQoL at respective occasions were established when 95% credible intervals excluded 0 or differences in any pair of γ's within a factor excluded 0.
To rule out the change in OHRQoL score due to change in interpretation of the items over 3 years, discriminatory and threshold parameters of items that were not invariant were allowed to vary across baseline and follow-up.

Descriptive statistics
Six hundred sixty eight sets of parental and student questionnaires from 12 years old children were collected; 436 (65.3%) students were followed up successfully after 3 years. At baseline, 19 cases with more than 2 missing items in CPQ were excluded;  . The median class of monthly family income was $10,000-$20,000 at both baseline and follow-up. Regarding the oral health behaviors, more than a quarter of the sample brushed teeth less than twice a day and only less than a quarter did not eat snack between meals. For the oral health status, only about one in ten had healthy gum while majority of them bled on probing or had calculus. The mean DMFT on baseline was 0.55 while that in the followup had increased to 1.70. The frequency distributions of other factors considered at both baseline and follow-up are shown in Table 1.   Table 2 shows the estimated median and the 95% credible intervals of the regression coefficients and the difference of each pair of levels within each factor for 12 years old children. Respondents with more frequent tooth brushing, less snacks consumption between meals, and with both parents employed showed a better OHRQoL. However, people with worse gum status were found having a worse OHRQoL. Regarding parental education, it is noted that significantly better OHRQoL was found for those with fathers attained secondary compared to below secondary education while significantly worse OHRQoL for those with mothers attained secondary compared to below secondary education. No significant differences were found among other comparisons for father's or mother's education.

GRM for assessing longitudinal invariance with external anchor items
A longitudinal GRM model allowing parameter drift was fitted with the 5 global rating anchor items. Results on longitudinal invariance are shown in Table 3. No significant change in discriminatory parameters of RSF:8 or ISF:8 was found. This indicated the absence of non-uniform longitudinal invariance. Two items form CPQ 11-14 RSF:8 and 2 items from ISF:8 had significant threshold parameters drift. In CPQ 11-14 RSF:8, the items were "Argued with other children or your family" and "Teased/ called names by other children". In CPQ 11-14 ISF:8, these items were "Difficult to drink or eat hot or cold foods" and "Irritable/ frustrated". For the item "Argued with other children or your family", children at follow-up were prone to choose "Never" = 0 compared to baseline even their OHRQoL levels have not changed. For the item "Teased/ called names by other children", children at follow-up were more likely to response to "Never" = 0 or "Once/ twice" = 1. For the item "Difficult to drink or eat hot or cold foods", children at follow-up were prone to choose "Once/ twice" = 1 or higher response options. For the item "Irritable/ frustrated", children at follow-up were more likely to choose "Sometimes" = 2, or higher response options. Of the 4 items with longitudinal invariance, 3 items came from the domains of emotional well-being and social well-being while the other one came from the domain of functional limitations. It was found that the OHRQoL was significantly improved over 3 years with change in the mean latent score = − 0.29 and the 95% credible interval (− 0.44, − 0. 15) excluded 0. The standard deviation of OHRQoL in 3-year follow up has increased from 1.01 to 1.35. Significant positive correlation was also found between baseline and follow-up latent OHRQoL score (r = 0.43) with its credible interval (0.30, 0.58) that excluded 0.

Longitudinal explanatory GRM
Tables 4, 5and 6 present the results of the longitudinal models with adjustment for longitudinal invariance by allowing the threshold parameter of the 4 items (with significant parameter drift) to vary. In Model 1, baseline explanatory variables were used to predict the change over 3 years (Table 4). Respondents who consumed more snacks at baseline had significantly more improvement in OHRQoL over 3 years. Table 5 presents the results of a similar longitudinal model. Instead of baseline predictors, the change in demographics variables and the change in oral health behavior/status over the 3 years    For participants with decreased income, their OHRQoL had worsened over 3 years. Table 6 presents the results of Model 3 that used 12 years old and 15 years old conditions to explain baseline and follow-up OHRQoL respectively, while adjusting for longitudinal invariance. Variables significantly associated with 12-years old OHRQoL were similar to that in cross-sectional analysis. More frequent tooth brushing, a lower mother education level (lower secondary vs. primary) and a higher father education level (lower secondary vs. primary, upper secondary vs. primary) was associated with better OHRQoL. Respondents with both parents employed, less snacks consumption between meals and better gum conditions can predict 12 years old OHR-QoL. In 15 years old, no variables were significantly associated with OHRQoL.

Longitudinal invariance
Three of the four longitudinally biased items were from the domains of emotional and social well-being, namely "Argue with other children or family", "Teased/ called names by other children" and "Irritable/frustrated". In short, children tended to choose a lower response options in items concerning emotional and social well-being in the follow-up study. One way to adjust longitudinal invariance is to remove the problematic item. This may however result in an instrument with questionable content validity.
Allowing invariant parameters to be varied provides an alternative adjustment method for longitudinal invariance while preserving the content validity.
The investigation of longitudinal invariance also highlighted the importance of testing longitudinal invariance especially when items concern emotions or perceptions which possibly change over time. Moreover, adjustment for longitudinal invariance also enables the questionnaire to be appropriate even if the participants' age was one year exceeded the CPQ's target age range of 11-14 years.

Oral health status and OHRQoL
Dental caries experience did not significantly associate with OHRQoL in the present study. This was consistent only to the findings from few countries. Table 7 shows studies concerning dental caries status and OHRQoL in adolescents in different countries. It reveals that most studies concluded the significant association had a higher mean DMFT or prevalence compared to that in present study. Studies found no such significant association had a comparable mean DMFT or prevalence to that in the present study, for instance, a study in Dunedin and a previous study in Hong Kong. Although the hypothesis "Dental caries predict OHRQoL only in higher impact populations." requires a meta-analysis to conclude, the studies in Table 7 and the present study suggest that a higher disease level may be required for detecting the impact of dental caries on OHRQoL because mild dental caries, even untreated, may not be painful and hardly affect the person. Other studies that found the significant association between oral health status and OHRQoL are often in large sample or diseased group [21].

SES and OHRQoL
Many studies have attempted to relate the social economic status to OHRQoL but mixed results have been found. Inconsistent results may be attributed to whether the analysis has been adjusted to confounding factors. Also, different studies have employed different variables as a proxy of SES: parents' education levels, occupations, parents' age, parents' attitude, family income, type of flat, household crowding, number of siblings and so on [22]. In this study, the parents' education level, family income and whether both parents are working were used to measure the SES. There were significant differences in OHR-QoL across some subgroups of parents' education level. But no clear indication of associations between OHRQoL and higher/ lower education levels was observed. Socioeconomic status often affects OHRQoL through the limited access of dental utilities due to material deprivation [23]. It is unlikely to be the case in Hong Kong because government's SDCS provides children from primary school (equivalent to US grade 1 to 6) an annual dental checkup. In our sample, 98% of the school children participated in the scheme or had private dental insurance. Almost all children have received adequate dental care regardless of their family income. This may explain the weak association between income and OHRQoL in Hong Kong. However, SDCS was no longer provided for adolescents after 12 years old. Adolescents with increased family income over the age of 12-15 had only little OHRQoL improvement over these 3 years. This concurs with the hypothesis that the lower economic status leads to inadequate access of dental utilities, and thus OHRQoL. This suggests that a specific scheme taking care of children from low income family is needed rather than general measures like extending SDCS to older ages.

Weakened associations at 15 years old follow-up
All predictors found predictive in OHRQoL of 12 years old adolescents (oral health behaviors, gum status, some socioeconomic variables) became insignificant in the 15 years old follow-up with adjustment for longitudinal invariance. This may be related to the improvement of the OHRQoL over 3 years. When an already low-impact population has further improved, the OHRQoL becomes better in a homogeneous manner. The baseline characteristics (shown in Table 1) of those children lost to follow up were compared with the rest of respondents. No significant difference were found in all variables, except for daily brushing frequencies (Twice or more: 75.9% in follow-up vs. 65.0% in lost follow-up), CPI (Calculus deposit: 62.2% in follow-up vs. 69.7% in lost follow-up) and DMFT (Mean DMFT: 0.47 in follow-up vs. 0.71 in lost follow-up). It makes detecting effect of factors even harder, subsequently resulting in a failure to identify factor significantly associated to OHRQoL of adolescent in 15-years old.
The weakened association at 15 years old may due to the "regress to the mean" which describes the trend that significant associations tended to become insignificant when the variables are measured in the second time because of the random error [24]. Respondents who consumed more snacks had significantly greater improvement in OHRQoL over 3 years and this appear to be counter-intuitive. This is suspicious to the "regress to the mean" effect especially the snacking frequency is highly susceptible to random error due to the difficulty to recall exactly the snack intake and neglecting the amount and types of snack consumed.

Limitation
Relying on anchor to establish item longitudinal invariance is sometimes criticized because invariance properties of the anchor itself is doubted. This study address this concern by choosing relatively large number of global self-rating items reflecting their own perception about OHRQoL and its impact on daily life. Many studies suggested that psychological characteristics played an important role in coping with unfavorable conditions, for instance, poor dental aesthetics [25]. No data about sense of coherence, self-esteem and confidence were collected, which have been proven to be potential predictors of OHRQoL [22,26,27]. Moreover, malocclusion status directly related to facial appearance and thus may affect OHRQoL directly. Association of OHRQoL and malocclusion status has been shown in both general population and studies confined to disease specific groups [28]. Unfortunately, malocclusion data was not available to be used in this study.
Being part of a larger scale cohort study "Children of 1997", it is a representative sample of the general Hong Kong adolescents. It should be noted that this is a study on a low impact population of dental caries and OHR-QoL. Also, the lost to follow-up respondents had slightly worse oral health status, this dataset may fail to capture their OHRQoL development which may drive more actionable insight for oral health policies. When generalizing the results, one should aware of the cultural difference and the relevant local dental policies which vary across countries.

Conclusion
In conclusion, IRT model is extended to fit longitudinal data and incorporate explanatory variables of OHRQoL. This study also illustrated the use of IRT in detection and adjustment of longitudinal invariance. The OHR-QoL at 12 was associated with demographic background variables, oral health behavior and status; however, these associations were not observed in 15-years old children after adjustment for longitudinal invariance. For children with decreased income, they have worsened OHRQoL. The results provide important information to oral healthcare researchers and policymakers.