Skip to main content

Item analysis of the Eating Assessment Tool (EAT-10) by the Rasch model: a secondary analysis of cross-sectional survey data obtained among community-dwelling elders



The Eating Assessment Tool (EAT-10) is increasingly used to screen for self-perceived oropharyngeal dysphagia (OD) in community-dwelling elders. A summated EAT-10 total score ranges from 0 to 40, with a score ≥ 3 indicative of OD. When using cut-points of a summated score, important requirements for the measurements are specific objectivity, validity, and reliability. Analysis by the Rasch model allows investigation of whether scales like EAT-10 satisfy these requirements. Currently, a few studies have found that EAT-10 responses from clinical populations with OD do not adequately fit the Rasch model.


The aim of this study was to determine whether measurements by EAT-10 fit the Rasch model when applied in screening self-perceived OD in non-clinical populations.


Secondary analysis was conducted on data from a cross-sectional survey of community-dwelling elders living in a municipal district of Tokyo, Japan, in which 1875 respondents completed the Japanese version of EAT-10 (J-EAT-10). Data were cleaned and recoded for the purpose of the analysis in this study, which resulted in inclusion of J-EAT-10 responses from 1144 respondents. Data were analyzed using RUMM2030 and included overall model fit, reliability, unidimensionality, threshold ordering, individual item and person fits, differential item functioning, local item dependency, and targeting.


The analysis identified that the response categories from zero to four were not used as intended and did not display monotonicity, which necessitated reducing the five categories to three. Considerable floor effect was demonstrated and there was an inappropriate match between items’ and respondents’ estimates. The person separation reliability (PSI = 0.65) was inadequate, indicating that it is not possible to differentiate between different levels of OD. Several items displayed misfit with the Rasch model, and there were local item dependency and several redundant items.


J-EAT-10 performed less than optimally and exhibited substantial floor effect, low reliability, a rating scale not working as intended, and several redundant items. Different improvement strategies failed to resolve the identified problems. Use of J-EAT-10 in population-based surveys cannot therefore be recommended. For such purpose, alternative screening tools of self-perceived OD should be chosen or a new one should be developed and validated.


Oropharyngeal dysphagia (OD), which impairs swallowing efficiency and safety, is common in old age as a result of several underlying processes and diseases [1,2,3,4]. OD increases the risk of malnutrition and dehydration [5], aspiration pneumonia [6], depression and anxiety [7], and decreased quality of life [8], as well as increasing health care expenditure and utilization [9, 10]. It is recognized that community-dwelling elders are at risk of developing OD, with an estimated mean prevalence of 15% across high quality studies included in a recent meta-analysis [4]. With an aging population, OD is an important and serious current and future health issue necessitating identification of elders at risk [1, 9, 10]. To take a proactive approach for avoiding the health-related and economic consequences of OD, systematic screening among community-dwelling elders is recommended [1,2,3, 10].

The Eating Assessment Tool (EAT-10) [11], a patient reported outcome measure (PROM) of self-perceived symptoms of OD, is recommended as an easy to use and quick screening tool for OD [2, 3]. EAT-10 was developed and validated for use in estimating initial OD severity and changes in response to therapy [11]. Supplemental file 1 shows the content of EAT-10, which comprises ten items to be rated on a 5-point response scale (0–4) with labels at the extremes of ‘0 = No problem’ and ‘4 = Severe problem’, resulting in a range of 0–40 [11]. EAT-10 has been translated into several different language versions published by the Nestle Nutrition Institute [12], and it is increasingly used as a screening tool for OD in clinical populations [3, 13,14,15,16,17,18] as well as in non-clinical populations of community-dwelling elders [19,20,21,22,23]. The diagnostic efficiency [24] of EAT-10 in terms of sensitivity (e.g., identifying persons with OD) and specificity (e.g., identifying persons without OD) has been quantified for different cut-off points. For example, it is suggested that an EAT-10 total score ≥ 2 [25] or ≥ 3 is indicative of OD [11], and that a total score > 15 is indicative of aspiration risk [26]. When quantifying the diagnostic efficiency of a scale, the summated score must accurately reflect what is being measured [24]. In the case of EAT-10, we obtain a measure of self-perceived OD severity, which is not directly observed and is therefore regarded as a latent variable. This is the opposite of a manifest variable, which can be directly measured or observed [27], such as videofluoroscopy swallowing evaluation [28]. Using a summated score of EAT-10 responses, it is therefore necessary to determine whether the items contribute to one single dimension of lower or higher OD severity. Hence, an important requirement of EAT-10 is that it should have specific objectivity [29], which implies invariance - the comparison between any two persons is independent of the rating scale items used and vice versa [27, 29, 30].

Within modern item response theory, the Rasch model has been considered the gold standard against which scales summarizing item responses can be tested [27, 29, 30]. Analysis by the Rasch model is a statistical method that allows detailed information on the performance of a set of item responses as a measure of a latent variable [27]. The Rasch model expresses the association between observed (actual) item performance and underlying ability (unobserved) or a latent variable (i.e., OD severity in the case of EAT-10). Hence, the set of items in EAT-10 must satisfy certain requirements to fit the Rasch model before it can be considered to measure a continuous latent variable of less or more [27, 29, 30], namely:

  • Unidimensionality: the items of a scale should measure only one latent variable (i.e., all EAT-10 items measure aspects of OD severity).

  • Monotonicity: the scale items function hierarchically from easy to difficult, and the probability of a high item score should increase with increasing values of the latent variable (i.e., the probability of giving a score that reflects a swallowing problem increases with high EAT-10 total scores).

  • Homogeneity: The rank order of the items from easy to difficult should be the same for all respondents, regardless of their level for the latent variable (i.e., the order of EAT-10 items according to the severity of the problem they express is the same for all respondents, regardless of their level on the scale, as reflected in the EAT-10 total score; the easiest problem to have is easiest for all respondents and vice versa).

  • Local independency: the items of a scale must be conditionally independent given the latent variable (i.e., the rating of any one problem should depend only on the level of the scale, as reflected in the EAT-10 total score and not the rating of any other items).

  • Absence of differential item function (DIF): the items should be conditionally independent of exogenous variables given the latent variable (i.e., the EAT-10 items should function equally for subgroups of respondents, for example male and female).

If these requirements are met by the items in EAT-10, the obtained measurement is assumed to be reliable and construct valid [27, 29, 30], and will provide ideal measurement of OD severity. Accordingly, the raw score can be regarded as a sufficient statistic for the estimated person parameter, and measurement by the scale is considered specifically objective [27, 29, 30].

After publication of EAT-10 [11], studies have found that, when used in clinical populations, EAT-10 does not fit the Rasch model sufficiently [16,17,18] and demonstrates low reliability, with several items not contributing adequately to a latent unidimensional variable [16,17,18], DIF by OD severity [16, 17], gender and different language versions [16], lack of monotonicity of the response scale [16, 18], and substantial floor effects (i.e., no problems) of 23% [16] and 57% [18]. If EAT-10 is applied in population-based screening among community-dwelling elders, larger floor effects might occur, since OD prevalence is lower in non-clinical compared to clinical populations [4]. It is worth noting that the performance of a screening test such as EAT-10 is dependent on the prevalence of the condition in question [24]. EAT-10 was developed and validated to document initial OD severity and monitor response to treatment in symptomatic patients [11]; it was not designed for population-based screening in the wider community. With the increased use of EAT-10 in the wider community [19,20,21,22,23], analysis by the Rasch model of EAT-10 responses obtained from non-clinical populations is needed. The aim of this study was therefore to evaluate whether measurements by EAT-10 are reliable, valid, and upholds specific objectivity when applied as a screening tool for detecting OD among community-dwelling elders.


Analysis by the Rasch model was performed as a secondary data analysis of an existing dataset available as an information file (Excel format) supporting a cross-sectional survey on OD prevalence among elders living in a municipal district of Tokyo, Japan, by Igarashi et al. [22]. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. The survey by Igarashi et al. included the Japanese version of EAT-10 (J-EAT-10) [31], was conducted with formal approval and is described in detail in Igarashi et al. [22]. Since the current study involves a secondary analysis of freely available data, formal ethical approval was not needed.

Source of data and data cleaning

Supplemental file 2 presents the codes for the original and current datasets. For the purpose of the analysis by the Rasch model, the dataset included: gender, age in years stratified by quartiles, functional level stratified into independent and dependent respondents, and item responses on J-EAT-10. In total, the original Excel file included data from 1875 anonymized respondents [32]. Of these, 731 respondents were removed from the current dataset due to incomplete responses on J-EAT-10 (N = 378) or assignment of values without codes for the variable “functional level” (N = 353). Accordingly, 1144 responses were included for the analysis by the Rasch model.

Analysis by the Rasch model

The analysis by the Rasch model was performed using the RUMM2030 software [33], which integrates a conditional pairwise maximum likelihood algorithm for the parameter estimations [30, 34]. In the case of J-EAT-10, the Rasch model specifies that the probability of a response of 0, 1, 2, 3 or 4 is a logistic function of the difference between the respondent’s level for the measured variable (i.e., severity of OD) and the level represented by the item. Logits (log-odd units) are the unit of measurement for reporting the relative differences between the estimates of a person’s level and item difficulties, and are an equal interval level of measurement. Persons (i.e., respondents) and items are located on the same measurement scale, with the mean item location set at zero logits. Accordingly, the ordinal scores from the J-EAT-10 items are expressed as linear measures, where negative values reflect easy items and a lower degree of OD severity and positive values reflect difficult items and a higher degree of OD severity [27, 29, 30].

The analysis followed recommended procedures [27, 35,36,37] and was carried out on responses from the full sample as well as separately for the independent and dependent respondents. Model fit was examined statistically and graphically, and carried out for items, persons, and different ability groups (i.e., class intervals) according to their locations on the measured variable. Ideally, the class intervals should be approximately equally distributed with at least 50 persons in each [34].

Person and item level fit to the Rasch model

Statistically, model fit was examined using standardized fit residual values, which express the differences between observed responses to the J-EAT-10 items and those expected by the model, and by analyzing them by means of chi-squared (χ2) statistics and analysis of variance (ANOVA) of the residuals across class intervals. Fit residuals values between ±2.5 for persons and items indicates model fit [27, 30, 34, 36, 37]. High item fit residuals signify under discrimination and might reflect multidimensionality, while low fit residuals signify over discrimination and might reflect potential redundancy or item dependency within the item set [27, 30]. Chi-squared statistics and ANOVA should reflect non-significant (Bonferroni adjusted) deviations from model expectations [27, 30, 34]. Item fit was also examined via visual inspections using graphs of observed item responses for each class interval plotted against the model expectations, which are displayed as an item characteristic curve (ICC) [27, 30, 34].

Local independence was investigated using a residual correlation matrix of the items. Local item dependence (LID) was evident by item residual correlations above 0.2 of the average correlation, reflecting that the entire correlation between the items is not captured by the latent variable [38]. This might happen when the content of a previous item affects responses to a subsequent (dependent) item [27, 30].

Differential item functioning (DIF) refers to item bias that occurs when subgroups with a similar level for the measured variable have a different response pattern to an item [30, 35]. DIF was examined by gender, age, and functional level. For the analysis, a two-way ANOVA on the residuals for each item across the subgroups and across the class intervals is applied. DIF can occur as uniform DIF, where item responses differ uniformly across the measured variable (i.e., a main effect) or as non-uniform DIF, where differences in item responses between subgroups vary across the measured variable (i.e., an interaction effect). The Bonferroni correction was used to adjust for multiple testing, keeping the type I error to 5% [34].

The scoring structures

J-EAT-10 consists of polytomous items with five response categories ordered to reflect an increasing amount of OD [11]. The boundaries between adjacent categories are called thresholds. As the number of thresholds is one less than the number of response categories, there are four thresholds for each item, which reflect positions on the latent variable where either of the adjacent responses is equally probable [27, 30, 34]. For fit to the Rasch model, monotonicity by means of ordered thresholds is expected, which implies that the transition from one score to the next is consistent with the increase in the latent variable [27]. Monotonicity was examined using the item thresholds parameters, a threshold map, and category probability curves. In addition, further analysis was performed by examining the category response frequencies. Before performing analysis by the Rasch model of polytomous data, a choice between two different parameterization methods is undertaken [27, 34], namely the Rating Scale Model (RSM) [39] or the Partial Credit Model (PCM) [40]. In the RSM, only one set of thresholds across all items is estimated, while in the PCM thresholds for each of the items are estimated [27]. Accordingly, the PCM contains a lot more information and is a more complex model because additional parameters are estimated compared to the RSM [27]. In RUMM2030, Fisher’s likelihood ratio test is available to assess the efficiency of the two different parameterizations. If the test is significant, it indicates that the PCM should be adopted [27, 34].

Overall fit to the Rasch model

Overall model fit is provided in RUMM2030 by summary fit residual statistics for items and persons, which should approach a standardized mean value of zero and an SD of 1.0, and by a summary item χ2 statistic, which should be non-significant (p > 0.05) reflecting homogeneity of the items across the different class intervals [27, 30, 34]. In addition, reliability and unidimensionality of the scale are reported.

Reliability was examined using Cronbach’s alpha (α) and the Person Separation Index (PSI), the Rasch equivalent of Cronbach’s α, except that it is calculated from the logit scale person estimates [27, 30, 34]. It is suggested that α/PSI ≥ 0.90 = excellent, 0.90 > α/PSI ≥ 0.80 = good, 0.8 > α/PSI ≥ 0.7 = acceptable, 0.7 > α/PSI ≥ 0.6 = questionable, 0.6 > α/PSI ≥ 0.5 = poor, and α/PSI < 0.5 = unacceptable [41, 42]. The PSI indicates the power of the latent variable to discriminate among persons and reflects the power of the fit statistics, which RUMM2030 displays as excellent, good, reasonable, low, or too low. If the PSI is not acceptable, the top measure cannot be statistically distinguished from the bottom measure with any confidence and the obtained fit statistics may not be reliable because of too large an error variance [27, 34].

Unidimensionality is defined as the absence of any meaningful pattern in the residuals, which was assessed by Principal Component Analysis [27, 30, 34]. Based on the loading between items and the first residual factor, two subsets of items consisting of items with positive and negative loadings were identified. The differences in location estimates for each person from these two subsets of items were investigated using a series of t-tests. Unidimensionality was confirmed if less than 5% of the sample showed a significant difference in location estimates [27, 30, 34].


Targeting is defined as the extent to which the range of the measure matches the range of the measure in the study sample. To be considered a well-targeted rating scale, J-EAT-10 should have item and person mean locations of around zero and have enough items of varied degrees of OD, matching the spread of scores among respondents [27, 30, 34]. Targeting was examined using a person-item thresholds distribution map, which visually depicts person locations against item-threshold locations [34]. If J-EAT-10 is poorly targeted, respondents may report having no problems (floor effect) or severe problems (ceiling effect) [27].

Improvement strategies

RUMM2030 provides opportunities to apply improvement strategies to achieve fit to the Rasch model [34]. Before deciding which strategies to employ, the overall model fit statistics, the item level fit statistics, and visual inspections of the ICCs as well as the category response frequencies and threshold ordering were taken together. Disordered thresholds may be resolved by combining adjacent categories [27, 34], mis fitting items or persons can be removed [27, 30, 34], and uniform DIF can be addressed by splitting the item into group specific items. Non-uniform DIF is usually removed, as it reflects misfit to the model [30, 34, 35]. LID can be addressed by grouping local dependent items into a “super-item” to absorb the impact of LID [27, 30].

Sample size

For a well targeted rating scale, a sample size of around 250–500 usually provides accurate and stable person and item estimates as well as a good balance for statistical interpretation of the fit statistics [43, 44]. Since the current dataset comprises a sample size of 1144, there is a risk of type I error associated with the fit statistics and a post-hoc downward sample size adjustment might be needed [44]. However, the reported floor effect, and thus the high percentages of respondents with a minimum EAT-10 total score of 0 from clinical populations [16, 18], ought to be considered. In Rasch modeling, such total scores are regarded as extreme person scores, which contain no information for rank ordering of persons and items or for estimating the threshold parameters. In RUMM2030, extreme persons are by default omitted from the estimation of the item location and the test-of-fit statistics due to lack of precision involved with the parameter estimates [27, 34]. Thus, the effective sample size for Rasch modeling will always be smaller than the original sample size [27]. Since current analysis included responses obtained in a non-clinical population, the presence of extreme person scores was expected, and the magnitude was assessed before deciding whether it was necessary to adjust the sample size.


Verification of model and sample size

The likelihood ratio test was significant (χ2 (df) = 317.26 (26), p < 0.001), indicating that the PCM should be adopted. The initial analysis of the full sample (N = 1144) found 483 respondents with an EAT-10 total score of 0 resulting in 42% extreme scores. Hence, an effective sample size of 661 respondents was included without downward adjustment.

Overall fit to the Rasch model

Table 1 shows the overall fit statistics. The initial analysis of the full sample (Table 1, analysis 1) showed significant item-trait interaction (χ2 (df) = 485.48 (40), p < 0.001) and a fit residual mean value (SD) for items of − 0.66 (4.16), both indicating misfit of the responses to the Rasch model. The fit residual mean (SD) for persons was − 0.30 (1.08), indicating no serious misfit. The t-tests suggested unidimensionality, with only 2.27% statistically significant different person estimates based on the two most divergent subsets of items within the J-EAT-10 scale. The PSI without extreme scores was 0.65, the power of analysis of fit was good, and Cronbach’s α was 0.85 indicating good reliability. As shown in Table 1, the overall fit statistics persisted to indicate model misfit when separately analyzing the data from the independent respondents (analysis 2) and the dependent respondents (analysis 3). Extreme scores were present for 53% of the independent respondents and 30% of the dependent respondents.

Table 1 Analysis by the Rasch model - overall fit statistics for J-EAT-10

Item level fit to the Rasch model

Table 2 shows the fit statistics at item level. The analysis of the full sample showed that items 1 and 9 displayed significant positive fit residuals > 2.5, which indicates multidimensionality, as illustrated in Fig. 1a) for item 1. Items 3, 4, 6 and 10 showed significant negative fit residuals, indicating redundancy or dependency within the item set. This is illustrated in Fig. 1b) for item 10. As shown in Table 3, no items displayed uniform or non-uniform DIF by gender, age, or functional level. LID was found for item pair 5 and 6 (residual correlation: r = 0.29). When analyzing the independent and dependent respondents separately, the item level fit statistics approached the findings for the full sample (Table 2).

Table 2 Individual item location and fit statistics for J-EAT-10
Fig. 1
figure 1

Item characteristic curves (ICC) of two mis fitting items of J-EAT-10. ICC plot for two items. Based on the sample size, persons (respondents) are divided into six ability groups (class intervals with at least 50 persons in each). The curved line represents the expected scores for the item, and the dots represent the observed scores for the class intervals at the different levels of the measured variable (self-perceived OD severity). a The ICC plot for item 1 with a high positive and significant fit residual of 5.9. The observed scores form a flatter curve than the expected scores, which indicates that this item is under discriminating and might reflect multidimensionality. b The ICC plot for item 10 with a high negative and significant fit residual of − 6.3. The observed scores form a steeper curve than the expected scores, which indicates that this item is over discriminating and might reflect potential redundancy or dependency within the item set

Table 3 Summary of differential item function (DIF) by gender, age, and functional level for J-EAT-10

The scoring structures

Table 4 shows that most items obtained scores of 0 or 1, and they displayed disordered thresholds during analysis of the full sample as well as of the independent and dependent respondents. Figure 2 illustrates category probability curves for item 9 with ordered thresholds and for item 3 with disordered thresholds.

Table 4 Category frequencies and item threshold parameters for each item of J-EAT-10
Fig. 2
figure 2

Category probability curves for two items of J-EAT-10. The y-axis represents the probability of observing each category of the five response options on J-EAT-10 at each level of item difficulty on the x-axis. Each colour corresponds to the different response options: 0 = blue, 1 = red, 2 = green, 3 = purple, 4 = pink. The intersections of adjacent curves are the thresholds. a The category probability curves for item 9 (Cough) with ordered thresholds. The responses to this item are distributed in a logical progressive order, and as a respondent’s OD severity increases, so the probability of achieving the next score increases. b The category probability curves for item 3 (Liquids effort) with disordered thresholds. The responses to this item are not distributed in a logical progressive order, and a score of 1 or 3 is never probable


The J-EAT-10 scale presented poor targeting, with insufficient match between overall spread of items and spread of respondents, as illustrated in Fig. 3. There are many gaps on the item-thresholds continuum, indicating that the scale is not able to detect small changes in respondents across the whole continuum of OD severity. Some item-thresholds are in the same place. For example, around the location logit of − 1, the frequency of five thresholds is made up of item 3 (liquids effort), item 4 (solids effort), item 6 (painful), item 7 (pleasure eat), and item 10 (stressful). This indicates that these items are duplicating the ability to discriminate at that level of difficulty. Fig. 3a) shows that the 42% extreme scores relate to respondents giving a score of 0 (no problem) across all ten items of J-EAT-10 (i.e., floor effects). No respondents gave a score of 4 (severe problems) to all items (i.e., no ceiling effects). Figure 3b) displays the mean (SD) location for the dependent and independent respondents, which illustrates that the dependent respondents reported higher degrees of OD severity and are slightly more spread across the continuum, though still poorly aligned with the item spread.

Fig. 3
figure 3

Person-item threshold distribution of the J-EAT-10 responses. The x-axes display location of item thresholds (lower half) and location of respondents’ summated OD severity on J-EAT-10 (upper half). The y-axes display the frequencies of item thresholds (lower half) and respondents (upper half). High scores imply higher OD severity and low scores imply lower OD severity. a The J-EAT-10 responses of the full sample and b the J-EAT-10 responses grouped as dependent and independent respondents. For both graphs, the item thresholds spread over about 7 logits, with evidence of floor effects (a high percentage of respondents achieved the lowest possible score of zero), but not ceiling effects. Some item-thresholds are in the same place, which indicates that they are duplicating the ability to discriminate at that level of difficulty. Some areas along the logit scale are not represented by item thresholds

Improvement strategies

The improvement strategies were applied to the responses from the full sample. The fact that few items had ordered thresholds, argued for changing the response categories consistently for all items. Cordier et al. [16] suggests that the response scale should be changed from 5 to 3 points by combining scores 0 and 1 as well as scores 3 and 4, resulting in the scoring structure 00122. As seen in Table 1 (analysis 4), this produced more respondents at the extremes (65%), decreased the reliability and power of fit, and did not provide overall model or item level fit. Since the pattern of the category probability curves could argue for a three-score category solution, additional scoring structures were analyzed. None of these provided overall model fit (Table 1, analyses 5–8), and only two (analyses 5 and 6) did not produce more respondents at the extremes and maintained good power of fit. Further improvement strategies did not provide overall model fit for any of the proposed scoring structures. For illustrative purpose, Table 5 presents one of the attempts based on the scoring structure 01122. The summary fit residuals for items and persons improved during a stepwise removal of the most mis fitting items. However, the item-trait interaction remained significant, and the reliability and power of fit decreased markedly. Although the five retained items (items 1, 2, 5, 8 and 9) obtained acceptable fit residuals, the fit statistics remained significant.

Table 5 Fit statistics during removal of misfit items from J-EAT-10 with scoring structure 01122


The current study presents a secondary analysis of existing data using the Rasch model. The aim was to evaluate whether measurements by J-EAT-10 are reliable, valid, and uphold specific objectivity when applied in OD screening in a non-clinical population of community-dwelling elders. Overall, the results align with the findings from clinical populations [16, 18] in terms of substantial floor effect and inappropriate targeting, disordered thresholds, several mis fitting items, unacceptable reliability by means of the PSI, but acceptable reliability by means of Cronbach’s α. However, the PSI should be used for interpretation of reliability, since these two reliability indices will diverge in the event of poor targeting and floor effect [27].

J-EAT-10 displayed inappropriate targeting and did not cover a high percentage of the sample, which on average presents a higher ability level than the average of the scale items. Although low physical performance and dependency are associated with OD [19, 21, 22], the inappropriate targeting was also present for the dependent respondents. The targeting problem and low PSI indicate that it is not possible to differentiate between different levels of OD when using J-EAT-10 as a screening tool in a non-clinical population in the wider community [27, 42]. In addition, the analyses revealed that the responses to most items are not consistent with the metric estimate of the latent variable, resulting in disordered thresholds. This suggest that the J-EAT-10 response structure does not function as intended when applied in a population-based survey. The improvement strategies for the response categories proposed by Cordier et al. [16] produced further extreme person scores, likely due to the frequent use of the score categories 0 and 1. Although fit to the Rasch model was not achieved, the best solutions appeared to be a three-point scoring structure with the pattern 01112 or 01122. This might indicate that meaningful differentiation of OD severity seems to be achievable with three response categories. It is worth noting that, besides too many response options, disordered thresholds might occur in the event of unclear or irrelevant item content and category descriptions or multidimensionality [27], which could occur in inadequately translated versions of PROMs [30]. In fact, Cordier et al. identified DIF by language for four translated versions of EAT-10 [16]. It cannot therefore be excluded that DIF by language exists for J-EAT-10. In the current study, there was no evidence of DIF by gender, age, or functional level. However, age is a continuous variable, stratified into four groups, and functional level was determined by care in minutes [22], which does not describe actual functional performance of older adults compared to information obtained by reliable and valid functional assessments [45]. Accordingly, further DIF analyses by language, age, and functional level might be needed.

The t-tests indicated unidimensionality, even though all the fit statistics indicated model misfit of J-EAT-10. Item 1 (lose weight) and item 9 (cough) displayed multidimensionality, and item 3 (liquids effort), item 4 (solids effort), item 6 (painful) and item 10 (stressful) showed high negative fit residuals, indicating redundancy, which was also reflected by the clustering of item-thresholds on the logit scale, as illustrated in Fig. 3. Though unidimensionality is a matter of degree and some level of item misfit might be unavoidable [27, 44], a proportion of 60% misfit items suggests that further examination of J-EAT-10 is needed. In order to find improvement strategies, misfit items were removed, resulting in a scale with item 1 (lose weight), item 2 (go out for meals), item 5 (pills effort), item 8 (stick throat), and item 9 (cough). Although the item fit-residuals improved, the item-trait interaction persisted to be significant, which indicates lack of homogeneity. In addition, the PSI became too low and four items persisted to display significant fit statistics. Accordingly, it cannot be recommended summarizing the item responses of J-EAT-10 into a total score when applied to a non-clinical population.

It is worth noting that misfit items should not be removed from a scale purely for statistical reasons without theoretical considerations, as this might distort the content validity of the measurement [27, 30]. Content validity is an important property of a PROM and refers to the degree to which the content of an instrument is relevant, comprehensive, and comprehensible with respect to the variable of interest and the target population [46]. The decision as to whether a scale is sufficiently unidimensional should ultimately therefore come from a synthesis of statistical analysis in conjunction with the purpose of measurement and clinical/theoretical considerations [27, 30]. Unfortunately, content validity is not established for either the original version of EAT-10 [47] nor J-EAT-10 [31], which restricted nuanced decisions for improvement strategies.

Methodology considerations

Application of secondary analysis on an existing dataset, had an advantage and some disadvantages [48]. The advantage was that it was possible at relatively low cost to contribute to the knowledge base of the psychometric properties of J-EAT-10 using analysis by the Rasch model, which requires a relatively large dataset [43, 44]. The disadvantages were that the data were not collected for the purpose of analysis by the Rasch model and that not being involved in the data collection procedure might have meant that some study-specific aspects were concealed. Since the variables in the dataset were given, DIF analysis of important variables, such as disease state and a manifest diagnosis of OD, was not possible. In addition, the codebook for the dataset did not contain information on some of the tabulated values for the variable ‘functional level’. Accordingly, we decided not to include these in the analysis. Furthermore, the sample was skewed to low distributions of OD measured with J-EAT-10, resulting in a high percentage of extreme person scores and poor targeting, which influenced the effective sample size [27, 43]. Although an effective sample size of N = 661 (full sample) is regarded as sufficiently large [44], the data was still skewed toward the low distributions of OD. Considering an OD prevalence of 15% among community-dwelling elders [4], this might not be surprising. Accordingly, it could be argued in favor of not performing analysis by the Rasch model on item responses from non-clinical populations answering a PROM designed for a clinical population [30]. However, since EAT-10 is promoted as a quick and easy OD screening method [12] and routine screening of community-dwelling elders using EAT-10 is recommended [1,2,3, 10, 19,20,21,22,23], it was important to undertake current analysis by the Rasch model.


The study adds knowledge to the evidence on the psychometric properties of a translated version of EAT-10. When J-EAT-10 was applied to detect OD in community-dwelling elders with low OD prevalence rates, it performed less than optimally. The main problems were substantial floor effect, low reliability, a rating scale not working as intended, and several redundant items. Different improvement strategies could not resolve the identified problems. Use of J-EAT-10 in population-based surveys cannot therefore be recommended. For such purpose, alternative screening tools of self-perceived OD should be chosen or a new one should be developed and validated.

Availability of data and materials

The dataset analyzed during the current study is available as supporting information (S1 File. data set) in Igarashi K, Kikutani T, Tamura F [] [32].



Analysis of variance


Confidence interval


degrees of freedom


Differential item functioning


Eating Assessment Tool


Fit residual


Japanese version of EAT-10


Local item dependence




Oropharyngeal dysphagia


Partial Credit Model


Person Separation Index


Patient reported outcome measure


Rating Scale Model


Standard deviation


Standard error


  1. Baijens LW, Clavé P, Cras P, Ekberg O, Forster A, Kolb GF, et al. European Society for Swallowing Disorders – European Union geriatric medicine society white paper: oropharyngeal dysphagia as a geriatric syndrome. Clin Interv Aging. 2016;11:1403–28.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Azzolino D, Damanti S, Bertagnoli L, Lucchi T, Cesari M. Sarcopenia and swallowing disorders in older people. Aging Clin Exp Res. 2019;31(6):799–805.

    Article  PubMed  Google Scholar 

  3. Zhao WT, Yang M, Wu HM, Yang L, Zhang X, Huang Y. Systematic review and meta-analysis of the association between sarcopenia and dysphagia. J Nutr Health Aging. 2018;22:1003–9.

    Article  PubMed  Google Scholar 

  4. Madhavan A, LaGorio LA, Crary MA, Dahl WJ, Carnaby GD. Prevalence of and risk factors for dysphagia in the community dwelling elderly: a systematic review. J Nutr Health Aging. 2016;20(8):806–15.

    Article  CAS  PubMed  Google Scholar 

  5. Hägglund P, Fält A, Hägg M, Wester P, Levring JE. Swallowing dysfunction as risk factor for undernutrition in older people admitted to Swedish short-term care: a cross-sectional study. Aging Clin Exp Res. 2019;31(1):85–94.

    Article  PubMed  Google Scholar 

  6. Palacios-Ceña D, Hernández-Barrera V, López-de-Andrés A, Fernández-de-las-Peñas C, Palacios-Ceña M, de Miguel-Díez J, et al. Time trends in incidence and outcomes of hospitalizations for aspiration pneumonia among elderly people in Spain (2003−2013). Eur J Intern Med. 2017;38:61–7.

    Article  PubMed  Google Scholar 

  7. Verdonschot RJ, Baijens LW, Vanbelle S, van de Kolk I, Kremer B, Leue C. Affective symptoms in patients with oropharyngeal dysphagia: a systematic review. J Psychosom Res. 2017;97:102–10.

    Article  PubMed  Google Scholar 

  8. Swan K, Speyer R, Heijnen BJ, Wagg B, Cordier R. Living with oropharyngeal dysphagia: effects of bolus modification on health-related quality of life-a systematic review. Qual Life Res. 2015;24(10):2447–56.

    Article  PubMed  Google Scholar 

  9. Attrill S, White S, Murray J, Hammond S, Doeltgen S. Impact of oropharyngeal dysphagia on healthcare cost and length of stay in hospital: a systematic review. BMC Health Serv Res. 2018;18(1):1–8.

    Article  Google Scholar 

  10. Westmark S, Melgaard D, Rethmeier LO, Ehlers LH. The cost of dysphagia in geriatric patients. Clinicoecon Outcomes Res. 2018;10:321–6.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Belafsky PC, Mouadeb DA, Rees CJ, Pryor JC, Postma GN, Allen J, Leonard RJ. Validity and reliability of the eating assessment tool (EAT-10). Ann Otol Rhinol Laryngol. 2008;117(12):919–24.

    Article  PubMed  Google Scholar 

  12. Nestlé Nutrition Institute. Swallowing screening tool [Internet]. Available from: Cited 12-04-2019.

  13. Matsuo H, Yoshimura Y, Ishizaki N, Ueno T. Dysphagia is associated with functional decline during acute-care hospitalization of older patients. Geriatr Gerontol Int. 2017;17(10):1610–6.

    PubMed  Google Scholar 

  14. Popman A, Richter M, Allen J, Wham C. High nutrition risk is associated with higher risk of dysphagia in advanced age adults newly admitted to hospital. Nutr Diet. 2018;75(1):52–8.

    Article  PubMed  Google Scholar 

  15. Chatindiara I, Allen J, Popman A, Patel D, Richter M, Kruger M, et al. Dysphagia risk, low muscle strength and poor cognition predict malnutrition risk in older adults at hospital admission. BMC Geriatr. 2018;18(1):78.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Cordier R, Joosten A, Clave P, Schindler A, Bulow M, Demir N, Arslan SS, Speyer R. Evaluating the psychometric properties of the eating assessment tool (EAT-10) using Rasch analysis. Dysphagia. 2017;32(2):250–60.

    Article  CAS  PubMed  Google Scholar 

  17. Wilmskoetter J, Bonilha H, Hong I, Hazelwood RJ, Martin-Harris B, Velozo C. Construct validity of the eating assessment tool (EAT-10). Disabil Rehabil. 2019;41(5):549–59.

    Article  PubMed  Google Scholar 

  18. Kean J, Brodke DS, Biber J, Gross P. An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10). Brain Impair. 2018;19(Spec Iss 1):91–102.

    Article  PubMed  Google Scholar 

  19. Bahat G, Yilmaz O, Durmazoglu S, Kilic C, Tascioglu C, Karan MA. Association between dysphagia and frailty in community dwelling older adults. J Nutr Health Aging. 2019;23(6):571–7.

    Article  CAS  PubMed  Google Scholar 

  20. Nyemchek B, Quigley L, Molfenter S, Woolf K. A cross-sectional evaluation of wellness in New York city community-dwelling seniors (P01–035-19). Curr Dev Nutr. 2019;3(Suppl 1):64.

    Google Scholar 

  21. Chatindiara I, Williams V, Sycamore E, Richter M, Allen J, Wham C. Associations between nutrition risk status, body composition and physical performance among community-dwelling older adults. Aust N Z J Public Health. 2019;43(1):56–62.

    Article  PubMed  Google Scholar 

  22. Igarashi K, Kikutani T, Tamura F. Survey of suspected dysphagia prevalence in home-dwelling older people using the 10-item eating assessment tool (EAT-10). PLoS One. 2019;14(1):e0211040.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Wham C, Fraser E, Buhs-Catterall J, Watkin R, Gammon C, Allen J. Malnutrition risk of older people across district health board community, hospital and residential care settings in New Zealand. Australas J Ageing. 2017;36(3):205–11.

    Article  PubMed  Google Scholar 

  24. Bossuyt PM, Reitsma JB, Linnet K, Moons KG. Beyond diagnostic accuracy: the clinical utility of diagnostic tests. Clin Chem. 2012;58(12):1636–43.

    Article  CAS  PubMed  Google Scholar 

  25. Rofes L, Arreola V, Mukherjee R, Clavé P. Sensitivity and specificity of the eating assessment tool and the volume-viscosity swallow test for clinical evaluation of oropharyngeal dysphagia. Neurogastroenterol Motil. 2014;26(9):1256–65.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Cheney DM, Siddiqui MT, Litts JK, Kuhn MA, Belafsky PC. The ability of the 10-item eating assessment tool (EAT-10) to predict aspiration risk in persons with dysphagia. Ann Otol Rhinol Laryngol. 2015;124(5):351–4.

    Article  PubMed  Google Scholar 

  27. Andrich D, Marais I. A Course in Rasch Measurement Theory. In: Measuring in the Educational, Social and Health Sciences. Singapore: Springer; 2019.

    Google Scholar 

  28. Kendall KA, Ellerston J, Heller A, Houtz DR, Zhang C, Presson AP. Objective measures of swallowing function applied to the dysphagia population: a one year experience. Dysphagia. 2016;31(4):538–46.

    Article  PubMed  Google Scholar 

  29. Kreiner S. Validity and objectivity: reflections on the role and nature of Rasch models. Nordic Psychol. 2007;59(3):268–98.

    Article  Google Scholar 

  30. Christensen KB, Kreiner S, Mesbar M. Rasch models in health. Hoboken: Wiley; 2013.

  31. Wakabayashi H, Kayashita J. Translation, reliability, and validity of the Japanese version of the 10-item eating assessment tool (EAT-10) for the screening of dysphagia. JJSPEN. 2014;29(3):871–6.

    Google Scholar 

  32. Igarashi K, Kikutani T, Tamura F. In Survey of suspected dysphagia prvealence in home-dweeling older people using the 10-item Eating Assessment Tool (EAT-10)- Plos ONE. 2019;14(1). S1 File. data set.

  33. Andrich D, Lyne A, Sheridon B, Luo G. Rumm2030: A Windows Program for the Analysis of Data According to Rasch Unidimensional Models for Measurement. 7th ed. Hoboken: RUMM Laboratory Pty Ltd; 2012. (2030).

  34. Andrich D, Sheridan B. RUMM2030 manual. Perth, Australia: RUMM Laboratory; 2009.

    Google Scholar 

  35. Hagquist C, Andrich D. Recent advances in analysis of differential item functioning in health research using the Rasch model. Health Qual Life Outcomes. 2017;15(1):1–8.

    Article  Google Scholar 

  36. Hagquist C, Bruce M, Gustavsson JP. Using the Rasch model in nursing research: an introduction and illustrative example. Int J Nurs Stud. 2009;46(3):380–93.

    Article  PubMed  Google Scholar 

  37. Pallant JF, Tennant A. An introduction to the Rasch measurement model: an example using the hospital anxiety and depression scale (HADS). Br J Clin Psychol. 2007;46(Pt 1):1–18.

    Article  PubMed  Google Scholar 

  38. Christensen KB, Makransky G, Horton M. Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Appl Psychol Meas. 2017;41(3):178–94.

    Article  PubMed  Google Scholar 

  39. Andrich D. A rating scale formulation for ordered response categories. Psychometrika. 1978;43(4):561–74.

    Article  Google Scholar 

  40. Masters G. A rasch model for partial credit scoring. Psychometrika. 1982;47(2):149–74.

    Article  Google Scholar 

  41. Sharma B. A focus on reliability in developmental research through Cronbach's alpha among medical, dental and paramedical professionals. Asian Pac J Health Sci. 2016;3(4):271–8.

    Article  Google Scholar 

  42. Fisher WP. Reliability statistics. Rasch Meas Trans. 1992;6:238.

    Google Scholar 

  43. Chen WH, Lenderking W, Jin Y, Wyrwich W, Gelhorn H, Revicki DA. Is Rasch model analysis applicable in small sample size pilot studies for assessing item characteristics? An example using PROMIS pain behavior item bank data. Qual Life Res. 2014;23(2):485–93.

    Article  PubMed  Google Scholar 

  44. Hagell P, Westergren A. Sample size and statistical conclusions from tests of fit to the Rasch model according to the Rasch Unidimensional measurement model (RUMM) program in health outcome measurement. J Appl Meas. 2016;17(4):416–31.

    PubMed  Google Scholar 

  45. Wales K, Clemson L, Lannin N, Cameron I. Functional assessments used by occupational therapists with older adults at risk of activity and participation limitations: a systematic review. PLoS One. 2016;11(2):e0147980.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Terwee CB, Prinsen CAC, Chiarotto A, Westerman MJ, Patrick DL, Alonso J, et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res. 2018;27(5):1159–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Speyer R, Cordier R, Kertscher B, Heijnen BJ. Psychometric properties of questionnaires on functional health status in oropharyngeal dysphagia: a systematic literature review. Biomed Res Int. 2014;2014:1–11.

    Article  Google Scholar 

  48. Cheng HG, Phillips MR. Secondary analysis of existing data: opportunities and implementation. Shanghai Arch Psychiatry. 2014;26(6):371–5.

    PubMed  PubMed Central  Google Scholar 

Download references


We are grateful to and commend Kumi Igarashi, Takeshi Kikutani and Fumiyo Tamura [22] for making their dataset available for reuse by other research groups.


No financial support was received for this study.

Author information

Authors and Affiliations



TH and AK conceptualized the study. TH proposed the statistical analysis approach, prepared and analyzed the data, and wrote the initial draft of the manuscript. AK critically reviewed the manuscript. Both authors approved the final manuscript.

Corresponding author

Correspondence to Tina Hansen.

Ethics declarations

Ethics approval and consent to participate

Since the current study involved a secondary analysis of freely available data, formal ethical approval was not needed. The primary data were collected upon approval by the ethics committee at Nippon Dental University School of Life Dentistry (Approval No. NDU-T2015–46) as described in Igarashi et al. [22].

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

English version of EAT-10.

Additional file 2.

Data codes of the existing data set and recodes for the analysis by the Rasch model.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hansen, T., Kjaersgaard, A. Item analysis of the Eating Assessment Tool (EAT-10) by the Rasch model: a secondary analysis of cross-sectional survey data obtained among community-dwelling elders. Health Qual Life Outcomes 18, 139 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: