 Research
 Open Access
 Published:
An indepth psychometric analysis of the ConnorDavidson Resilience Scale: calibration with RaschAndrich model
Health and Quality of Life Outcomes volume 13, Article number: 154 (2015)
Abstract
Background
The ConnorDavidson Resilience Scale (CDRISC) is inarguably one of the bestknown instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory.
Method
The aim of this paper has focused on the calibration of the CDRISC with a nonclinical sample of 444 adults using the RaschAndrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item.
Results
Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CDRISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning.
Conclusions
The CDRISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.
Background
Possibly in reaction to models of psychopathology and illness, research on resilience has been gradually increasing over the last 20 years with respect to theory, assessment and implementation [1–3]. Currently, there is little doubt about its potential in healthcare, personal wellbeing and individual quality of life over the life cycle [4, 5].
The assessment of interventions and programs designed to promote and enhance resilience certainly requires measures with adequate evidence of validity and reliability. Otherwise, it would be impossible to determine the success of a program or to determine who is resilient and to what extent (hence the large differences found in prevalence studies of rates of resilience acrosslevels of risk, revealed among others by Haskett et al. [3] and VanderbiltAdriance and Shaw [6]). The development of instruments to assess resilience depends on the complexity of the construct and, therefore, on the difficulty of achieving consensus on an operational definition that enjoys sufficient evidence of validity. Although the resilience literature has become remarkably extensive [7], many points of uncertainty still persist and need to be resolved, especially those related to its definition and measurement.
Based on an extensive review of the literature, Windle [8, 9] defined resilience as the process of negotiation, management and adaptation to significant sources of stress or trauma. Protective factors and resources within the individual, as well as the individual’s life and environment, facilitate this ability to adapt to and emerge from adversity. On the other hand, the experience of resilience may be subject to changes over the life cycle. In recent years, several scales and questionnaires have been developed to assess resilience in children, adolescents and adults. The best known instruments that are applied in practice are the Resilience ScaleRS [10], the ER 89 [11], the Ego Resiliency scale [12], the Resilience Attitudes and Skills Profile [13], the ConnorDavidson Resilience Scale (CDRISC) [14], the Adolescent Resilience Scale [15], the Resilience Scale for Adults [16, 17], the Dispositional Resilience Scale [18], the 10item ConnorDavidson Resilience Scale [19], the Youth Resiliency: Assessing Developmental Strengths (YR: ADS) scale [20], the Resilience Scale of the California Healthy Kids Survey [21], the Brief Resilience Scale [22], the Child and Youth Resilience Measure (CYRM) [23] and the Psychological Resilience scale [24]. All of these scales use a selfreport format, and most consist of several factors (with only three of the above scales being unidimensional). Some have been developed for use in clinical settings, whereas others aim to assess resilience in different relational, community and cultural contexts. Windle et al. [7] conducted a review of the psychometric properties of the above scales, assessing their content validity, internal consistency, criteria validity, construct validity, reproducibility (i.e., absolute and relative error measures), responsiveness, presence of floor or ceiling effects and interpretability. The authors awarded a score of 2, 1 or 0 points to each criterion according to whether it was perfectly fulfilled, doubtful or not met. Thus, they prepared a ranking, obtaining the Resilience Scale for Adults [16, 17] and the CDRISC [14] highest scores. However, the authors indicated that none of the scales exceeded a moderate level of psychometric quality based on a study of the original papers.
Despite the importance of the study of Windle et al [7], we note that the criteria for the psychometric quality of the instruments were based on classical test theory; on the other hand, the source of these criteria does follow some logic, insofar as all of the instruments were developed under its postulates. In our opinion, the authors’ reliance on such determinants to assess the psychometric quality of the instruments—which is represented by the use of Cronbach’s alpha to assess the internal consistency and even the complete reliability of resilience tests—is a limitation of the study. Historically, Cronbach’s alpha is the most commonly used statistic to assess reliability in the literature on psychological research, but it has been seriously challenged in recent years based on the argument that it is not related to the internal structure of the test, given the covariance matrix of items and the typical assumptions about measurement error. It cannot be said, therefore, that Cronbach’s alpha truly measures test reliability (more than internal consistency) or unidimensionality (vid. e.g., Sijtsma [25], Sočan [26] or Ten Berge and Sočan [27]); more powerful alternatives, such as the Greatest Lower Bound (glb), have been proposed. Another alternative is to calculate the composite reliability from the loadings and measurement errors derived from a confirmatory factor analysis of the data, the ordinal alpha coefficient (if a factor analysis model is assumed), the ordinal theta coefficient (if a principal component analysis model is assumed), or stratified alpha from the polychoric correlations to correct the underestimation bias of the coefficient when correlations between variables are high.
The purpose of this paper is to calibrate the CDRISC [14] using the RaschAndrich rating scale model, which will be described below. The CDRISC is inarguably one of the bestknown instruments in the field of resilience assessment. It consists of 25 items with five response categories (0 to 4) grouped into five factors. The first factor (8 items) reflects the notion of personal competence, high standards and tenacity. The second factor (7 items) has to do with trust in one’s intuition, tolerance of negative affect, and the strengthening effects of stress. The third factor (5 items) reflects positive acceptance of change and secure relationships. The fourth factor (3 items) reflects control. The fifth factor (2 items) reflects spiritual influences. The scale was validated using different samples (five in clinical settings and one in the community). The authors reported that the scale has high internal consistency, good testretest reliability and adequate convergent and discriminant validity. However, the CDRISC has some weaknesses to be described in the following paragraphs.
First, the original factor structure could not be replicated by exploratory or confirmatory factor analysis [19, 28–34]. Second, the existence of the fourth and fifth factors, with three and two items, respectively, is highly questionable. Third, the decision of the authors of the scale [14] of using the KaiserGuttmann criterion for deciding the number of factors to retain is questionable because this approach often leads to overfactorization [35]. Moreover, the authors showed a preference for an orthogonal rotation method to analyze scale structure when it would be reasonable to contemplate the possibility that hypothetical dimensions of resilience were correlated, as often occurs with other latent constructs. Fourth, the names of the first three factors are confusing because they include dissimilar concepts. Finally, the multidimensional structure of the scale also seems questionable: CampbellSills and Stein [19], after removing items with low or inconsistent loadings or those with overlapping contents concluded that the structure of the scale is unidimensional, retaining only 10 items of the original 25. Subsequent studies have found similar results (e.g., NotarioPacheco et al. [36], Burns and Anstey [32] and Gucciardi et al. [31]). In the case of the Burns and Anstey study, the authors completed a onefactor solution of 22 items, with adequate fit indices, comparable to those of the reduced 10item version by CampbellSills and Stein [19].
In view of the ambiguities and inconsistencies mentioned and the evidence supporting the unidimensionality of the scale, the purpose of this study focuses on calibrating the CDRISC using the Rasch Rating Scale Model (RSM), assuming that its structure is unidimensional. Besides being suitable for the study of construct dimensionality, the methods framed in Item Response Theory have obvious advantages over CTT with regard to analysing the psychometric properties of a psychological measurement. These advantages have been widely discussed [37–44]). The most relevant of these findings includes the following: (a) psychometric information does not depend on the sample used; (b) the effectiveness of the scale can be evaluated at each level of the trait or latent variable; (c) it is possible to estimate the precision with which each test (and each individual item) measures different levels of ability/latent traits of the examined participants; and (d) the standard error uses different values throughout the continuum of the latent variable.
To the best of our knowledge, no study has evaluated the psychometric properties of the CDRISC using IRT despite its advantages. Having precise measures is of utmost importance for numerous reasons, among which we highlight the following. First, results from studies testing structural models are as reliable and valid as those from models explaining how latent variables are measured and, by extension, as reliable and valid as the evaluation instruments used. Second, from the perspective of the clinical application of these evaluations, IRT models provide a considerably deeper knowledge of items that ideally provide precise and valid instruments for the diagnosis, classification, and evaluation of intervention effectiveness.
Methods
Participants
A convenience sample of 444 adults (24.5 % men and 75.5 % women) with a mean age of 36.18 years (SD =19.5) was used. The mean age of the women was 36.08 years (SD =19.2); of the men, 36.58 years (SD =20.40). The high dispersion of the sample was due to its being a bimodal distribution (viz., 279 participants were college students, with the remainder (N =165) being older participants who had been educated at the University of Experience Program). The analysis of the gender and age variables did not yield significant differences from the expected frequencies (t _{(442)} =0.248, p =0.804), so we accept the equiprobability hypothesis i.e., independence between the two variables. The participants in this study completed the scale after sign the corresponding consent, adjusted to the Code of Practice for Research at the University of Valladolid (Spain, last update of the Governing Council of 31 January 2013).
Measures
The Spanish version of the ConnorDavidson Resilience Scale (CDRISC) was used [14, 45]. The properties and theoretical structure of the scale has been described in the introduction section.
Data analysis
In the first phase of data analysis, exploratory factor analysis was performed, implementing the FACTOR 9.2 program [46] to determine the dimensional structure of the scale.
In a second phase, the Rasch Rating Scale Model (RSM) [47–51] was used and implemented in the WINSTEPS v. 3.73 program [52, 53]. The RSM specifies the probability P_{nij}, that a person n with ability level β_{n} would be observed in category j of a rating scale applied to item i at a difficulty level (or ‘endorsability’) δ_{ i } as opposed to the probability P_{ni(j1)} of the person’s being observed in category (j 1). Thus, on a Likert scale, j could be strong, whereas j1 might be moderate.
In equation (1), τ_{j} is the RaschAndrich threshold, also called the step calibration or step difficulty. The model is appropriate for estimating the latent variable (resilience) and the item endorsability for responses scored in two or more categories. The model also assumes that the distance between the threshold parameters is constant across all items [52, 38].
Results
Exploratory factor analysis
To determine the factor structure of the CDRISC, the unweighted least squares (ULS) method was used as the extraction procedure; an oblique rotation (direct oblimin, delta =0) was selected because previous research had shown that resilience factors have moderateto highcorrelations (e.g., Burns and Anstey [32]). The analysis was performed on the matrix of polychoric correlations, reflecting the ordinal nature of the input data. The adequacy of the data was confirmed by Bartlett’s test of sphericity (df =276, p <0.001), the KaiserMeyerOlkin (KMO) index (.94) and the determinant of the matrix (p <0.001).
Initially, item 3 was removed (‘Sometimes fate and God can help’) for having a corrected homogeneity index < .20. To determine the number of factors to be retained, the KaiserGuttman rule (eigenvalues > 1.00) was used, resulting in the retention of four factors, which explained 37.2 % of the total variance. However, the fourth factor had an eigenvalue of 1.02, explaining 4.2 % of variance, and it was composed of only two items (2 and 7). Moreover, crossloadings of < 0.4 on all factors for items 2, 12, 13 and 22 were observed. Because the retention of four factors was not statistically justified in this case, other criteria were used to guide decisionmaking. First, residual correlations were analyzed by the root mean square residual (RMSR) statistic; the values obtained were .046, .039, .033 and .031 for models with one, two, three and four factors, respectively. Because RMSR indices < .08 are considered indicative of a goodfitting model [54], following the principle of parsimony, this analysis led to the retention of a single factor. Second, the optimized parallel analysis [55] was used as a criterion for retention, comparing the eigenvalues obtained from the analysis with those obtained from 1000 randomly generated polychoric correlation matrices. In our case, randomly generated eigenvalues obtained from the first factor exceeded those obtained from the analysis, which again suggested a onefactor solution to be the most appropriate. Finally, the minimum average partial (MAP) method [56] was used. It was observed that the lowest average value of partial correlations corresponded to a onefactor solution. In light of the above results, it was decided to perform subsequent analyses assuming a onefactor model.
Under conventional factor analysis, the criterion for item retention was that the factor item loading should be at least .40. As shown in Table 1, two items did not meet the criterion: item 9 (‘Things happen for a reason’) and 20 (‘Have to act on a hunch’). Thus, these items were removed, leaving a final version of the scale composed of 22 items with a unidimensional structure. This single factor accounted for 38.6 % of the variance and has been identified in other studies, such as Burns and Anstey [32], CampbellSills and Stein [19], Gucciardi et al. [31] and NotarioPacheco et al. [36]. The decision to remove items 3, 9 and 20 was supported by the results of the Rasch analysis described in the following sections. Table 1 shows itemfactor loadings for the onefactor model. Items with factor loadings < .40 are highlighted.
Assessment of the fit of items and persons
Table 2 shows the results of the Rasch analysis (viz., the locations of the items, standard errors, infit and outfit values, pointbiserial coefficients and estimated discriminations of the items), once items 3 (‘Sometimes fate and God can help’), 9 (‘Things happen for a reason’) and 20 (‘Have to act on a hunch’), whose infit and outfit mean square (MNSQ) values were > 1.50, were deleted.
Item polarities indicated that all pointbiserial correlations were positive and greater than the recommended value of 0.20, falling in a range of 0.38 to 0.67. Therefore, all items met the critical requirement in the Rasch analysis for being aligned in the same direction on the latent variable.
The item separation index was 7.58, demonstrating that the items discriminated between different levels of resilience between subjects. The overall reliability or item separation reliability (0.98) indicated that the items formed a welldefined variable and that the reliability of the location of the items on the scale was good; it also provided evidence for the tenability of the local independence assumption. Low reliability would mean that the sample was not large enough to accurately locate the items on the latent variable. With this sample, item difficulties were estimated very accurately.
Estimates of the subjects were reliable. In an attempt to assess the extent to which the test was able to discriminate among levels in the sample to a degree sufficient for our purpose, the separation index was calculated to be 3.28. This value is roughly equivalent to a KuderRichardson (KR)20 or Cronbach’s alpha value of 0.91 and indicated that the CDRISC, for the sample studied, discriminated among at least three levels (i.e., subjects with low, medium and high resilience). The person separation reliability index was appropriate (0.91).
The raw scoretomeasure correlation (RSMC) values are Pearson’s correlations between raw scores and the entire measure, including extreme scores. It is expected that, when the data are complete, they will be near 1.0 for persons and 1.0 for items (which is true in our case: 0.99 and 1.00 were obtained, respectively).
The average fit and standard deviations of the items were suitable (infit =1.00, SD =0.17; outfit =1.01, SD =0.18). The average fit and standard deviations of the persons were also suitable (infit =1.00, SD =0.23; outfit =1.01, SD =0.24). These results suggested that this set of items satisfied, in principle, the requirements needed to identify the construct of resilience.
A graphical representation of fit by infit and outfit MNSQ is provided in Fig. 1. The two easiest items (i.e., those most likely to be endorsed) were 25 (‘Pride in your achievements’) and 2 (‘Close and secure relationships’), whereas the most difficult to endorse were numbers 14 (‘Under pressure, focus and think clearly’) and 19 (‘Can handle unpleasant feelings’.) All standard errors were even and reasonably low, as shown by the diameters of the bubbles representing each. Finally, all items were located in the area of 0.5 to 1.5 (areas of acceptable fit), showing their usefulness for measurement [57]. With regard to the subjects, proper model fit was also found: the average infit was 1.01 (SD =0.23), and the average outfit was 1.01 (SD =0.24). It should be noted that only 70 subjects (15.7 %) had values greater than 1.50 for infit or outfit MNSQ. Accordingly, the proportion of persons with good fit amounted to 84.3 % of the sample.
Specific objectivity
Specific objectivity analysis was carried out by dividing the original sample into two random subsamples of 222 subjects each, followed by the performance of a simple linear regression analysis between the item difficulty parameters obtained from each of them [58]. The correlation between both sets of parameters was .990, with an intercept of 0.001, a slope of 1.115 and a coefficient of determination 0.980. Because the values that would represent perfect fit between the data and the model are 1, 0, 1 and 1, respectively, we concluded that the requirement of invariance of the item parameters was met and that the data showed a good overall fit to the model.
Appropriateness of the item difficulty level for the sample
The maps of persons and items (Fig. 2), also known as ‘Wright maps’, vividly illustrate how the items on progressively higher difficulty levels overlap with the levels of those persons assessed on the latent trait (resilience). Because the Rasch model uses the same measure (logit), both metrics can be compared to determine whether the item difficulty is appropriate for the sample of persons. If the sample is appropriate, there should be considerable overlap in the map between the item difficulty parameters and the levels of the latent trait of persons. This alignment between items and people is called targeting in the Rasch analysis jargon.
Figure 2 shows the complete map of items and persons ordered from the highest to the lowest levels. Consequently, persons with high levels of resilience, as well as items most difficult to endorse, are at the top of the map. We can see how the range of the item difficulty parameters partially overlaps the range of the latent trait parameters of persons, indicating that the 22 items assessed subjects with different levels of resilience. However, the following considerations need to be taken into account.
First, the average for persons (M =0.75, SD =0.93) was significantly higher than for the items (M =0.00, SD =0.48). Second, the amplitude for persons (1.73 to 3.81 logits) was far superior to that for the items (from 0.82 to 0.92 logits). Third, a total of 188 subjects (42.7 %) scored above the range of item difficulty, whereas only 16 (3.0 %) did so below that range. The targeting area between the item difficulty and the presence of the latent trait in subjects grouped 240 people (49 %). Fourth, no differences were found between the number of men and women in each of the three areas mentioned. An analysis of the standardized Pearson residuals led to the conclusion that the distributions by gender in different areas (upper and lower targeting areas) were not significantly different from those expected under the equiprobability assumption (χ ^{2} _{(2)} =1.59, p =0.447). This lack of difference was confirmed by an analysis of variance. The means of men and women were 0.72 (SD =1.02) and 0.78 (SD =0.98), respectively (F_{(443)} =0.531, p =0.467, d =0.025), which denoted a negligible effect size [59]. Moreover, no significant differences were found between the younger subjects and those older than 40 years (F_{(443)} =1.10, p =0.293, d =0.026).
Dimensionality assessment
One of the underlying assumptions of the Rasch model is that the scale is unidimensional. We checked this requirement using exploratory factor analysis (whose results are detailed in a previous section), infit and outfit statistics, and principal component analysis (PCA) of the Rasch standardized residuals.
When analyzing Rasch models, lack of dimensionality is reflected in poor fit indices. As described above, two fit indices are commonly used: infit MNSQ and outfit MNSQ, both with a theoretical range from 0 to +, which determine to what extent each of the items represents a single underlying dimension. Whereas infit is affected by unexpected response patterns of subjects located near the item position on the scale, outfit is more sensitive to unexpected response patterns of subjects located far from the item location. Because MNSQ is calculated by dividing the value of χ^{2} by the degrees of freedom, MNSQ values =1 are ideal, suggesting that the observed variance is equal to the expected variance. Outfit and infit values of 1 + x indicate (100*x)% more variation between observed and predicted patterns by the model from what would be expected if the data and model were to fit perfectly. For example, the infit MNSQ value of 1.12 obtained in item 12 (‘When things look hopeless, I don’t give up’) indicated that this item had 12 % more variation in the observed data than what was predicted by the model. It is assumed [60] that items with MNSQ values > 1.00 have infrafit (suggesting the presence of unmodeled noise or other sources of variance in the data). An item with a large fit statistic generally indicates that it does not belong to the single construct being measured. In contrast, items with MNSQ values < 1.00 present overfit (which suggests that there is less variation in the observed data than in the model and, therefore, that the model predicts the data too well, causing inflated summary statistics). As shown in Table 2, infit MNSQ values ranged from 0.88 (‘You can achieve your goals’) to 1.30 (‘Under pressure, focus and think clearly’).
Discrimination indices are shown in the last column of Table 2. When indices are < 1.00, they indicate infradiscrimination, which suggests weak differentiation from one level to the next [60]. In our case, only items 7 and 14 had discrimination indices < 0.70.
In addition to reviewing the MNSQ values described in the preceding paragraphs, we conducted a principal components analysis of the Rasch residuals to determine the unidimensionality of the scale. Principal components analysis decomposes the correlation matrix between items based on standardized residuals (i.e., differences between observed values and those predicted by the Rasch model) to determine whether there are other potential dimensions. The first factor in the analysis corresponds to the Rasch dimension. A variance ≥ 60 % is considered suitable. The second dimension (or first contrast of the residuals) indicates whether there are patterns in the differences of the residuals sufficiently large to suggest the likely existence of more than one dimension. If the variance of the Rasch dimension is low, while being significant in successive contrasts, the scale may be multidimensional. The frequently adopted rule is that the second dimension must include at least 3 items (according to the eigenvalue) to be considered as a possible second dimension and should represent at least 5 % of the unexplained variance [60].
Principal components analysis of the CDRISC showed that 39.8 % of the variance was explained by the data. This percentage is almost identical to the variance explained by the model (39.7 %). The first contrast had an eigenvalue of 1.9 (lower than the value of 3.0 required to consider a second dimension), which indicated that it contained fewer than 3 items and explained 4.9 % of the variance of the data that was not modeled. Consistent with what has been said, the scale could be considered unidimensional (or ‘sufficiently dimensional,’ in more precise terms), as shown in Fig. 3, in which the existence of distinct clusters to suspect multidimensionality is not evidenced. Figure 3 shows the scatter plot of the measures against loadings in the first contrast to visually illustrate the factorial structure of the CDRISC. In the chart, the Rasch dimension (in the abscissa) is contrasted with the first contrast factor (on the ordinate). If the items were to appear in separate groups, one might think that the scale did not meet the requirement of unidimensionality.
Role of the response categories
Each item classification was subsequently reviewed to determine if response categories functioned as expected. First, all frequencies of the four categories used (not true at all, rarely true, sometimes true, often true, true nearly all the time) exceeded the minimum of 10 recommended by Linacre [61]. The most common was often true (n =3450) followed by sometimes true (n =3346), true nearly all the time (n =1767), rarely true (n =1354) and not true at all (n =295).
The infit MNSQ values were close to the expected value of 1.00 in all categories (1.25, 0.97, 0.96, 0.98 and 0.94, respectively). The outfit MNSQ values were also close to 1.00 in the four categories (1.30, 1.00, 0.98, 0.97 and 0.95, respectively), indicating that the category provided more information (i.e., systematic variance) than noise (i.e., error variance) in the measurement process [61].
Secondly, it was confirmed that average measures for all categories advanced monotonically and that there was no particularly noisy category. Thus, the average measures (0.46, 0.19, 0.38, 1.01 and 1.85) and threshold estimates (1.95, 0.79, 0.67 and 2.06) showed an increase in parallel with the increase across category labels, suggesting that the categorization of the rating scale was successful (Fig. 4). The sequence was therefore τ_{1} < τ_{2} < τ_{3} < τ_{4.} This sequence of values indicated that the RaschAndrich threshold parameters were ordered. Therefore, from the not true at all category, the most likely transition that occurs is to the rarely true category, and so on. Together with the values of these threshold parameters, the standard error of the item steps are shown, showing that the values are relatively low (0.06, 0.03, 0.02 and 0.03).
Looking at the chart of the response category characteristic curves (RCCC), the most likely response category along the continuum can be observed more clearly. This curve relates the probability of item response with its level in the construct measured with the test and is useful in the assessment of item properties. As shown in Fig. 4, the points of intersection between the response categories match the measurement threshold parameters (τ). In turn, these points define regions of most likely responses in the continuum.
Differential Item Functioning (DIF) analysis
Uniform DIF analysis revealed that there was an item in the scale with a certain DIF risk (in this sample) according to the gender of participants. This was item 2 (‘Close and secure relationships’), and it was 0.71 logits more difficult for men (t_{(249)} = 3.43, p <0.001; MH χ^{2} = 9.44, p <0.001). DIF size in the other contrasts did not reach 0.50 [52, 62]. With respect to age groups, the only item at risk of DIF was 25 (‘Pride in your achievements’), which was 1.05 logits more difficult for older people (t _{(390)} = 5.00; p <0.001; MH χ^{2} = 25.97; p <0.001).
Item accuracy
With regard to the accuracy of the scores provided by the test items, item information functions and those of the global test were estimated. The result that was obtained was for values of θ between θ = 1.0 and θ =0.5, where test item information scores were highest; hence, it was in this continuum region where the test measured with greater accuracy (Fig. 5). The largest standard errors of measurement occurred at the extremes of the continuum (6.39 and 6.51).
With regard to the ability of each item to accurately measure different regions of the underlying variable, the results were as follows.

1.
Items with maximum information values in the lowermiddle region of the resilience continuum. In this region were those items that had the highest values of the information function between θ = 1.5 and θ = 0.5. The items that corresponded with this category were 1, 2, 8, 24 and 25.

2.
Items with maximum information values in the middlemiddle region of the resilience continuum. This region was bounded by values of θ = 1.0 and θ =0.0. Items that had their highest scores in this range were 5, 6, 10, 11, 13, 15, 17, 21, 22 and 23.

3.
Items with maximum information values in the uppermiddle region of the resilience continuum. This region was bounded by the values of θ = 0.5 and θ =0.5. Items that were included in this region were 4, 7, 12, and 16.

4.
Items with maximum information values in the upper region of the resilience continuum. Items 14, 18 and 19 were situated in this region, which had maximum information function values between θ =0.0 and θ =1.0.
Test rates
The standardized CDRISC rates were eventually built (qv, Appendix). In them, direct test scores (22 to 110) and measures (i.e., Rasch parameter estimates), along with the standard error for each score, standard scores with its standard error, absolute and cumulative frequencies, and percentiles, are shown. For example, a raw score of 90 on the CDRISC corresponded to a Rasch measure of 1.35, a standard error of 0.27, a standard score of 595 (SD =29) and a percentile of 77.
Discussion
In this study, we have calibrated the CDRISC scale using the RaschAndrich Rating Scale Model (RSM). To our knowledge, the CDRISC has not been psychometrically calibrated and validated by methods included in item response theory. This methodological framework offers assessment instruments a number of advantages not offered by the traditional analysis methodology (i.e., classical test theory), such as parameter invariance, the estimated accuracy of the items and test, the estimate independence for the test, the joint measurement of items and persons, specific objectivity, range properties and specificity of the standard error of measurement, and the customization ability of the tests. We consider this approach appropriate because psychometric goodness of the original scale was insufficiently substantiated [7] and analysis by Item Response Theory (IRT) models allows considerably deeper understanding of the psychometric properties of the items and scale.
The results showed that the data, taken together, met the requirements of the RSM fit statistics. A good overall fit of persons and items to the model was found: the items of the CDRISC allow the identification of a relatively wide range of behaviors evaluating resilience. Moreover, both the average reliability indexes of the items and persons and the overall reliability index were found to be acceptable. Moreover, the fit of items was conducted in two phases. In the first, after removing three items with unacceptable infit or outfit values, the fit with the remaining 22 items of the CDRISC was studied. The results revealed that they all exhibited fit appropriate to the expectations of the model. Therefore, we consider that the data collected with the 22 items could be conveniently explained by the RSM.
The fit for persons showed that, for 84.23 % of individuals (collectively considering mismatches identified by infit and outfit MNSQ), the application of RSM to all items of the CDRISC could conveniently explain response patterns. Therefore, it can be stated that the scale is useful for measuring resilience in the population for which the scale was administered. In 70 (15.77 %) subjects, the response patterns did not conform to what was expected by the model. Because this was a small percentage, as mentioned above, we retained an interpretation that the model adequately explained the response patterns given by persons to all items. Moreover, the application of RSM to all items allowed the investigation of other item properties, such as measurement error, RCCC and item position on the resilience continuum.
With respect to whether the items were homogeneously and hierarchically ordered with respect to the latent variable evaluated, scale items were shown to be distributed along the continuum, without excessive distance between them, so that, in principle, it would not be necessary to rebuild the instrument for adding items to fill those information gaps. The results thus indicate that the items are distributed in a hierarchical manner and with proper scaling.
The operation of the response categories and their function information were suitable. As illustrated, the RCCC showed that the response categories were ordered on all items, as required by the model [51].
The area alignment of the items that made up the CDRISC roughly corresponded to more than half of the subjects in the sample. Just over onethird were above the range of item difficulty and only 3 % were below. In other words, the test had a clear ceiling effect with this sample. Given the distribution of item difficulty, the scale seems adequate to measure middle and lower ranges of the latent variable. In this sense, the data support, in part, one of the objectives with which it was built: i.e., assessing resilience in clinical settings and vulnerable persons [14]. The obvious ceiling effect and the absence of items able to adequately discriminate at high levels of the latent variable causes the scale to be inadequate for use in contexts in which the detection of persons with high level of resilience is of interest (for example, in selection processes for certain professions). An interesting theme for future research would be to generate a bank of items suitable for measuring resilient behavior at high levels of the variable.
In relation to the analysis of differential item functioning, we found that, in terms of gender, this study sample was subject to suspected differential operation on one of the items (‘I have stable and close relationships with persons’). Another item also emerged (‘I’m proud of my accomplishments’) as being at risk of DIF in relation to age. In this regard, it should be verified whether DIF in these items was also verified in other samples.
With respect to the accuracy of the scores provided by the test items, information functions of the items and of the overall test were estimated. The consequent result was that, for values of between = 1.0 and =0.5, the highest information function scores of the test were observed; hence, the test measures this continuous region with great accuracy. The largest standard error of measurement was at the extremes of the continuum (6.39 and 6.51). In future applications of the test and when considering the creation of an item bank, knowing the location of the items on the resilience continuum and where each item provides the maximum information would allow the creation of tests for the desired levels of resilient behavior. In this sense, it would be necessary to prepare items expected to have different endorsability values for the resilience construct for each subcategory of its operational definition. When the aim is to evaluate persons who most likely would have high levels of resilience, a selection of a sample of items should be constructed whose maximum information provided would be in the top positions of the continuum. In contrast, when the Resilience latent variable scores of the persons are low, items could be selected whose maximum information provided would be placed at the bottom of the continuum. Therefore, apart from developing custom tests for accurate diagnosis, having an appropriate item bank would allow one to produce other comprehensive evidence consisting of items that would evenly measure all manifestations of the latent variable.
The following should be noted as limitations of the study. First, the convenience nature of the selection of subjects implies that generalization of the results to the population is not possible. It would be desirable to use probability sampling in future studies to alleviate this limitation. It would also be desirable to use clinical samples to determine if they differ significantly from nonclinical samples. Second, the results showed an apparent CDRISC ceiling effect, meaning that it is not a reliable test for assessing or detecting high resilience levels. Third, the amount of variance explained by the Rasch dimension was somewhat limited. We believe that this is due to reduced dispersion in item difficulty because the explained variance depended jointly on the dispersion of persons and items. Finally, some overlap should be noted between items in terms of their difficulty (i.e., the difference in logits between some of them was very small). However, we think that they should be retained in the scale because their contents refer to clearly distinct concepts while nevertheless relate to the meaning of the latent variable being evaluated.
Conclusions
The current study shows results of a psychometric analysis of the CDRISC items’ performance. A good overall fit of persons and items to the model was found: the items of the CDRISC allow the identification of a relatively wide range of behaviors evaluating resilience, and the average reliability of the scale was aceptable.
It was noted that the CDRISC presents an essentially unidimensional structure, so that people can be evaluated in a single overall score of resilience. Three of their items showed poor fit to the model, which may mean that they are not associated with the same latent construct as other items.
On the other hand, a clear ceiling effect was observed. Thus, the scale seems adequate to measure middle and lower ranges of the latent variable. This means that this scale, at least in this sample, is not reliable to assess high levels of resilience. In order to improve CDRISC measurement quality, understanding of resilience and its relationship with other outcomes, it would be necessary to develop a set of items, suitable for measuring resilient behavior at low, middle and high levels of the trait.
Abbreviations
 RSM:

Rating scale model
 ULS:

Unweighted least squares
 RMSR:

Root mean square residual
 MAP:

Minimum average partial
 RSMC:

Raw scoretomeasure correlation
 PCA:

Principal component analysis
 MNSQ:

Mean square
 DIF:

Differential item functioning
 RCCC:

Response category curves
References
Friedli L. Mental Health, Resilience and Inequalities. Denmark: World Health Organization; 2009 [http://www.euro.who.int/document/e92227.pdf]
Fergus S, Zimmerman MA. Adolescent resilience: a framework for understanding healthy development in the face of risk. Annu Rev Public Health. 2005;26:399–419.
Haskett ME, Nears K, Ward CS, McPherson AV. Diversity in adjustment of maltreated children: factors associated with resilient functioning. Clin Psychol Rev. 2006;26:796–812.
Evans R, Pinnock K. Promoting resilience and protective factors in the children’s fund. J Child Poverty. 2007;13:21–36.
Olsson CA, Bond L, Burns JM, VellaBrodrick DA, Sawyer SM. Adolescent resilience: a concept analysis. J Adolesc. 2003;26:1–11.
VanderbiltAdriance E, Shaw DS. Conceptualizing and reevaluating resilience across levels of risk, time, and domains of competence. Clin Child Fam Psychol Rev. 2008;11:30–58.
Windle G, Bennett KM, Noyes J. A methodological review of resilience measurement scales. Health Qual Life Outcomes. 2011;9:8.
Windle G. What is resilience? A systematic review and concept analysis. Rev Clin Gerontol. 2010;21:1–18.
Windle G. What is resilience? A review and concept analysis. Rev Clin Gerontol. 2011;21:151–69.
Wagnild GM, Young HM. Development and psychometric evaluation of the resilience scale. J Nurs Meas. 1993;1:165–78.
Block J, Kremen AM. IQ and egoresiliency: Conceptual and empirical connections and separateness. J Pers Soc Psychol. 1996;70:349–61.
Klohnen EC. Conceptual analysis and measurement of the construct of egoresiliency. J Pers Soc Psychol. 1996;70:1067–79.
Hurtes KP, Allen LR. Measuring resiliency in youth: the resiliency attitudes and skills profile. Ther Recreat J. 2001;35(4):333–47.
Connor KM, Davidson JRT. Development of a new resilience scale: the ConnorDavidson Resilience Scale (CDRISC). Depress Anxiety. 2003;18:76–82.
Oshio A, Kaneko H, Nagamine S, Nakaya M. Construct validity of the adolescent resilience scale. Psychol Rep. 2003;93:1217–22.
Friborg O, Barlaug D, Martinussen M, Rosenvinge JH, Hjemdal O. Resilience in relation to personality and intelligence. Int J Methods Psychiatr Res. 2005;14:29–42.
Friborg O, Hjemdal O, Rosenvinge JH, Martinussen M. A new rating scale for adult resilience: what are the central protective resources behind healthy adjustment? Int J Methods Psychiatr Res. 2003;12:65–76.
Bartone P. Testretest reliability of the Dispositional Resilience Scale15, a brief hardiness scale. Psychol Rep. 2007;101:943–4.
CampbellSills L, Stein MB. Psychometric analysis and refinement of the ConnorDavidson Resilience Scale (CDRISC): validation of a 10 item measure of resilience. J Trauma Stress. 2007;20:1019–28.
Donnon T, Hammond W. A psychometric assessment of the self reported youth resiliency assessing developmental strengths questionnaire. Psychol Rep. 2007;100:963–78.
Sun J, Stewart D. Development of populationbased resilience measures in the primary school setting. Health Educ. 2007;7:575–99.
Smith BW, Dalen J, Wiggins K, Tooley E, Christopher P, Bernard J. The brief resilience scale: assessing the ability to bounce back. Int J Behav Med. 2008;15:194–200.
Ungar M, Liebenberg L, Boothroyd R, Kwong WM, Lee TY, Leblanc J, et al. The study of youth resilience across cultures: Lessons from a pilot study of measurement development. Res Hum Dev. 2008;5:166–80.
Windle G, Markland DA, Woods B. Examination of a theoretical model of psychological resilience in older age. Aging Ment Health. 2008;12:285–92.
Sijtsma K. On the use, misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika. 2009;74:107–20.
Sočan G. Assessment of reliability when test items are not essentially tequivalent. In Developments in Survey Methodology. Edited by Ferligoj A, Mrvar A. Metodološki zvezki, 15, Ljubljana: FDV;2000:2335.
Ten Berge JMF, Sočan G. The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality. Psychometrika. 2004;69:613–25.
CampbellSills L, Cohan SL, Stein MB. Relationship of resilience to personality, coping, and psychiatric symptoms in young adults. Behav Res Ther. 2006;44:585–99.
Menezes VA, Fernández B, Hernández L, Ramos F, Contador I. Resiliencia y el modelo BurnoutEngagement en cuidadores formales de ancianos. Psicothema. 2006;18(4):791–6.
Singh K, Yu X. Psychometric evaluation of the ConnorDavidson Resilience Scale (CDRISC) in a sample of Indian students. J Adolesc. 2010;1:23–30.
Gucciardi DF, Jackson B, Coulter TJ, Mallet CJ. The ConnorDavidson Resilience scale (CDRISC): dimensionality and age related measurement invariance with Australian cricketers. Psychol Sport Exerc. 2011;12:423–33.
Burns RA, Anstey KJ. The ConnorDavidson Resilience scale (CDRISC): testing the invariance of a unidimensional resilience measure that is independent of positive and negative affect. Pers Individ Dif. 2010;48:527–31.
Yu X, Lau JTF, Mak WWS, Zhang J, Lui WWS, Zhang J. Factor structure and psychometric properties of the ConnorDavidson Resilience Scale among Chinese adolescents. Compr Psychiatry. 2011;52:218–24.
Lamond AJ, Depp CA, Allison M, Langer R, Reichstadt J, Moore DJ, et al. Measurement and predictors of resilience among communitydwelling older women. J Psychiatr Res. 2009;43:148–54.
Preacher KJ, McCallum RC. Repairing Tom Swift’s electric factor analysis machine. Underst Stat. 2003;2:13–43.
NotarioPacheco B, SoleraMartínez M, SerranoParra MD, BartoloméGutiérrez R, GarcíaCampayo J, MartínezVizcaíno V. Reliability and validity of the Spanish version of the 10item ConnorDavidson Resilience Scale (10item CDRISC) in young adults. Health Qual Life Outcomes. 2010;9:63.
Andrich D. Rasch models for measurement. Newbury Park, CA: Sage Publications; 1988.
Ayala RJ. The theory and practice of item response theory. New York, NY: The Guilford Press; 2009.
Bond TG, Fox CM. Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah NJ: LEA; 2001.
Crocker L, Algina J. Introduction to classical and modern test theory. Mason, OH: Cengage Learning; 2008.
Embretson SE, Hershberger SL. The new rules of measurement. Mahwah NJ: LEA; 1999.
Embretson SE, Reise SP. Item response theory for psychologists. Mahwah NJ: LEA; 2000.
Van der Linden WJ, Hambleton RK. Handbook of modern item response theory. New York: Springer; 1997.
Wright BD, Stone MH. Best test design. Rasch measurement. Chicago: MESA Press; 1979.
Bobes J, Bascaran MT, GarcíaPortilla MP, Bousoño M, Sáiz PA, Wallance DH. Banco de instrumentos básicos de psiquiatría clínica. Barcelona: Psquiatría ED; 2008.
LorenzoSeva U, Ferrando PJ. FACTOR: a computer program to fit the exploratory factor analysis model. Behav Res Methods. 2006;38:88–91.
Andrich D. Rating formulation for ordered response categories. Psychometrika. 1978;43:561–73.
Andrich D. Rasch Models for Measurement. Sage University Paper Series on Quantitative Measurement in the Social sciences. Newbury Park, CA: Sage Publications; 1988.
Rasch G. Probabilistic Models for Some Intelligence and Attainment tests. Chicago: MESA Press; 1960.
Rasch G. On specific objectivity: an attempt at formalizing the request for generality and validity of scientific statements. In: Blegvad M, editor. The Danish Yearbook of Philosophy. Copenhagen: Munksgaard Publishing House; 1977. p. 58–94.
Wright BD, Masters GN. Rating Scale Analysis. Chicago: University of Chicago, MESA Press; 1982.
Linacre JM. A user’s guide to WINSTEPS [Computer Manual]. Chicago: Winsteps; 2011.
Linacre JM, Wright BD. WINSTEPS: Multiple Choice, Rating Scale, and Partial Credit Rasch Analysis [Computer Program]. Chicago: MESA; 1999.
Hu L, Bentler PM. Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychol Methods. 1998;3:424–53.
Timmerman ME, LorenzoSeva U. Dimensionality assessment of ordered polytomous items with parallel analysis. Psychol Methods. 2011;16:209–20.
Velicer WF. Determining the number of components from the matrix of partial correlations. Psychometrika. 1976;41:321–7.
Linacre JM. A user’s guide to WINSTEPSMinistep: RaschModel computer programs. Chicago: MESA; 2008.
Prieto G, Delgado AR. Análisis de un test mediante el modelo de Rasch. Psicothema. 2003;15:94–100.
Cohen J. Statistical Power for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Erlbaum Associates; 1988.
Linacre JM. WINSTEPS: Rasch measurement computer program. 2005. [http://www.winsteps.com/winman/index.htm]
Linacre JM. Investigating rating scale category utility. J Outcome Meas. 1999;3:103–22.
Zwick R, Thayer DT, Lewis C. An empirical Bayes approach to MantelHaenszel DIF analysis. J Educ Meas. 1999;36:1–28.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All listed authors participated meaningfully in the study. VAG and MTCS designed the study and performed the data collection; VAG, MTCS, BAG, AMM and FPP, analyzed the data, and wrote the manuscript. All authors made critical revision of the manuscript for important intellectual content. All authors read and approved the final manuscript.
Appendix
Appendix
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Arias González, V.B., Crespo Sierra, M.T., Arias Martínez, B. et al. An indepth psychometric analysis of the ConnorDavidson Resilience Scale: calibration with RaschAndrich model. Health Qual Life Outcomes 13, 154 (2015). https://doi.org/10.1186/s129550150345y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s129550150345y
Keywords
 Differential Item Functioning
 Item Response Theory
 Latent Trait
 Item Difficulty
 Item Bank