The Patient Assessment of Chronic Illness Care produces measurements along a single dimension: results from a Mokken analysis

Background As the worldwide prevalence of chronic illness increases so too does the demand for novel treatments to improve chronic illness care. Quantifying improvement in chronic illness care from the patient perspective relies on the use of validated patient-reported outcome measures. In this analysis we examine the psychometric and scaling properties of the Patient Assessment of Chronic Illness Care (PACIC) questionnaire for use in the United Kingdom by applying scale data to the non-parametric Mokken double monotonicity model. Methods Data from 1849 patients with long-term conditions in the UK who completed the 20-item PACIC were analysed using Mokken analysis. A three-stage analysis examined the questionnaire’s scalability, monotonicity and item ordering. An automated item selection procedure was used to assess the factor structure of the scale. Analysis was conducted in an ‘evaluation’ dataset (n = 956) and results were confirmed using an independent ‘validation’ (n = 890) dataset. Results Automated item selection procedures suggested that the 20 items represented a single underlying trait representing “patient assessment of chronic illness care”: this contrasts with the multiple domains originally proposed. Six items violated invariant item ordering and were removed. The final 13-item scale had no further issues in either the evaluation or validation samples, including excellent scalability (Ho = .50) and reliability (Rho = .88). Conclusions Following some modification, the 13-items of the PACIC were successfully fitted to the non-parametric Mokken model. These items have psychometrically robust and produce a single ordinal summary score. This score will be useful for clinicians or researchers to assess the quality of chronic illness care from the patient's perspective.

Background Improving the quality of care for long-term conditions including arthritis, diabetes and coronary heart disease is a global healthcare priority. The increasing prevalence of multimorbidity (the co-existence of multiple long-term conditions in the same individual) adds additional pressures to individuals and healthcare systems alike [1].
The Patient Assessment of Chronic Illness Care (PACIC) is a relatively brief 20-item questionnaire designed to assess the extent to which care is aligned with the Chronic Care Model [2,3]. The chronic care model (CCM) has been widely accepted as a suitable framework for improving the care of patients with long-term ('chronic') conditions such as diabetes or arthritis.
The PACIC has been widely used in both validation studies and as an endpoint in outcomes research [4][5][6][7]. A short version for cardiovascular disease patients has been developed using factor analysis [8,9] but despite the scale's popularity, no analysis has been performed using modern test theories, including either parametric and non-parametric item response theory [10].
Previous studies using confirmatory factor analyses failed to find support for the hypothesised 5-factor structure of the PACIC [9,11] though other studies using exploratory factoring methods found better support for the original structure [12]. Disparities in findings related to the factorial structure leaves some uncertainty as to how the scale may be best applied to measure a patient's assessment of their own care. The current study addresses this uncertainty be examining the scaling structure of the PACIC using modern psychometric methods [13], avoiding some of the known issues with illusory factors in factor analyses, which may be driving the uncertainty about the scale's structure in the literature [14].
The current study conducted a psychometric analysis of the PACIC scale using Mokken analysis. Mokken analysis is analogous to non-parametric item response theory, and may be used to arrange ordinal questionnaire items into scales and to assess if the assumptions of non-parametric item response theory (including unidimensionality and monotonicity) are met by the scale (4). By successfully applying data to the Mokken model the suitability using ordinal scale sum scores is confirmed (Table 1).

Methods
Data for the analyses described here were originally collected as part of a wider cohort study designed to assess the impact of care planning on patient outcomes [7]. The current analyses use the baseline data from the cohort study. The same sample has previously been used to investigate the factor structure of PACIC and is described elsewhere [11]. Ethical approval was granted for the original data collection by Northwest 3 REC -Liverpool East (REC Ref no: 10/H1002/41).
Analyses in the current paper were all conducted within R Statistical Computing Environment [15] using the 'base' and 'mokken' packages [16,17].

Mokken analysis
Mokken models are a non-parametric extension of the simple deterministic Guttman scaling model [18]. Guttman models unrealistically assume that data are error free and Mokken models introduce a probabilistic framework which allows researchers to account for measurement error [19]. The major advantage of employing a non-parametric item response theory (NIRT) technique over other modern test theories, including the Rasch models [20], is the relatively relaxed assumptions within NIRT [21] whilst affirming important psychometric assumptions of unidimensionality and scalability [19].
Two Mokken models of interest are the monotone homogeneity model (MH model) and the double monotonicity model (DM model). In the MH model, items are allowed to differ in their discrimination parameter (the slope of their item characteristic curve). The DM model is a more restrictive version of the MH model where item discrimination parameters are fixed, much in the same way as the Rasch or 1 parameter item response theory (IRT) model. Within the MH model it is possible that some items have a weaker or stronger relationship than others to the underlying trait, which may indicate redundancy [19]. Fitting the DM model is essential in order to ensure that scores for polytomous questionnaires are correctly ordered [22].
Following suggestions in Mokken analysis teaching papers [16,23] a three-stage analysis was conducted. These three stages of analysis ensure that four assumptions of NIRT are met. Both the assumptions of NIRT and the stages of a Mokken analysis are described below.

Unidimensionality
The assumption of unidimensionality states that all items must measure the same underlying latent trait. This assumption can be expressed both logically (that all items measure one construct) as well as mathematically (that only one latent variable is necessary to account for the inter-item associations within the data) [21].

Local independence of items
The assumption of local independence simply states that an individual's response to an item is reliant solely on their level of the underlying trait being measured and not influenced by their responses to other items on the same questionnaire.
Local dependence may occur where item content is too conceptually similar between items meaning that the response to one item is conditional on the response to another.
However, whilst sophisticated methods for assessing local independence of items have been reported and used under parametric IRT paradigms [24,25], tests to assess local dependency under the NIRT paradigm are not, as far as the authors are aware, yet widely available in accessible psychometric packages [26].

Monotonicity
The assumption of monotonicity states that the probability of affirming an item is a non-decreasing functioning of the level of the underlying latent trait. For example, on a given item a person with a high level of the underlying trait (theta) must always have a greater chance of affirming an item than a person with a lesser level of the underlying trait.

Non-intersection
An additional assumption of non-intersection is added in order to satisfy the demands of the more restrictive DM model. Non-intersection is confirmed by invariant item ordering which ensures that the ordering of each item (in terms of its 'difficulty') is the same for each individual responding to the scale. Invariant item ordering (IIO) occurs when the item characteristic curves intersect across the scale, which may not occur where slope parameters are uniform across the scale. Figure 1 gives an example of non-intersecting item characteristic curves and Fig. 2 shows item characteristic curves that intersect.

Stage one
In Stage One the scalability of both the individual item and scale total is evaluated using Loevinger's H coefficient, where a higher value indicates higher scalability. The Mokken 'automated item selection procedure' is also used at this stage to assess the number and structure of meaningful factors within the data.
Mokken (1971)  This stage of a Mokken analysis is analogous to an exploratory factor analysis [17].

Stage two
In Stage Two the assumption of monotonicity (higher scores indicate a high level of the trait or characteristic being measured) between item pairs within the sample is assessed. The 'mokken' package evaluates the number and severity of monotonicity violations. Items that violate the assumption of monotonicity should be removed to improve the scale.

Stage three
The final assumption of invariant item ordering is to check for non-intersection using the manifest invariant item ordering protocol in the 'mokken' package. Invariant item ordering occurs when the ordering of the items is the same for each participant [27]. Items that violate this assumption may be removed from the scale one at a time following an iterative process. In the event that two items violate the assumption, the item with the lowest scalability is removed, before analysing the rest of the items again.
After the completion of all three stages, the final scale can be said to demonstrably meet all of the assumptions of non-parametric item response theory.

Local independence
As no formal test of local independence exists under the Mokken NIRT paradigm the final items of the PACIC will be analysed for local independence by conceptual comparison of wording and item themes. Local independence may also be indirectly indicated by Loevinger's H and Rho values that are exceptionally high.

Reliability
Scale reliability will be calculated using the Molenaar Sijtsma statistic (Rho) [28]. The Rho statistic calculates the probability of obtaining the same score twice by extrapolating on the basis of the proportion of respondents who give positive responses to item pairs [13].

Evaluation and validation sampling
To ensure that the findings in the current study would be robust across multiple different samples the sample was split randomly into an evaluation and validation sample. The analysis described above was then first run on the evaluation sample and confirmed by application to the validation sample.

Data
The 1849 cases were split randomly into evaluation (n = 956) and validation (n = 890) samples.

Stage one
The Mokken automated item selection procedure (AISP) indicated that a single meaningful factor was present, which included all of the items within the dataset. Scalability coefficients (Item H) are given in Table 2. In its 20 item form, the scale displayed an acceptable overall H value of .50 (SE = .01).

Stage two
Tests of monotonicity returned no violation of monotonicity for any item (see Tables 3 and 4).
Note: item numbers are based on the original order in which they were listed in the PACIC.

Stage three
Assessment of IIO suggested that the 20-item scale did not have IIO properties and a process of backwards step-wise deletion was conducted, iteratively removing seven items over eight steps, illustrated in Table 5.
The final "patient assessment of chronic illness care" scale consisted of 13-items that fully met all NIRT assumptions of dimensionality, scalability, monotonicity and invariant item ordering. The final scale H was .48 (SE = .01) indicating very good scalability.

Validation analysis
To confirm the findings in the evaluation analysis the final 13-item scale was assessed in the validation half of the original dataset. The final 13 items solution demonstrated good scalability, monotonicity and did not violate the IIO assumption.

Reliability
The Molenaar Sijtsma statistic (Rho) indicated very good reliability in the final 13-item scale (Rho = .88).

Discussion
Non-parametric Mokken analysis indicated that the items of the PACIC questionnaire a single unidimensional trait representing patient's assessment of their chronic illness care, rather than the previously hypothesised five-factor structure. Within this single dimension, the 20 items of the PACIC displayed good scalability and monotonicity, however seven items displayed invariant item ordering; violating an assumption of the double monotonicity model. Upon removing these 6 items the resultant 13-item questionnaire displayed excellent scalability and reliability across a single dimension.
Three of the six items which were removed from the analysis were originally placed in the 'Problem Solving' domain (Items 13, 14 and 15), two from the 'Follow-up' domain (Items 18 and 19), one from the 'Goal Setting' (Item 10) and one from the 'Patient Activation' domains (Item 3). The removal of these items may relate to inconsistencies in the implementation of different elements of the CCM in the United Kingdom. Items 18 and 19 both assess activities carried out by other medical practitioners, these items appear to rely on the assumption that seeing another medical professions (e.g., dietician) is appropriate for all respondents.
Whilst these items remain in the questionnaire, the maximum score could not be attained from any patients with a chronic condition who did need to see other clinical staff such as a medical educator or 'eye doctor' , which may have caused undue bias between patients who require care from multiple professionals and those who do not.

Fig. 2 Intersecting item characteristic curves
It is important that items which are meant to assess satisfaction with aspects of healthcare that may not be universally implemented are worded carefully to reduce confusion and facilitate accurate measurement [29].
We recommend that researchers and clinicians who wish to measure the views of patients relating to the quality of their chronic illness care in the UK are best to do so using the 13-item solution presented here, rather than the original scale across five dimensions for which we found no support in the current study. The scale has the advantage of being shorter, thus being less burdensome.
The present study is limited insofar as it was not possible to assess local independence of items using the tools available. Local dependency can result in inflated covariance between items which may, in turn, lead to higher H-coefficients and the risk that items with local dependency are spuriously included in the scale. However, in the absence of a quantitative analysis, some confidence can be gained from assessing the item wording for items which have clear conceptual overlap. It appears that the final 13 items do cover a broad range of topics and do nerlying trait ot have obvious conceptual overlap: which would be indicative of local dependency.
Further research may usefully be conducted on this scale that assesses the PACIC using parametric itemresponse theory, which may include other analyses including local independence of items and differential item functioning [29]. Parametric item-response theory also leads to the possibility of employing computer adaptive testing, which can improve the efficiency and accuracy of assessments [30].
The current study was conducted exclusively in the United Kingdom and significant heterogeneity in the way in which chronic care is organised and experienced globally suggests that the final 13-item solution may not hold for populations in the United States of America, for  example. Another study which used factor analyses to assess the psychometric performance of the PACIC for use in diabetic populations in the USA using factor analyses reduced the number of items in the final scale to 11, the disparity between findings may be attributable to differing experiences of patients in the UK and the USA [10]. Given these differences, the recommendations made in this paper should not be applied to the PACIC when it is deployed within a US population for which it was originally developed. Work which derived a set of  Table 4 Backwards step-wise removal of items violating IIO Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 items which functioned well across populations would be tremendously useful to establish to enable comparison of global models of chronic healthcare from the patient perspective.

Conclusions
The original PACIC scale was found to be unidimensional and, following the process of Mokken analysis, 13 items met the assumptions of scalability and unidimensionality, which are necessary for producing reliable, ordinal measurements from questionnaire scales. The removal of superfluous items that do not contribute positively to accurate unidimensional measurement has produced a 13-item version of the PACIC, which we recommend for use in the UK.