LC-PROM: Validation of a patient reported outcomes measure for liver cirrhosis patients

Zhang, Ying; Yang, Yuanyuan; Lv, Jing; Zhang, Yanbo

doi:10.1186/s12955-016-0482-y

Research
Open access
Published: 10 May 2016

LC-PROM: Validation of a patient reported outcomes measure for liver cirrhosis patients

Ying Zhang¹,
Yuanyuan Yang¹,
Jing Lv¹ &
…
Yanbo Zhang¹

Health and Quality of Life Outcomes volume 14, Article number: 75 (2016) Cite this article

2022 Accesses
5 Citations
1 Altmetric
Metrics details

Abstract

Background

The aim of the study is to develop a specific patient-reported scale of liver cirrhosis according to the Patient Reported Outcome guidelines of the Food and Drug Administration (FDA), and to examine its capacity to fill gaps in this field.

Methods

A conceptual framework was developed and a preliminary item pool developed through literature review and interviews of 10 patients with liver cirrhosis. With the preliminary items, we performed a pilot survey that included a cognitive test with patients and interviews with experts; the focus was on content and language of the scale. In the item selection stage, seven statistical methods including discrete trends method, discrimination analysis, exploratory factor analysis, Cronbach’s α coefficient, correlation coefficient, test-retest reliability, Item-Response Theory were applied to survey data from 200 subjects (150 liver cirrhosis patients and 50 controls). This produced the preliminary Liver Cirrhosis Patient-reported Outcome Measure (LC-PROM). In the next stage, we conducted the survey with 620 subjects (500 patients and 120 controls) to validate reliability, validity and acceptability of this scale.

Results

The 55 items and 13 dimensions addressed four domains: physical, psychological, social, and therapeutic. Cronbach’s α coefficients were 0.921 for the total scale; the confirmatory factor analysis, t-tests and ANOVA supported scale validity; the model fit index as Root Mean Square Error of Approximation (RMSEA), Root Mean Square Residual (RMR), Normed Fit Index (NFI), Non-Normed Fit Index (NNFI), Comparative Fit Index (CFI) and Incremental Fit Index (IFI) met the criterion generally. The acceptance ratio and response rate indicated good feasibility.

Conclusions

This study developed an accurate and stable patient-reported outcome scale of liver cirrhosis, which is able to evaluate clinical effects effectively, is helpful to patients in recognizing their health condition, and contributes to clinical decision making both for patients and physicians. Additionally, the LC-PROM can perform as an ultimate assessment of medical and health care effects and can inform clinical trials of new drugs for liver cirrhosis.

Background

Liver cirrhosis (LC) is a potential consequence of the progression of any of various kinds of liver disease, and the high incidence of hepatitis will lead to a large number of patients suffering from liver cirrhosis. LC is characterized by fatigue, digestive disorders, bleeding and anemia, endocrine disorder, hypoproteinemia, portal hypertension and other serious symptoms that cause great pain to patients physically, impacting their daily social life. As an irreversible, chronic, progressive disease. LC can not be cured completely at the present stage. Particularly for weak patients, the common treatments used in the clinical can cause secondary damage in addition to harm caused by the disease itself.

At present, patients’ health status and treatment effects are evaluated by hepatic function test and serological markers, or reflected by hospital stays and symptom improvement over time. However, with the continued development of a biopsychosocial medical model the use of scales to assess patients’ fitness has been widely accepted and applied internationally; that is, patients’ personally reported data, dubbed patient-reported outcome (PRO), are used to measure clinical results. One of the arguments for using questionnaires to ask patients to judge their own health-related quality of life (HRQoL) is that it has been shown that physicians are generally unable to make accurate judgments of patients’ HRQoL. Physicians’ judgments not only deviate from those of patients, they also differ among one another. This latter variability makes it particularly difficult to obtain ‘objective’ judgments of HRQoL [1].

The PRO Harmonization Group, which consists of the Food and Drug Administration (FDA), International Society For Pharmacoeconomics and Outcomes Research (ISPOR), the European Regulatory Issues on Quality of Life Assessment Group (ERIQA), and the International Society for Quality of Life Studies (ISQOL), proposes that evaluation of clinical curative effects should contain data from physicians’ reports, physiological measures, caregivers’ reports, and PROs, which come solely from the patient. In the course of a disease, there are some symptoms that can only be experienced by patients; i.e., these symptoms cannot be reflected by physical measures. In this case, the normal reference values of medicine do not equal true health; additionally, physician report data are always processed through the subjective consciousness and may only include contents related to the physician’s concerns. What’s more, this report is limited by physicians’ knowledge and experience. Therefore, PROs play an important role in clinical practice, and this method is now generally accepted by experts and patients alike. Since the publication of the draft guide for new drug development and curative effect evaluation in February 2006 [2], PROs are becoming more important in assessment of treatment outcome and in new drug registration.

A PRO instrument specific to LC could provide several benefits: it could help improve the evidence base through research assessing effectiveness of LC therapies; facilitate clinician-patient communication and shared decision making; help prioritize patient problems and preferences; monitor changes or outcomes of treatment; measure the performance of healthcare providers and services; and be incorporated in clinical audits [3–5].

In short, the aim of this study is to develop such a PRO scale that meets the following criteria: (I) specific to liver cirrhosis; (II) addresses all physical symptoms, psychological feelings, daily activities, and therapeutic status related to LC; (III) comprises items that are founded on the patients’ own perspective; (IV) has good internal consistency, a reasonable theoretical framework and can distinguish different severities of the disease; and (V) is of appropriate length and has strong feasibility.

Methods

The Medical Ethics Committee of Shanxi Medical University provided ethics approval, and all participants signed informed consent to participate.

Step 1 item generation

Literature review

We conducted literature searches on databases and network resources for PRO instruments. From the searches, we formed the conceptual framework of the new instrument, called the Liver Cirrhosis Patient Reported Outcome Measure (LC-PROM).

Patient interviews

We conducted semi-structured interviews with ten liver cirrhosis patients (five males and five females; average age 53 years). In the interview, patients were encouraged to talk about their main disease symptoms, physical feelings and symptoms that they most desired to improve, psychological conditions after diagnosis and participation in social activities since diagnosis, adherence to therapy and satisfaction with their status. In addition, patients could speak freely on other relevant topics. Throughout the process, researchers wrote down the interviewees’ original words as far as possible, and audio recordings were made. After the interview, all information was sorted and then an initial list of items was developed.

Cognitive debriefing and discussion with experts

Another ten patients (five males and five females, average age 52 years) were selected to undertake cognitive debriefing. These patients were asked to flag items that were ambiguously worded or difficult to understand, and to suggest items that needed to be added or deleted.

Seven experienced experts including three chief physicians of gastroenterology, one infectious diseases physician, one psychologist, one sociologist, and one ethics expert were invited to discuss whether the initial structural framework was reasonable and whether the items covered all areas of disease evaluation. The correlation of items with their respective dimensions and linguistic issues were considered. We modified the item pool according to the experts’ advice, and the preliminary scales were formed.

Step 2 item selection

Sampling survey

Two hundred subjects were sampled from inpatients of eight different hospitals and communities in Shanxi Province. There were 150 LC patients and 50 health controls.

Patients who were diagnosed with definite LC, who were between 18 and 72 years old, and who were fully able and willing to participate in this study as volunteers were included.

Patients were excluded if they had an uncertain diagnosis, suffered mental illness or disorders of consciousness, were unable to understand questions because of dysgnosia, or were unable to complete the test.

Health controls were healthy volunteers from communities who were not diagnosed with any diseases by physicians and had an age distribution similar to that of LC patients. Health controls also provided informed consent and got some rewards.

The survey was administered by trained investigators. Before beginning, subjects were informed of the survey objective and signed the informed consent form. Next, the participants independently completed the preliminary scale. During the survey, investigators were present to respond to questions. If participants were elderly or had a low education level, investigators read the items to them and wrote down their answers. After the survey, any incomplete scales were filled in by the subjects under the guidance of the investigators.

Scale scoring

Scores were calculated using a five-point Likert scale to reflect frequency of occurrence over the past 2 weeks of the issue presented in each item. The responses were 0 = never, 1 = occasionally, 2 = about half of the time, 3 = often, and 4 = almost every day. The positively-toned items were scored as the original score plus one, and the negatively-toned items were scored as 5 minus the original score. Thus every item score ranged from 1 to 5, with higher scores denoting more positive outcomes.

Statistical methods for item selection

Item reduction was based on both Classical Test Theory (CTT) and Item Response Theory (IRT). This study employed six methods of CTT followed by IRT.

Discrete trend

A low discrete degree means subjects were inclined to select the same answer; that is, the items had a low capacity to test for differences. In general, scores obey a normal distribution, so the standard deviation for every item was calculated. The items with a low standard deviation (<1.0) were deleted.

Discrimination analysis

Items that do not reflect different characteristics of subjects should not remain in the scale. We compared every item score with two independent-sample t-tests (α = 0.05), and the items that were not statistically different were deleted.

Exploratory factor analysis (EFA)

Taking the small sample size into consideration, we did EFA in each domain (physical, psychological, social, and therapeutic) separately, then rotated the solution. According to the eigenvalue and the variance contribution ratio, the number of factors was determined. Items with low factor loading (<0.4) and cross-loading on two or more dimensions were removed.

Cronbach’s α if item deleted (CAID)

Internal consistency was evaluated with CAID and the Corrected Item Total Correlation (CITC). If the α coefficient increased greatly when an item was deleted, the item was reducing the internal consistency of its own dimension. CITC < 0.4 indicates an item poorly contributing to the construct of the scale; therefore such items were deleted.

Correlation coefficient

The representativeness of an item was measured by its correlation coefficient with the dimension to which the item belonged. When the value was less than 0.6, the item was not retained.

Retest reliability

This method considered item stability. Thirty subjects were selected from the sample to take a retest 2 weeks after the first test. Among these, 20 cases whose data were error-free in both tests were used to calculate retest correlation coefficient. The criterion for reliability was 0.7.

Item response theory (IRT)

IRT is part of modern measurement theory and was put forward to overcome defects of CTT [6]. It is also called latent trait theory, and has advantages for item selection and test construction. It claims that there is a functional relationship between subjects’ abilities and their responses to an item. How to define this relationship is the basic idea and the starting point. In brief, IRT can be viewed as a probabilistic method for discussing the relationship between subjects’ potential traits and their responses to items.

If θ represents a subject’s ability, P(θ) is the probability of the subject’s responding to an item correctly; their functional relationship can be reflected by a curve called the item characteristic curve (ICC). Two important parameters on the curve are used in this study: a reflects discriminant degree and b shows item difficulty. On the ICC whose X,Y axes are θ and P(θ), b is the value of θ corresponding to P(θ) = 0.50; this value ranges from −3 to 3. a is the function of the tangent line’s slope at point b; its value ranges between 0.3 and 2, with larger values representing higher degrees of discrimination.

Because the five-point Likert scale was being used, a Graded Response model was constructed, which is appropriate for hierarchical and continuous data, extending a unidimensional model to a multidimensional one [7]. The basic idea of the model [8] is that: assuming the full score of an item is f _j, then the number of scores for item j is f _j + 1, that is 0,1,2…,f _j. If P _ajt * is the probability that the score of item j is greater than t when the ability value is θ _a, then P _aj0 * = 1, P _{aj, f} _j ₊₁* = 0. If P _ajt is also the probability that the score of item j is t [9], then P _ajt = P _ajt*-P _{aj, t+1}* (t = 0,1,2, …, f _j), where P _ajt* = 1/{1 + exp[−Da _j(θ _a-b _jt)]}, in which D = 1.7, a _j is the discriminant degree of item j, b _jt is the difficulty when the score of item j is t, and the difficulty level of item j is monotonically increasing; that is, b _j1 < b _j2 < … < b _j,_fj. P _ajt* corresponding to an ICC is called the Project type characteristic function in the Graded Response model.

Five parameters can be estimated in our study, namely a,b ₁,b ₂,b ₃,b ₄, where b ₁ is the parameter of difficulty level between answer 1 and answer 2, and so on, and b _1< b _2< b _3< b ₄. Here a must be > 0.60, and b ranges from −3 to 3.

Items supported by at least five methods were retained in the final LC-PROM.

Step 3 validation of the scale

Second Sampling Survey

Six hundred twenty subjects were selected in the second survey, of which 120 were controls. Inclusion and exclusion criteria did not change, nor did the survey process.

Reliability analysis

Reliability reflects the stability and consistency of a scale. In our study, Cronbach’s α coefficients for the total scale and for each domain were calculated, to evaluate the average consistency of the items. The higher the value is, the better the reliability, but if α is too high, it suggests that the items are not simply related but overlap considerably. In the extreme case where α = 1,we should consider whether some items are redundant and could be eliminated. Here we chose 0.80 as the critical value; i.t., the measured results can be considered stable when α exceeds 0.80.

Validity analysis

Validity, also called accuracy, is the other arm of validation of a scale, and reflects the extent to which a scale measures what it sets out to measure. Validity includes subtypes of content validity, criterion validity, construct validity, and discriminant validity. In this article, we chose to measure the latter two.

Construct validity

This index shows whether the scale constructs match those in the initial framework. A scale with good construct validity is able to target true potential traits for measurement. Factor analysis is a major method for construct validity analysis and includes Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). When an item collection is not based on theoretical guidance, EFA has the ability to explore the fields and dimensions belonging to a scale. However, before this study, we had reviewed the literature to formulate a scale framework, and EFA had been applied during the process of item selection, so at this stage CFA was suitable. Factor loading for every item and fit index for every domain were calculated.

Discriminant validity

This is an index of a scale’s ability to discriminate populations with different traits through comparing test results of selected subjects. The statistical method was a simple two-independent samples t-test. The total scores on the LC-PROM and on each domain were compared between cases and controls to judge whether the LC-PROM could distinguish these two groups. In addition, we stratified the time that patients had been sick as less than 1 year, 1 to 3 years, 3 to 5 years, and more than 5 years. ANOVA was then applied to infer the relationship between disease course and scale score. The scale we developed had a good discriminant validity when p ≤ 0.05.

Feasibility analysis

When a scale can be understood and completed by subjects easily, the scale is said to have strong feasibility. This property is assessed with reference to acceptance ratio, response rate, and completion time.

Statistical software

The data analysis was conducted by SPSS16.0, Multilog7.03 and LISREL8.70.

The entire study flow diagram is presented in Fig. 1.

Results

Generation of item pool

Literature review and patient interviews

Database searches revealed some liver disease-specific scales, such as the Hepatitis Quality Of Life Questionnaire (HQLQ) [10, 11], the Liver Disease Quality Of Life (LDQOL) [10, 11], the Chronic Liver Disease Questionnaire (CLDQ) [10–13], and several related questionnaires such as the WHOQOL-BREF [11], the SF-36 [10, 11], the SCL-90 [12, 13] and the Hospital Anxiety and Depression Scale (HADS) [12, 13].

The LC-PROM focused on 4 domains: Physical (PHD), Psychological (PSD), Social (SOD), and Therapeutic (TRD). This idea is based on the definition of PRO and all the specific scales for liver disease. Meanwhile, taking the Social Avoidance and Distress Scale (SAD) and the Beck Hopelessness Scale (BHS) into consideration, the LC-PROM was divided into a further 13 dimensions, and the initial item pool included 72 items (see Appendix 1). The instrument’s conceptual framework is shown in Table 1.

Table 1 Preconceived conceptual framework for the LC-PROM

Full size table

Cognitive debriefing and expert discussion

The LC-PROM was regarded as clear and concise, easy to understand and easy for the patients in the cognitive debriefing to complete. Completion time was 10 min on average. Considering patients’ suggestions, we made some modifications to the instrument. Six items in PHD that described atypical symptoms and overlapped with each other were deleted. Symptoms in deleted items included, for example, oliguria, dry eyes, pale skin and mucosa, among others. We also replaced the words “hepatic region” with “right upper abdomen,” to make this text easier to understand. Similarly, two items were reduced in PSD, one item was reduced in TRD, and one item was added in SOD.

Experts agreed that the LC-PROM was reasonable in its construction framework and item attributions, and that it was comprehensive in its content. However, because this was a self-rating scale, it was determined that the items should be expressed in the first person, so a full revision was made by research group accordingly. This second draft of the preliminary LC-PROM included 64 items, 13 dimensions and four fields (see Appendix 2).

Item reduction

Participant characteristics

We sampled 200 participants in this survey; 189 responded, for an acceptance rate of 94.50 %. There were 179 subjects, including 132 patients and 47 controls, whose data were available, for a final response rate of 94.71 %. Baseline data of participants are shown in Table 2. The average length of time since liver cirrhosis diagnosis was approximately 3.02 years.

Table 2 Baseline data for participants in pilot survey

Full size table

Item selection based on CTT and IRT

When CAID was used, we calculated the initial Cronbach’s α coefficient when all 64 items were retained; this did not result in deletion of any items, the detailed result was not shown here.

In IRT a number of items were suggested for deletion: fourteen in PHD, four in PSD, and seven in TRD; and only one item was retained in SOD according to parameters a and b. Fig. 2 shows the ICC matrix.

Fifteen items were to be deleted based on statistical results, but considering the value of disease-specific symptom information and the contributions of certain items to each dimension, six items were maintained in the final version of the LC-PROM. The final version comprised 55 items within 13 dimensions belonging to 4 domains (see Appendix 2). The detailed screening process is presented in Table 3, and the final construction frame can be seen in Table 4.

Table 3 Item selection outcome based on CTT and IRT

Full size table

Table 4 Construction frame of the final LC-PROM

Full size table

Validation of LC-PROM

Demographic characteristics

Another 620 subjects (500 cases and 120 controls) were sampled for the validation. Of the 598 who responded, 576 produced valid data for analysis (464 cases and 112 controls). Participant characteristics are presented in Table 5.

Table 5 Demographic characteristics of 464 patients and 112 controls in LC-PROM validation

Full size table

As Table 5 shows, males were more numerous than females; subjects’ average age was 50–55 years. There were no statistically significant differences in the distributions of gender, age, or height between the two groups. LC patients had a higher proportion of smoking and drinking, and lower weight. These characteristics are consistent with risk factors for LC. Among the subjects, 269 patients had been sick for 1 to 5 years, the number of patients who suffered from LC less than 1 year and more than 5 years were 97 and 98 respectively, the average length of time was 3.70 years.