Development and preliminary evaluation of the QUALIKO: an observational quality of life instrument for patients with Korsakoff’s syndrome

Background To develop a Korsakoff-specific measure of quality of life (QoL), to be rated by professional caregivers, and to field-test its psychometric properties in a sample of patients with Korsakoff’s syndrome (KS) living in a specialized nursing home. Methods A research version of the QUALIKO was developed based on an existing instrument for dementia (the QUALIDEM), literature review and two rounds of surveys among expert professionals involved in the care for patients with KS. Next, QoL was independently rated using the preliminary QUALIKO for 77 patients with KS by two primary caregivers. Results The research QUALIKO consisted of 48 items describing observable behaviors across ten aspects of QoL relevant to patients with KS. Six items demonstrated poor scalability in the field test. The remaining 42 items all formed subscales with moderate to strong scalability according to Mokken scale analysis. Reliability was acceptable to good across both raters for all subscales (Mokken rho’s = 0.70–0.90), except for the two 2-item subscales of negative affect and positive self-image (Mokken rho’s = 0.47–0.71). Inter-observer agreement was excellent for five subscales (ICCs = 0.75–0.89) and fair to moderate for the other five subscales (ICCs = 0.59–0.72). The multidimensional internal structure was confirmed and all subscales were significantly correlated with primary caregivers’ global ratings of QoL except for positive self-image. Missing item values were low and floor and ceiling effects acceptable for most subscales. Conclusions The QUALIKO holds promise as a feasible, reliable, and valid measure of QoL in residential KS patients. Future research in larger samples is needed to confirm the psychometric dimensionality of the instrument, to gather normative data and to examine its test-retest reliability.


Background
Korsakoff's syndrome (KS) is a largely irreversible residual syndrome, typically resulting from severe nutritional (thiamine) depletion and occurring after incomplete recovery from a Wernicke encephalopathy [1]. KS occurs in most cases in chronic patients with Alcohol-Use Disorder and malnutrition. It is characterized by disproportionate learning and memory impairments [2,3], executive dysfunction, flattened affect, apathy, lack of illness insight, and confabulations [1]. Due to these severe cognitive impairments and neuropsychiatric symptoms, most patients with KS are in need of lifelong specialized care. In the Netherlands, many KS patients reside in long-term care facilities from a relatively young age (mean age of admission of 56.7 [4]).
Given the severity of the symptoms associated with KS and the major impact these symptoms have on the longterm functioning and daily lives of patients with KS, gaining insight into the quality of life (QoL) of individual residential patients with KS is essential. In addition, due to a lack of illness-insight, patients with KS may become frustrated, suspicious, angry, and aggressive in long-term care settings, as they may not fully comprehend why they are unable to live independently in their own homes [5]. However, assessment of QoL of patients with KS living in long-term care facilities has to date received limited attention and instruments are lacking.
QoL is by definition subjective in nature and therefore preferably assessed using self-report questionnaires [6]. However, KS severely affects the cognitive functions of patients, resulting in a lack insight into oneself and one's disease [7][8][9]. Because of this lack of illness insight and memory dysfunction, QoL of patients with KS is probably more reliably and validly assessed indirectly by using observational proxy instruments [6]. To date, no validated instruments for assessing QoL of patients with KS living in residential settings are available. By lack of a better fitting instrument, previous studies examining QoL in patients with KS used the QUALIDEM [10][11][12][13][14]. This is an observation scale developed for objectifying QoL in patients with dementia in nursing homes. The scale contains 37 questions that can be divided into nine subscales: Caregiver relationship, positive affect, negative affect, restless behavior, positive self-image, social relations, social isolation, feeling at home, and having something to do. For instance, Oudman and Zwart [12] compared the QoL of patients with KS and patients with dementia, both living in long-term care facilities, using the QUALIDEM. Overall, they found that QoL was higher in patients with KS than in dementia patients. However, the mean QoL score in the patients with KS could be considered moderate. The lowest scores were found on subscales "care relationship" and "having something to do". Furthermore, patients felt less at home in a nursing home than patients with dementia. These scores on QoL remained relatively stable over a 20-month period [13].
Although patients with (Alzheimer's) dementia and KS both display cognitive and memory deficits [15], the concept of QoL and domains affected might be quite different. For example because patients with KS are generally younger and (when stimulated by caregivers) more active than patients with dementia [5]. Development of a feasible observation scale tailored for this patient group might encourage future studies to focus on QoL in this patient group. Therefore, the aims of the current study were to develop an observational KS-specific measure for assessing QoL and to explore its scalability, reliability and construct validity.

Method
The QUALIKO was developed in two phases. In phase 1, relevant domains and items were identified and formulated based on literature review and two surveys among an expert panel of professionals involved in the care for patients with KS. In phase 2, the preliminary version of the instrument was field tested for psychometric properties.

Phase 1: development and selection of the dimensions and items
For developing a KS-specific QoL measure, it is important to identify those dimensions of QoL that are particularly relevant to this target group. A review of the literature revealed only three studies examining QoL in patients with KS in long-term care facilities [12][13][14]. All three studies used the QUALIDEM. No other studies were available that focused on the assessment or important components of QoL in KS. Therefore, potentially relevant dimensions additional to those from the QUA-LIDEM were identified from the literature on studies on the assessment of QoL in general [16][17][18], in elderly populations [19,20], and in specific populations with severe cognitive impairments [21][22][23]. Categorization of these dimensions resulted in 20 dimensions potentially relevant for patients with KS (see Table 1).
A short survey including brief explanations and examples of behaviors representative for each dimension was performed among an expert panel of 19 professional caregivers experienced with working with patients with KS. All caregivers were employees from Krönnenzommer, ZorgAccent (Hellendoorn, the Netherlands), a nursing home specialized in care for patients with KS. The experts were asked to select the five dimensions from the list of 20 they considered most important for the QoL of patients with KS. Five dimensions that were selected by the majority (over two-third) of the experts were identified for inclusion in the research version of the instrument: Care relationship, Feeling at Home, Meaningful activity, Autonomy, and Positive Self-Image (see Table 1). Positive Affect and Restless Tense Behavior, selected by 42 and 32% of the experts respectively, were also considered sufficiently relevant for inclusion. Although Social Relations / Social Support was selected by only one expert, this dimension was also included as social wellbeing, next to physical and psychological wellbeing, is a fundamental aspect of overall health-related QoL [16,17] and appeared to be underrepresented in the other dimensions. Finally, like in the QUALIDEM, a separate dimension for Negative Affect was added.
Where possible, subscales and items of the QUALI-DEM covering these dimensions were selected for inclusion in a preliminary QUALIKO instrument. Eight subscales and corresponding items of the QUALIDEM directly matched the selected dimensions: Care Relationship (7 items), Positive Affect (6 items), Negative Affect (3 items), Restless Tense Behavior (3 items), Positive Self-Image (3 items), Social Relations (6 items), Social Isolation (3 items), and Feeling at Home (4 items). Both Autonomy and Meaningful activity were not included in the QUALIDEM, but were considered important dimensions by the KS experts. However, the QUALIDEM does contain a subscale 'Having Something to Do', consisting of 2 items ('Finds things to do without help from others' and 'Enjoys helping with chores on the ward'). These items were also considered relevant for Meaningful activity of KS patients. Five additional items were formulated to constitute a new Meaningful activity subscale. For Autonomy, 6 potential new items were formulated. Care was taken in formulating the items in such a way that they were applicable to patients with KS living in nursing homes.
Next, the resulting preliminary version of the QUA-LIKO was evaluated by the same expert panel. Each member was presented with the observation scale and was asked to judge the items based on 1) relevance for QoL, 2) formulation, 3) observability and 4) applicability to patients with KS. Experts were asked to pay particular attention to the newly developed items of the Meaningful activity and Autonomy subscales. Based on the comments and issues raised by the experts: several items were reformulated, one of the new items and 2 of the original QUALIDEM items ('Cries', 'Calls out') were removed altogether as they were not considered relevant for patients with KS, and 3 new items were added ('Indicates to want more independence than he or she can handle', 'Feels safe', and 'Indicates to miss contact with family').

Phase 2: field testing Participants and procedure
The 48-item QUALIKO version resulting from phase 1 was subsequently field tested for psychometric properties among patients with KS living in the Krönnenzommer nursing home. All patients fulfilled the diagnostic criteria for KS [24] and Alcohol-Induced Persisting Amnestic Disorder according to the DSM-IV-TR [25]. For each patient, the QUALIKO instrument was completed independently by two primary professional caregivers who were responsible for the daily care of the respective patients. The caregivers received the questionnaire on the same day and were instructed to complete them after a one-week observation period. All questionnaires were completed within a period of 2 weeks. Caregivers were additionally instructed to complete the instrument independently, and not to consult one another when in doubt. In total, 17 primary caregivers completed the observation scale. Primary caregivers received a brief verbal instruction on using the instrument.
In total, the QUALIKO was completed by two primary caregivers for 77 patients with KS. The majority of the patients (N = 61; 79.2%) were men. The mean age of the patients was 60.4 (SD = 6.9) and their average duration of stay in the nursing home was 7.2 (SD = 5.6) years. Most patients were divorced (n = 43; 55.8%), 20 patients (20.6%) were never married, 8 patients (10.4%) were married or living together, and 5 patients (6.6%) were widowed.

Instruments
The QUALIKO instrument for field testing consisted of 48 items describing observable behaviors, printed in random order. Each QUALIKO form contained a separate short written instruction on how to complete the instrument. The instruction included that the items should be scored over the past week of observing the patient. The four response options for each item were never (0), seldom (1), sometimes (2), and often (3). An explanation was provided for the different response options: never (never in the past week), rarely (at most once in the past week), sometimes (a few times in the past week), and often (almost all days). Twenty-five items were positively formulated (e.g., 'Appreciates help that he or she receives') and 23 items were negatively formulated (e.g., 'Rejects help from nursing assistant'). Responses for the 23 negatively formulated QUALIKO items were recoded so that higher scores indicated better QoL. Besides the QUALIKO, two primary caregivers and their respective team supervisor (e.g., the head nurse) judged the global QoL of each patient on a 10-point numerical rating scale ranging from "very poor" (1) to "very good" (10).

Analysis
Feasibility or practicality of completing the QUALIKO was determined from the percentage of missing values for individual items. Scalability of the assumed subscales of the QUALIKO was examined separately for both observers using Mokken scale analysis [26]. Mokken scale analysis is a nonparametric item response theory-based method for constructing unidimensional sets of items and is ideally suited when the intention is to score an underlying latent trait by simple summation of the item response values [27]. The evaluation of a scale using Mokken scaling results in Loevinger's coefficient H as an indicator for the scalability of each item and subscale. The scalability of a single item in relation to the other items in the scale or an item set is expressed by the value H i . For an item to be coherent enough to be included in a unidimensional subscale, its H i value should be > 0.30 [27,28]. Therefore, items with H i values ≤0.30 in either or both observers were considered not scalable and iteratively deleted from the subscale, starting with the item with the lowest H i coefficient. The scalability of the total subscale is expressed by H T , summarizing the accuracy of item ordering within a scale. Total subscales should have a H T value of at least 0.30 to form a weak scale. Values of H T between 0.40 and 0.50 indicate moderate scalability and H T values of 0.50 and above indicate strong scalability [29].
The reliability of each subscale was estimated for both observers by calculating Molenaar-Sijtsma's Mokken's rho (ρ) coefficients. Comparisons of Mokken's ρ and the classical reliability estimate Cronbach's alpha (α) showed that Mokken's ρ mostly led to only slightly biased approximations of the true reliability. Furthermore, they were always less biased than coefficient α [30,31]. Coefficients ≥0.70, ≥0.80 and ≥ 0.90 are generally considered to indicate acceptable, good and excellent reliability, respectively [32]. For comparison purposes, we also calculated Cronbach's α internal consistency coefficients for each subscale. To examine whether scores can be generalized across observers, interobserver reliability of the subscales was calculated by computing intraclass correlation coefficients (ICCs) with 95% confidence intervals (CIs) using two-way random effects models with absolute agreement for single measurements (type A,1) [33]. According to Fleiss, ICCs < 0.40 indicate poor agreement, between 0.40 to 0.75 fair to good agreement and values > 0.75 excellent agreement [34]. Additionally, quadratic weighted kappa (Κ w ) coefficients with 95% asymptotic CIs for categorical data were computed to estimate the interobserver reliability of the individual items [35]. Κ w values for individual items between 0.21 to 0.40 were considered to indicate a fair strength of agreement, between 0.41 to 0.60 moderate agreement and between 0.61 to 0.80 substantial agreement [36].
For examining the construct validity of the QUALI-DEM, summed subscale scores for the remaining items were computed for both observers and averaged by taking the arithmetic mean. To assess the internal construct validity of the QUALIKO, Pearson correlation coefficients (r's) were computed between the subscale scores. As QoL is considered to be a multidimensional construct it was hypothesized that correlations between the subscales would be at most moderate (≤0.70) [37]. High correlations (> 0.70) were considered undesirable because this would question the distinctiveness of the subscales. For external construct validity (convergent validity), Pearson correlations were computed with the global ratings of QoL made by the supervisors and primary caregivers. All subscale scores were expected to demonstrate significant weak to moderate (r = 0.10 to 0.69) positive associations with global ratings of QoL.
For exploring the interpretability, mean scores and standard deviations (SDs) were computed for the subscales. Potential associations of QUALIDEM subscale scores with age and differences in scores between sexes were explored using Pearson correlation coefficients and independent-sample t-tests, respectively. Additionally, floor effects (proportion of patients scores the lowest, worst possible score) and ceiling effects (proportion of patients scores the highest, best possible score) of the subscale scores were examined. If a large proportion of patients scores the minimum or maximum possible value on a subscale, this may point to limited content validity and result in reduced reliability and responsiveness to either deteriorations or improvements in QoL [38,39].
Mokken scale analysis was performed with the Mokken package version 2.8.11 in R [40]. All other analyses were performed using SPSS version 25.

Missing value analysis
Of the 48 initial QUALIKO items, only 31 responses (0.4%) from a maximum of 7392 possible responses were missing across the two observers (17 responses for primary caregiver 1 vs. 14 for primary caregiver 2), suggesting completing the instrument was feasible for the primary caregivers. Across the two observer groups, the proportion of missing values ranged from 0% for 36 items to 3.9% for 2 items ('Is productive at day care' and 'Complains about day care'). Missing item responses were imputed using the expectation maximization algorithm [41], which has been shown to produce more accurate estimates than other methods such as deletion of missing cases or mean substitution [42], and rounded to the nearest integer for subsequent analyses.

Item analysis
Individual item-response distributions tended to be negatively skewed, with more patients scoring at a higher level of QoL, confirming the applicability of non-parametric scaling analysis. For twelve items, not all item-response options were used by either or both observer groups. In all these cases, the most negative response option ('never' or 'always') was not used for any patient.

Scalability
Item scalability coefficients were generally comparable between observer groups. Six items from 4 subscales had scalability coefficients below 0.30 and were deleted from their respective subscale for subsequent analysis. Five of these items demonstrated poor scalability in both observers, while 1 item demonstrated poor scalability in one observer group and borderline scalability in the other. Two items from the subscale Autonomy were deleted, namely 'Is capable of taking care of themselves' The scalability coefficients of the remaining 42 items are presented in Table 2. Except for Positive self-image, total H T values indicated at least moderate scalability of the subscales in both observer groups. Four subscales (Care relationship, Autonomy, Positive affect, Social relations) showed strong scalability in both observer groups. Restless tense behavior, Social isolation, Feeling at home and Meaningful activity showed strong scalability in one observer group and moderate scalability in the other. Negative affect, with only two remaining items, demonstrated moderate scalability in both observer groups. Finally, the other 2-item subscale Positive selfimage showed strong scalability in one observer group, but weak scalability in the other.

Reliability
Reliability coefficients ρ indicated acceptable (≥0.70) to good (≥0.80) reliability for all subscales in both observer groups, except for the two 2-item subscales of Negative affect and Positive self-image. As would be expected, Cronbach's α lower bound estimates tended to be slightly lower than Molenaar-Sijtsma ρ values. The results are presented in Table 2.
Inter-observer agreement was excellent (ICC > 0.75) for five subscales (Care relationship, Autonomy, Positive affect, Social relations, and Meaningful activity) and fair to moderate (ICC = 0.40 to 0.75) for the other five subscales (Negative affect, Restless tense behavior, Positive self-image, Social isolation, and Feeling at home). Interobserver agreement for individual items was substantial (33 items) or moderate (13 items) for all except two items ('Follows directions of nursing assistant' and 'Mood can be influenced in a positive sense'). Table 3 presents the Pearson correlations between the ten subscales of the QUALIKO. Correlations between subscales were mostly negligible to moderate, confirming the multidimensional structure of the QUALIKO instrument. However, two subscale intercorrelations exceeded the selected threshold for strong correlation (Care relation vs. Autonomy and Autonomy vs. Feeling at home), indicating a high degree of overlap between these subscales. With the exception of Positive self-image, all subscales of the QUALIKO were significantly and positivity correlated with the primary caregivers' global ratings of QoL (Table 4). Correlation coefficients were all in the range of weak to moderate and were strongest for Positive affect, Negative affect and Meaningful activity. Interestingly, global QoL ratings by the respective team supervisors were not significantly correlated with Autonomy, Positive selfimage, Social isolation, and Feeling at home. The QUALIDEM, which was originally developed for use in patients with dementia [10,11], turned out to provide a good basis for assessing QoL in patients with   KS. In the final QUALIKO instrument, 30 items from the original 40-item QUALIDEM were retained or only slightly adapted and 8 out of its 9 subscales were retained. However, the QUALIKO also contains 12 new items and 2 additional dimensions of QoL (Autonomy and Meaningful activity) that were considered important by specialized caregivers for patients with KS. This is not surprising, as patients with KS are generally younger and more active than patients with dementia. As such, the QUALIKO may have better content validity for specific use in residential patients with KS living in nursing homes.

Construct validity
The field test showed that 42 items of the QUALIKO formed subscales with, in general, moderate to strong scalability. Scalability and reliability coefficients of the subscales were quite similar to those found for the QUALIDEM in different samples of patients with dementia in residential settings [43], with especially the Social relations and Social isolation subscales showing notably better scalability and reliability in the current study. Scalability and reliability were, however, lower for Negative affect and Positive self-image, possible due to the fact that only two items were retained for these subscales. Interobserver agreement was excellent for 5 subscales and fair to moderate for the other 5 subscales, suggesting that scores are sufficiently comparable between observers.
The multidimensional internal structure of the QUA-LIKO was confirmed by low to moderate Table 4 Pearson correlations of the QUALIKO subscales with global ratings of QoL made by the team supervisor and primary caregivers  intercorrelations between most of the subscales and, except for positive self-image, all subscales were significantly correlated with primary caregivers' global ratings of QoL, supporting the construct validity of the instrument. Also, missing values for individual items were minimal, and comparable to those found in previous studies with the QUALIDEM (range of missing item responses: 0.2-0.4%) [43], supporting the feasibility of QUALIKO administration in the nursing home setting. Finally, floor and ceiling effects were acceptable for most subscales, suggesting that the QUALIKO is potentially responsive to measuring changes in QoL [38,39]. However, the items of the Positive self-image dimension, which was considered an important aspect of QoL of residents with KS by the specialized caregivers, may need to be reformulated. This subscale demonstrated the lowest scalability and reliability and a notable lack of correlation with most of the other subscales and with global ratings of QoL. The content of this dimension may need to be aimed more at assessing a patient's selfworth and self-acceptance. Lack of insight into their condition is typical of this patient group and patients themselves may believe that nothing is wrong with them [7]. The need for living in long-term care settings, and additionally losing independence and autonomy, is particularly frustrating for these patients, and might subsequently have negative effects on a patient's self-image.
Although the current study demonstrated promising preliminary psychometric properties of the QUALKO, future studies are needed to investigate other important measurement properties the instrument. First of all, reproducibility (test-retest reliability) of QUALIKO scores needs to be examined in a stable sample of KS patients. Also, relevant normative data derived from a large and representative sample of KS patients should be established to increase the usability and interpretability of QUALIKO scores for research and daily clinical care of individual KS patients. The finding that 5 out of 10 subscale scores were significantly different between male and female patients suggests that sex-specific norms may be needed.
The major limitation of the psychometric evaluation of the QUALIKO in this study is the relatively small sample size of the field study, especially for dimensionality and scalability analysis. For instance, the sample size was considered too small for exploratory factor analysis of the dimensional structure of the instrument, which as a very general rule of thumb requires at least 5 to 10 subjects per item [44]. Although our sample of 77 KS patients is substantial compared to most published research in this patient group (due to its relatively low prevalence), much larger samples are required for thoroughly confirming item monotonicity and the absence of local response dependence and differential item functioning. Although the non-parametric Mokken analysis used in this study provides a preliminary indication of the unidimensionality and scalability of the subscales of the QUALIKO, future studies need to confirm these psychometric properties in larger samples using appropriate techniques such as confirmatory factor analysis or parametric item response theory analysis.
On the other hand, the sample size was considered sufficiently large for both reliability and construct validity analysis. For instance, the sample size of 77 observations allowed accurate estimation of excellent inter-observer agreement values (ICC > 0.75) for two raters with a small 95% confidence interval width of at most 0.2 (i.e., between 0.65 and 0.85) around the estimate and a slightly wider confidence interval width of 0.4 for fair agreement (ICC = 0.40) [45]. Furthermore, it provided around 80% power to detect even weak bivariate correlations of at least r = 0.30 as statistically significant at a two-tailed α = 0.05 level for the construct validity analyses.
Two subscales in the final QUALIKO (Negative affect and Positive self-image) ended up consisting of two items only. Although Cronbach's α values were reported for these subscales, several researchers have indicated that Spearman-Brown split-half reliability is a more appropriate reliability coefficient for twoitem scales [46,47]. We decided to only report Cronbach's α estimates also for these two subscales, since additional analyses (not reported) showed that these were almost identical to Spearman-Brown split-half reliability estimates for both subscales in both observer groups, suggesting that the measurement error variances are equivalent for the two items in both scales [47].

Conclusions
This study described the development and preliminary evaluation of the first disease-specific observation scale that can be used by professional caregivers to measure the QoL in patients with KS residing in nursing homes. Development of this feasible observation scale might encourage future studies to focus on QoL in this patient group. The QUALIDEM showed promising measurement qualities, but more studies in larger groups of KS patients are needed to confirm the psychometric properties of the instrument, to collect normative data and to establish its test-retest reliability. patient in the last week. Please answer every question. If you are not sure which response option is best, please select the option that most closely represents your observations. An answer is never incorrect; always select the option that resembles reality most closely from your perspective. Do not overthink your answers; the first option that comes to mind is usually the best. Never = Never in the past week Rarely = Once at most in the last week Sometimes = A few times a week Frequently = Almost daily