Validation of three-item Fatigue Severity Scale for patients with substance use disorder: a cohort study from Norway for the period 2016-2020

Little attention has been paid to customising fatigue questionnaires for patients with Substance Use Disorders (SUDs). The present study aims to validate and shorten the nine-item Fatigue Severity Scale (FSS-9) and Visual Analogue Fatigue Scale (VAFS) for use with this population. We used data from a nested cohort with annual health assessments with responses on the FSS-9 and VAFS. During the period 2016– 2020, 917 health assessments were collected from 655 patients with SUD in Bergen and Stavanger, Norway. A total of 225 patients answered the health assessment at least twice. We dened baseline as the rst annual health assessment when the health assessments were sorted chronologically per patient. We checked for internal consistency, and we used longitudinal conrmatory factor analysis and linear mixed model analysis to validate and shorten the FSS-9 and VAFS. internal consistency of the FSS-9 was with a Cronbach’s α of 0.94 at and 0.93 at the second annual health When shortening the FSS-9 to a three-item FSS (FSS-3, items 5–7), the Cronbach’s α was 0.87 at and 0.84 at the second The internal consistency was not when the was and The longitudinal conrmatory factor analysis model between baseline and the second health assessment showed a well-tting model for the FSS-3 (χ 2 = 13.33, degree of freedom = 8, P = 0.101) with a correlation of r = 0.52 and P < 0.001 between the annual health assessments. The linear mixed model analysis showed equal linear changes at the individual level for the FSS-3 (slope: 0.00, P > 0.05) and FSS-9 (slope: 0.01, P > 0.05) between the health assessments.


Background
Fatigue is a common subjective health complaint that signi cantly affects patients with chronic diseases [1][2][3]. The nine-item Fatigue Severity Scale (FSS-9), and the Visual Analog Fatigue Scale (VAFS) are well-known questionnaires for quantifying fatigue and are validated for a broad spectrum of chronic diseases [1][2][3][4][5][6]. For patients with substance use disorders (SUDs), however, the validation and customisation of the fatigue questionnaires are still lacking, and reliable results of quantifying fatigue in this population are not existing. Patients with SUDs are harder to reach than other patients with chronic diseases, and the risk of poor compliance and biased responses due to their complicating medical and psychosocial conditions is high [7][8][9]. Therefore, to measure fatigue in the SUD population focusing on validity and the number of items could facilitate reliable and valid measurements of fatigue in this population.
The nine-item Fatigue Severity Scale shows high validity and reliability within a wide range of chronic infectious and neurological diseases -such as chronic hepatitis C virus (HCV) infection [1], Parkinson's disease [6], myasthenia gravis [2], systemic lupus erythematosus [4], and stroke [3]. For the VAFS, good validity and reliability have been shown in post-stroke patients [10], as well as when the VAFS was used alongside the FSS-9 to evaluate patients' general feelings of fatigue symptoms [11,12]. Furthermore, excellent validity and reliability was achieved when the FSS-9 was shortened into a seven-item version [13,14], while uncertain results were reported when a ve-item version was validated [15]. The high validity and reliability across different chronic diseases and in the shortened seven-item version of the FSS-9 bring the FSS-9 and VAFS into focus for further validation and shortening aimed for patients with SUDs.
Validating and shortening fatigue questionnaires is essential for facilitating knowledge of fatigue, for building a framework for individualised treatment and for developing clinical guidelines for the SUD population. However, patients with SUDs' chaotic living situations with extensive medical and psychosocial health challenges, including polysubstance use, substance intoxications, psychiatric comorbidities (e.g. attention de cit hyperactivity disorder, psychosis disorders, or personality disorders), chronic viral hepatitis, and temporary living situations might contribute to the challenge of data collection and might affect whether and how they respond to the fatigue questionnaire [7][8][9][16][17][18]. In these cases, using simple wording and phrases, and avoiding nuances between items might be essential to ensuring reliable results on the questionnaires.
Thus, the present cohort study aims to validate and shorten the nine-item Fatigue Severity Scale (FSS-9) and the Visual Analog Fatigue Scale (VAFS) for use with patients with substance use disorders (SUDs).

Data source
We drew data from a nested cohort from the INTRO-HCV trial that collected data on patients with SUDs in Bergen and Stavanger, Norway [19]. We recruited patients receiving opioid agonist therapy in Bergen and Stavanger and patients with SUDs receiving healthcare from the primary health clinics in the municipality of Bergen. This study included all patients in the cohort who had answered the FSS-9 and/or VAFS in the study period from May 2016 to January 2020.

Data collections
All included patients were invited to an annual health assessment, including FSS-9 and VAFS measurements and a survey of their current sociodemographic situation. We collected all data in a health register using data collection software (Checkware ® ) under the supervision of research nurses.

Study population
We conducted 917 health assessments of 655 patients, and this included 916 FSS-9 measurements and 915 VAFS measurements during the study period. We de ned a measurement as when at least one of the items in the FSS-9 or the VAFS were answered during a health assessment. Baseline was de ned as the rst health assessment including measurements of the FSS-9 or VAFS when the health assessments were listed chronologically. The FSS-9 and VAFS were completely answered in 914 health assessments. For the remaining three health assessments, one health assessment only have used the VAFS without FSS-9, one only answered ve of the nine items on the FSS-9 without the VAFS, and one answered the FSS-9 without the VAFS. Of the 655 included patients, 188 completed two health assessments, while 37 patients completed three health assessments. The time intervals between the annual health assessments varied with a mean of 12 months (standard deviation (SD): 4 months) (Additional File 1). Due to the relatively small number of patients with three health assessments, we used two health assessments when estimating internal consistency and performing con rmatory factor analyses. For patients with three annual health assessments, we only included the rst (baseline) and second health assessments in these analyses.

Measuring fatigue
We used the FSS-9 and VAFS to measure the level of fatigue. The FSS-9 measures fatigue during the past week, and it includes items regarding: mental and physical functioning, motivation, exercise, carrying out certain duties, and interference with work, family, or social life. The VAFS measures the patient's general experience of fatigue, and the VAFS is correlated to the FSS-9 for various chronic diseases [3], including patients with chronic HCV infection [12]. The FSS-9 was answered on a Likert scale from 1 (no fatigue) to 7 (worst fatigue) and VAFS was answered by placing a mark on a line from 0 (no fatigue) to 10 (worst fatigue)) that represent the fatigue level. The data collection software only allowed valid responses for each question and prompted for answers for empty questions before submission in order to minimise missing data. The FSS-9 has been translated and back-translated from the English version into Norwegian by quali ed native Norwegian-speaking translators (Additional File 2) [20].

Statistical analysis
We used Stata/SE 16.0 (StataCorp, TX, USA) for descriptive analysis and IBM SPSS version 24.0 and Mplus version 8.4 for reliability analysis (Cronbach's α if-item-deleted and Item-Total correlation), for con rmatory factor analysis, and for linear mixed model analysis (Mplus: TwoLevel analysis). The threshold for statistical signi cance was set to P < 0.05 for all analyses unless otherwise stated.
2.5.1. Internal consistency of the FSS-9 and the three-item FSS (FSS-3) with added VAFS We calculated the internal consistency of the FSS-9 and VAFS (10 items) at baseline and at the second health assessment.
Cronbach's α is considered to show good internal consistency if Cronbach's α was above 0.70 [21,22]. First, Cronbach's α of the FSS-9 items were calculated at baseline. We then shortened the FSS-9 by deleting the item that resulted in the highest Cronbach' α value for the remaining items (alpha-if-item-deleted analysis). The remaining items' Cronbach's α coe cients were recalculated, and the next item was deleted. If the remaining scale showed almost equal Cronbach's α values after removing one or another item, clinical experience was used in the decision of what item we removed. We deleted items that were less adaptable to patients with SUDs, for example, items about employment (unemployment is common in this population) and items with complex phrases and wordings that might be di cult to understand for patients with SUDs when they are intoxicated or going through substance withdrawals. Furthermore, we calculated Cronbach's α for the VAFS plus the FSS-9 and for the VAFS plus the shortened version of the FSS-9 (FSS-3) at baseline and at the second health assessment.
2.5.2. Longitudinal con rmatory factor analysis for evaluating the t of the FSS-9 and the FSS-3 with added VAFS We used con rmatory factor analysis to test the structure of the items in the FSS-9 in order to evaluate the relationships between the items and their underlying latent factors. In addition, the construct overlap was evaluated for the relationships between the FSS-9 factors and the VAFS. We analysed the con rmatory factor analysis models for the FSS-9 and the FSS-3 at baseline and the second health assessment. Moreover, the FSS-9 and FSS-3 factor models were combined with the covariance of the VAFS at baseline and at the second health assessment in order to explore how the VAFS was related to the models. Further, using longitudinal data at baseline and at the second health assessment, we created four longitudinal con rmatory factor analysis models for the FSS-3. First, we estimated a free longitudinal con rmatory factor analysis with all unique parameter values. We then tested for constraints in the model by setting the factor loadings within each item equal to each other at baseline and at the second health assessment. Third, we tested whether the residuals at baseline and the second health assessment were equal within each indicator. The last model constrained the intercept values for the indicators at baseline and the second health assessment. We used the Wald test to compare model restrictions. All con rmatory factor analysis models were evaluated with standard t measures: χ 2 , degrees of freedom, pvalues, Comparative t index, Tucker Lewis Index, Root Mean Square Error of Approximation with 90 % con dence interval, and the probability of close t. A well-tted model should have a statistically non-signi cant χ 2 , values of Comparative t index and Tucker Lewis Index should be above 0.95, and Root Mean Square Error of Approximation should preferably be below 0.05 (close t) [23]. Root Mean Square Error of Approximation above 0.10 is considered to be a poorly tted model [23]. We used the modi cation index to explore model improvements if the goodness of t measures indicated a poorly tted model (χ 2 difference test). We analysed all variables as continuous variables due to the relatively high number of categories in the ordinal variables (FSS-9 items ranged from 1 to 7, and the VAFS ranged from 0 to 10). The con rmatory factor analyses were run using the Robust Maximum Likelihood estimator.

Linear mixed model analysis for evaluating changes in the FSS-9 and FSS-3 sum scores with added VAFS between the annual health assessments
We used a linear mixed model analysis (Mplus multilevel modelling: TwoLevel) to evaluate linear changes from baseline in the sum scores of the FSS-9 and the FSS-3 as well as the scores in the nine separate FSS items and the VAFS. We included all 917 health assessments. First, we estimated a full random intercept random slope model, which gave us the mean and individual variance in terms of both level and change together with the relationship between level and change [24]. We re-estimated the model as a random intercept xed slope model if the covariance between the intercept and the slope variance was statistically non-signi cant. We used the Mplus Maximum Likelihood Robust estimator to correct standard errors for potential deviation from normality [25]. In addition, interclass correlations were estimated. We used full information maximum likelihood in order to use all available measurements. The full information maximum likelihood assumes 'missing at random' [26].

Ethics approval and consent to participate
The study was reviewed and approved by the Regional Ethical Committee for Health Research (REC) West, Norway (reference number: 2017/51/REK Vest, dated 29.03.2017/20.04.2017). Each patient provided written informed consent prior to enrolling in the study.

Patient characteristics at baseline
Seventy-one percent were male, and the mean age was 43 years (Table 1). Half had more than primary school as the highest level of education. Eighty-three percent received opioid agonist therapy, and 42 % had injected substances in the last 30 days leading up to the health assessment. The mean values of the FSS-9 items varied from 4.43 to 5.38 at baseline (Table 2). For the VAFS, the mean value was 5.19 at baseline. The FSS-9 and VAFS variables were slightly left-skewed (skewness ranged from -1.14 to -0.29) and tended towards a attened distribution (kurtosis ranged from -1.39 to -0.09).

Internal consistency of the FSS-9 and the FSS-3 with added VAFS
The nine-item Fatigue Severity Scale's Cronbach's α was 0.94 at baseline and 0.93 at the second health assessment (Additional le 3).

Longitudinal con rmatory factor analysis for evaluating the t of the FSS-9 and FSS-3 with added VAFS
The results from the con rmatory factor analyses comparing the t of items in the FSS-9 and FSS-3 with added VAFS at baseline and at the second health assessment are shown in Table 3. At baseline, the unidimensional model with unique factor loadings, residuals, and intercept values resulted in a borderline tted model, with a Comparative Fit Index and Tucker Lewis Index below the suggested levels and an Root Mean Square Error of Approximation point estimate just at the level of a poor t for the FSS-9. The modi cation index showed that the estimation of the residual covariance between items 2 and 3 improved the model with the best result (Δχ 2 = 80.1, degree of freedom (df) = 1, P < .001). The FSS-3 showed a well-tted model. The FSS-9 and FSS-3 were highly correlated with r = 0.95, P < 0.001, giving 90 % explained variance after estimating factor scores. The VAFS, together with the FSS-9 and FSS-3, respectively, gave identical results. The correlations with VAFS were r = 0.70, P < .001 (FSS-9) and r = 0.68, P < .001 (FSS-3). We obtained relatively similar results for the factor models at the second health assessment for both the FSS-9 and FSS-3 with and without the VAFS. The longitudinal con rmatory factor analysis model based on the FSS-3 supported time-invariant equal factor loadings and equal residuals between the baseline and the second health assessment (Figure 1). The correlation between the models at baseline and at the second measurement was r = 0.52, P < 0.001. A small reduction in the model t was found if the intercept values were constrained to be equal within each measured item between health assessments. However, this simpler model was still a well-tted model.

Linear mixed model analysis for evaluating changes in the FSS-9 and the FSS-3 sum scores with added VAFS between the annual health assessments
The linear mixed model analysis showed considerable intra-individual clustering for the FSS-9 and FSS-3 with added VAFS, and for the separate items (Table 4). However, the intraclass correlation coe cient estimated variations of 0.18 to 0.52, and these showed more variation over time in some variables than in others. The random slope and covariance between the intercept and the slope were statistically non-signi cant for all models. This indicated an equal linear change from baseline at the individual level. The reestimated linear random intercept xed slope models showed a small increase in the items 1-4, while no similar change was found in the other items. A mean change was also found in the VAFS variable.

Discussion
The present study shows that the FSS-9 plus VAFS can be shortened to just the FSS-3, with most of the included variance and validity retained. Questionnaires that are easily understood and with few items are essential to ensure high completion rates among patients with SUDs. Our ndings showed that the full-scale FSS-9 had excellent internal reliability when tested empirically based on internal consistency. The FSS-3 did not substantially reduce the internal consistency compared to the FSS-9 with or without the VAFS. The FSS-3 results from the longitudinal con rmatory factor analysis showed a well-tted unidimensional model with equal factor loadings and equal residuals when comparing baseline to the second health assessment. The FSS-9 and FSS-3 were almost perfectly correlated, which was in line with what we expected considering the homogeneity of the FSS-9 scale. The factor analysis supported longitudinal stability in the indicators and con rmed the longitudinal measurement invariance. Although the fatigue level varied, the mixed model analysis showed that the scoring structure was substantially stable and equal over time from baseline.
The reliability analyses showed high internal consistency for the FSS-9 and FSS-3, and the internal consistency only decreased from 0.94 to 0.87 when reducing the number of items from ten (FSS-9 + VAFS) to three (FSS-3). A homogenous and substantially equal internal consistency was also found in studies evaluating the FSS-9 in other chronic diseases such as stroke [3], HCV infection [1], and multiple sclerosis [15], as well as studies that have shortened the FSS from nine to seven items [13,14,27]. Unlike other populations, patients with SUDs may have a broad spectrum of mental and physical diseases that could interfere with the patients' experiences of fatigue and how they respond to the fatigue scales [28,29]. In the present study, most patients were polysubstance users, of which nearly 40% had injected substances during the past 30 days. This may contribute to substantial changes in medical and psychosocial factors affecting the health assessment, for example, being affected by substances, living temporarily on the street, or having a lack of income. Considering the medical and psychosocial conditions, nuances in wording and phrases in the 10 items (FSS-9 plus VAFS) could be missed or misunderstood, thus making the present highly reliable FSS-3 questionnaire clinically useful in further fatigue surveys.
The con rmatory factor analyses demonstrated unidimensional models for the FSS-9 at baseline and at the second health assessment, which was improved when adding the residual covariance between items 2 and 3. This means that the single-factor model did not fully capture these items' responses. The explanations for this might be related to similar phrasing and wording, as well as the order of items 2 and 3, which might affect patients' perception and interpretation, and increase their confusion. Moreover, the unidimensional FSS-9 factor model was in line with models reported in other studies that have validated the FSS-9 [2,20,27]. In those studies, however, small study populations have been a persistent issue, contributing to a potential risk of overlooking underlying multidimensional models [2,27]. When using a relatively large cohort of patients with SUDs, the present study showed that the unidimensional factor models of the FSS-9 and FSS-3 were maintained. Therefore, regardless of population sizes, one can assume that the unidimensional factor models are generally well-tted for the FSS.
The linear mixed model analysis showed that items 5-7 included in the FSS-3 did not change differently between the annual health assessments compared to the items 1-4 in the FSS-9. The result might give further arguments for the better validity of the FSS-3 questionnaire, and the FSS-3 is assumed to be less sensitive to uctuations compared to the FSS-9 in the SUD population. This points out that the FSS-3 might be preferred when evaluating changes in fatigue over time in the SUD population.
The present study showed that the FSS-9 could be shortened to just the FSS-3 without the VAFS among patients with SUDs. However, comparing the FSS-3 to the FSS-9, the FSS-3 might increase the risk of common method bias considering that three items are more likely to be recalled and are more accessible in the short-term memory than nine items [30]. Previous studies have not evaluated the impact of common method bias; however, high reliability was achieved when validating a shortened FSS-9 into seven items in various study populations [13,31]. Nevertheless, these studies detected cross-sample differences between items 3, 5, 6, and 9, which corresponded to two items (5 and 6) in the FSS-3 questionnaire. This points out the need for further validation and shortening studies on the FSS-3 when adapting it to other populations.

Strengths and limitations
This study has some strengths. We collected data from patients who are di cult to reach in both research and health care. Of those, 225 patients were followed up by two or three annual health assessments, making longitudinal analyses possible. The study also had some limitations. First, patients recruited to this study answered the FSS-9 and VAFS questionnaires under different mental and physical health conditions, which might reduce the generalisation of the results. Second, the majority of the patients were recruited from opioid agonist therapy, making this validation study more transferable to other opioid agonist therapy populations. Third, the time intervals between the baseline and the second health assessment and between the second and the third health assessment varied, which could have affected how patients scored the FSS-9 and VAFS.

Conclusion
The present study demonstrates that the FSS-9 plus the VAFS can be shortened to just the FSS-3 among patients with SUDs. We found that the FSS-3 was more consistent in the structure of changes in fatigue levels compared to the FSS-9 plus the VAFS. The FSS-3 might thus be useful for measuring fatigue in this population.

Declarations
Ethics approval and consent to participate The study was reviewed and approved by the Regional Ethical Committee for Health Research (REC) West, Norway (reference number: 2017/51/REK Vest, dated 29.03.2017/20.04.2017). Each patient provided written informed consent prior to enrolling in the study.

Consent for publication
Participants have consented to publication Availability of data and material No additional data are available due to data protection requirements.

Competing interests
No applicable  Table 1: Baseline characteristics of patients (numbers (n) and percentages (%)). N = 655 Age (years), n (%) 18 Table 2 displays descriptive information of the FSS-9 and VAFS at baseline. The FSS-9 was answered on a Likert scale from 1 (no fatigue) to 7 (worst fatigue). The VAFS was answered by placing a mark on a line from 0 (no fatigue) to 10 (worst fatigue)). The items in bold represent the FSS-3 items.   Legend: Cov i,s , Covariance between intercept and slope; ICC: Intraclass Correlation Coefficient of between on total variance; I: Intercept (baseline); S: Slope (change); σ 2 : variance (residual, intercept and slope); σ 2 i,s : Covariance between intercept and slope * P < .05 ** P < .01 *** P < .001. a FSS-9: Nine-item Fatigue Severity Scale; b FSS-3: Three-item Fatigue Severity Scale; sv Items included in the FSS-3; The table shows the linear mixed model analyses, including the fixed effects, the random effects, and the ICC of the FSS-3, FSS-9, VAFS, and the single items of the FSS-9. The random effects show the residual, the slope, and the covariance between the intercept and slope.