Validation and Shortening Questionnaires Used in Surveys and Trials on Populations with Substance use Disorders: A Method using the Fatigue Severity Scale


 BackgroundPrecise and valid measurements of health outcomes and exposures among people with substance use disorders (SUDs) is essential to improve health services and health within this group. Unfortunately, many of the standardized questionnaires are validated on other populations and are often too comprehensive, and insufficiently adapted to the group. This may introduce limitations on several aspects that can be assessed but also biases due to research participation fatigue. New methods are needed to validate shortened and adapted questionnaires for this population. This study aims to present a method on how to shorten validated questionnaires and assure a construct validity when applied to SUD populations. MethodsWe used the data from a nested cohort with responses on a nine-item Fatigue Severity Scale (FSS-9), and Visual Analog Fatigue Scale (VAFS) collected from 655 people with SUD in Bergen and Stavanger, Norway, in the period 2016-2020. A total of 225 people filled out FSS-9 and VAFS at least twice. We defined baseline as the first measurement of FSS and VAFS when the measurements were sorted chronologically per participant. A three-step method was used for validation and shortening: Checking of internal consistency; longitudinal confirmatory factor analysis; linear mixed model analysis.ResultsThe internal consistency of FSS-9 was excellent with a Cronbach’s coefficient α on 0.94 at baseline and 0.92 at the second measurement. When shortening FSS-9 to a three-item FSS (FSS-3) (items 5-7), the Cronbach’s coefficient α was 0.87 at baseline and 0.85 at the second measurement. The internal consistency was not affected when VAFS was added to FSS-9 and FSS-3, respectively. The longitudinal confirmatory factor analysis model between baseline and second measurement showed a well-fitted model for FSS-3 (χ2 = 13.33, degree of freedom = 8, P = 0.101) with a correlation on r = 0.52, P < 0.001 between the measurements. The linear mixed model analysis showed equal linear changes at individual level for FSS-3 (slope: 0.00, P > 0.05) and FSS-9 (slope: 0.01, P > 0.05) between the measurements.ConclusionThe ten items could be shortened to a three-item version with excellent validity and reliability. This method could be useful for validating and shortening other questionnaires among patients with SUD and other populations.


Background
Substance use disorder (SUD) is a chronic relapsing disorder affecting two percent of the population worldwide [1]. Extensive medical and psychosocial health challenges, including temporary living situations, injecting drug use, unemployment, comorbid psychiatric and physical diseases, typically cause reduced life expectancy, great suffering, and a high burden of diseases in this population [1,2]. These challenges may raise subjective health complaints impairing quality of life [1][2][3][4]. Therefore, high-quality research on these marginalized people is essential to get more precise evidence of SUD epidemiology and its treatment effects. Unfortunately, the vast majority of the standardized questionnaires are validated on other populations. Additionally, the questionnaires are often too comprehensive, particularly when combined with other questionnaires assessing other health facets, causing discontinuations and biased responses from people [5]. This may reduce the quality of studies conducted on populations with SUD. To get validated and shorter questionnaires for this population, standardized and systematic methods are needed to ensure well-tted questionnaires and facilitate reliable results from people with SUD.
Fatigue is a common subjective health complaint involving a persistent and overwhelming feeling of exhaustion and loss of energy and is frequently reported in people with chronic diseases [6][7][8][9][10]. Different questionnaires to measure fatigue are developed with high validity and reliability. The nine-item Fatigue Severity Scale (FSS-9) and visual analog fatigue scale (VAFS) are two of the most commonly used questionnaires [11,12]. The FSS is recognized by the items that are graded as a Likert scale from one to seven points for quantifying the level of fatigue [12]. Initially, the FSS was developed to evaluate fatigue in people with multiple sclerosis and systemic lupus erythematosus [11]. Later, the FSS-9 was showed to have excellent reliability and validity also for people with other chronic diseases such as hepatitis C virus infection [13], Parkinson's disease [14], stroke [15], and myasthenia gravis [16]. For these chronic diseases, the scores were homogeneous across items when validated FSS-9, and the level of fatigue did not change substantially from one item to another through the questionnaire. Furthermore, VAFS is a 10 cm horizontal line ranged from zero to 10 that estimates the level of fatigue the last week, and was frequently correlated to the level of fatigue reported by FSS [15,17,18].
The FSS-9 has recently been a subject of being shortened in validation studies due to homogenous internal consistency across different study populations [19,20]. For instance, a seven-item FSS was validated among people with human immunode ciency virus (HIV) and stroke, showing better psychometric properties of fatigue compared to FSS-9 [19,21]. To obtain high completion rates of questionnaires and ensure high construct validity and collaboration among people with SUD, the FSS is applicable for validation and shortening for this population.
Thus, this cohort study aims to present a method for shortening validated questionnaires and assess the longitudinal construct validity of people in need of shortened versions of measurement scales. In this case, a systematic three-step method including internal consistency, complementary factor analysis and linear mixed model analysis for shortening and validating the nine-item Fatigue Severity Scale (FSS-9) and the Visual Analog Fatigue Scale (VAFS) is presented by using data from people with SUD.

Data source
We drew data from a nested cohort from the INTRO-HCV trial collecting data on people with SUDs in Bergen and Stavanger, Norway [22]. We initiated data collection in May 2017 and recruited people receiving on opioid agonist treatment in Bergen and Stavanger or people with SUDs receiving healthcare from the primary health clinics in the municipality of Bergen. This study included all people that had lled out VAFS or FSS in the cohort in the study period from May 2017 to January 2020.

Data collections
All included people were assessed yearly with a questionnaire that included fatigue symptoms and the sociodemographic situation. We collected all data in the health register by using an electronic data collection software (Checkware®) under supervision from research nurses.

Study population
We included 917 measurements of FSS-9 or VAFS from 655 people in the study period. We de ned measurement as when at least one of the items in the questionnaires were lled out in the annual health assessment. Of the measurements, 916 FSS-9 and VAFS were completely lled out, while one participant only conducted VAFS. Of 655 people included, 188 and 37 people lled out FSS-9 and VAFS twice and thrice, respectively. Of 37 people having three measurements, we included the second or third measurement that was closest to the date for one-year after the baseline date for creating a substantially similar time interval between measurements to each participant when calculating internal consistency and con rmatory factor analysis (mean: 12 months (Standard deviation: 4 months), see Additional le 1).

Measuring fatigue
We used the nine-item FSS and VAFS to measure the level of fatigue in the annual health assessment. The FSS measured the extent of fatigue the last week, including mental and physical functioning, motivation, carrying out duties, and interfering with work, family, or social life. VAFS measured the people's general experience of fatigue. VAFS is correlated to FSS-9 among various chronic diseases [15], including people with chronic hepatitis C virus infection [18]. VAFS is simpler to ll out compared to FSS-9 and may be used as a supplement to FSS-9 if high reliability and validity. A high score on items of FSS or VAFS indicated high levels of fatigue, respectively. The data collection software only allowed valid responses to each question and prompted empty questions before submission of the questionnaire to minimize missing data. People who only lled out FSS or VAFS were included. In the past, the FSS has been translated and back-translated from the US-English version into Norwegian by a quali ed native Norwegianspeaking translator (Additional Table 2) [23].

Statistical analysis
We used Stata 16.0 for descriptive analysis (mean, standard deviation, and frequency analysis), and IBM SPSS version 24.0 and Mplus version 8.4 for reliability analysis (Cronbach's α and α if item deleted), con rmatory factor analysis, and linear mixed model analysis (Mplus: TwoLevel analysis). The threshold for statistical signi cance was set to P < 0.05 for all analyses unless stated otherwise. The shortening and validation analyses of FSS and VAFS are conducted in three steps. 2.5.1.
Step 1: Internal consistency of FSS-9 and a shortened FSS-9 version with added VAFS We calculated internal consistency for all items in the FSS-9 and VAFS. A shortened version of FSS-9 was constructed by removing items from FSS-9 based on clinical judgment and internal consistency. The purpose of including clinical judgment was to emphasize simplicity and intelligibility and to remove di cult readable and abstract items that could be di cult to comprehend for people with SUD. Internal consistency was calculated by removing one by one item from the scale by using Cronbach's α. A very good internal consistency was de ned if Cronbach's α exceeds an accepted cut-off of 0.80 [24].

2.5.2.
Step 2: Longitudinal con rmatory factor analysis for evaluating the t of FSS-9 and the shortened FSS-9 version with added VAFS.
We used con rmatory factor analysis to test the structure of the items in FSS-9 to evaluate relationships between the items and their underlying latent factors. In addition, the relationships between FSS-9 factors and VAFS were evaluated regarding construct overlap. The con rmatory factor analysis models were based on FSS-9 and the shortened version of FSS-9 alone at baseline and second measurement. In addition, the FSS factor models were combined with the covariance with VAFS to explore how VAFS affected the model. Further, we conducted four longitudinal con rmatory factor analysis models for the shortened version of FSS-9 by using longitudinal data at baseline and second measurement. First, we estimated a free longitudinal con rmatory factor analysis with all unique parameter values. We then tested for constraints in the model by setting the factor loadings within each item equal at baseline and second measurement. Third, we constructed a model by setting the residuals at baseline and second measurement equal within each indicator. The last model constrained the intercept values in the same indicators at baseline and second measurement. We used the Wald test to compare model restrictions. All con rmatory factor analysis models were evaluated with standard t measures: χ 2 , degrees of freedom, p-values, Comparative t index, Tucker Lewis Index, Root Mean Square Error of Approximation with 90% con dence interval, and the probability of close t. A well-tted model should be a statistically non-signi cant χ 2 , values of Comparative t index and Tucker Lewis Index above 0.95, and Root Mean Square Error of Approximation, preferably below 0.05 (close t) [25]. Root Mean Square Error of Approximation above 0.10 is considered as a poor tted model. We used the modi cation index to explore model improvements if the goodness of t measures indicated a poor tted model. This is a chi-square difference if a constrained parameter is estimated. Thus, the model is improved at the cost of one degree of freedom. We analyzed all variables as continuous variables due to the relatively high number of categories (FSS items ranged from one to seven, VAFS ranged from zero to 10) in the ordinal variables.  [26]. If we found the covariance between the intercept and the slope variance to be statistically non-signi cant, these were constrained, and the model re-estimated as a random intercept xed slope model. We used the Mplus Maximum Likelihood Robust estimator, correcting standard errors for potential deviation from normality [27]. In addition, interclass correlations were estimated. We used full information maximum likelihood in order to use all available measurements.
Full information maximum likelihood assumes missing at random [28].

Ethics approval and consent to participate
The study is reviewed and approved by the

Participant characteristics at baseline
Seventy percent were male, and the mean age was 43 years (Table 1). Half had primary school as the highest level of education. Eighty-two percent received opioid agonist treatment, while 42 percent had injected drugs in the past 30 days. The mean values of FSS-9 items varied from 4.43 to 5.38 at baseline (Table 2). For VAFS, the mean value was 5.19 at baseline. Variables were slightly left-skewed (skewness ranged from − 1.14 to -0.31) and tended to a attened distribution (kurtosis ranged from − 1.24 to -0.09).  loadings and equal residuals between the baseline and second measurement (Fig. 1). The correlation between the models at baseline and the second measurement was r = 0.52, P < 0.001. A small reduction in model t was seen if intercept were constrained equal within each measured item between measurements, but this model was also well-tted.  The table displays the con rmatory factor analysis results presenting explained variance and model t with different fatigue score versions including 9-item (FSS-9), 10-item (FSS-9 + VAFS), 3-item (FSS-3) and 4-item (FSS-3 + VAFS). Both are presented at baseline and a second measurement with a median interval of 11 months. In addition, longitudinal con rmatory factor analysis of FSS-3, including two measuring point (Time 1-2), baseline, and second measurement, was presented.  The table displays the con rmatory factor analysis results presenting explained variance and model t with different fatigue score versions including 9-item (FSS-9), 10-item (FSS-9 + VAFS), 3-item (FSS-3) and 4-item (FSS-3 + VAFS). Both are presented at baseline and a second measurement with a median interval of 11 months. In addition, longitudinal con rmatory factor analysis of FSS-3, including two measuring point (Time 1-2), baseline, and second measurement, was presented.

Step 3: Linear mixed model analysis for evaluating changes in FSS-9 and FSS-3 with added VAFS
The linear mixed model analysis showed considerable within individual clustering for the FSS-9 and FSS-3 with added VAFS, and separate items (Table 4). However, the intraclass correlation coe cient estimated variations from 0.18 to 0.52, which showed more change from baseline, and over the second, and third measurements in some variables than in others. The results showed the random slope and covariance between intercept and slope to be statistically non-signi cant for all models. This indicated an equal linear change between the measurement occasions at the individual level. These re-estimated linear random intercept and xed slope models showed a small increase in the items one to four, while no similar change was found in the other items. A mean change was also found in the VAFS variable, but nor this variable showed individual differences in change. Table 4 Linear mixed model analysis results of the level and linear change in Fatigue Severity scale (FSS-9, FSS-3, and single items) and Visual analog Fatigue Scale (VAFS). The table displays a linear mixed model analysis of FSS-3, FSS-9, VAFS, and single items, including xed and random effects and ICC among people with substance use disorder. The random-effects present the slope and covariance between intercept and slope level and change.

Discussion
This study has shown how a three-step model consisting of internal consistency, longitudinal con rmatory factor analysis, and mixed model analysis may be useful for the shortening of validated questionnaires while retaining most of the included variance and validity when having longitudinal data. Questionnaires that easily comprehend with few items are essential to ensure high completion rates among people with SUDs. Our ndings showed that the full-scaled FSS-9 had excellent internal reliability when tested empirically by using internal consistency. However, a three-item FSS did not reduce substantially the internal consistency compared to the FSS-9 with or without VAFS. The results from the longitudinal con rmatory factor analysis of FSS-3 showed a well-tted unidimensional model with equal factor loadings and equal residuals between baseline and second measurement. The FSS-9 and FSS-3 were almost perfectly correlated. The mixed model analysis showed that even though the level of fatigue varied, the scoring structure was substantially stable and equal between the measurements.
The reliability analyses showed high internal consistency of FSS-9 and FSS-3, and the internal consistency dropped minimally from 0.94 to 0.87 when reducing the number of items from ten (FSS-9 + VAFS) to three (FSS-3). A homogenous and substantial equal internal consistency was also found in studies evaluating FSS-9 in other chronic diseases such as stroke [15], hepatitis C virus infection [13], and multiple sclerosis [29]. A shortened FSS on seven items has also been validated on patients with systemic lupus erythematosus, HIV, or stroke to have minimal change on reliability compared to the FSS-9 [19,21,30]. Unlike other populations, people with SUDs may have a broad specter of mental and physical diseases that could interfere with the people's experience of fatigue from measurement to another [31,32]. The majority of people included in this study were marginalized, having more than one SUD, and more than 40% had injected drugs in the past 30 days. Consequently, people could be affected by centrally acting stimulants or sedatives taken immediately before the questionnaires were lled out, or some people could have substantial changes in their social conditions such as needing to move from housing to living on the street or changes in their income, which may impact the way they responded to fatigue. These psychosocial and medical conditions made the FSS-3 appropriate for ensuring credible results of items that were well-comprehended for people and considered whether the levels of functioning or selfperceived sense of fatigue were affected.
The con rmatory factor analyses demonstrated a unidimensional model at baseline and the second measurement for FSS-9 that was improved by adding residual covariance between items two and three. This pointed towards that the responses in these two items not adequately were captured by the one-factor model. The explanations could be that the Norwegian version had substantially similar wording of items two and three (Supplementary table 2), as well as the order of items in the questionnaire that affected people's perception of them. Unidimensional factor models are frequently reported when validated FSS on other populations, including the general populations [16,23,30]. However, previous studies have noted that multidimensional models in con rmatory factor analysis could be missed due to small study populations [16,30]. Our ndings on a large population on people with SUDs con rmed that one single factor for analyzing FSS-3 matched the observations adequately. Nevertheless, residual covariance between items two and three was of importance for maintaining a well-tted unidimensional model in FSS-9.
The linear mixed model analysis showed that four of the items in FSS-9 and the VAFS changed differently between measurements compared with other items. This gives further arguments for better validity of the shortened version, which is assumingly less sensitive for uctuations than the remaining items in FSS-9.
Nine-item FSS may be shortened to FSS-3 without VAFS when evaluating fatigue among people with SUDs. However, FSS-3 may increase the risk of common method bias compared to FSS-9 if the responses to items in a shorter questionnaire are more likely to be accessible in short-term memory and recalled when responding to other items [33]. Previous studies have validated a shortened FSS-9 on seven items across people with multiple sclerosis, stroke, and Human immunode ciency virus (HIV)/acquired immune de ciency syndrome (AIDS) showing high reliability, and however: present cross-sample differences in items three, ve, six, and nine [19,20]. Nevertheless, the items may have varying importance in different populations. Therefore, the construct validity of the FSS-3 may be limited to people with SUDs pending on validation studies that con rmed similar results on other populations.

Strengths and limitations
This study has several strengths. People who were included from the cohort are hard to reach for both research and health care.
The majority of participants had an extensive history of injecting drug use with several drug addictions, and a substantial proportion had an ongoing injecting drug use. We plotted all collected information into the software directly for reducing the risk of data entry errors and ensuring nearly complete data for each measuring point. The data within each interview were also mostly complete with a few missing items.
The study has some limitations. People recruited to this study lled out the FSS and VAFS questionnaires in different mental and physical health conditions. Changes in substance use, medical and psychosocial challenges may impact on signi cant variations within and between people. Further, many participants were recruited from people with opioid dependence who received opioid agonist treatment while others, to a more substantial degree, utilized street drugs. In addition, there was some variation in time between the baseline and the second and third measurements. To adjust for this, those who ful lled FSS and VAFS thrice, only one of the two last measurements that were closest to date one year after the baseline date, were included in the reliability analysis and con rmatory factor analyses. Further, common method bias could affect the shortening from nine to three items in FSS. A shorter questionnaire is more likely to be accessible in short-term memory and recalled when responding to other items [33].

Conclusion
This three-step procedure, including internal consistency, longitudinal con rmatory factor analysis, and linear mixed model analysis, might be useful for shortening validated questionnaires while retaining the included variance and validity in settings with repeated measurements. We found that a three-item questionnaire for fatigue (FSS-3) was more consistent in the structure of changes in fatigue levels compared to a ten-item version (FSS-9 + VAFS) among people with SUDs. This method can be essential in the validation and shortening of homogenous questionnaires for use on people with SUD and other populations.

Declarations
Ethics approval and consent to participate The study is reviewed and approved by the Regional Ethical Committee for Health Research (REC) West, Norway (reference number: 2017/51/REK Vest, dated 29.03.2017/20.04.2017). Each patient provided written informed consent prior to enrolling in the study.

Consent for publication
Participants have consented to publication Availability of data and material No additional data are available due to data protection requirements.
Competing interests