Dimensionality of the Pittsburgh Sleep Quality Index: a systematic review

Background The Pittsburgh Sleep Quality Index (PSQI) dimensionality is much debated, with the greatest number of reported factor structures. Therefore, this review appraised the methodologies of studies investigating the factor structure of the PSQI. Material and methods MEDLINE, PsycInfo, AJOL, BASE, Cochrane Library, Directory of Open Access Journals (Lund University), CINAHL, and Embase were searched systematically to include articles published till 23rd March, 2018. The articles with the objective of factor analysis of the PSQI (20 articles) or with a major section on the same subject (25 articles) were included. There was no limitation about participant characteristics. Descriptive analysis of articles for measures of the suitability of the data for factor analysis, details of the exploratory factor analysis (EFA) and details of the confirmatory factor analysis (CFA) was performed. Results The analysis used by the majority did not employ the simplest scheme for interpreting the observed data: the parsimony principle. Other shortcomings included under- or non-reporting of sample adequacy measures (11 out of 45 articles), non-use of EFA (20 out of 45 articles), use of EFA without relevant details, non-use of CFA (11 out of 45 articles), and use of CFA without relevant details. Overall, 31 out of 45 articles did not use either EFA or CFA. Conclusion We conclude that the various PSQI factor structures for standard sleep assessment in research and clinical settings may need further validation. Trial registration Not applicable because this was a review of existing literature.


Background
Population-based epidemiological studies have confirmed that sleep disorders occur frequently in almost every country [1][2][3]. Complaints of disturbed or poor quality sleep are also exceedingly common among patients presenting to all specialties of medicine [4][5][6]. The most common sleep disorders are insomnia, circadian rhythm sleep disorders, obstructive sleep apnea, sleep-disordered breathing, hypersomnia, daytime sleepiness, parasomnias, and restless legs syndrome [4][5][6][7]. Untreated sleep disorders may lead to potentially life-threatening symptoms. It is now recognized that far from being only a consequence of medical illnesses, sleep disorders are often primary drivers of other illnesses. Sleep disturbance is linked to neurocognitive dysfunctions, including attention deficits, impaired cognitive performance, depression, anxiety, stress, and poor impulse control. These disturbances are in turn linked to sympathetic activity changes and an increased risk of cardiovascular and cerebrovascular diseases [4,5,8]. These impairments have wider consequences in patients' lives. Poor sleep severely impairs daytime performance, both socially and at work, and increases the risk of occupational and automobile accidents, poor quality of life, and poor overall health [4,5,[9][10][11].

Role of subjective measurement
The ever-increasing list of problems known to be caused by sleep dysfunction has led to recognition that poor sleep has a complex relationship with overall health. It is now appreciated that disturbed sleep interacts bi-directionally with numerous neurological, physiological, psychological, and behavioral factors [4,[12][13][14]. The central role of sleep in overall health has thus underscored the need for both reliable, validated subjective tools and objective polysomnographic (PSG) assessment in modern medical practice. While these represent very different diagnostic approaches, they are nevertheless complementary in as much as subjective tools account for psychological and behavioral manifestations not assessed by PSG. Self-rating questionnaires such as the Pittsburgh Sleep Quality Index (PSQI) have an important role in sleep health assessment in both clinical and research settings [4,15,16]. These questionnaires have the advantages of cost effectiveness, high patient compliance, and ease of administration. Perhaps more importantly, since such questionnaires are selfexplanatory and do not require supervision, they reduce demand on medical specialists' time [5]. Given the important diagnostic role of rating scale questionnaires, it is essential that their reliability and validity be established beyond doubt. A key element of this quality assurance is psychometric confirmation of the questionnaires' dimensionality, i.e., whether the questionnaire's items are all correlated and representative of factors affecting sleep quality [4,15]. This review critically appraises the evidence for dimensionality of one of the most widely used self-rating instruments of sleep quality, the PSQI [4,15,17].

Pittsburgh Sleep Quality Index
The PSQI is the most widely used sleep health assessment tool in both clinical and non-clinical populations. The original 1989 article describing the Index has, since 26-06-2015, had 1545, 7863, 4962, and 4554 citations on PMC, Google Scholar, ResearchGate, and Web of Science, respectively. It is also possibly the most widely translated sleep questionnaire. The PSQI consists of 24 questions or items to be rated (0-3 for 20 items while 4 items are open-ended), 19 of which are self-reported and 5 of which require secondary feedback from a room or bed partner. Only the self-reported items (15 rated as 0-3 while 4 open-ended) are used for quantitative evaluation of sleep quality as perceived by the patient. The open-ended items are also finally scored as structured categorical values (rated at 0-3) as per the range of values reported for them by the patient. These 19 selfreported items are used to generate categorical scores representing the PSQI's 7 components. The individual component scores each assess a specific feature of sleep. Finally, the scores for each component are summed to get a total score, also termed the global score (range: 0 to 21). This score provides an efficient summary of the respondent's sleep experience and quality for the past month [12].

Validation and reliability measures of the Pittsburgh Sleep Quality Index
The PSQI is possibly the most rigorously validated tool used in sleep diagnostics [4,5,[15][16][17]. Of the many psychometric studies carried out on the PSQI, 75% have reported an internal consistency in the ideal range for within-and between-group comparisons but not for comparisons made between questionnaires for individual patients [4]. Mollayeva et al. [4] performed a meta-analysis and found strong evidence for the PSQI's reliability and validity. Further, the meta-analysis revealed a moderately positive evidence for the questionnaire's structural validity across a variety of samples. The PSQI was found to have known-group validity, and, while some studies showed methodological weaknesses in this regard, its convergent and divergent validity were generally confirmed.

Factor analysis
A tool's dimensionality is evaluated by factor analysis. Factor analysis attempts to discover patterns in a set of variables based on shared variance [18]. A key goal of this analysis is identifying the simplest and most parsimonious means for interpreting and representing observed data [19]. More specifically, the procedure seeks to use measured variables to infer the smallest number of unobserved or latent variables that can still account for the observable variables [20]. The mathematical operations are broadly categorized into 2 sub-groups: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). EFA aims to find the smallest number of common latent factors that can account for the correlations [21]. CFA is then employed to test the relationship between the observed variables and their underlying latent factors [15]. Factor analysis is useful for studies involving many variables that can be reduced to a smaller set, such as questionnaire items or a battery of tests. The goal of this process is identifying the concepts underlying a phenomenon and thus facilitating interpretations.

Dimensionality of the PSQI
The PSQI's dimensionality is much debated, with many studies supporting multiple factors and some supporting unidimensionality [4,[15][16][17]. Among sleep diagnostic tools, the PSQI has the greatest number of reported factor structures [15]. The intensity of discussion around the topic of the dimensionality of the PSQI is reflected in the publication of 45 articles on the subject since 2006 [15]. As the PSQI components are structured categorical variables scored 0-3, therefore the factor analysis should ideally begin with a polychoric correlation matrix. However, most of the programs do use a Pearson correlation matrix. It may be one of the reasons for the discrepancy among studies. Some evidence suggests that some studies may have over-factored the PSQI [15]. Several reviews have concluded that many previous efforts to investigate the PSQI's factor structure have suffered from nonparsimonious methodologies [4,15,17,. Given a choice between close fit and parsimony (i.e., model with fewer latent factors), the latter may be preferred [47]. Manzar et al. [15] used an innovative strategy of performing comparative CFA of all the documented PSQI models on a discrete sample to disprove the questionnaire' soft-reported multidimensionality and heterogeneity. However, the study had the important limitation of being unable to address inter-software, inter-sample, and inter-model differences [15]. Mollayeva et al. [4] mentioned procedural discrepancies in the studies investigating the PSQI factor structures without providing further details. Approximately 30 distinct PSQI models have been proposed in the literature. Of these models, 7 were 1-factor, 17 were 2-factor, 4 were 3-factor, 1 was 4-factor and 2 were second order models [15-17, 22-46, 48-53]. The current state of the literature, with its broad range of suggested factor structure models, represents an impediment to an efficient consideration of the PSQI's use. There evidently exists a need for a thorough appraisal of the procedural details and application of standard practices in the previous methodological studies of the PSQI. Such an investigation is indispensable for streamlining the debate about the PSQI's heterogeneity.

Practical implication of the heterogeneity of the Pittsburgh Sleep Quality Index
One consequence of the PSQI's presumed heterogeneity is the possible attenuation of its practical application in clinical diagnostics [15]. A questionnaire's dimensionality directly affects the reporting of its intended measures. Currently, however, very few efforts have been undertaken to validate the PSQI's disparate models in either research or clinical settings. This is possibly related to the choice of the appropriate PSQI model for a particular sample. Previous attempts by Hancock and Larner [54] and Yurcheshen et al. [55] to test the disparate PSQI models did not adequately address the reason(s) for the specific model's selection. In fact, both studies used a 3-factor PSQI model initially reported to be valid in a different population [22,[54][55][56]. Such reports using unrelated PSQI models will complicate inter-study comparisons for the PSQI-based measures. The goal of the present systematic review is to help develop strategies for managing the methodological discrepancies in the PSQI factor analysis and reporting of the PSQI-based sleep assessment. An additional goal is to provide possible guidelines for factor analyses of questionnaires in general and sleep inventories in particular.

Literature search scheme
All articles available online on 23-03-2018 were included. The comprehensive search strategy was planned in consultation with epidemiological experts, information technologists, and sleep scientists. We searched 8 electronic databases: CINAHL, Bielefeld University (BASE: Bielefeld Academic Search Engine), Cochrane Library, Directory of Open Access Journals (Lund university), Embase, Medline, PsycInfo, and African Journals On Line (AJOL).
To minimize inclusion of irrelevant articles, we searched for a combination of 2 keywords (Pittsburgh Sleep Quality Index/PSQI with dimensionality/dimension/factor model/ factor analysis/factor structure/domain/Exploratory factor analysis/EFA/Confirmatory factor analysis/CFA). Seventyeight articles were initially identified ( Fig. 1). Thirty four articles; 30 duplicates, 3 with reasons (Factor analysis details were missing) and 1 for unavailability of full-length article were removed.

Selection criteria
Forty five full-length peer-reviewed articles were used. Forty three articles were in English, and 1 each was in Spanish and Chinese. We e-mailed the lead author of the Spanish article an English translation of the section covering factor analysis. It was included after gaining the author's approval. The lead author of the Chinese article provided translation of the factor analysis section, therefore it was also included. The articles' reference lists were thoroughly reviewed for other relevant publications. There were no restrictions on the type or age range of the population covered. We only included articles that had a primary objective of exploring and/or confirming dimensionality (20 studies) and articles that reported multiple indices of psychometric properties with a substantial section devoted to factor analysis (25 studies) (Fig. 1).

Data extraction
The measures used to present the factor analysis findings were grouped in three broad categories; measures of the suitability of the data for factor analysis ( Table 1), summary of the exploratory factor analysis conditions ( Table 2) and summary of the confirmatory factor analysis conditions (Table 3). Descriptive analysis of articles for measures of these three categories was performed. Meta-analysis was not conducted as the included studies were heterogeneous in methods and statistical analyses used.

Discussion
Sample description, sample size, and measures of the suitability of the data for factor analysis The gradual development of a heterogeneous multiple factor structure of the PSQI has often been defended by the complexities of sleep problems among diverse samples. However, there is no consensus about this assertion that complexities of sleep problems in diverse samples must result in multiple factor structure [15]. Moreover, this speculative presumption conveniently ignores to explain why the measured variables, i.e. individual items of the PSQI and the PSQI component scores cannot account for this complexity.
The appropriate sample size for factor analysis is a frequently debated topic among statisticians. There are disparate guidelines [72][73][74]. There are also different opinions on such issues as sample to variable ratio (N:p ratio) criteria [72,75], the factorability of the correlation matrix [76,77], use of the KMO/Bartlett's Test of Sphericity [76,78], and use of the determinant of the matrix and anti-image or diagonal element of the anticorrelation matrix [72]. The suitable data for factor analysis and replicable factor extraction may require large samples and the satisfaction of a number of conditions as determined by such measures as the KMO, Bartlett's test, the determinant of the matrix, the anti-image of the anti-correlation matrix, and inter-component correlations [79]. A non-zero determinant of the matrix indicates the absence of multi-collinearity, meaning that linear combinations of items can form factors [80,81].

Exploratory factor analysis
The non-reporting of EFA results is fundamentally contrary to recommended norms for factor analysis, a deficiency that is particularly important considering the debate about the number and patterns of common factors for the PSQI [4,15,17,82]. Although the choice of extraction types for performing EFA is much-debated, though some prefer the use of principal axes for initial solutions [72]. The choice of the extraction method (principal axis or principal factors) may depend on the underlying data and the assumptions [60]. Many studies failed to report the final extraction method used in the EFA (Table 2). Four studies reported using MLE for the final extraction [17,22,24,41], but 3 of these did not report the normality and/or skewness of the distribution of data being analyzed [22,24,41]. The extracted factors' applicability seems unclear because MLE entails multivariate normality [83]. Two studies reported using the principal factors method and principal component factor analysis, the authors might have meant principal axis method and PCA, respectively [43,46]. Under these circumstances, it is unsurprising that most of the studies did not explain the types of extraction used, plus most of the studies did not explain the choice of rotation.
Factor rotation increases interpretability by optimizing a simple structure with a distinct cluster of interrelated variables loading on the least number of latent variables [80]. Oblique rotations are better suited to accounting for the inter-relationships in the clinical data. They can be used even when the factors are not significantly correlated [81]. However, the use of rotation methods in the PSQI factor analysis studies is inconsistent. Of the studies reporting rotation methods, similar numbers used orthogonal and oblique rotations ( Table 2). Some of the studies using orthogonal rotation did report the correlation value of the extracted factors [28,30,41,43,44]. The reported factor correlations were in the range of 0. 1-0.9 [39,41,44]. Therefore, the factor correlation values of the various PSQI models do not seem to support the choice of orthogonal rotation methods.
There are many criteria for determining the number of factors to be retained from EFA. These include the Cattell's Scree test, Kaiser Criterion of Eigenvalue greater than one, the percentage of cumulative variance explained, and robust measures such as Horn's Parallel analysis, the Broken-Stick (B-S) criterion, and the minimum average partial (MAP) test [72,84]. These tests have many limitations, and more so for the first three tests mentioned earlier. Therefore, the consensus opinion is to employ multiple criteria [72,84]. It is perhaps concerning that only approximately one-third of the PSQI factor analysis studies used multiple criteria, and none used multiple robust measures (Table 2) [84]. The B-S criterion and MAP test were not used by any of the studies exploring the PSQI's factor structure. The communality accounts for the variance of the common factors. Factor analysis aims to explain variance through common factors. Therefore communalities less than 0. 2 are removed [80]. However, communality criteria were frequently under-reported in the studies investigating the PSQI's factor structure ( Table 2). These inconsistencies and discrepancies might explain the variation in the number of factors retained after EFA (Table 2) [4,15,17].

Confirmatory factor analysis
For finding prospective models and validation of the dimensionality of a questionnaire tool in discrete populations, it is recommended that factor analysis studies use both EFA and CFA [80]. More than 68% of the studies investigating the PSQI's factor structure employed either EFA or CFA. Some of the PSQI models are based only on EFA [25,27,28,31,39,43], while some are based only on CFA [15, 23, 26, 29, 32, 34-36, 38, 42], neither of which is the recommended practice for performing factor analysis [85].
Another issue is the influence of user software. The software packages used to perform CFA (LISREL, Mplus, SAS, STATA, Amos, and EQS) differ with regard to estimation; path diagrams; availability of standard errors for standardized estimates, factor covariance, and factor correlations; availability of modification indices; and ability to handle different types (i.e., continuous and categorical) of measured and latent variables [68]. However, the fact that studies investigating the PSQI's factor structure used different software for CFA should not affect the results, as there are only slight differences in the statistics reported by the various programs, but the solutions are comparable [15]. LISREL, Mplus, SAS and STATA can handle the PSQI component scores, which are ordered as categorical variables, using diagonally weighted least squares estimation methods. Amos cannot accurately estimate models because it treats the PSQI component scores as measured variables. This is especially true if the PSQI component scores' distributions are characterized by skewness and kurtosis [17,68]. However, Amos allows model estimation using MLE with bootstrapping to smooth non-normality with standardized estimates of factor loading [86]. Non-reporting of distribution characteristics is a common problem with the PSQI factor analysis studies. Further, some studies using SPSS with Amos did not describe their extraction and bootstrapping methods [24,40]. More than a quarter of the studies (i.e.13 out of 34 studies that used CFA) failed to report their extraction methods (Table 3). It is therefore difficult to reach a conclusion about the applicability of these studies' results. Modification indices should be used discretely to avoid over-capitalization on sample specific variations. It may be better to validate the modification index incorporated models on unrelated samples [87]. Few studies reported using the modification index, and they did not explain the choice of the type of modification index [23,29,32,36,42].
Inter-factor correlation of 0.85 and above arises from multicollinearity and indicate poor discriminant validity [88]. The reported correlation coefficients between CFA model factors were as high as 0.89, 0.9, and 1.0 [33,41]. This is technically undesirable because correlation coefficients greater than 0.9 suggest that the 2 correlated factors might not be practically distinct. Instead, the items loading on them might load on a common factor [17]. Jomeen and Martin [26] did not report inter-factor correlations in their final model. Moreover, they failed to report the factor loadings (Table 3). It is therefore difficult to reach a conclusion about their model's parsimony.
Low loadings for some of the PSQI components' scores (i.e., medicine component and sleep quality component) in some studies might reflect a reduced sensitivity of the questionnaire items measuring them [23,36,38,40]. Tomfohr et al. [36] reported only the intercomponent correlations as a sample size adequacy measure, did not use EFA, and did not provide details regarding the modification index. Among all the studies, Dudysova et al. [66] had the smallest sample size at 105. They did not report their EFA findings, nor did they provide information regarding suitability of the data for factor analysis, such as the KMO test, Bartlett's test, determinant score, nor anti-image matrix. Similarly, Skouteris et al. [23] did not report their findings regarding EFA or sample size adequacy measures. They also did not report the CFA extraction method. Lequerica et al. [40] did not report any sample size measures or the CFA extraction method. The study used Amos without reporting normality conditions or bootstrapping.
Methodological discrepancies between these studies might have affected their results and the reliability of their findings. The model fit indices were streamlined with regard to number, types, and limit values (Table 3). Almost all the reviewed studies used multiclass model fits, which is consistent with generally accepted guidelines for factor analysis [89]. Gelaye et al. reported using 4 model fit indices in their study but mentioned the cut-off criteria of only 3. A model fit was presented for a 2factor solution, though the EFA supported a 3-factor model [17,44]. It is also concerning that in almost all the studies, the basic parsimony requirements for factor analysis were not upheld [15,17]. It is worth noting that the recommended practice for factor analysis gives preference to parsimonious models over multidimensional models if differences are irreconcilable [47]. Therefore, the non-application of parsimony, together withother procedural discrepancies, has made it difficult to endorse the applicability of the various PSQI factor structures, even in similar samples.

Practice points for future
The studies investigating factor analysis of a questionnaire should employ both EFA and CFA. The reporting of details of sample suitability for factor analysis is preferable. This gives supporting evidence about distribution, levels of multicollinearity, singularity, and shared variance among measured variables. The details of EFA like extraction methods, rotation and factor retention should be reported along with their justification. The reporting of CFA like extraction methods and modification indices is preferable along with their justification. It is preferable to employ multiple goodness of fit indices from different categories.

Limitations
This review has some limitations. We did not perform a meta-analysis, but the discrepancies made that almost impractical. The studies' methodological qualities were not assessed, but such approaches have their own demerits [1]. We mostly reviewed articles published in the English language; with only 2 non-English articles included after their authors approved/provided a translation of the factor analysis sections [27]. Some authors did not respond to the queries regarding details of the factor analysis in their study. The authors of the other included articles were not contacted. Model fit indices were not discussed in detail because the studies were methodologically sound in this regard. Interested readers are referred Cheung and Rensvold [90].

Conclusion
The results of this review do not permit an optimistic conclusion regarding the applicability of factor analysis studies on this widely used questionnaire. The generalizations from the majority are severely limited by issues including non-application of parsimony, non-use of EFA or non-reporting of relevant details, and non-use of CFA or non-reporting of relevant details. The generalizations from studies using small size may be difficult. Furthermore, under-or non-reporting of sample adequacy measures "and" non-reporting of relevant details make understanding the diversity of factor structures difficult to interpret. In summary, the factor analysis may not be replicable across different methodologies. The structured categorical data of the PSQI may be sensitive to the specific model (method of extraction) being applied.
Therefore, the applicability of the various PSQI factor structures even in related samples seems doubtful.