- Open Access
A randomised comparison of a four- and a five-point scale version of the Norwegian Function Assessment Scale
Health and Quality of Life Outcomesvolume 6, Article number: 14 (2008)
There is variation in the number of response alternatives used within health-related questionnaires. This study compared a four-and a five-point scale version of the Norwegian Function Assessment Scale (NFAS) by evaluating data quality, internal consistency and validity.
All inhabitants in seven birth cohorts in the Ullensaker municipality of Norway were approached by means of a postal questionnaire. The NFAS was included as part of The Ullensaker Study 2004. The instrument comprises 39 items derived from the activities/participation component in the International Classification for Functioning, Disabilities and Health (ICF). The sample was computer-randomised to either the four-point or the five-point scale version.
Both versions of the NFAS had acceptable response rates and good data quality and internal consistency. The five-point scale version had better data quality in terms of missing data, end effects at the item and scale level, as well as higher levels of internal consistency. Construct validity was acceptable for both versions, demonstrated by correlations with instruments assessing similar aspects of health and comparisons with groups of individuals known to differ in their functioning according to existing evidence.
Data quality, internal consistency and discriminative validity suggest that the five-point scale version should be used in future applications.
The measurement of functional ability is important in many contexts. While there often seems to be agreement as to the content of instruments for evaluation of function, there is relatively less consensus about the scaling of items. Item scaling vary in the number of response categories, the wording of category options and the use of all-point (where all categories are defined) or end-point (where only end-points are defined) scales [1, 2]. The majority of health status and patient-reported outcome measures use all-point defined scales with between two and seven categories, the most popular being five-point scales including the agree/disagree Likert format. The generic Short Form 36-item (SF-36) Health Survey  uses five-point scales for seven of the eight health scales it includes. Other generic instruments such as the Nottingham Health Profile (NHP)  and EuroQol EQ-5D  use two- and three-point scales respectively. In the WHO Health and Work Performance Questionnaire, functional status is reported using different scales with between four and 11 points .
It has been argued that seven-point response scales are the maximum number that individuals are able to process  and some authors have advocated their use . However, such scales are not widely used possibly because of the difficulty of finding suitable adjectives when seven all-point defined scales are used. Seven categories are also harder to fit across a page of A4 with a reasonably sized typeface. However, if the number of alternatives is less than the rater's ability to discriminate, the result may be a loss of information [2, 9]. There is evidence that the reduction in reliability from ten to seven categories is quite small, but the use of five categories reduces the reliability by about 12 percent . Hence it is argued that the minimum number of categories should be in the region of five to seven . One review concluded that seven plus or minus two appears to be a reasonable range for the optimal number of response alternatives . More recently, it was found that respondents preferences were highest for a ten-point scale followed by seven-point and nine-point scales . The respondents rated scales with five, seven and ten response categories as relatively easy to use. Scales with two, three or four response categories were rated as relatively quick to use, but were unfavourable in terms of the extent to which they allowed the respondents to express their feelings adequately. If a scale does not allow respondents to express themselves, they may become frustrated or demotivated and the quality of their responses may decrease .
Previous research has shown that the greater the number of response options, the more reliable the scale is likely to be . Simulations of categorization error have consistently shown that correlation between true values and scale scores increase with the number of response options . Scales with relatively few response alternatives tend to generate scores with comparatively little variance, thereby limiting the magnitude of correlations with other scales [13, 14]. The reduction in reliability is most severe for scales with four categories or less, but tends to level off once seven or more options are available. However, there is often a trade-off between scale reliability and ease of administration . One study using the NHP indicated that the psychometric performance and patient acceptability was improved by using a five-point scale instead of the original shorter response format .
Following a recent systematic review, it was recommended that future research designs should allocate respondents to different versions of a questionnaire to compare approaches to item scaling . Our study considered two different all-point defined scales using four and five response alternatives. The Norwegian Functional Assessment Scale (NFAS) was included in a large Norwegian population study on musculoskeletal pain, The Ullensaker Study 2004, to obtain self-reported levels of functional ability. Eligible persons were randomised to receive NFAS with the original four-point scale or a five-point scale.
The aim of this study was to compare the original four-point with the new five-point scale version by evaluating validity of the NFAS in a population. This will determine which version should be used in the future applications.
Study setting and sample
Ullensaker is a rural community which had 23,700 inhabitants in 2004. There are no major differences between the population of Ullensaker and the general population of Norway with respect to demographic characteristics . In 2004, postal questionnaires, which included the NFAS along with questions relating to musculoskeletal pain, were sent to all 6108 inhabitants in Ullensaker municipality in the birth cohorts 1918–20, 1928–30, 1938–40, 1948–50, 1958–60, 1968–70 and 1978–80. Reminders were sent at eight weeks.
The sample was computer-randomised by an external company to either the four-point or the five-point scale version, herein referred to as the NFAS-4 and the NFAS-5. The Ullensaker Study questionnaire also included the Dartmouth COOP Functional Health Assessment Charts/WONCA(COOP/WONCA), General Health Questionnaire-20 (GHQ-20), Standardized Nordic Questionnaire, work ability, sickness absenteeism, and occupation.
The Regional Committee for Medical Research Ethics and The Norwegian Data Inspectorate approved the study.
The Norwegian Function Assessment Scale (NFAS)
The Norwegian Function Assessment Scale (NFAS) is a self-report instrument developed by an expert group in social insurance in 2000 and is designed to assess the need for rehabilitation, adjustment of work demands among sick-listed persons as well as the rights to social security benefits . The scale comprises 39 items derived directly from the activities/participation dimension in the International Classification of Functioning, Disability and Health (ICF) . The items are relevant for assessing physical and mental functioning in working life, some relating to activities of daily living. The NFAS starts with the question "Have you had difficulty doing the following activities during the last week?" and respondents report 39 activities using a four-point scale: no difficulty, some difficulty, much difficulty, could not do it. The five all-point defined scale was developed to be more congruent with the qualifiers in the activities/participation dimension of ICF : no difficulty, mild difficulty, moderate difficulty, much difficulty and could not do it.
Based on the results of principal component analysis from the previous study with sick-listed persons , the items form seven domains: Walking/standing (7 items), Holding/picking up things (8 items), Lifting/carrying (6 items), Sitting (3 items), Managing (7 items), Cooperation/communication (6 items), Senses (2 items). These domains have evidence for validity in sick listed persons . The main application of the NFAS is likely to be social insurance. Hence it was decided to keep the domains from the earlier study with sick-listed persons . It should, however, be anticipated that principal component analysis based on data from the general population in Ullensaker will yield somewhat different results. The first four and the last three domains are intuitively grouped into physical and mental domains respectively. Domain scores are calculated by adding the item scores and dividing by the number of items completed. NFAS total scores are calculated by adding all 39 item scores and dividing by the number of items completed. Low scores indicate good functional ability.
COOP/WONCA  is a generic health status measure, where functional status is self-reported with a time frame of the previous two weeks. It comprises six charts: Physical fitness, Feelings, Daily activities, Social activities, Overall health and Change in health. Each chart has five response alternatives with pictorial representations. The present study used an optional Pain chart in place of the Change in health chart.
General Health Questionnaire (GHQ-20)
Psychological distress during the last two weeks was measured by the GHQ-20 , a widely used screening instrument for measuring non-psychotic psychiatric illness in a general population. Items are scored as the original GHQ score in a bi-modal fashion (0-0-1-1) .
Work ability was assessed by one question "To what degree is your ability to perform your ordinary work reduced today: hardly reduced at all, not much reduced, moderately reduced, much reduced and very much reduced" . Respondents were asked to report whether they had experienced any pain or discomfort in ten different body regions during the previous week . Sickness absenteeism was assessed by asking the respondents if they had been sick-listed during the previous year: no, less than 1 week, between 1–8 weeks, more than 8 weeks. Occupation was assessed with the categories: employed, housekeeping/full-time household work, unemployed, medical rehabilitation, disability pension, retired or student.
The two versions of the NFAS were compared for levels of missing data, and floor and ceiling effects, which were expressed as percentages.
Tests of scaling assumptions
Internal consistency was assessed by item-total correlation and Cronbach's alpha. Item-total correlation coefficients should meet 0.40 standard. Cronbach's alpha was considered acceptable for group comparisons when the coefficient exceeded 0.70 . Item discriminant validity was assessed by analyzing correlations between the items and their domains (item-total) and between the items and the other domains (item-other) to see if the former was at least two standard errors higher than the latter, thereby indicating definite scaling success .
We hypothesised that scores from conceptually related domains of NFAS would correlate higher than scores of unrelated domains. We also hypothesised that NFAS scores would correlate higher with conceptually corresponding aspects of the COOP/WONCA, GHQ and Work Ability than with non-corresponding aspects. Correlation coefficients among measures of the same attribute should fall in the midrange of 0.40 – 0.80 .
It was hypothesised that those having a disability pension or rehabilitation benefit due to disease and those reporting being sick-listed previous year, would report lower functional ability. We also compared domain scores between those reporting musculoskeletal pain last week without mental distress (original GHQ score <4) and those with mental distress (original GHQ score ≥ 4) but no musculoskeletal pain. It was hypothesised that females, older persons and persons with shorter education would report lower functional ability than the males, younger persons and persons with longer education. Since data are categorical, non-parametric tests for independent samples were used to compare subgroups.
Of the 6108 questionnaires posted, 3325 (54.4%) were returned. The response rate was lower for males (p < 0.001) and young or very old persons (p < 0.001) (Table 1). The response rates for the two versions were 54.0% for NFAS-4 and 54.8% for NFAS-5. 55 participants in birth cohort 1968–70 randomised to the NFAS-4 were erroneously mailed the NFAS-5 version. Hence, the subsamples differed significantly regarding age (p < 0.05), but not on any other background variables. Excluding the birth cohort 1968–1970 did not affect the results.
For respondents to the NFAS-4 and NFAS-5, there were no missing data for 78.5% and 82.4% respectively. All items had more missing data for the NFAS-4 than NFAS-5 (Table 2). The mean levels of missing data for individual items in the NFAS-4 and NFAS-5 were 3.3% and 2.6% respectively, which was statistically significant (p < 0.01). The same items within both versions had the highest percentage of missing values.
Item responses were skewed towards no difficulty for both versions (Table 2). The percentage of respondents reporting no difficulty for all 39 items was 33.1% in the NFAS-4 and 30.6% in the NFAS-5. In the general the NFAS-4 items had larger floor and ceiling effects than NFAS-5 items; some differences were statistically significant (p < 0.05) (Table 2). The third response alternative in NFAS-4 and the fourth in NFAS-5 had exact the same wording, "much difficulty", but the percentage response was lower in NFAS-5 than in NFAS-4 for 24 items.
All items in both versions met the 0.40 criterion for item-total correlation with the exception of the two items in the "senses" domain in NFAS-4 (Table 3). In all domains, item-total correlation coefficients were higher within the NFAS-5 than within NFAS-4, and this difference was significant for 35 items.
All items, except four in the NFAS-4 and one in the NFAS-5, met the item-discriminant validity criterion. Cronbach's alpha for two of the NFAS-4 and one of the NFAS-5 domains just failed to meet the 0.70 criterion (Table 3). Cronbach's alphas were significantly higher for NFAS-5 across the first six domains and the total score.
For both versions, scores from conceptually related domains of NFAS correlated higher than scores of unrelated domains (Table 4). The NFAS-5 produced the largest correlations between domains and between domains and total scores, which was significant (p < 0.05) for 15 items and four domains.
NFAS scores correlated higher with conceptually corresponding aspects of the COOP/WONCA, GHQ and Work Ability than with non-corresponding aspects for both versions (Table 4). The Sitting and Senses domains had relatively low correlations with these items or scales. The correlation coefficients were similar for the two versions. With only one exception, all the correlations hypothesized as being high, were over 0.40, indicating that the same construct was being measured by the NFAS and the external standard.
Both versions discriminated between persons anticipated to report different levels of functional ability, including persons with disability pension or medical rehabilitation, persons reporting sickness absence, and persons with physical versus mental symptoms (Table 5).
For both versions, a decline in physical functional ability was significantly associated with increasing age (p < 0.05). With one exception, males reported significantly better functional ability (p < 0.001) for both versions. With the exception of the Senses domain for the NFAS-4, a significant education gradient was found for both versions (p < 0.001).
Applying age-stratified analyses, the results for data quality, scaling assumptions and construct validity remained stable.
Both versions demonstrated low levels of missing data and skewed response distribution, but the NFAS-4 had more missing values and larger end effects than NFAS-5. The NFAS-5 demonstrated better internal consistency and item-discriminant validity than the NFAS-4, although the results were acceptable for both versions. All a priori hypotheses were met, which strongly supports the construct validity of the scale for both versions. Both versions discriminated similarly well between groups with different levels of health status and between known groups in the population.
The response rates and the low levels of missing data show that both versions of the NFAS are acceptable to the population. A few items had a high percentage of missing values, which is probably because there was no "not applicable" option. Significantly less missing data for the NFAS-5 than the NFAS-4 is some indication that the respondents found it easier choosing a suitable response from the five-point scale. This finding is supported by Nagata et al. , who compared feasibility of health measurement response scales using four, five and seven categories and a visual analog scale. The level of missing data was least and the responder preference was highest, for the five-point scale version.
Since the NFAS data are skewed towards higher levels of functioning, the larger end effects for NFAS-4 have to be considered when the instrument is used to discriminate between different levels of functioning or to assess changes in functioning over time. It is likely that NFAS-4 will not be as responsive to changes in functioning, simply because it has fewer response options that individuals can use to indicate that their functioning has changed.
It might be anticipated that the response alternative, "much difficulty", along with the two end categories would show similar percentages in the two versions. This was not found. Hence, the responses did not seem to be affected by the wording or anchoring of the response alternatives.
Internal consistency and validity
The internal consistency values were similar to widely used instruments including the SF-36 [28, 29, 29–33] and the NHP . Our item-other domain correlation coefficients were comparable with other study results using the SF-36 in a study including rheumatoid arthritis patients  and a population study .
Regarding construct validity, different time perspectives in the questioning for the different scales could influence possible associations since Work Ability concerns today, NFAS last week, COOP/WONCA and GHQ the last two weeks. However, all a priori hypotheses correlation coefficients met the 0.4 – 0.8 standard. Other studies have obtained similar correlation coefficients between NHP and SF-36 scales [15, 34] or between SF-36 scale scores and comparable item or domain scores from other questionnaires [32, 35]. Regarding the ability to discriminate between groups with different levels of health status, comparable results were found for the SF-36 [30–33, 35]. A gender difference was found in several studies [28, 30–32, 35–37], but not all [33, 38]. The finding of a physical age gradient is supported by several studies [28, 32, 33, 35–38], and an education gradient has also been found in previous research [28, 30, 31, 35, 38].
The NFAS-5 demonstrated somewhat higher internal consistency and item-discriminant validity values compared to the NFAS-4. The majority of this difference could probably be attributed to the fact that correlation between true values and scale scores increase with the number of response options , but it is not known whether this explains the whole difference in correlation coefficient values.
Future applications of the NFAS
The items in the NFAS are derived directly from the activities/participation dimension in the ICF. The ICF use a five-point scale for their qualifiers and the clinical checklists. This supports the use of the NFAS-5. The NFAS-5 had lower levels of missing data than the NFAS-4 which may indicate higher responder acceptability. The NFAS-5 generally performed better than the NFAS-4 in relation to the psychometric tests. Therefore the five-point scale is recommended in future applications of the NFAS. The main drawback in changing to a new response format is that it precludes direct comparisons between previous and new research. However, following our study results, we believe that the evidence supports changing the NFAS response format to a five-point scale.
Strengths and limitations
This study' strengths include the randomised design, the large study sample, the good data quality and the thorough testing of validity against other standards. The moderate response rate and that all data is self-reported, represent study limitations. An external, unrelated variable would have strengthened validity assessment. With the present study design it was not possible to ask the respondents about their preferences  or to determine the sensitivity to change, the responsiveness of the scale. However, the low mean missing values may indicate acceptability among respondents.
The data quality of NFAS is high with acceptable internal consistency and good construct validity. In choosing between the four-point and the five-point scale, it should be noted that while construct validity and discriminative ability are comparable, both data quality, internal consistency and discriminative validity suggest that the five-point scale is to be preferred in future applications of the NFAS.
The General Health Questionnaire-20 items
The International Classification of Functioning, Disability and Health
The Norwegian Function Assessment Scale
The generic Short Form 36-item Health Survey
McColl E, Jacoby A, Thomas L, Soutter J, Bamford C, Steen N, Thomas R, Harvey E, Garratt A, Bond J: Design and use of questionnaires: a review of best practice applicable to surveys of health service staff and patients. Health Technol Assess 2001, 5: 1–256.
Streiner DL, Norman GR: Health measurement scales a practical guide to their development and use. Third edition. Oxford, Oxford University Press; 2003.
Ware JE: SF-36 Health Survey Manual and Interpretation Guide. Boston, The Health Institute New England Medical Center; 1993.
Hunt SM, McKenna SP, McEwen J, Backett EM, Williams J, Papp E: A quantitative approach to perceived health status: a validation study. J Epidemiol Community Health 1980, 34: 281–286.
Group EQL: EuroQol--a new facility for the measurement of health-related quality of life. The EuroQol Group. Health Policy 1990, 16: 199–208. 10.1016/0168-8510(90)90421-9
Kessler RC, Barber C, Beck A, Berglund P, Cleary PD, McKenas D, Pronk N, Simon G, Stang P, Ustun TB, Wang P: The World Health Organization Health and Work Performance Questionnaire (HPQ). J Occup Environ Med 2003, 45: 156–174. 10.1097/01.jom.0000052967.43131.51
Miller GA: The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 1956, 63: 81–97. 10.1037/h0043158
Guyatt GH, Townsend M, Berman LB, Keller JL: A comparison of Likert and visual analogue scales for measuring change in function. J Chronic Dis 1987, 40: 1129–1133. 10.1016/0021-9681(87)90080-4
Cox EP: The Optimal Number of Response Alternatives for a Scale: A Review. J Marketing Research 1980, 17: 407–422. 10.2307/3150495
Preston CC, Colman AM: Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychol (Amst) 2000, 104: 1–15. 10.1016/S0001-6918(99)00050-5
Avis NE, Smith KW: Conceptual and methodological issues in selecting and developing quality of life measures. In: Advances in medical sociology (Fitzpatrick, R, editor).. London, JAI Press Inc.; 2006:255–80.
Nishisato S, Torii Y: Effects of categorizing continuous normal variables on product-moment correlation. Japanese Psychological Research 1970, 13: 45–49.
Martin WS: Effects of Scaling on Correlation Coefficient - Test of Validity. Journal of Marketing Research 1973, 10: 316–318. 10.2307/3149702
Chang L: A Psychometric Evaluation of 4-Point and 6-Point Likert-Type Scales in Relation to Reliability and Validity. Applied Psychological Measurement 1994, 18: 205–215. 10.1177/014662169401800302
Cleopas A, Kolly V, Perneger TV: Longer response scales improved the acceptability and performance of the Nottingham Health Profile. J Clin Epidemiol 2006, 59: 1183–1190. 10.1016/j.jclinepi.2006.02.014
StatisticsNorway: StatBank Norway.2006. [http://www.ssb.no]
Brage S, Fleten N, Knudsrod OG, Reiso H, Ryen A: [Norwegian Functional Scale--a new instrument in sickness certification and disability assessments]. Tidsskr Nor Laegeforen 2004, 124: 2472–2474.
World Health Organization: ICF-International Classification of Functioning, Disability, and Health. Geneva, World Health Organization; 2001.
World Health Organization: ICF Checlist. Version 2.1a, Clinical Form for International Classification of Functioning, Disability and Health.2007. [http://www.who.int/classifications/icf/site/checklist/icf-checklist.pdf]
Nelson E, Wasson J, Kirk J, Keller A, Clark D, Dietrich A, Stewart A, Zubkoff M: Assessment of function in routine clinical practice: description of the COOP Chart method and preliminary findings. J Chronic Dis 1987, 40 Suppl 1: 55S-69S.
Goldberg DP: Manual of the General Health Questionnaire. Edited by: NFER-Nelson . Windsor; 1978.
McDowell I: Measuring Health. A Guide to Rating Scales and Questionnaires. Third edition. Oxford, University Press; 2006.
Reiso H, Nygard JF, Brage S, Gulbrandsen P, Tellnes G: Work ability assessed by patients and their GPs in new episodes of sickness certification. Fam Pract 2000, 17(2):139–144. 10.1093/fampra/17.2.139
Kuorinka I, Jonsson B, Kilbom A, Vinterberg H, Biering-Sorensen F, Andersson G, Jorgensen K: Standardised Nordic questionnaires for the analysis of musculoskeletal symptoms. Appl Ergon 1987, 18: 233–237. 10.1016/0003-6870(87)90010-X
Nunnally JC, Bernstein IH: Psychometric theory. 3rd ed edition. New York, McGraw-Hill; 1994.
Kaasa S, Bjordal K, Aaronson N, Moum T, Wist E, Hagen S, Kvikstad A: The EORTC core quality of life questionnaire (QLQ-C30): validity and reliability when analysed with patients treated with palliative radiotherapy. Eur J Cancer 1995, 31A: 2260–2263. 10.1016/0959-8049(95)00296-0
Nagata C, Ido M, Shimizu H, Misao A, Matsuura H: Choice of response scale for health measurement: comparison of 4, 5, and 7-point scales and visual analog scale. J Epidemiol 1996, 6: 192–197.
Loge JH, Kaasa S: Short form 36 (SF-36) health survey: normative data from the general Norwegian population. Scand J Soc Med 1998, 26: 250–258.
Sullivan M, Karlsson J, Ware JE Jr.: The Swedish SF-36 Health Survey--I. Evaluation of data quality, scaling assumptions, reliability and construct validity across general populations in Sweden. Soc Sci Med 1995, 41: 1349–1358. 10.1016/0277-9536(95)00125-Q
Jenkinson C, Coulter A, Wright L: Short form 36 (SF36) health survey questionnaire: normative data for adults of working age. BMJ 1993, 306: 1437–1440.
Jenkinson C, Stewart-Brown S, Petersen S, Paice C: Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health 1999, 53: 46–50.
Brazier JE, Harper R, Jones NM, O'Cathain A, Thomas KJ, Usherwood T, Westlake L: Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ 1992, 305: 160–164.
Blake C, Codd MB, O'Meara YM: The Short Form 36 (SF-36) Health Survey: normative data for the Irish population. Ir J Med Sci 2000, 169: 195–200.
Loge JH, Kaasa S, Hjermstad MJ, Kvien TK: Translation and performance of the Norwegian SF-36 Health Survey in patients with rheumatoid arthritis. I. Data quality, scaling assumptions, reliability, and construct validity. J Clin Epidemiol 1998, 51: 1069–1076. 10.1016/S0895-4356(98)00098-5
Sullivan M, Karlsson J: The Swedish SF-36 Health Survey III. Evaluation of criterion-based validity: results from normative population. J Clin Epidemiol 1998, 51: 1105–1113. 10.1016/S0895-4356(98)00102-4
Hopman WM, Towheed T, Anastassiades T, Tenenhouse A, Poliquin S, Berger C, Joseph L, Brown JP, Murray TM, Adachi JD, Hanley DA, Papadimitropoulos E: Canadian normative data for the SF-36 health survey. Canadian Multicentre Osteoporosis Study Research Group. CMAJ 2000, 163: 265–271.
Bruusgaard D, Nessioy I, Rutle O, Furuseth K, Natvig B: Measuring functional status in a population survey. The Dartmouth COOP functional health assessment charts/WONCA used in an epidemiological study. Fam Pract 1993, 10: 212–218. 10.1093/fampra/10.2.212
Grammenos S: Illness, disability and social inclusion. Dublin, European Foundation for the Improvement of Living and Working Conditions; 2003.
The study is part of The Functional Assessments Project financed by The Ministry of Labour and Social Inclusion. It was carried out in collaboration with The Ullensaker Study 2004 (financed by the University of Oslo and the Trygve Gythfeldt Fund).
The author(s) declare that they have no competing interests.
NØ planned and designed the study, performed some of the statistical analysis, drafted the manuscript and coordinated the study. PG participated in the planning and design of the study, interpretation of the results and in drafting the manuscript. AG helped in the interpretation of the results and participated in drafting the manuscript. JSB performed most statistical analysis and reviewed the manuscript. FAD assisted statistical analysis and reviewed the manuscript. BN participated in planning and designing the study, collected the data and participated in drafting the manuscript. SB planned and designed the study, participated in the interpretation of results and in drafting and revising the manuscript. All authors read and approved the final manuscript.