Skip to main content

A new questionnaire for measuring quality of life - the Stark QoL



The Stark questionnaire measures health-related quality of life (QoL) using pictures almost exclusively. It is supplemented by a minimum of words. It comprises a mental and a physical health component.


A German sample of n = 500 subjects, age and gender stratified, filled out the Stark Qol questionnaire along with various other questionnaires via internet.


The physical component shows good reliability (Cronbach's alpha = McDonalds Omega = greatest lower bound = .93), the mental component can be improved (Cronbach's alpha = .63, McDonalds Omega = .72, greatest lower bound = .77). Confirmatory factor analysis shows a good fit (Bentlers CFI = .97). Construct validity was proven.


The Stark QoL is a promising new development in measuring QoL, it is a short and easy to apply questionnaire. Additionally, it is particularly promising for international research.


Measurement of Quality of Life (QoLFootnote 1) has become increasingly important in medicine over the past three decades. A search in Pubmed revealed an exponential growth of published articles containing the term "Quality of Life OR QoL" since the 1990s, exceeding 20.000 articles per year since 2013. Randomised controlled trials as well as observational studies increasingly include QoL measures, usually as a secondary endpoint e.g. [13]. Additionally, there are studies utilizing measures of QoL as predictors, for example for death [4]. Inclusion of QoL measures into studies is no longer restricted to highly developed western countries, but now includes countries from all over the world e.g. [5, 6].

The Stark QoL is comprised of a total of 16 pictures representing different mood states, energy, social contact, and various physical activities. As far as possible, the content of the items was transferred into the pictures, leaving only very short text elements in between. Fully avoiding text proved to be impossible. A respondent needs to know whether a certain picture displays something (s)he is able to do or something (s)he would like to be able to do or would like to do. Respondents either have to tick the picture in a series that best describes them, or choose a symbol ("--", "-", "0", " + ", "++") near the picture describing how well they are able to perform a certain task. The idea of the questionnaire is based on the Dartmouth COOP Charts [7, 8], where some items similar to those in the Stark QoL were utilized. The name Dartmouth COOP Charts is an abbreviation for a questionnaire used in the Dartmouth-Northern New England Primary Care Cooperative Information Project, and it comprises various dimensions which we would label as aspects of QoL today [9].

When developing the questionnaire, it was important that it could be easily translated into other languages. Since about 10 % of the world’s population are still illiterate [10], it was an additional requirement that respondents who had never learned to read would able to fill it out – naturally after receiving verbal instructions. The Stark QoL is short: it fits on two standard pages. Basically, two components of QoL can be analysed, a mental and a physical one. The questionnaire is called Stark QoL because the pictures were drawn by a German artist named H.P. Stark ( A first study on 445 students, which focused on the items individually revealed good reliability of the instrument [9].

The aim of the present paper is to present the two dimensions of the questionnaire with various indicators of validity. Therefore, the subscales of the Short Form 36 [11] and the symptom check list 27 plus [12] were compared to the mental and physical component of the Stark QoL.



A sample of 500 individuals stratified by age and gender filled out the Stark QoL, along with several other questionnaires via internet ( Registered individuals received an email asking them to fill out a questionnaire set containing about 280 items. Participants received compensation of about € 4.30 for filling out the questionnaire. The window was automatically closed after the 500th subject filled out the questionnaire. The ethics commission of the State Chamber of Physicians, Rhineland-Palatinate (Landesärztekammer Rheinland-Pfalz) approved the project (837.185.07). Table 1 displays the sample characteristics. The sample has been described in detail by Hardt et al. [13].

Table 1 Sample description (n = 500)

The sample size was determinated by the plan to perform a confirmatory factor analysis. Muthen and Muthen [14] performed a simulation study using two scales with five items each having factor loadings of .80. Having non-normal data, they recommend a sample size of at least n = 265. Hence, a sample size of n = 500 should suffice even if some factor loadings were lower here, and the mental component has only three items. For all other statistics the given sample size is fully sufficient. A recent review over 114 patient reported outcome measures found a median sample size of n = 207 [15].


The Stark QoL: The first item measures mood and consists of five smileys, at one end is a very happy face, at the other end a very sad one. Probands were asked to check the one that best applies to them. The second item measures energy and presents two pictures of a person walking, on the left-hand side the walker is full of energy and on the right he seems to be walking almost as if depressed. The third item measures social contact and displays three pictures showing a group of five persons each, one white and four of them grey. The white person symbolizes the proband himself, the grey ones a possible peer group. On one end, the white person is standing in the middle of the group, on the other end alone. Together, these three items constitute the mental component. All items are displayed on one page and are to be answered by making a cross under the picture that best applies to one’s own situation.

On the second page, six items measuring physical functioning are presented. The pictures show activities like carrying a shopping basket, moving a table, tying shoes, etc. Next to each picture, a five point Likert scale was displayed. The text reads "I can", and "++" stands for "very well", " + " for "well", "0" for "fairly", "-" for "poorly" and "- -" for very poorly. Probands are asked to indicate how easily they can perform the activity displayed in each picture. These items constitute the physical component. The whole questionnaire is displayed in Additional file 1.

The Short Form 36 is a self-rating questionnaire consisting of eight subscales: "Physical Functioning", "Role limitations due to Physical problems", "Bodily Pain", "General Health Perceptions", "Vitality", "Social Functioning", "Role limitations due to Emotional problems", and "Mental Health" [11, 16]. The SF-36 has been translated into more than 40 languages [17].

The symptom checklist 27 plus is a six-scale questionnaire. It measures depressive symptoms (current and lifetime), symptoms of social anxiety and agoraphobia, vegetative symptoms and symptoms of pain. The scales (except lifetime depression) assess a time frame of two weeks. They comprise between four and six items each, and have good internal consistencies in population as well as patient samples e.g. [12, 18, 19].



All items of the present analysis except age and gender were coded between 0 and 100. Scales were calculated as the mean of the items, high values stand for good QoL. The SF-36 was coded according to the manual. The scl-27-plus scales were coded inversely, i.e. high values stand for many and/or severe symptoms. There were no missing data in the survey because the program prompted the respondents to tick any item if one was left blank before changing to the next screen. Since Cronbach’s α [20] as a single measure for reliability is no longer regarded as optimal even by Cronbach himself [2123], Cronbachs α, McDonalds ωt [24] and the greatest lower bound (glb) [25] were used to estimate reliability.

Confirmatory factor analysis

Bentler’s Comparative Fit Index (CFI) [26], the Goodness of Fit Index (GFI), the root mean square error of aproximation (RMSEA), the adjusted Goodness of Fit Index (AGFI), the root mean squared residual (RMR), and the chi square degree of freedom ratio (X 2/df) were reported to assess the overall fit of the confirmatory factor analysis.

Construct validity

Pearson correlations were utilized to assess construct validity. In addition to linear effects as represented by the correlation coefficients, all associations between continuous variables were tested for curvy-linearity by including a quadratic term in a regression model. Results for the nonlinear associations are reported as curves in Additional files 2, 3 and 4 if the quadratic term was p < .01 (two-tailed), and as a linear regression line otherwise. Statistics were performed by STATA [27], AMOS [28] and the package "psych" in R [29].


Item and score distributions

Figure 1a and b display the distributions of two sample items of the mental component, Fig. 1c the score for mental component. Similarly, Fig. 1d and d the distributions of two sample items of the physical component, Fig. 1f the score. Table 1 shows in the lower half the descriptive statistics all items and scores of the Stark QoL. The mental component has a mean of \( \overline{x} \) = 71, the physical component one of \( \overline{x} \) = 82. The means of all single items are well above the theoretical midpoint of the scale (50). The easiest item is picking a glass (\( \overline{x} \) = 90), the most difficult lifting a box (\( \overline{x} \) = 71). All items and scales are left skewed, i.e. subjects are on the positive end of the scale.

Fig. 1
figure 1

Distributions of items and scores


Item 9, "lifting a heavy box", did not contribute meaningfully to the physical component. Hence it was not utilized and the score calculated over items 4 – 8. The reliability of the mental component was α = .63, ωt = .72 and glb = .77. For the physical component, it was α = ωt = glb = .93. The items of the mental component had item-rest correlations .40 ≤ r ≤ .62. The items of the physical component had item-rest correlations .77 ≤ r ≤ .85. All items have lower correlations to the foreign scales than to their own.

Confirmatory factor analysis

Also in the confirmatory factor analysis, item 9, "lifting a heavy box", did not perform well. The drawing is too similar to item 2, "energy". Hence, it exclusion was confirmed. Additionally, there was a correlation between the errors of item 4, "shopping" and item 8, "moving a table". When the two were allowed to correlate, there was CFI = .97, GFI = .95, RMSEA = 0.094, AGFI = .90, RMR = .034 and χ2/df ratio = 5.40. If the two were not allowed to correlate, there was CFI = .94, GFI = .92, RMSEA = 0.124, AGFI = .85, RMR = .037 and χ2/df ratio = 8.64 . Standardized factor loadings varied between .55 and .85 in the mental component and between .82 and .87 in the physical component (Fig. 2).

Fig. 2
figure 2

Graphical representation of the Stark QoL with standardizised parameters – i.e. regression coefficients at the single sided arrows, correlation coefficients at the double sided arrows, intercepts in the boxes and residual variances close to the circles

Correlation between the mental and the physical component

The linear estimate of the association between the mental and the physical component is r = .34, p < .001 (see Table 2). However, when tested for non-linearity, it can be seen that the association between the mental and the physical component is zero in the lower range and much stronger than r = .34 in the higher range (p < .001 for the quadratic term). Additional file 3: Figure S1 displays the estimated curve with its 95 % confidence intervals. The latter is narrow in the upper ranger where many observations are, and opens up wide in the lower range. Light blue bubbles display the observed values, big bubbles stand for more subjects than small ones.

Table 2 Correlations (Pearson’s r)

Validity – SF-36

Table 2 displays the correlations of the two components with the eight scales of the SF-36. Correlations are consistently positive. The mental component of the Stark Qol has two high correlations, one to "Vitality" (r = .66) and one to "Mental Health" (r = .63), three medium correlations to "Social Functioning" (r = .54), to "General Health Perceptions" (r = .52) and to "Role-Emotional" (r = .45). All other correlations are r ≤ .40. The physical component has one strong correlation to "Physical Functioning" (r = .71), followed by three medium ones to "Bodily Pain" (r = .57), to "Role-Physical" (r = .54) and to "General Health Perceptions" (r = .52). All other correlations are r ≤ .41. Again, some associations were non-linear as displayed in Additional files 2 and 3.

Validity – SCL-27-plus

Correlations between the Stark components and the SCL-27-plus are consistently negative and smaller in magnitude. There is one single medium correlation between "current depressive symptoms" and the mental component (r = −.47), all others are smaller than r ≤ .40. The significant non-linear association between "agoraphobic symptoms" and the mental component as well as all linear associations are displayed in Additional files 2 and 4.


The correlation of the mental component is close to zero for age, but there is a strong non-linear effect explaining about 1.7 % of the variance of the mental component (see Additional file 4: Figure S2). Young and old participants reported good QoL, middle-aged a relatively poor one (p < .003 for the quadratic effect). The physical component has a strong negative correlation with age (r = −.33). The effect is linear, the added contribution of a quadratic term would contribute non-significantly (p < .812, see Additional file 4: Figure S2). Gender effects are relatively small with point biserial correlations of r = .08 for the mental and r = .10 for the physical component being non-significant for both components.


Item and score distributions

In this mainly healthy sample, the answers of all items of the Stark QoL are rather on the side of a high quality of life, leading to left skewed distributions of the scores. This should not necessarily be regarded as negative, the values of the SF-36 show a similar pattern. In research on health related issues, QoL questionnaires are usually designed in theis way, to be able to capture the QoL of severely impaired patients.


The Stark QoL demonstrates a good reliability for its physical component, but the estimates vary strongly for its mental component. The value for α was poor, for ωt acceptable and for glb good. The reason for the poor α lies probably in the fact, that the mental component has three items only. Adding more items would probably improve the scale.

The overall fit of a confirmatory factor analysis was good after freeing one covariance in the physical component, and it still acceptable for the restricted model. Even if Hu and Bentler [30] suggested a cut-off of .95 for the CFI, practice has shown that this criterion cannot always be reached. A second important criterion to evaluate a test was clearly satisfied here: most factor loadings were high. The reason why the two items "shopping" and "moving a table" have a correlated error is unclear and should become examined in further research.


Construct validity of the Stark QoL was good. The mental component showed the largest correlations to the SF-36 subscales "Vitality" and "Mental Health", two aspects which are explicitly displayed in the Stark QoL. The third item of the mental component assesses social contact – the score had only a moderate correlation to the SF-36 scale "Social Functioning". The physical component of the Stark QoL clearly has the highest correlation to the SF-36 scale "Physical Functioning". Hence, the physical component displays a high specificity. Such an effect would have been expected, because the pictures displaying physical activity partially capture precisely what is asked using words in the items of the SF-36.


Construct validity regarding the SCl-27-plus is also good. All correlations were negative, much smaller than those with the SF-36 on average, and some close to zero in the physical component. Such a pattern was expected, because the SCL-27-plus does not assess QoL, but psychological complaints. When the correlations between the mental component and current depressive symptoms versus lifetime depressive symptoms were compared, the one to current depressive symptoms is significantly higher. Hence, one can draw the conclusion that the Stark QoL measures a state rather than a trait.


The effect of age on the physical component is strong, but linear. The expected value for an eighty-year-old is almost 30 points lower than the one for a 20-year-old (exactly 28.59 points). The fact that our physical fitness decreases with age is well-known. However, that the effect was linear here was surprising – usually the decline becomes larger in old age. It can be speculated that this is a consequence of the internet sample which constituted the basis here - possibly only elderly people in good health voluntarily participate in such an internet survey. The estimated value of the mental component is also dependent on age, but here the minimum is at an age of about 50, where the estimated value of mental component is at 68. In comparison, both a 20-year-old and a 75-year-old would receive an estimated value of 80. Gender effects were small and non-significant at p > .01. However, in a further study with the Stark QoL, one should consider these differences.

The present study has the following limitations: (1) Data rely on a sample examined via internet. It is not representative for the German population. It is known that about 88 % of Germans and 68 % of Poles have access to the internet [31] – a number which shows that the bias due to sampling procedure should not be too large. On the other hand, the distribution of the variable "years of formal education" displays a pattern which differs clearly from the normal population. (2) This paper reports the construct validity of the questionnaire examined in a relatively healthy sample. It is necessary to conduct studies with patient groups in the future. (3) The Stark QoL was designed to conduct research over various countries including in developing one. Further research on more diverse samples is needed.


The Stark QoL constitutes an alternative to questionnaires assessing quality of life via worded items. The partly low reliability of the mental component is clearly critical. In a previous study, it received a somewhat better reliability [9]. There are short QoL measures in use e.g. [32, 33]. Even the widely used SF-36 has scales with two and three items [11]. Some authors were satisfied utilizing such short scales, others were more critical. We rather belong to the latter. One or two more items should be included into the mental component of the Stark QoL, we currently think about adding a visual analogue scale for assessing pain and a sort of thermometer with a large heart the top and a small one at the bottom for assessing happiness.

On the other hand, the Stark QoL is a short and efficient measure for two widely assessed dimensions of quality of life and the pictures may make a questionnaire set a bit livelier than one relying solely on worded items. We placed it at the end of the questionnaire. Additionally, translation into many languages should be easy, and international comparisons could be facilitated with the Stark QoL.


  1. There are various forms of QoL. Since this article focuses only on health related QoL, the term QoL stands for health related QoL throughout.



Cronbachs α


Adjusted goodness of fit index


Comparative fit index


Goodness of fit index


Greatest lower bound


Health related quality of life


Root mean squared residual


Root mean square error of aproximation


Quality of life


Standard deviation

ωt :

McDonalds omega

X 2/df:

Chi square degree of freedom ratio


  1. Fimland MS, Vasseljen O, Gismervik S, Rise MB, Halsteinli V, Jacobsen HB, et al. Occupational rehabilitation programs for musculoskeletal pain and common mental health disorders: study protocol of a randomized controlled trial. BMC Public Health. 2014;14:368.

    Article  PubMed Central  PubMed  Google Scholar 

  2. Myleus A, Petersen S, Carlsson A, Hammarroth S, Hogberg L, Ivarsson A. Health-related quality of life is not impaired in children with undetected as well as diagnosed celiac disease: a large population based cross-sectional study. BMC Public Health. 2014;14:425.

    Article  PubMed Central  PubMed  Google Scholar 

  3. Maddison R, Rawstorn JC, Rolleston A, Whittaker R, Stewart R, Benatar J, et al. The remote exercise monitoring trial for exercise-based cardiac rehabilitation (REMOTE-CR): a randomised controlled trial protocol. BMC Public Health. 2014;14:1236.

    Article  PubMed Central  PubMed  Google Scholar 

  4. Ul-Haq Z, Mackay DF, Pell JP. Association between physical and mental health-related quality of life and adverse outcomes; a retrospective cohort study of 5,272 Scottish adults. BMC Public Health. 2014;14:1197.

    Article  PubMed Central  PubMed  Google Scholar 

  5. Nglazi MD, West SJ, Dave JA, Levitt NS, Lambert EV. Quality of life in individuals living with HIV/AIDS attending a public sector antiretroviral service in Cape Town, South Africa. BMC Public Health. 2014;14:676.

    Article  PubMed Central  PubMed  Google Scholar 

  6. Zhou J, Ru X, Hearst N. Individual and household-level predictors of health related quality of life among middle-aged people in rural Mid-east China: a cross-sectional study. BMC Public Health. 2014;14:660.

    Article  PubMed Central  PubMed  Google Scholar 

  7. Nelson EC, Landgraf JM, Hays RD, Wasson JH, Kirk JW. The functional status of patients. How can it be measured in physicians' offices? Med Care. 1990;28(12):1111–26.

    Article  CAS  PubMed  Google Scholar 

  8. Landgraf JM, Nelson EC. Summary of the WONCA/COOP International Health Assessment Field Trial. The Dartmouth COOP Primary Care Network. Aust Fam Physician. 1992;21(3):255–7. 60–2, 66–9.

    CAS  PubMed  Google Scholar 

  9. Hardt J. The Stark QoL - measuring Quality of Life somewhat differently. Research. 2014; Assessed Oct 2015.

  10. UNESCO. Weltbildungsbericht. 2013/2014. Assessed Oct 2015.

  11. Ware Jr JE, Snow KK, Kosinski M, Gandek B. SF-36 health survey: manual and interpretation guide. Boston: New England Medical Center, The Health Institute; 1993.

    Google Scholar 

  12. Hardt J, The Symptom-Check-List-27-plus (SCL-27-plus). A modern conceptualization of a traditional screening instrument. Ger Med Sci Psychosoc Med. 2008;5:doc08. Assessed Oct 2015.

  13. Hardt J, Dragan M, Schultz S, Engfer A. Parent–child Relationships in Poland and Germany: A Retrospective Study. Psychology. 2011;2:502–9.

    Article  Google Scholar 

  14. Muthén LK, Muthén BO. How to use a montecarlo study to decide on sample size and determine power. Struct Equ Model. 2002;9:599–620.

    Article  Google Scholar 

  15. Anthoine E, Moret L, Regnault A, Sebille V, Hardouin JB. Sample size used to validate a scale: a review of publications on newly-developed patient reported outcomes measures. Health Qual Life Outcomes. 2014;12:176.

    Article  PubMed  Google Scholar 

  16. Bullinger M, Kirchberger I. SF-36 Fragebogen zum Gesundheitszustand. Göttingen: Hogrefe; 1998.

    Google Scholar 

  17. Garrat AM, Ruta DA, Abdalla MI, Buckinham JK, Russel IT. The SF-36 health survey questionnaire: an outcome measure suitable for routine use with the NHS. BMJ. 1993;306:1440–4.

    Article  Google Scholar 

  18. Hardt J. Die Symptom-Checkliste-27-plus: Kreuzvalidierung eines neuen Screening-Instrumentes. Zeitschrift für Psychologische Medizin. 2011;22:11–20.

    Google Scholar 

  19. Henn L, Braks K, Brian T, Herke M, Hardt J. Die Symptomcheckliste 27 plus bei Patienten mit Essstörungen. Psychother Psychosom Med Psychol. 2013;63:201–3. doi:10.1055/s-033-1337910.

    Article  PubMed  Google Scholar 

  20. Cronbach LJ. Coefficient alpha and the internal structure of tests. Biometrica. 1951;16:297–335.

    Google Scholar 

  21. Cronbach LJ, Shavelson RJ. My current thoughts on coefficient alpha and succesor procedures. Educ Psychol Meas. 2004;64:391–418.

    Article  Google Scholar 

  22. Sijtsma K. On the Use, the Misuse, and the Very Limited Usefulness of Cronbach's Alpha. Psychometrika. 2009;74(1):107–20.

    Article  PubMed Central  PubMed  Google Scholar 

  23. Revelle W, Zinbarg RE. Coefficients Alpha, Beta, Omega, and the glb: Comments on Sijtsma. Psychometrika. 2009;74(1):145–54.

    Article  Google Scholar 

  24. McDonald RP. Test Theory: A Unified Treatment. Mahwah: Erlbaum; 1999.

    Google Scholar 

  25. Sijtsma K. Reliability Beyond Theory and Into Practice. Psychometrika. 2009;74(1):169–73.

    Article  PubMed Central  PubMed  Google Scholar 

  26. Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107:238–46.

    Article  CAS  PubMed  Google Scholar 

  27. StataCorp. Stata Statistical Software: Release 12. 12 ed. College Station, Texas: StataCorp LP; 2011.

    Google Scholar 

  28. Arbuckle JL, Wothke W. AMOS (Version 4). Chicago: Smallwaters; 1999.

    Google Scholar 

  29. R Development Core Team. R: a language and environment for statistical computing. 3.1.3 ed. Vienna: R Foundation for Statistical Computing; 2015.

    Google Scholar 

  30. Hu L, Bentler PM. Cutoff criteria fit indixes in covariance structure modeling: conventional criteria versus new alternatives. Strucural Equation Modeling. 1999;6(1):1–55.

    Article  Google Scholar 

  31. European Community. Internet World Stats. 2015. Assessed Oct 2015.

  32. Hermann K, Kraus K, Herrmann K, Joos S. A brief patient-reported outcome instrument for primary care: German translation and validation of the Measure Yourself Medical Outcome Profile (MYMOP). Health Qual Life Outcomes. 2014;12:112.

    Article  PubMed  Google Scholar 

  33. Thomson WM, Foster Page LA, Malden PE, Gaynor WN, Nordin N. Comparison of the ECOHIS and short-form P-CPQ and FIS scales. Health Qual Life Outcomes. 2014;12:36.

    Article  PubMed Central  PubMed  Google Scholar 

Download references


This work was funded, in part, by the Heigl-Stiftung Duesseldorf and the Koehler-Stiftung, Essen.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jochen Hardt.

Additional information

Competing interests

The author declares no conflict of interest.

Additional files

Additional file 1:

The Stark QoL version 1 with nine items and answering options. (DOC 1221 kb)

Additional file 2: Table S1.

Tests for nonlinear associations, (unstandardizised regression coefficients). (DOC 54 kb)

Additional file 3: Figure S1.

Results for nonlinear associations, Stark QoL and SF-36. (DOC 120 kb)

Additional file 4: Figure S2.

Results for nonlinear associations, Stark QoL and -27-plus. (DOC 94 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hardt, J. A new questionnaire for measuring quality of life - the Stark QoL. Health Qual Life Outcomes 13, 174 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: