Skip to main content

Swahili translation and validation of the Warwick Edinburgh Mental Wellbeing Scale (WEMWBS) in adolescents and adults taking part in the girls’ education challenge fund project in Tanzania



The Warwick Edinburgh Mental Wellbeing Scale (WEMWBS) is validated for measuring mental wellbeing in populations aged 11 + and has been translated into 30 + languages. The aims of this study were a) to translate and validate WEMWBS for use in Swahili-speaking populations to facilitate measurement and understanding of wellbeing, evaluation of policy and practice, and enable international comparisons; and b) to examine sociodemographic characteristics associated with higher and lower mental wellbeing in participants in the Girls’ Education Challenge (GEC) project in Tanzania.


A short questionnaire including WEMWBS and similar scales for comparison, socio-demographic information, and self-reported health was translated into Swahili using gold standard methodology. This questionnaire was used to collect data from secondary school students, learner guides, teacher mentors and teachers taking part in the GEC project in Tanzania. Focus groups were used to assess acceptability and comprehensibility of WEMWBS and conceptual understanding of mental wellbeing. These were audio-taped, transcribed and analysed thematically. Internal consistency of WEMWBS, correlation with comparator scales and confirmatory factor analysis were completed as quantitative validation. Finally, multivariable logistic regression was used to explore associations between individual characteristics and ‘high’ and ‘low’ mental wellbeing, defined as the highest and lowest quartile of WEMWBS scores.


3052 students and 574 adults were recruited into the study. Participants reported that WEMWBS was understandable and relevant to their lives. Both WEMWBS and its short form met quantitative standards of reliability and validity, were correlated with comparator scales and met the criteria to determine a single factor structure. For students in the GEC supported government schools: mental wellbeing was higher in students in the final two ‘forms’ of school compared with the first two. In addition: being male, urban residence, the absence of markers of social marginality and better self-reported health were all significantly associated with better mental wellbeing. For adults, urban residence and better self-reported health were associated with better mental wellbeing.


The Swahili translation of WEMWBS is available for use. Further work to explore how to intervene to increase mental wellbeing in vulnerable GEC participants is needed.


Understanding and measuring wellbeing is a policy priority. In addition to its very important intrinsic value, wellbeing has instrumental value because it drives population health as well as associated health care costs, and social and economic progress, with impacts on employment, productivity, criminal activity, prosocial behaviours and education [1].

The Warwick Edinburgh Mental Wellbeing Scale (WEMWBS) is validated for measuring mental wellbeing in populations aged 13 and above in the UK [2], with the short-form validated in populations aged 11 and above [3]. It has been translated into more than 30 languages, and many translations and validations have been published [4]. It is collected and reported as part of UK national statistics and has been used extensively to evaluate interventions [5,6,7].

The primary aim of this study was to translate and validate WEMWBS and its short form for use in adolescent and adult Swahili-speaking populations. This will facilitate measurement and understanding of wellbeing in Swahili speaking populations, evaluation of policy and practice where the target populations are Swahili speaking or include Swahili speaking people, and enable international comparisons. The secondary aim of this study was to describe mental wellbeing in the participants of the Tanzanian Girls’ Education Challenge (GEC) fund project, run by the Campaign for Female Education (CAMFED) and funded through the UK’s Foreign, Commonwealth and Development Office. Specifically, we aimed to examine sociodemographic characteristics associated with higher and lower wellbeing in this population.


This was a mixed methods study including qualitative and quantitative data collection to support the primary aim of the study, and quantitative data collection to answer the secondary aim of the study.


Translations followed gold standard methods. Firstly, two translators, both with Swahili as their mother-tongue, independently translated WEMWBS from English into Swahili. Secondly, the translators shared their independent translations and worked together to produce one common consensus version of WEMWBS in Swahili. Notes were taken to document issues addressed and how they were resolved. The Swahili version of WEMWBS was sent to two translators who had never seen the original WEMWBS and they independently translated the Swahili version back into English. Although ideally, these would have been people with English as their mother-tongue, our translators were native Swahili speakers with fluent English. The two new English versions and associated notes were sent to Sarah-Stewart Brown (SSB) (one of the WEMWBS developers) who compared these against the original WEMWBS with any minor points for further discussion noted at this point. Further discussion with a native Swahili speaker (not one of the previous translators), alongside the translations and notes, further refined the translated version into the final Swahili WEMWBS.

Sample size, setting and participants

There is a lack of consensus about how to calculate sample size for validation studies for scales, with recommendations varying from a participant to item ratio of between 2 and 20 participants per item, to absolute recommendations e.g.; Comrey and Lee state 100 = poor, 200 = fair, 300 = good, 500 = very good, ≥ 1000 = excellent [8]. We considered both the sample required for the validation study, as well as our secondary aim (to investigate the determinants and distribution of wellbeing in the population examined). In order to sample enough adults to be able to have an appropriate sample size for the validation in adults, we planned to visit 100 schools, and for this reason we were able to collect data from far more students that necessary for the validation studies alone at little extra expense. These data allowed us to examine the associations between the WEMWBS and socio-demographic characteristics of participants.

CAMFED Tanzania operates within 28 districts (15 rural and 13 peri-urban districts); we purposively selected 5 of these districts based on criteria to allow a cross section of district characteristics (rural, peri-urban; coastal, inland; higher and lower scores on national exams) and bearing in mind practical considerations (for example, journey times for fieldworkers, budgetary constraints). We drew up lists of GEC schools in each of the 5 selected districts and randomly selected schools in each for our sample according to the number of supported schools in the district (between 10–34 selected schools per district = 90 schools in total).

At each school a target sample of 34 students were randomly selected from the school student roll (in a few schools there were fewer than 34 students in the specified forms), additionally 2 teachers were randomly selected from the staff list. Each school has a single teacher mentor and usually 3 learner guides engaged in delivering the GEC fund project all of whom were invited to take part in the study. These approximately 40 individuals per school were invited to participate. The fieldworker visited the school and described the project, giving the opportunity for prospective participants to ask questions, and provided the prospective participants with participant information sheets and consent forms (including second copies of these for parents and guardians if the participant was a student). They were given a minimum of 24 h to consider the information and return the consent form. No student or teacher declined consent although we planned to randomly select a further participant from the appropriate group if that situation arose (including a teacher to replace a teacher mentor in the situation that a teacher mentor declined participation).

Eight schools were purposively chosen for focus groups with students, and either older (forms 3 and 4- usually aged 17 +) or younger (forms 1 and 2- usually aged 15 +) adolescents from that school were invited to participate (four focus groups of younger or two of older adolescents respectively). Eight different schools were purposively chosen for focus groups with adults and all teachers and learner guides taking part in quantitative data collection at those schools were invited to participate. For the adult focus groups, teachers or learner guides came together from across participating schools for the four focus groups. The schools were chosen in order to reflect varying school characteristics. Equal numbers of urban and rural schools participated in qualitative data collection.

Data collection

A questionnaire to collect quantitative data was designed by the research team comprising OO, SSB, LB, DK and LW, based on questions that have been used in existing established survey questionnaires. It was designed to take approximately 15 min to complete.

The quantitative data set included: the WEMWBS, comparator questions (The World Health Organisation- Five Well-being Index (WHO5), the 12-item General Health Questionnaire (GHQ-12), Office for National Statistics-4 (ONS-4) and self-reported health), socio-demographic variables (age, sex, ‘marginality indicators’ (a set of 20 questions based on Tanzania’s national guidelines for the Care and Support of the Most Vulnerable Children)) and questions relating to GEC exposure, collected directly from participants via questionnaires on tablets using the Open Data Kit application. Please note that the short form of WEMWBS (SWEMWBS) uses 7 of the 14 WEMWBS items.

Data collection was completed by a team of 20 fieldworkers from the Tanzanian chapter of CAMA, the pan-African network of educated young African women (graduates of the CAMFED programmes), after training by the CAMFED Monitoring and Evaluation team and OO, and with continuous supervision by DK, the Head of Monitoring and Evaluation at CAMFED.

At the end of every day of data collection, the data were submitted into the CAMFED server for secure storage. These data were checked for quality, specifically that the data matches what is expected (the no. of participants for example, and that there were no duplicates).

Qualitative data were collected in 12 focus groups run after quantitative data collection in a subset of the schools, as part of the validation, in order to ask about participants’ experiences of completing the questionnaire, particularly their thoughts on the different wellbeing questions asked. In addition, the topic guide included questions on the concept of mental wellbeing more generally. Focus groups were recorded in Swahili on an encrypted audio recording device then transcribed by the CAMA fieldworkers who collected the data. Data were anonymised during transcription. Translation was then completed by CAMA fieldworkers and some English teachers based within GEC secondary schools, known to CAMFED and previously commissioned for similar work. Twelve focus groups were used, based on pragmatic considerations and with the expectation that data saturation would be reached (it was).

Qualitative analysis

Both inductive and deductive qualitative analytical techniques were used to analyse the content of the transcripts in English. This analysis was part of the validation and explored the acceptability and comprehensibility of the WEMWBS tools (and comparator tools) as well as what the concept of wellbeing means to the participants and whether WEMWBS captured this (with the deductive analysis specifically testing whether concepts relating to WEMWBS were present in the testimony). Coding was completed by one researcher (BW) and with reflection with a second researcher throughout the process (OO).

Confirmatory factor analysis

Confirmatory factor analysis (CFA) was performed for validation of the Swahili (S)WEMWBS in three steps: (1) specification of theoretical model, (2) modification based on potentially misspecified parameters, and (3) assessment of global fit of the modified model and the relative difference (RD) of parameters between theoretical and the modified model (\(\frac{{\theta }_{modified}-{\theta }_{theoretical}}{{\theta }_{modified}}\)). The modified model was considered appropriate when (i) no parameters were severely misspecified, (ii) RD in loadings compared to the theoretical model was negligible (max. RD < 10%), and (iii) global fit indexes fell within pre-defined ranges of acceptability. We used thresholds for good fit (CFI & TLI > 0.95, RMSEA & SRMR < 0.06) and acceptable fit (Comparative Fit Index (CFI) & Tucker-Lewis Index (TLI) > 0.90, Root Mean Square Error of Approximation (RMSEA) & Standardised Root Mean Square Residual (SRMR) < 0.08). This three-step CFA allowed us to model sources of error that could be substantively irrelevant (i.e., produced negligible parameter bias or RD) but improved the precision of reliability indices (diminishing the risk of overestimating scores' reliability).

We used MacDonald's ω as a reliability index because it accounts for factor structure (including, for example, correlated residuals) and is more appropriate when the loadings vary – α is reported for discussion. Two sets of cut-off values have been considered for ω. The first is a traditional, scaled set of cut-off values: excellent (> 0.90), good (> 0.80), acceptable (> 0.70), questionable (> 0.60) and poor (> 0.50). The second is practical: adequate for general purpose or research > 0.70, best for high-stakes decisions > 0.90). We also evaluated factor score determinacy (FD: acceptable > 0.80, good > 0.90 – Grice, 2001) for use of loading-based scores and construct replicability (H: acceptable > 0.70, good > 0.85 – Hancock & Mueller [9]) or specifying SEM.

Measurement invariance was examined following Wu and Estabrook’s [10] recommendations for models with ordinal indicators (such as WEMWBS), assessing changes in the same fit indices used for global fit evaluation. Following Rutkowski and Svetina [11] changes in CFI of up to -0.02 and RMSEA of up to 0.03 were considered appropriate for tests of metric/weak invariance, while ΔCFI ≥ -0.01 for and ΔRMSEA ≤ 0.01 were considered appropriate for scalar/strong invariance tests. We also considered Chen's [12] recommendation of a change in SRMR ≤ 0.030 (for metric invariance) or ≤ 0.015 (for scalar or residual invariance).

The data for indicators in all instruments was skewed due to increased concentration in the higher (better) values; with endorsement of the highest response category over 40% in some items. All instruments presented sparse data with more than half of their items having < 0.5% of endorsement in the lowest (i.e., worse scoring) response category. Therefore, following DiStefano et al. [13] guidelines, we evaluated each model both with and without collapsed response categories. Models were fitted using unweighted least squares estimator with mean and variance adjusted (scale-shifted approach, ULSMV) since is best suited for ordered data with skewed distributions.

Details and justification about cut-off values, specification of the models, assessment, and modification of misspecified parameters, and chosen estimator are given in Supplemental Materials.

Multivariable analysis

To address the secondary aim of this study, we defined groups within the population, those with low mental wellbeing, scoring below the lowest quartile in our data; those with high mental wellbeing scoring above the highest quartile. Logistic regression models were used to generate odds of low mental wellbeing compared to the rest of the population and to generate the odds of high mental wellbeing compared to the rest of the population in two sets of models. Unadjusted models included each characteristic in turn (gender, age group, location (urban vs rural), self-reported health, form (adolescents only), marginality (adolescents only), education (adults only), role (adults only), main occupation (adults only)) and fully adjusted multivariable models included a subset of characteristics of interest, selected based on theory to reduce multicollinearity (sex, location, self-reported health, form(adolescents only), marginality (adolescents only), role (adults only) and education (adults only)).


CFA and the correlation among test scores were modelled within the R language and environment (r core team, 2020) using the lavaan package [14] and related helper functions within the semTools package [15] such as reliability (for α and ω) and miPowerFit (for EPC and MI analysis), as well as the BifactorIndicesCalculator [16] for H and FD indices. Qualitative analysis was completed using Microsoft Word. Descriptive statistics, univariable and multivariable regression analyses were conducted in StataIC version 16.1.


3052 students and 574 adults were recruited into the study and provided data to the quantitative analyses. The characteristics of the included participants are shown in Table 1 (students) and Table 2 (adults). The 96 qualitative participants were a subset of these (32 adults in 4 focus groups: 16 learner guides, 16 teachers; and 64 students in 8 focus groups: 32 aged 15–16, 32 aged 17 +).

Table 1 Characteristics of Student Survey Participants
Table 2 Characteristics of Adult Survey Participants

Qualitative validation

Overall WEMWBS was understood by the participants, and participants felt the way they answered would give someone a good indication of their wellbeing. The participants reported finding it easy to complete and answering honestly with frequent use of words such as “self-explanatory” and commenting on the relevance of the questions to their lives.

Learner-guide: I was free and I answered honestly from the heart because it is something I deal with every day

Some specific WEMWBS items (confidence, feeling cheerful, dealing with problems well and feeling relaxed) were directly referenced when participants discussed what they considered to be important indicators of good wellbeing.

Leaner-guide: If you are in good wellbeing you will have peace and you will also be happy and enjoy your life.

Participants also presented the idea that mental wellbeing includes making a positive contribution to society, and considered that personal wellbeing might be related to community wellbeing. One student also talked about obedience, which might be considered a way of relating positively to the community.

Teacher: Good wellbeing means a lot to me and to someone else. First I grow confident, I have good relationships with other people, thirdly it brings me positive development, because I grow up confident, I am healthy because good wellbeing plays a big part in the health and life of the community around you, so I see it leads to development even in your community.

Student: Good wellbeing is the one that leads a person to do something that he has agreed to for the whole community

Student: Wellbeing is being disciplined and obedient all the time.

When asked to describe poor mental wellbeing participants across the focus groups suggested criteria relating to problem behaviour and its consequences, as well as negative feelings and perspectives; some responses included:

Student: He can also be a person all the time knowing that citizens hate him because of the behaviour he goes with which is not good, he can also be lonely due to his own situation.

Student: He can also have negative thoughts and decide to do something wrong. Sometimes he can be harsh eg he can rob, or hurt others.

Student: actually it’s not easy to identify them but through different researchers these people have pain, feels lonely and abandoned. Secondly these people don’t trust themselves when they’re in front of people also feel unloved and separate themselves from people, also they feel scared and don’t trust themselves.”

Student “sometimes they can use marijuana”.

Teacher "most of the time he will do things that are against the school and the community for instance if a person is a drunkard or a thief in that community they will not like him.”

All groups liked being asked positively framed questions and several commented on not liking the negatively framed questions included in the survey. Certainly the testimony indicated that WEMWBS was understood, generally applicable to the participants’ lives and captured the main elements of mental wellbeing as they described it. There were no obvious differences in conceptualisation of mental wellbeing between the students and the adults who participated.

Quantitative validation

Sparse data and collapsed response categories

The dataset presented sparse data for WEMWBS, i.e., response categories with < 10% endorsement in several items. Because this can produce significant differences in parameter estimates, standard errors, chi-square-values, and chi-square-based fit indices, we calculated CFA models with both sparse and collapsed data (merging low-endorsement response categories with adjacent values) before assessing their psychometric qualities. Using collapsed response categories in the presence of sparse data has been shown to improve the precision of parameter and standard errors with ULSMV estimator and is a better option than using robust maximum likelihood estimators while treating the data as interval numeric [13]. Since the parameter estimates with collapsed data presented low RD (only three items among all instruments with RD > 10%), we reported the results with non-collapsed data to avoid overestimating fit or reliability indices. In particular, (S)WEMWBS models presented negligible parameter RD when response categories were collapsed (max RD = 2.76%).

Only for measurement invariance did we use collapsed versions of WEMWBS' items 1, 6, 7, and 12. We merged the lowest (worst) response category with the second-lowest because at least one group (e.g., adults, males) presented zero observations in the lowest one.


The theoretical model for WEMWBS (unidimensional, no correlated residuals) presented acceptable global fit indices (Table 3). Following the analysis of MI and EPC, we included three correlated residuals in the model: items 1–2 (first two items of the instrument), 4–13 (both about feeling "interested"), and items 8–14 (concerned with feeling "good"/"cheerful"). This model presented optimal global fit indices. The RD of factorial loads between the theoretical and modified models was negligible (mean = 2.1%, max = 4.9%) — therefore indicating that the correlations of residuals were substantively irrelevant, affecting the reliability of the scale but not the meaning of the measured construct.

Table 3 Global fit and reliability indices for theoretical, modified and collapsed-category models of (S)WEMWBS

For SWEMWBS’s scores, where the initial model showed an optimal fit except for RMSEA. As with WEMWBS, correlation of residuals for items 1–2 was modelled, resulting in acceptable RD in loading parameters (mean = 3.6%, max = 9.0%) and optimal global fit indices (Table 3).

Based on the models with correlated residuals, unit-weighted composite scores for WEMWBS and SWEMWBS presented practically equal and acceptable levels of reliability (ω = 0.71 and 0.70). Nonetheless, the factor scores for WEMWBS offered higher determinacy (FD = 0.94 vs 0.89) and replicability (H = 0.88 vs 0.79), positioning it in the optimal range for both (FD ≥ 0.90, H ≥ 0.85) while SWEMWBS fell in the acceptable range (FD ≥ 0.80, H ≥ 0.70).

Factorial loads were mostly good (≥ 0.50) for both sets of items with only item 4 falling significantly below 0.50 (0.35; 95%CI = 0.32–0.38) in the WEMWBS and item 13 being also < 0.50, though not significantly so (0.48; 95%CI = 0.45–0.51) (Table 4). Both items are not present in the short WEMWBS. WEMWBS and SWEMWBS appeared to measure the same construct since the RDs of the loadings of the common items were low (mean = 3.6%, max = 9.0%).

Table 4 Factorial loadings for theoretical, modified and collapsed-category models of (S)WEMWBS

Models with collapsed categories presented negligibly better fit indices than the correlated-residuals models, and practically identical reliability coefficients or factorial loadings (maximum difference < 0.02 for either loadings or reliability).

Other instruments’ CFA

The same rationale was applied to the other instruments we included in the survey questionnaire for the purposes of validation. WHO-5 did not present any significantly misspecified parameter. ONS-4 presented severely misspecified parameters, though releasing constraints was not feasible due to the low number of items and, therefore, lack of degrees of freedom.

Only WHO-5's unit-weighted composite scores were acceptable and adequate for general purposes such as research (ω ≥ 0.70); though not for high-stakes decisions (ω < 0.90). On the other hand, the factor scores of all but ONS-4 presented acceptable (FD ≥ 0.80, H ≥ 0.70) to good (FD ≥ 0.90, H ≥ 0.85) determinacy and replicability. GHQ-12 offered an interesting example of the relevance of accounting for factorial structure when assessing the reliability of scores. Though α (not informed by factorial structure) is acceptable (≥ 0.70) or good (≥ 0.90) and the same disregarding the presence of method factors, ω is only acceptable in the severely misspecified unidimensional model and one of the three-correlated factors. In the models with method factors accounting for method-implied correlated residuals, ω is either questionable (< 0.70) or poor (< 0.50). This implies that GHQ-12's unit-weighted composite scores appeared unsuited for practical use.

Measurement invariance

Both WEMWBS and SWEMWBS met all criteria for strong measurement invariance (ΔCFI ≥ -0.01, ΔRMSEA ≤ 0.01, ΔSRMR ≤ 0.015) across gender, rural/urban districts, and age groups (Table 5, only WEMWBS shown). All instruments were invariant across rural/urban districts. WHO-5 also presented strong invariance across age and gender. ONS4 showed strong invariance across gender, and partial invariance across age groups (releasing intercept of item 2). GHQ-12 was partially invariant across age (released intercept of item 8) and gender (released intercepts of items 3, 4, 5, and 9).

Table 5 WEMWBS’s measurement invariance across sex, rural/urban districts, and age group

We also compared the reliability of the scores by group. Both unit-weighted composites and factor scores for the adult group were more reliable for the WEMWBS (ω = 0.79 vs 0.71; H = 0.92 vs 0.88), SWEMWBS (ω = 0.78 vs 0.69; H = 0.85 vs 0.78), and WHO-5 (ω = 0.80 vs 0.73; H = 0.88 vs 0.80) than for the other instruments examined. In conjunction with the analysis of measurement equivalence through CFA, this means that scores in the teenager population carry more "noise", despite measuring the same construct with substantively equal meaning given a specific value. Difference in reliability across gender or rural/urban districts were ≤ 0.02 in either ω, H, or FD coefficients.

Criterion validity

Variables correlated as expected, though with weaker coefficients than anticipated (Table 6). When analysing these correlations within groups, we observed that the correlations in the adults sub-sample were closer to the expected values based on previous studies, while in the students' sample some correlations were significantly weaker. For example, in adults WEMWBS and SWEMWBS presented higher correlations with WHO-5 (Δρ = 0.11 and 0.15 respectively), ONS4 (Δρ = 0.11 and 0.13), and self-reported health (Δρ = 0.19 and 0.22).

Table 6 Correlation of (S)WEMWBS scores with scores of related instruments and their differences between students and adults

No significant differences in correlation coefficients were observed across gender or rural/urban districts.

Secondary aim: Distribution and factors associated with mental wellbeing

In the student sample, the mean WEMWBS score was 53.5, with a standard deviation of 9.04. Univariable models found that being male and living in an urban area were associated with better mental wellbeing, the odds of low wellbeing decreased with age (with no corresponding increase in the odds of high wellbeing with age). Adolescents in higher forms (more years of education) had better mental wellbeing than those in lower forms. Adolescents with better self-reported health had better wellbeing that those with poor self-reported health. As expected those classified as socially ‘marginalised’ had poor mental wellbeing compared with those who were not marginalised and the more marginality indicators that an adolescent indicated as applying to them, the poorer their mental wellbeing was likely to be. Those selecting the third option “other” as their gender appear to have the poorest mental wellbeing (Table 7).

Table 7 Univariable associations between mental wellbeing (WEMWBS) and individual characteristics in the student sample

In the adult sample, the mean WEMWBS score was 55.9 with a standard deviation of 8.34. The only significant association with high mental wellbeing in the univariable analyses identified an association between high self-reported health and high mental wellbeing. There were more significant associations between individual characteristics and low mental wellbeing: increasing age was associated with lower odds of low mental wellbeing; compared with learner guides, teacher mentors and teachers were less likely to have low wellbeing; urban adults are less likely to have low wellbeing than rural adults; more educated adults are less likely to have low wellbeing than less educated adults; and skilled manual workers are more likely to have low wellbeing than teachers (Table 8).

Table 8 Univariable associations between mental wellbeing (WEWMBS) and individual characteristics in the adult sample

In the student multivariable model, males have 0.6 (95% CI 0.51–0.72) times the odds of low wellbeing and 1.41 (95% CI 1.20–1.66) times the odds of high wellbeing compared to females. The number of students selecting “other” for their sex were too small in number to draw firm conclusions. Students in form 3 and 4 had significantly lower odds of low mental wellbeing that students in form 1 do (0.63 (95% CI 0.50–0.80) and 0.75 (95% CI 0.57–0.99) respectively). Students in form 3 also had significantly higher odds of high mental wellbeing compared with students in form 1 (1.46 (95% CI 1.20–1.66)). Urban students had lower odds of low mental wellbeing and higher odds of high mental wellbeing compared with rural students (0.61 (95% CI 0.49–0.76), 1.25 (95% CI 1.03–1.53) respectively). Students categorised as marginalised had higher odds of low mental wellbeing and lower odds of high mental wellbeing (1.76 (1.16–2.67) and 0.56 (0.41–0.75) respectively). Finally, there were increasing odds of poor mental wellbeing and decreasing odds of high mental wellbeing for those self-reporting poorer health (Table 9).

Table 9 Student sample multivariable model results

In the adult multivariable model, the only significant associations were for location (with urban adults less likely to have low mental wellbeing OR 0.51 (0.31–0.83)) and for self-reported health with both low and high mental wellbeing (Table 10).

Table 10 Adult multivariable analysis


Overall, with respect to our primary aim: the Swahili translation of WEMWBS and its short form were applicable, understood, and relevant to GEC participants which was demonstrated through the qualitative data and through the 100% completion rate of the survey. Additionally, they met quantitative tests of reliability and validity i.e.: they were correlated with comparator scales and met the criteria to determine a single factor structure. This Swahili translation of WEMWBS is now available for use ( With respect to the secondary aim of the study: for students in the GEC supported government schools mental wellbeing is higher in students in the final two ‘forms’ of school compared with the first two. In addition, being male, urban residence, the absence of markers of marginality and better self-reported health were all significantly associated with better mental wellbeing. For adults, urban residence and better self-reported health were associated with better mental wellbeing.

In our study both unit-weighted composites and factor scores for the adult group were more reliable for the WEMWBS, SWEMWBS, and WHO-5 than for ONS-4 and GHQ-12, this implies that for this adult sample the reliability of (S)WEMWBS and WHO-5 is distinctively higher than that of the other instruments we collected. WHO-5’s scores were the only ones with acceptable reliability (omega > 0.70) besides (S)WEMWBS. On the one hand, this makes the correlation between (S)WEMWBS and WHO-5 scores the most interesting for assessing criterion validity, since the other scores are not as reliable. On the other hand, it seems sensible to recommend the use of either version of (S)WEMWBS in further wellbeing studies when there is an interest in observing mental wellbeing specifically, and not physical wellbeing – since WHO-5 conflates both into a general wellbeing measurement. If such general wellbeing were the exclusive interest, WHO-5 might produce more reliable scores than ONS-4. Although WEMWBS and GHQ-12 have slightly different foci – mental wellbeing vs ill-health, the unsatisfactory reliability of the observed GHQ-12 scores in the population studied in this validation, might imply that WEMWBS is a better option for studying mental health in normal populations in Tanzania. This would be emphasized if the length of the questionnaire was of concern, since SWEMWBS offers an economical alternative with no loss of reliability when the intended use is unit-weighted composite scores. Of course, these observations on the adequacy of structural models and the reliability of scores of each instrument need to be further studied in other Swahili-speaking populations.

It was interesting that one aspect of mental wellbeing alluded to by focus group participants was the contribution of individual wellbeing to community wellbeing. Those with higher wellbeing were thought to make a positive contribution to the wider community. This attribute of mental wellbeing is not captured by WEMWBS other than through the item ‘I feel useful’ or by any of the other tools included in our study. Amongst other available wellbeing measures, the mental health continuum short form may have items that captured this element of mental wellbeing and could be further explored for use in Tanzania. For example in the mental health continuum question “During the past month, how often have you felt the following?”, Items include: “Part of your community”, “important to your society” [17].

Average scores in the population of this study were higher than in other populations [18,19,20] and there was a tendency for sparse data in the lowest response categories for some items, and particularly among men. This general uplift in average score may be related to some stigmatizing attitudes towards poor mental health expressed in the focus groups, and an increased likelihood of social desirability bias. WEMWBS (as a self-completion questionnaire) might partially mitigate against social desirability bias, and because it offers a positively framed approach to mental health, that may be more acceptable to this population. Despite this, it is possible that mean population norms from this population are inflated and that could have implications when comparing with other populations. Men may have better wellbeing, or may have greater score inflation, perhaps because gender norms influence the degree of social desirability bias. There is evidence of this from other cultural contexts [21], and this could be explored further. Measurement invariance analysis would be valuable to mitigate this possible effect when comparing different populations' scores.

In terms of the epidemiology of mental wellbeing in our GEC population, one of the striking findings in adolescents is the improvement in mental wellbeing in Form 3. Students sit exams at the end of their Form 2 year, and it may be that exam stress undermines wellbeing as adolescents approach these exams, or there may be survivor bias (the students who ‘survive’ the exam and continue with their education in Form 3 are the most resilient). Better wellbeing among male students compared with female students, and compared with those who did not wish to indicate their sex (although those numbers were small) is in line with literature that suggests greater rates of minor mental illness among females compared with males [22] and poor wellbeing among transgender people who often face stigma, discrimination and exclusion[23]. Better wellbeing among those with fewer marginality indicators was also expected, as these marginality indicators all suggest disadvantage for the students. The well-evidenced bidirectional link between physical and mental health is likely responsible for the association between better wellbeing and better self-reported health[24]. The urban advantage is likely due to better access to resources and opportunities in urban areas. It was unexpected that there was no gender difference in mental wellbeing for adults, we hypothesise that this is because our adult sample is not representative and may include particularly resilient women (secondary school completers and teachers).

Strengths and limitations

This was a well-powered study, using Gold standard translation to develop a version of WEMWBS in Swahili. The methodological decisions for CFA aimed for an analysis that meets the literature standards better than often seen in similar practical applications. This allowed a more rigorous and informative assessment of the strengths and limitations of WEMWBS. For example, examining the data as ordinal with an adequate estimator allowed us to examine the implications of sparse data in the lowest (worst) response categories – thereby tackling a plausible limitation of the data and potential explanations for general tendencies, like the high scores when contrasted with previous international studies.

Another example of the high-quality quantitative analysis was the use of omega instead of alpha, which allowed us to observe that the scores of both WEMWBS and SWEMWBS are as reliable (0.71 and 0.70 respectively) for using unit-weighted composites, while an assessment based on alpha would have indicated that WEMWBS gave more reliable scores (0.88 vs 0.79). We further took advantage of this through the systematic evaluation and reporting of the correlated residuals included in the models, alongside their effects – or lack thereof – on the substantive meaning of the scores, since these correlations were considered by omega, and further contrasted examining their RD with the theoretical model’s loadings. Reporting H and FD also informed us that, although the Swahili versions of both SWEMWBS and WEMWBS’ scores are acceptable for calculating factor scores or modelling measuring models in an SEM framework, the full WEMWBS is better for these purposes. This information might guide decisions on which version to use depending on the resources available and the goals of the implementation. We can observe advantages also in the use of CFA to assess the scores of other instruments intended for examining criterion validity and further practical recommendations for measuring wellbeing in the target population.

By using a mixed methods approach we were able to examine both the acceptability of the translated (S)WEMWBS for measuring mental wellbeing for researchers and for participants. This also allowed us to examine whether qualitative participants discussed anything challenging or unexpected about the items that quantitative analyses picked up as less consistent (for example WEMWBS items 4 and 13, which in this case was reassuring).

It is worth noting that neither student nor adult sample are representative of the general population in Tanzania: the student sample is particularly deprived- from schools identified for extra support through the Girls Education Challenge fund. Meanwhile the adult sample is dominated by teachers and likely to be privileged compared to an adult sample from the general population. We completed data collection in August 2020, during the COVID-19 pandemic. Students had recently returned to schools but everyone was in the midst of change and uncertainty. This means it is unlikely that we can generalise the findings from this study to students and adults in Tanzania in general.


Swahili WEMWBS has been validated for use in populations aged 15 + years. It can now be accessed and used to measure mental wellbeing in relevant groups. When making international comparisons, it is important to consider measurement invariance analysis, given evidence presented here of higher mean scores that may be routed in social desirability bias.

Our analysis is being used by CAMFED to develop their life skills programme and teacher training contents, in order to improve mental wellbeing throughout the GEC funded schools to promote better educational outcomes for their students.

Availability of data and materials

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.



Campaign for female education


Confirmatory Factor Analysis


Comparative Fit Index


Girls Education Challenge


The 12-item General Health Questionnaire (GHQ-12)


Office for National Statistics-4


Relative Difference


Root Mean Square Error of Approximation


Standardized Root Mean Square Residual


Short Warwick Edinburgh Mental Wellbeing Scale


Tucker-Lewis Index


Warwick Edinburgh Mental Wellbeing Scale


The World Health Organisation- Five Well-being Index


  1. Maccagnan A, Wren-Lewis S, Brown H, Taylor T. Wellbeing and Society: Towards Quantification of the Co-benefits of Wellbeing. Social Indicators Research. 2018;141:217–43.

    Article  Google Scholar 

  2. Clarke A, Friede T, Putz R, Ashdown J, Martin S, Blake A, et al. Warwick-Edinburgh Mental Well-being Scale (WEMWBS): Validated for teenage school students in England and Scotland. A mixed methods assessment. BMC Public Health. 2011;11(1):1–9.

    Article  Google Scholar 

  3. Melendez-Torres GJ, Hewitt G, Hallingberg B, Anthony R, Collishaw S, Hall J, et al. Measurement invariance properties and external construct validity of the short Warwick-Edinburgh mental wellbeing scale in a large national sample of secondary school students in Wales. Health Qual Life Outcomes. 2019;17(1):1–9.

    Article  Google Scholar 

  4. Dong A, Chen X, Zhu L, Shi L, Cai Y, Shi B, Shao L, Guo W. Translation and validation of a Chinese version of the Warwick-Edinburgh Mental Well-being Scale with undergraduate nursing trainees. J Psychiatr Ment Health Nurs. 2016;23(9-10):554-60.

  5. Morris S, Earl K. Health Survey for England 2016 Well-being and mental health Health Survey for England 2016: Well-being and mental health. 2017.

  6. University of Essex. Understanding Society – The UK Household Longitudinal Study [Internet]. 2021 [cited 2021 Jul 23]. Available from:

  7. Deighton J, Lereya ST, Morgan E, Breedvelt H, Martin K, Feltham A, Antha D, Hagel A, Fonagy P, Humphrey N, Dalzell K. Measuring and monitoring children and young people’s mental wellbeing: A toolkit for schools and colleges. Public Health England and the Evidence Based Practice Unit. 2016.

  8. Anthoine E, Moret L, Regnault A, et al. Sample size used to validate a scale: a review of publications on newly-developed patient reported outcomes measures. Health Qual Life Outcomes. 2014;12(2).

  9. Hancock GR, Mueller RO. Rethinking construct reliability within latent variable systems. In: Cudeck R, Jöreskog KG, Sörbom D, Du Toit S, editors. Structural equation modeling: Present and future – a festschrift in honor of Karl Joreskog. Scientific Software International; 2001. p. 195–216.

  10. Wu H, Estabrook R. Identification of Confirmatory Factor Analysis Models of Different Levels of Invariance for Ordered Categorical Outcomes. Psychometrika. 2016;81(4):1014–45.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Rutkowski L, Svetina D. Assessing the Hypothesis of Measurement Invariance in the Context of Large-Scale International Surveys. 2014;74(1):31–57.

  12. Chen FF. Sensitivity of Goodness of Fit Indexes to Lack of Measurement Invariance. 2007;14(3):464–504.

  13. DiStefano C, Shi D, Morgan GB. Collapsing Categories is Often More Advantageous than Modeling Sparse Data: Investigations in the CFA Framework. 2020;28(2):237–49.

  14. Rosseel Y. lavaan: An R Package for Structural Equation Modeling. J Stat Softw. 2012;48(1):1–36.

    Google Scholar 

  15. Jorgensen TD, Pornprasertmanit S, Schoemann AM, Rosseel Y, Miller P, Quick C, Garnier-Villarreal M, Selig J, Boulton A, Preacher K, Coffman D. Package ‘semTools’. 2016.

  16. Dueber D. BifactorIndicesCalculator: Bifactor indices calculator. R language and environment for statistical computing; 2020.

  17. Żemojtel-Piotrowska M, Piotrowski JP, Osin EN, Cieciuch J, Adams BG, Ardi R, et al. The mental health continuum-short form: The structure and application for cross-cultural studies–A 38 nation study. J Clin Psychol. 2018;74(6):1034–52.

    Article  PubMed  Google Scholar 

  18. Clarke A, Friede T, Putz R, Ashdown J, Martin S, Blake A, et al. Warwick-Edinburgh Mental Well-being Scale (WEMWBS): Validated for teenage school students in England and Scotland. A mixed methods assessment. BMC Public Health. 2011;11(1):1–9.

    Article  Google Scholar 

  19. Koushede V, Lasgaard M, Hinrichsen C, Meilstrup C, Nielsen L, Rayce SB, et al. Measuring mental well-being in Denmark: Validation of the original and short version of the Warwick-Edinburgh mental well-being scale (WEMWBS and SWEMWBS) and cross-cultural comparison across four European settings. Psychiatry Res. 2019;271:502–9.

    Article  PubMed  Google Scholar 

  20. WEMWBS Population Norms in Health Survey for England data 2011 [Internet]. 2011 [cited 2021 Sep 16]. Available from:

  21. Sigmon ST, Pells JJ, Boulard NE, Whitcomb-Smith S, Edenfield TM, Hermann BA, LaMattina SM, Schartel JG, Kubik E. Gender differences in self-reports of depression: The response bias hypothesis revisited. Sex Roles. 2005;53(5-6):401-11.

  22. Kuehner C. Gender differences in unipolar depression: an update of epidemiological findings and possible explanations. Acta Psychiatr Scand [Internet]. 2003 Sep 1 [cited 2023 Feb 9];108(3):163–74. Available from:

  23. Winter S, Diamond M, Green J, Karasic D, Reed T, Whittle S, et al. Transgender people: health at the margins of society. The Lancet. 2016;388(10042):390–400.

    Article  Google Scholar 

  24. Doherty AM, Gaughran F. The interface of physical and mental health. Soc Psychiatry Psychiatr Epidemiol [Internet]. 2014 Feb 22 [cited 2023 Feb 9];49(5):673–82. Available from:

Download references


We would like to thank Manoj Nanji, Lilian Msuya, Hamisi Babu, Festo Mboya and Samwel Stanley for their support with translation of WEMWBS. Thank you to Alice Coffey for help with formatting the tables and references for the manuscript.


This project was funded by a Warwick Global Challenges Research Fund accelerator grant.

Author information

Authors and Affiliations



OO, SSB and LB conceived of the project. OO and DK managed the day-to-day running of the project (ethical application, training field-workers, data collection and management) with input from SSB, LB and LW. MT-S completed the CFA, EB-M completed quantitative epidemiological analyses, RW completed the qualitative analysis. NS completed a literature review which contributed to the introduction and discussion sections of the manuscript. OO and MT-S wrote the main manuscript text. All authors reviewed the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Oyinlola Oyebode.

Ethics declarations

Ethics approval and consent to participate

Ethical approval for this study was obtained from the National Institute for Medical Research, Tanzania, reference: NIMR/HQ/R.8a/Vol.IX/3472 and from the Biomedical and Scientific Research Ethics Committee of the University of Warwick, reference: BSREC 75/19–20. All participants gave their informed consent to take part in the study, and for those aged under 16, parental consent was also obtained. Details of the consent procedures are detailed in the methods section of this study. All methods were performed in accordance with the relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

Sarah Stewart-Brown developed the WEMWBS and has been facilitating their use across global populations for 15 years. WEMWBS potential as a change agent is being explored in the commercial sector at present where a charge is made for the use of the scale from which she could benefit financially. The other authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oyebode, O., Torres-Sahli, M., Kapinga, D. et al. Swahili translation and validation of the Warwick Edinburgh Mental Wellbeing Scale (WEMWBS) in adolescents and adults taking part in the girls’ education challenge fund project in Tanzania. Health Qual Life Outcomes 21, 43 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: