Study design and setting
This cross-sectional study, which took place between May 2007 and October 2009, used data from a randomized controlled trial (RCT) in Nairobi, Kenya (N = 538) (ClinicalTrials.gov number, NCT00830622) [8]. Baseline data were collected prior to initiating ART or receiving the intervention. Data from participants in both trial arms were pooled to conduct these analyses. This multi-site trial involved three HIV clinics located in demographically and ethnographically diverse settings [8].
Participants
Inclusion criteria were ART naïvety, aged 18 years or above, access to a mobile phone, and the ability to text message or have somebody who could text message on their behalf. Individuals who met the inclusion criteria and consented to participate were randomized to either receive a cell-phone based adherence intervention or standard care only. The study protocol was approved by the University of Manitoba and Kenyatta National Hospital ethics review boards [8]. The sample size calculation was based on primary trial outcomes including Viral load and adherence. While the trial was not specifically powered to measure the secondary HRQoL outcomes, a post-hoc sample size calculation revealed the sample was adequate to detect the MCID differences.
Data and measures
The variables were defined at study entry, which took place at ART initiation. Individuals had been receiving care, but were ART naïve at the time of data collection. A translated and adapted SF-12 version one survey was administered to participants at baseline along with a survey that collected data on gender, age, income and rural/urban residence. The SF-12 was administered on the same day that the WHO stage, CD4 count and viral load measures were taken. CD4 count was collected (FACScan, Becton Dickinson, Sunnyvale, CA, USA) as part of routine clinical care and viral load (Amplicor, Roche Diagnostics, Mannheim, Germany) was assessed as part of the trial protocol [8]. Research clinicians administering the baseline survey assessed the World Health Organization (WHO) clinical stage of HIV infection [8].
Theoretical foundation
A longer form of the SF-12, the SF-36, has been translated and adapted for use in 40 countries as part of the International Quality of Life Assessment (IQOLA) project [9]. Kiswahili, the primary language in many East African nations, was not among the original IQOLA project translations. However, two subsequent studies (Wagner et al. and Wyss et al.) translated and evaluated a Kiswahili translated SF-36 survey [10, 11]. Wagner et al. evaluated content, quality and scaling of the translated survey in a general Kenyan population, demonstrating that the SF-36 survey performed comparably to the UK counterpart [10]. Wyss et al. extended this work by assessing the validity of the SF-36 using a method of known group validation [11]. They demonstrated that the SF-36 could discriminate health status between groups with known differences in health based on theory or evidence. The discriminative ability of a HRQoL survey is an important validation step to ensure the survey can adequately capture outcomes of interest [12]. The SF-36 is cumbersome to administer in research settings, so the briefer SF-12 was created [3]. The SF-12 has been shown to retain much of the descriptive ability and validity of the SF-36, but has not been validated in East Africa.
Translation and adaptation process
An international team of healthcare professionals and researchers translated the English SF-12 (Version 1) into Kiswahili based on IQOLA recommendations. The survey was reviewed by a multidisciplinary focus group of English and Kiswahili speaking healthcare providers and researchers for relevance, ease of understanding, and cultural appropriateness. Where necessary, items and response options were slightly modified and culturally adapted to make the questionnaire relevant and appropriate for use in a Kenyan context. Literature reviews and expert opinion were used to inform changes to the survey. For example, ‘climbing stairs’ in the original SF-12 was changed to ‘climbing a hill’, based on a previous study using the SF-36 in Tanzania [10, 11]. After translating the survey into Kiswahili, it was back translated into English and assessed by a focus group of English speaking healthcare researchers to ensure consistency. The survey was pre-tested on a sample of 20 Kenyan individuals and healthcare staff to evaluate cultural appropriateness and understanding.
Validation
We investigated the construct validity of the survey using known group validation [11]. This method involves demonstrating that the PCS, MCS or SF-6D survey scores are able to discriminate scores between groups known a priori to have differences in their health status. We used three established criteria to classify HIV severity: CD4 cell count, viral load, and WHO clinical stage of HIV infection.
We hypothesized that the HRQoL and HSUV would be lower in more advanced HIV disease stages independently of how severity was defined. Further, since HIV is predominantly a physical disease, we hypothesized physical scores would show greater differences than mental health scores. Our specific hypotheses were: 1. MCS, PCS and SF-6D scores would be lower in individuals with CD4 < 200; MCS, PCS and SF-6D scores would be lower in individuals with viral load >55,000 copies/ml; and MCS, PCS and SF-6D scores would be lower in individuals in WHO stages 2, 3 & 4 compared to individuals in WHO stage 1. Since WHO stage 1 individuals are asymptomatic, we suspected that there would be a bigger difference in HRQoL and HSUV between these individuals and more symptomatic individuals [13].
Severity threshold definitions
We used the United States (US) Center for Disease Control (CDC) severity stages, based on CD4 cell count, as our first definition of disease severity [14]. Stage 1 includes individuals with a CD4 count ≥500 cells/mm3; stage 2 includes individuals with a CD4 count between 200 and 499 cells/mm3; and stage 3 includes individuals with CD4 count <200 cells/mm3. The vast majority of individuals initiating ART have CD4 near or below 350 cells/mm3, as that was the ART treatment guidelines in Kenya at the time. Further, presentation to care with advanced HIV care has been defined as having a CD4 count below 200 [15]. To maintain an adequate sample in both groups, we dichotomized individuals above and below CD4 count of 200 cells/mm3, reflecting a comparison of individuals with advanced HIV infection to those without advanced HIV infection.
Our second definition of severity was based on a previous US study that used viral load threshold to classify individuals [12]. Viral load is associated with disease progression: an increased viral load indicates advanced disease and predicts progression to AIDS or death [16]. We classified individuals above or below 55,000 copies/ml to assess differences in the scores and draw descriptive comparisons to the previous US sample [12].
Our third definition of severity was the WHO HIV clinical staging system, which is based on physical symptoms. The WHO clinical stages are particularly useful in limited-resource settings, as CD4 cell counts are not always available. Symptoms have been grouped into four stages. Stage one individuals are asymptomatic; stage two individuals have mild symptoms such as rash or upper respiratory tract infections; stage 3 individuals have moderate to severe symptoms such as unexplained chronic diarrhea for greater than 1 month; and stage 4 individuals have severe to life-threatening symptoms such as extreme weight loss or opportunistic infections.
Based on our three definitions of severity, we categorized our sample into two groups based on their CD4 count or viral load threshold and four groups according to WHO clinical stages. We assessed the PCS, MCS and SF-6D, compared scores between each groups, and determined the discriminative ability of the scores.
Statistical analysis
We conducted a descriptive analysis of the baseline characteristics of the study population, and stratified the results by the severity groups we defined. We calculated individual PCS and MCS scores using correlated weights from the US and SF-6D scores based on UK weights [1, 3, 17, 18]. The SF-12 was designed to give a population mean MCS and PCS of 50 with a standard deviation of 10 in a disease-free US population [3]. The minimum clinically significant difference (MCID) for both PCS and MCS scores has been suggested to be in the range 3–5 points; however, MCID for HRQoL scores are not well-established [19]. We used a change of 3 to interpret the clinical significance of differences that we observed, but caution is suggested in interpreting the MCID since a 1-point change can be meaningful if it came at no additional cost [19]. The MCID for the SF6D has been suggested to be 0.033 (95% CI 0.029 to 0.037) [20].
We calculated mean PCS, MCS and SF-6D scores in each of the severity categories. For CD4 and viral load threshold analyses, t-tests were used to test for statistical differences between the two groups. For the WHO clinical stage analysis, we used analysis of variance analysis (ANOVA) with a post-hoc analysis to test for differences in scores between the four groups. Participants with missing CD4 counts, viral load or WHO stage were excluded from the respective analysis.
We used receiver operator characteristic (ROC) curves as a second test of the discriminative ability of the instruments [12, 21]. Traditionally, a ROC plots the sensitivity by 1-specificity of a diagnostic test and helps to determine the ability of the test to discriminate between a diseased and non-diseased population. It has also previously been used to determine the construct validity of an instrument by evaluating if the instrument can correctly discriminate two groups known to have differing HRQOL [12]. We used ROC curves to assess whether the scores could correctly categorize a participant into a severity group using different threshold scores as cut-offs. The area under the ROC curve (AUC) is a measure of signal to noise of an instrument [21]. An AUC of 1 indicates perfect discriminatory ability; an AUC of between 0.8 to 1 shows good to excellent ability to discriminate; an AUC of between 0.7 to 0.8 shows fair discriminative ability; an AUC of between 0.60 and 0.70 shows weak ability to discriminate; an AUC below 0.60 indicates a failure to discriminate between groups; and an AUC of 0.50 suggests the instrument is no more useful to predict the group to which an individual belongs than flipping a coin [21].