Validation of the controlled ovarian stimulation impact measure (COSI): assessing the patient perspective

Background Controlled Ovarian Stimulation (COS) is the first step for in vitro fertilization (IVF) treatment, a treatment often described and experienced as stressful to patients and their partners. COS also requires concerted efforts by the patients in administering medication and general compliance to treatment protocols. Little is known about the impacts on patients that may be specific to this important first step in treatment. The absence of a conceptually sound and well-validated measure assessing patient experience and functioning during ovarian stimulation has been an obstacle to understanding the impacts of ovarian stimulation on women pursuing IVF. To address this gap, the Controlled Ovarian Stimulation Impact Measure (COSI) was developed based upon accepted methods for designing patient reported outcome (PRO) measures. The purpose of this study was to psychometrically validate the COSI. Methods 267 patients from three countries (Ireland, United Kingdom, United States) were administered the COSI. Psychometric validation was conducted according to an a priori statistical analysis plan. Results The final 28-item COSI was found to have robust scale structure with four domains: Interference in Daily Life (Work and Home), Injection Burden, Psychological Health and Compliance Worry. Internal consistency of all domains was adequate (between 0.80 to 0.87) as was test-retest reliability (between 0.72-0.87). All a-priori hypotheses for convergent and known-groups validity tests were met. Conclusions There is a measurable impact of COS on patient functioning and well-being. The COSI is a well-developed and validated PRO measure of this impact. Future work should include examination of responsiveness and confirmation of concepts in non-western countries.


Background
In vitro fertilization (IVF) has been documented as a potentially difficult treatment process with impacts on patients and their partners. Researchers have observed increased anxiety levels in patients at several junctures within the IVF cycle [1]. These points of increased impact include prior to the beginning of a cycle [2][3][4][5][6]; before oocyte retrieval and prior to embryo transfer [7][8][9][10]; before administering a pregnancy test [9][10][11]; waiting for the results of IVF [9,11]; and following an unsuccessful IVF [2][3][4][12][13][14]. Furthermore, there is some evidence that infertility-related stress, anxiety, and depressive symptoms may negatively affect infertility treatment outcome [15][16][17][18][19], however there are conflicting results in this research area [20]. Despite these reports, little attention has been given to discrete procedures within the IVF treatment process and impacts on patients that may be specific to these procedures and treatment phases.
IVF treatment involves five phases: (i) ovarian stimulation; (ii) oocyte retrieval; (iii) fertilization followed by embryo culture; (iv) embryo transfer; and (v) the luteal phase. Controlled Ovarian Stimulation (COS) is the first step in IVF treatment. COS also requires concerted efforts by the patients, and perhaps their partners, in administering medication and general compliance to treatment protocols. A recent study by the first author, consisting of patient interviews and focus groups in three countries, revealed that COS can place significant burdens on patients as a discrete phase in the IVF process, impacts so great that they could be recalled up to a year after the COS [21]. Furthermore, the impacts were sustained regardless of the outcome of IVF, whether or not pregnancy was achieved [21]. This study demonstrated that the frequent injection and monitoring associated with COS contributed to the burden specific to COS, impacting psychological and daily functioning. Despite this one study, little is known about the specific extent and burden of COS impacts on women pursuing IVF as currently the only validated measure of the impact of fertility problems on quality of life is the FertiQoL. However, the FertiQoL does not specifically focus on the impact of COS and no COS treatment specific patient reported outcome measure (PRO) of the impacts has been available.
To address this gap, the Controlled Ovarian Stimulation Impact Measure (COSI) was developed in order to provide a way of assessing and understanding the impacts of controlled ovarian stimulation on functioning and well-being for patients undergoing IVF. The development process followed accepted principles of measure development for patient reported outcomes [22]. To ensure content validity of the COSI, the concept elicitation phase included literature review, expert interviews, and direct input from 47 women who have experienced COS [21]. Based on these findings, a conceptual model was developed and 46 possible items were generated from the model using the patient/participant voices. The purpose of this study was to psychometrically validate the COSI by developing a formal measurement model that identifies domains and subscales from the initial item pool, and explore the reliability and validity of the COSI using a pre-specified analysis plan [23,24].

Subjects
This multicenter, non-interventional study was conducted in 10 clinics located in Ireland (1 site), the United Kingdom (3 sites), and the United States (6 sites). Women receiving Controlled Ovarian Stimulation (COS) for IVF and/or intracytoplasmic sperm injection (ICSI) treatment were enrolled in the study. Ethics approval was received in all 3 countries (US: Independent Review Consulting # 3239-001, UK: reference #07/H1306/161, Ireland: Rotunda Hospital, no number provided).
Eligibility criteria included women who were scheduled to undergo COS for IVF/ICSI within the next month, were between the ages of 18-39 years of age at the time of signing informed consent, able to read and speak English, and willing and able to sign informed consent. Women known to have either psychiatric illness or alcohol/drug abuse within the previous 12 months were not eligible to participate.
All eligible women who had appointments to proceed to human chorionic gonadotrophin (hCG) administration (the final COS injection) as part of their treatment were consecutively asked to participate in the study. Study participants who did not proceed to hCG administration and had a cancelled cycle were not allowed to continue in the study. These participants were dropped from the study and replaced by another patient.
The routine medical procedures and/or medications of the participants were not changed in any way by participation in the study. This study was reviewed and approved by an Institutional Review Board and all patients signed an informed consent.

Procedure
Participants were asked to complete a battery of questionnaires at the day of hCG or the day after. Women were given the battery to take home with them at their final monitoring visit prior to hCG administration, when the decision was made to proceed to hCG injection. If the decision to proceed to hCG was expected to be made when the woman was not in the clinic, the battery was given at the participant's monitoring visit on stimulation day 7 or 8 of their COS treatment. Women were informed that they should complete all questionnaires in the validation battery either the day of or the day after their hCG injection, without interference or discussion from another person, and return the completed battery to the clinic at the oocyte retrieval appointment.
At the embryo transfer appointment, a subsample of 178 women received the COSI retest with a pre-stamped pre-addressed envelope. They were instructed to complete the measure on day 8 following embryo transfer.

Measures
The survey took approximately 35 minutes to complete. In order to conduct the psychometric analyses required to validate the measure, additional measures were also included in the survey battery along with the COSI. Table 1 presents which measures were employed for each of the psychometric analyses. These measures include: Diabetes fear of injecting and self-testing questionnaire (D-FISQ) -fear of self injecting (FSI) subscale A 15-item quality-of-life subscale measuring fear of self injecting in adult diabetics. Subjects rate the items on a 4-point scale as 0 ("almost never"), 1 ("sometimes"), 2 ("often"), or 3 ("almost always") over the past month. FSI subscale scores are calculated by summation to detect and quantify the degree of emotional, cognitive, behavioral and physiological level of fear of self-injecting [25].

Quality of life enjoyment and satisfaction questionnaire (Q-LES-Q) (short form)
A 16-item questionnaire assessing the degree of enjoyment and satisfaction experienced in eight areas (physical health, subjective feelings of well-being, work, household duties, school, leisure, social relationships, and general life quality). Each item is rated on a 5-point Likert scale that indicates the degree of enjoyment or satisfaction achieved during the past week (1 = "very poor"; 5 = "very good"). Scores are aggregated, with higher scores indicative of greater enjoyment or satisfaction in each domain [26].

Endicott work productivity scale (EWPS)
A 25-item questionnaire measuring the degree to which a medical condition affects subjects' work functioning. Subjects rate the items on a 5-point Likert scale measuring frequency from 0 ("never") to 4 ("almost always") over the past week. Scores are aggregated; the maximum score possible (worst) is 100 and the best possible score is 0. Total score is based on the degree to which behaviors and subjective feelings or attitudes that are likely to reduce productivity and efficiency in work activities characterize the subject during the week before evaluation. The number of hours of work expected, the number worked, and the reason(s) why the subject worked less than usual are also collected [27].
Frequency, intensity, and burden of side effects rating (FIBSER) A 3-item questionnaire measuring medication side effect impact over the past week using 3 domains: frequency, intensity, and global burden (degree interfered with day-to-day functions). Each domain is rated on a 7-point scale (Frequency: ranging from "no side effects" to "present all of the time"; Intensity: ranging from "no side effects" to "intolerable"; Burden: ranging from "no impairment" to "unable to function due to side effects"). Each specific domain is assessed as a total value and compared to develop a picture of overall effect of medication side effects, e.g., some side effects may occur infrequently, but be both intense and very burdensome to the patient or other side effects may be highly frequent, but be only of modest intensity and minimal burden [28].

Hot flash related daily interference scale (HFRDIS)
A 10-item questionnaire, adapted so that that it measured the degree to which ovarian stimulation interferes with nine daily activities. Subjects rate the degree of interference with each item during the previous week using a 0 ("do not interfere") to 10 ("completely interfere") scale. A total scale is computed by summing items. Higher scores indicate higher interference and thus, greater impact on quality of life [29].

Insulin treatment satisfaction questionnaire: convenience domain (ITSQ)
A 5-item domain of a 22 item questionnaire assessing treatment satisfaction for diabetic patients on insulin over the past month, adapted to assess the impact of the convenience of ovarian stimulation. All items are rated on a 7-point Likert scale and scored by transforming all items to a scale of 0-100 with the higher score (for the overall score and for each subscale) indicating better treatment satisfaction. ITSQ subscale scores are calculated by imputing the missing values based on the mean of the non-missing items [30].

Treatment satisfaction questionnaire for medication (TSQM)
A 14-item questionnaire measuring a patient's satisfaction with medication. The time frame is 2-3 weeks, or since the last medication use. Items are rated on a 5-or 7-point scale according to patients' experience with the Table 1 Hypotheses for the tests of validity

Convergent Validity
Total score There will be a strong correlation with life satisfaction on the Q-LES-Q.
Psychological Impact Women with greater mental health on the PGWBI will have lower domain scores.

Interference in Daily Life
Women who have higher scores on the HFRDIS will have greater domain scores.

Injection Burden
Women with greater scores on the TSQM or the ITSQ Convenience subscale will have greater domain scores.

Work
Women with greater negative scores on the Work Productivity Scale (EWPS) will have greater domain scores and women who self report a negative work impact will have lower domain scores.
Compliance Worry Women with greater Compliance Worry will have greater treatment inconvenience.

Discriminate validity
Total score Greater fear of injection as assessed by the D-FISQ will be associated with a greater COSI score.
Psychological Impact Women who have more supportive spouses will have greater domain scores.

Daily Life Interference
Women who work will have greater domain scores than women who do not work.

Injection Burden
Women who mix their COS medication and/or who have previous experience with self-injections will have greater Injection Burden.

Work
Women who self-report a supportive work environment will have lower domain scores.
Compliance Worry Women who feel they were better trained to administer injections will have less Compliance Worry.
medication in terms of satisfaction, bother/interference with side effects, ease of use and confidence [31].

Psychological general well-being (PGWB) index
A 22-item questionnaire measuring self-representations of intrapersonal affective or emotional states reflecting a sense of subjective well-being or distress during the past month. The PGWB includes indicators of positive and negative affective states. Questionnaire subscales are: anxiety, depressed mood, positive well-being, self control, general health and vitality. Items are rated on a 6-point scale, according to the intensity or frequency of the affective experience. Total scores on the PGWB range from 0 to 100 and are expressed as a summary score; a higher score equals better quality of life. For each item, the response option that indicates the greatest distress is scored zero; the most positive option is scored five [32].

Statistical analysis Validation strategy
Data was analyzed according to an a-priori statistical analysis plan to determine item and factor structure, assess internal and test-retest reliability, construct validity using a nomological network approach, divergent and convergent validity, discriminant validity, and to examine the minimally important difference. Validation analyses were conducted for the total COSI score as well as for each (derived) domain score. All batteries and COSI retests were double data entered into a centralized SPSS database. All analyses were performed using SAS © (SAS institute, Cary).

Item reduction and measurement model
The preliminary version of the COSI consisted of 42 core items with 6 additional items for working women, assessing 4 core domains (Psychological, Interference in Daily Life, Convenience and Side Effects) and 1domain for Impact on Work Life. Items were rated on a 5-point Likert scale which range from either "Not at all/Never" to "Extremely/Always" with intervals ("rarely/a little", "sometimes/somewhat", "often/a lot"). Analytic guidelines of the decision criteria were used to guide the process of item reduction of the preliminary 48 items. These included criteria for examination of missing data (any items having 5% or greater missing data were assessed for removal from the scale), ceiling and floor effects (if the frequency of either response extremes is greater than 50%, the item is considered to be demonstrating a ceiling or floor effect), item-rest correlations (Pearson's correlation of .70 or greater is acceptable) and item-to-item correlations (correlation coefficients greater than 0.80 may indicate a redundancy between the items). Additionally, content validity (conceptual relevance) of items was also considered in final decisions as to whether or not an item should be deleted. This was done to ensure that important patient reported key concepts remained in the measure. Thus, final item reduction decisions were treated as an iterative process between these psychometric methods and the conceptual framework developed in the measure development process.
Factor analysis using VARIMAX rotation was performed on the correlation matrices derived from the core items comprising the COSI. In order to include all respondents, work related items were not included in the factor analysis. The final solution for factors was also rotated using oblique rotation (OBLIMIN) in order to establish the degree of correlation between domains. The most appropriate number of factors to be extracted was determined using primarily (A) residual analysis, i.e., evaluation of the ability of the factor solution to represent the correlation structure, and (B) clinical and theoretical interpretability of the solution. A scree plot of the principal component solution was used as a guide to the number of factors that were needed.

Reliability
Internal Consistency was assessed by Cronbach's alpha [33]. For scales or domains that are relatively unilateral, reliability was expected to be high (>0.80); for scales or domains that are more multifaceted, a lower reliability of at least 0.70 was considered acceptable.
Test-retest reliability was analyzed for those subjects who completed the re-test approximately 12 days after completing the COS. Given that the women had experienced both oocyte retrieval and embryo transplant in between administrations, it was expected that the retest would not only reveal the degree of error variation that is a characteristic of the test, but would also be sensitive to changes in perception of the original COSI and thus, not be as strong a correlation as generally expected for test-retest reliability.

Validity
In order to assess convergent validity, hypotheses regarding the relationship between the measure assessing the similar concept to the COSI score were developed. Pearson's correlations were computed to measure the association between the total and/or subscale scores on the COSI measure for which the significant relationship was expected.
In order to assess discriminate validity, hypotheses regarding the relationship between known groups and the COSI score being tested were developed. The scores of the groups on the COSI domains were compared using descriptive statistics, and parametric tests, typically oneway ANOVA with groups as a fixed factor (or a t-test in case of two groups and appropriate contrasts comparing groups when several groups were involved) to test for significance between the known groups and the COSI score. Table 1 presents the hypotheses for the tests of validity.

Exploratory analysis
The relationship between the COSI and other patient data (e.g. age, ethnicity, cause of infertility), key treatment factors (e.g. number of injections), and the women's relationship to their fertility clinic (e.g. ease of contact) was examined using regression analysis for continuous variables and one way ANOVA for categorical variables.

Sample description
The COSI validation sample consisted of 267 women (United Kingdom: 80: Ireland: 45; United States: 142) who completed the validation battery. The aggregate sample had a mean age of 35.7 years. 85.0% of the participants reported excellent or very good health. 79.4% of the sample was self-identified as White/Caucasian. The largest non-White/Caucasian group represented in the sample was Asian-American/Pacific Islander (7.9%) followed by mixed race (5.2%) or another ethnic group not listed (4.1%), Latino/Hispanic (1.9%), and Black/ African American (1.5%). Additionally, a majority of the sample was employed 30 or greater hours per week (72.7%), and a majority earned equal to or greater than $50,000 annually (92.1%) with over half of the sample earning more than $100,000 annually (60.3%). 73.4% of the sample had no children. The full demographic and infertility factors description is shown in Table 2. On average there were no marked differences in sociodemographic factors across the three countries included.

Item reduction and measurement model
Item reduction resulted in a final 28-item COSI. Four factors were identified: Interference in Daily Life (with 2 subdomains of Work and Home), Injection Burden, Psychological Health and Compliance Worry. These domains varied slightly from the a-priori hypothesized domains, which also included Side Effects and a separate domain for Work Interference, and did not include Injection Worry. As a result, the a-priori concurrent and knowngroups validity hypotheses for the Side Effects domain were not tested, and post-hoc hypotheses for the new Compliance Worry were formulated based. A-priori hypotheses for the Work Interference domain were used to support the Work Interference subdomain of the Daily Life Interference domain. Incomplete data was minimal and did not affect item reduction.
The final factor structure of the core COSI items (excluding work items) is shown in Table 3.

Reliability
Cronbach Alpha's for the internal consistency of all domains, and the total score were between 0.80 to 0.92 thus meeting the criteria of acceptable internal consistency. Test-retest correlations were also acceptable (between 0.72 to 0.87) for all domains and the total score. Reliability ICC statistics on the COSI are shown in Table 4.

Validity
All a-priori hypotheses for convergent and discriminant validity were met. Additionally, the post-hoc convergent validity hypotheses for the Injection Burden domain were also met.

Final measure
The final COSI consists of 28 items in 4 domains (Interference in Daily Life: Home 5 items, Work 4 items; Psychological Health: 10 items; Injection Burden: 4 items; Compliance Worry: 5 items).
The conceptual framework of items per domain of the COSI is as shown in Figure 1.

Exploratory analyses
There was no significant relationship between COSI Total score or any domains and major demographic factors: age, marital status, ethnicity (with exception of Injection Burden: p = 0.007, higher impact for Asian and mixed race), employment status, education, treatment protocol (with exception of Interference in Daily Life -Work: p = 0.019, lower impact for agonist and antagonist short flare-up protocols), income (with exception of Interference in Daily Life: p = 0.024, higher impact for income above $100.000) or previous injection experience. The overall impact of COS was lower (COSI total score: p = 0.025) as well as the Interference in Daily Life (home: p = 0.005 work p = 0.037) and Injection Burden (p = 0.031) for those who were paying either all or some of the costs of IVF themselves.
With regard to treatment factors, women who felt they were better trained to self administer injections also had lower overall impact (p = 0.002) as well as lower Interference in Daily Life (home: p = 0.002, work: p = 0.05) and Injection Burden (p = 0.032). Also, women who injected themselves reported lower Injection Burden then when this was done by the husband or shared (p = 0.002). Further, women reported lower overall impact (p = 0.013), Psychological Impact (p = 0.047), Interference in Daily Life at home (p = 0.04), and less Injection Burden (p = 0.017) when the cause of infertility was related to a male factor. Women who had a greater number of injections had greater Injection Burden (p = 0.009), and depending on the number of monitoring visits (categorized as ≤3, 4-6, 7-9, ≥10) did not experience equal Overall Impact (p < 0.001), Psychological Impact (p = 0.002), Interference in Daily Home Life (p = 0.044), and Compliance Worry (p = 0.044), with categories corresponding to lower number of visits corresponding to improved scores. The relationship the women had with their fertility clinic also played a role on the impact of treatment. Women who felt they had a better relationship with their clinics, or that the clinics were easy to contact and had fewer monitoring visits, had significantly better outcomes in both overall impact as well as for most domains. Additionally, significant relationships (p < 0.001) were found  between the COSI Total Score and for how stressful women felt the infertility treatment was, the degree of supportiveness of spouse and friends and family, their need to rearrange work schedule and the degree of supportiveness and ease of contacting their fertility clinic. Regarding the Daily Life Interference domain , women who worked, and who needed to rearrange their work schedule due to their COS treatment, did have greater Daily Life Work Interference (p < 0.001), and women with a greater number of monitoring visits had greater Daily Life Home Interference (p < 0.001) and Daily Life Work Interference if they worked (p = 0.003).

Discussion
These findings suggest that the 28-item COSI can be considered a well-developed and valid PRO measure with well-defined domains (Interference in Daily Life (with 2 subdomains of Work and Home), Injection Burden, Psychological Health and Compliance Worry) which assess the impact of COS on women's functioning and wellbeing. Further, each of these domains has been found to be acceptable psychometrically and can be considered appropriate for use as individual concepts. IVF has been found to create anxiety in patients [21] and this anxiety may be related to treatment outcomes [15][16][17][18][19]. Further, discontinuation of fertility treatment has been attributed to the burden of treatment [34]. It has been suggested that further evaluation of the efficacy of treatments and interventions that reduce burden is needed [34]. The COSI has been rigorously developed following the Guidance for industry: patient-reported outcome measures [22], as a measure of the impact of COS on patient functioning and well-being. This study has shown that the COSI is relevant to patients, clinically meaningful, valid and reliable. As such, the COSI should be helpful to clinicians to assess the multiple impacts of COS and the effectiveness of interventions geared to reduce this anxiety.
The impact of a treatment can often be mitigated by many factors such as patient characteristics, level of family  Figure 1 Conceptual framework. support, health process factors and characteristics of the treatment [21,35,36]. In this study, we also found, as shown in the exploratory analyses that the impact of COS on women is not insignificant and that this impact can be mitigated by factors such as where the procedure is conducted, the type of treatment, number of visits and injections as well as family, clinic and work support structures. A greater focus on these factors by clinicians should help to improve care and reduce the burden of infertility. Some limitations and future considerations should be mentioned. Unfortunately, due to ethics requirements, we were not able to collect any information on women who choose not to participate in the validation study, thus the extent to which women who did participate were similar to women who did not participate is unknown. However, the characteristics of the women who did participate appear to be representative of the general population of women undergoing COS. Further, the study was conducted in Western countries and the sample was limited in ethnic diversity. Although the exploratory analyses did not find an effect of ethnicity on the impact of COS, future research is needed to confirm this finding. The exploratory analyses also suggested that site characteristics may impact outcomes and this also should be an area for future research in order to more clearly define site characteristics that help rather than hinder women in coping with COS. Finally, as this was a validation study, translation issues have not yet been addressed. It will be important that any translations of the COSI take into consideration cultural equivalency in the translation process.
Validation is an iterative process and should be continually ongoing as PRO measures are initially developed and used in clinical practice. Further work on the COSI should examine responsiveness and interpretability of the COSI scores in future studies. Further, as this study was conducted in western cultures, additional validation work in non-western cultures would be warranted.

Conclusions
The study findings suggest that the 28-item COSI can be considered a well-developed and valid PRO measure with well-defined domains which assess the impact of COS on women's functioning and well-being.