Psychometric properties of the FACT-M questionnaire in patients with Merkel cell carcinoma

Background No validated disease-specific questionnaires exist to capture health-related quality of life (HRQoL) in patients with Merkel cell carcinoma (MCC). The Functional Assessment of Cancer Therapy – Melanoma (FACT-M) is validated in patients with melanoma, which shares many similarities with MCC. This paper reports the psychometric properties of the FACT-M in the metastatic MCC population. Methods Data were collected as part of a single-arm, open-label, multicenter trial involving patients with metastatic MCC who had failed at least one previous line of chemotherapy. FACT-M and EQ-5D were administered at baseline, Week 7, Week 13, and Week 25. An optional interview was administered at the same time points. MCC-specific FACT-M scores were derived following a combined quantitative and qualitative approach. Reliability and construct validity of original and additional MCC-specific FACT-M scores were assessed at baseline. Capacity to detect change in tumor size was assessed from baseline to Week 7. Minimally important differences (MIDs) were computed using distribution and anchor-based methods. Results Baseline assessments were available in 70 patients (mean age: 70 years; 74.3% male); 19 patients were interviewed at baseline. Additional MCC-specific scores were as follows: Physical Function score (six items), Psychological Impact score (six items), and MCC summary score (12 items). FACT-M original and additional MCC-specific scores both demonstrated acceptable psychometric properties: high reliability (Cronbach’s alpha: 0.81–0.96), good convergent validity (correlations above 0.4 observed for 88% of items of the Melanoma surgery scale, 75% of items of the Melanoma scale, and 100% of items of the other FACT-M domains). Some evidence of floor/ceiling effects and poor discriminant ability was found. Higher scores (better HRQoL) on all FACT-M domains were observed in patients with better functioning (assessed by ECOG performance score), supporting clinical validity. Despite the small sample for responsiveness analysis (n = 37), the majority of FACT-M scores showed sensitivity to changes in tumor size at Week 7 with small to moderate effect sizes. MIDs were consistent with previously reported values in the literature for FACT-M domains. Conclusions FACT-M is suitable to capture HRQoL in patients with metastatic MCC, thus making it a potential candidate for assessing HRQoL in MCC trials. Trial registration This study is a post-hoc analysis conducted on data collected in Part A of the JAVELIN Merkel 200 trial. This trial was registered on 2 June 2014 with ClinicalTrials.gov as NCT02155647.


Background
Merkel cell carcinoma (MCC) is a rare and aggressive skin cancer associated with Merkel cell polyomavirus, exposure to ultraviolet irradiation, immunosuppression, and old age [1,2]. MCC occurs with an incidence of 0.2-0.4 cases per 100,000 people per year in Europe, 0.8 cases per 100,000 people per year in the United States of America, and 1.6 cases per 100,000 people per year in Australia [3][4][5]. The 5-year overall survival rate with metastatic MCC ranges from 0% to 18% based on retrospective analyses [6][7][8][9].
MCC is challenging to treat in metastatic stages due to limited treatment options and lack of standard therapeutic procedures. Avelumab is an anti-PD-L1 monoclonal antibody recently approved by the Food and Drug Administration (FDA) to treat patients 12 years and older with metastatic MCC (mMCC). Approval was based on data from an open-label, single-arm, multicenter clinical trial (JAVELIN Merkel 200 trial) including a cohort of patients who had previously progressed after chemotherapy for distant metastatic disease (Part A) as well as early data from patients naïve to systemic therapy in the metastatic setting (Part B) demonstrating a clinically meaningful and durable overall response rate [10][11][12].
The relevance of the assessment of how patients function and feel from their direct perspective is an important clinical endpoint in the literature. It has also been highlighted by both the FDA and the European Medicines Agency [13,14]. Recently, the FDA emphasized the importance of patient-reported evaluation of disease-related symptoms, treatment-related symptoms, and physical functioning [15]. Questionnaires available to collect patientreported outcomes (PROs) in patients with non-melanoma skin cancer and malignant melanoma were identified in a systematic literature review conducted in 2012 [16]. Nine cancer and skin cancer-specific PRO measures were identified for which adequate evidence of psychometric properties were available. Of these, the Functional Assessment of Cancer Therapy -General (FACT-G) and Functional Assessment of Cancer Therapy -Melanoma (FACT-M) provided evidence of acceptable psychometric properties. The FACT-G was only evaluated in patients with nonmelanoma skin cancers. The FACT-M had more promising characteristics for patients with malignant melanomas, especially those with advanced disease, with good internal consistency of all scales, high reproducibility, and good sensitivity [17,18]. In addition, significant correlations were reported between the FACT-M and other questionnaires measuring similar constructs (European Organisation for Research and Treatment of Cancer Quality of Life questionnaire [EORTC-QLQ] Melanoma; Profile of Mood States). Finally, the FACT-M was shown to distinguish between disease stages with significantly lower scores in patients with advanced (stages III or IV) melanoma than in patients with early-stage melanoma [19].
No validated disease-specific tools exist to capture health-related quality of life (HRQoL) in patients with MCC. The FDA encourages the use of adapted questionnaires in oncology, provided that the adaptation and validation follow a rigorous, scientific and sound approach [20]. As melanoma and MCC share many similarities, both being aggressive skin cancers, the FACT-M was considered as a potentially adequate tool to assess HRQoL in the MCC population. However, FACT-M psychometric properties have yet to be confirmed in patients with MCC.
This study was conducted to assess the reliability and validity of the FACT-M questionnaire in patients with mMCC.

Study design
A specific statistical analysis investigating psychometric properties of the FACT-M questionnaire in patients with mMCC was conducted on data collected in the avelumab clinical trial JAVELIN Merkel 200 (NCT02155647, [10]). This single-arm, open-label, multicenter trial was conducted in the United States of America, Europe, Australia, and Japan. Enrolled patients were male and female adults with histologically proven mMCC and an Eastern Cooperative Oncology Group performance status (ECOG PS) of 0 or 1 at trial entry, who had failed at least one line of chemotherapy. During the trial, patients received avelumab at a dose of 10 mg/kg as a 1-h intravenous infusion once every 2 weeks until significant clinical deterioration, unacceptable toxicity, or any criterion for withdrawal from the trial or trial drug was fulfilled. The primary analysis of the trial was performed in patients with a minimum of 6 months follow-up (date of data cutoff March 3, 2016), and the results have been published in Kaufman et al. (2016) [10,11].

Data collected
HRQoL was assessed during the JAVELIN Merkel 200 trial using a generic questionnaire (EuroQol-5 Dimensions [EQ-5D), a melanoma-specific questionnaire (FACT-M), and optional subject qualitative interviews. FACT-M and EQ-5D were collected at sites using electronic tablets at baseline, throughout the treatment period (at Week 7 and then every 6 weeks) and at the End-of-Treatment visit. Optional qualitative patient interviews were conducted via telephone at baseline, Week 13, and Week 25.
Data used to assess the FACT-M psychometric properties were cut when all patients had at least 6 months of treatment follow up, and, as such, PRO data were available up to Week 25. Data collected during the baseline qualitative patient interviews were used to explore the content validity of the FACT-M in the MCC population.

FACT-M questionnaire
The FACT-M includes 51 items grouped into nine multi-item scores [17,18]: six subscale scores and three summary scores. The six subscales consist of four subscales from the FACT-G (physical well-being [PWB], social well-being [SWB], emotional well-being [EWB], functional well-being [FWB]), one Melanoma scale, and one Melanoma surgery scale. The three summary scores include the FACT-M Trial Outcome Index (TOI), the FACT-G total score, and the FACT-M total score. The FACT-M administration guideline instructed patients to answer all items and select "Not at all" if they felt that the item was not applicable to them.

EQ-5D questionnaire
The EQ-5D is a self-administered, generic, utility questionnaire developed by the EuroQoL Group in 1990 [21]. It includes five single-item dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/ depression) and a vertical visual analogue scale (VAS) for the patients to rate their current health state. Patients must choose between five levels of difficulty in accomplishing tasks in each dimension (EQ-5D-5L). The responses to the five dimensions are used to create a utility index. In this study, utility values were calculated using country-specific value sets from the United States of America [22]. The VAS ranges from 0 (worst imaginable health state) to 100 (best imaginable health state).

Patient interviews
Interviews with patients were conducted to gather information regarding the impact of MCC and its treatments (radiotherapy or chemotherapy) on patients' everyday lives, to assess patients' experience of avelumab during the trial and to document the evolution of these experiences along the trial. Qualitative data obtained from these interviews were used to understand HRQoL concepts and symptoms in the MCC population, and these were compared to the concepts covered in the FACT-M in order to identify those most relevant items for mMCC patients.

Statistical analyses
Statistical analyses were conducted on the PRO analysis set (PAS), which included all patients from the trial intent-to-treat population (i.e. all patients who received at least one dose of trial treatment) who completed at least one item of each PRO (FACT-M and EQ-5D) at baseline.
Identified concepts of interest in the MCC population included Physical Function, Visual Lesion Impact, and Psychological Impact. These concepts were identified following qualitative research in MCC and regulatory guidance.
MCC-specific FACT-M scores were derived by first selecting only those FACT-M items that matched to at least some extent concepts identified from the qualitative interviews, then by selecting FACT-M items that matched concepts of interest, and finally by conducting a statistical analysis assessing psychometric properties, including a data-reduction technique (principal component analysis).
Reliability and validity of scores for both original and additional MCC-specific FACT-M scores were assessed. Reliability is the degree to which an instrument is free from measurement error. Internal consistency reliability (the extent to which items within a domain are consistent with each other and measure a single underlying concept) was assessed at baseline using Cronbach's alpha [23]. Validity is defined as the accuracy with which a measurement tool measures the concept it is intended to measure. Construct validity, i.e. confirmation of the scaling structure, clinical validity, and concurrent validity, were assessed at baseline. Scaling structure was confirmed using multitrait analysis assessing item convergent and discriminant validity. Clinical validity was assessed by comparing the FACT-M mean scale scores between groups of patients categorized by ECOG PS [24]; the hypothesis put forward being that patients with a better level of functioning should have better HRQoL. Concurrent validity was assessed at baseline by calculating the Pearson coefficient between the FACT-M scores and the EQ-5D VAS and Index score; the hypothesis was that domains measuring related concepts should have high correlation levels while domains measuring different concepts should have low correlations. Ability of the FACT-M scale to detect change over time was assessed from baseline to Week 7 by comparing change in FACT-M scale scores according to percentage change in tumor size (classified into three groups: reduction ≥30%, reduction between 0% and 30%, increase >0%) using paired t-test, effect size (ES), Standardized Response Mean (SRM), and Guyatt's statistics [25,26]. Tumor size reduction greater than 30% is consistent with RECIST 1.1 criteria for determining partial response, a commonly used clinical criteria in the evaluation of tumor burden [27]. Finally, minimally important differences (MIDs), defined as the smallest difference in score in the PRO domain that is perceived as meaningful and beneficial for the patient [28] were computed using distribution-based and anchor-based methods. Anchorbased MID thresholds were explored using the percentage change in tumor size at Week 7. The responder threshold was defined for each FACT-M score as the mean change from baseline to Week 7 in patients whose percentage change in tumor size decreased over 30%. The distribution-based method included the use of ES and standard error of measurement (SEM). Two responder thresholds were calculated as 0.2 x SD BL and as 0.5 x SD BL , with SD BL being the standard deviation of the score at baseline. The MID threshold using SEM was calculated as SD BL x ffiffiffiffiffiffiffi ffi 1−r p where r is the reliability coefficient. Recommended MID range for FACT-M scores in the mMCC population were identified based on maximum and minimum MID thresholds obtained using the different methods.
Comparison of quantitative variables between groups of patients was assessed using t-test when comparing two groups of patients or ANOVA when comparing three groups of patients or more. Statistical significance threshold was set to 5% for each two-sided test and is provided to aid interpretation. No adjustments were made to account for multiplicity. Statistical analyses were performed using SAS software for Windows (Version 9.4, SAS Institute Inc., Cary, NC, USA).

Patient population
Among the 88 enrolled patients who received at least one dose of trial treatment, 70 patients completed at least one item of both EQ-5D and FACT-M questionnaires at baseline and were included in the PAS. The number of patients in the PAS was 49 at Week 7, 38 at Week 13, and 27 at Week 25. The optional interview was conducted with 19 patients.
Socio-demographics and clinical characteristics of patients in the PAS and interviewed patients are presented in Table 1 Overall, patients were mostly males (74.3%), with a mean age of 70.2 years. Patients were mainly from the USA (40 patients, 57.1%), followed by Europe (22 patients; 31.4%), and the rest of the world, i.e. Japan and Australia (8 patients; 11.4%). Socio-demographic and clinical characteristics of interviewed patients were very similar to those from the PAS, except that interviewed patients were mainly from the United States of America and none were from Asia.

Development of the additional MCC-specific FACT-M scores
Additional MCC-specific FACT-M scores were developed to obtain constructs of interest specific to the mMCC population with specific focus items. Forty-three items from the FACT-M questionnaire were initially selected, corresponding to the core FACT-G and the Melanoma Subscale; the Melanoma surgery scale was not deemed relevant to the disease and objective of the trial. The iterative and combined qualitative and quantitative process allowed to derive three additional MCC-specific scores: Physical Function score (six items), Psychological Impact score (six items), and MCC summary score (sum of Physical Function and Psychological Impact scores; 12 items) ( Table 2).

Description of original FACT-M scale and additional MCC-specific scale scores over time
Original FACT-M scores and additional MCC-specific scores were rather steady over time, with a slight tendency to increase (better HRQoL) ( Table 3). The FACT-M questionnaire presented good concurrent validity, as high correlation coefficients were observed between the FACT-M scores and EQ-5D for items that represented the same underlying concept, and lower coefficients were observed when different concepts were assessed by the two instruments. In particular, higher coefficients were found between the EQ-5D index   (Table 5).

Ability to detect change and MID
Measurable tumor shrinkage greater than 30% (i.e. indicating partial/complete response) was associated with an improvement in HRQoL scores, whereas tumor growth was associated with a decrease in HRQoL scores. The magnitude of these change scores were in the region of small to moderate effect sizes for both tumor shrinkage and tumor increase groups (Table 5). Clear differentiation of change scores between groups was observed for FWB (p = 0.005), Melanoma surgery scale (p = 0.036), and TOI (p = 0.038). A similar trend was observed for the other scales and subscales with the exception of EWB, where a decrease was observed for the tumor shrinkage group.
Anchor-based calculations of the MIDs were generally consistent with distribution based calculations; MID ranges (minmax) based on the different calculation methods was generally in the range of 0.2 x SD BL (lowest) to 0.5 x SD BL (highest) ( Table 5). Percentage of responders varied from 18% to 39% when response was defined as a change in score above SEM calculated MIDs and from 22% to 53% when response was defined as a change in score above anchor calculated MIDs (Table 6).

Discussion
The FACT-M is a validated questionnaire to assess HRQoL in patients with melanoma, a disease sharing similarities with MCC. As there is currently no existing validated disease-specific PRO questionnaire to capture HRQoL in patients with MCC, the FACT-M questionnaire was a candidate questionnaire to capture diseasespecific HRQoL outcomes in mMCC.
Overall, psychometric properties of the FACT-M questionnaire in the mMCC population was acceptable, as the questionnaire demonstrated good item convergent validity, very good internal consistency reliability, clinical validity, and notable ability to detect change in tumor size, given the small sample size. Very good internal consistency ensured that items within a domain reflect a single underlying concept and responses to these items are consistent with each other. However, the FACT-M questionnaire demonstrated insufficient discriminant validity, especially for items of the Melanoma Subscale. A similar finding has been previously reported in the original FACT-M validation work [17] and is likely due to the addition disease-specific module items that correlate highly with core items but not necessarily together. Therefore, results from the Melanoma Subscale should be interpreted cautiously not only in patients with melanoma but also in patients with MCC. Altogether, these results showed a good construct validity of both original and additional MCC-specific FACT-M scores.
In an attempt to create scores with better psychometric properties for the mMCC population, additional scores were derived from concepts arising from patient interviews. In contrary to our expectation, these    [29,30] as expected considering the similarities between the two diseases. The magnitude of these change scores were consistent with distribution based methods of MID calculation, which supports the choice of anchor for detecting differences in HRQoL domains. One unexplained finding was the small reduction in social/family well-being in the tumor shrinkage group, which requires further investigation. Ranges of MIDs for future studies involving the FACT-M in the mMCC population exclude this observed negative association and are as follows:

Limitations and future directions
Data used in this study were collected in a clinical trial setting, which may have biased the study results: e.g. restrictive patient eligibility criteria may result in a study population not entirely representative of the mMCC population. Another limitation to the study results may be the small sample sizes. However, adequate quantity of analyzable data was retrieved from the PRO administered during the trial. This was likely due to protocol specific procedures such as training and reminders, electronic administration of PRO data (missing items were not permitted by the electronic questionnaires), and instructions provided by the FACT-M questionnaire. Indeed, patients were required to answer all items and select "Not at all" if they felt that the item is not applicable to them.
Further work could include performing specific cognitive debriefing interviews (i.e. an in-depth item per item review of the questionnaire by patients) that could provide additional insights on the relevance of the item selected for the study-specific scores.

Conclusion
In conclusion, assessment of FACT-M psychometric properties demonstrated that FACT-M is suitable to capture HRQoL in patients with mMCC, thus making it a potential candidate for assessing HRQoL in mMCC trials. HRQoL improvements were observed in patients with relevant tumor shrinkage after 7 weeks of avelumab treatment for the majority of FACT-M scores, except the Social/Family well-being domain. This link between HRQoL and clinically relevant endpoints will be valuable A responder is defined as a patient whose score had changed relative to baseline by an amount greater than or equal to the MID; [b] SEM was calculated using Cronbach's alpha for internal consistency reliability; [c] Reduction in tumor size ≥30% was used as the anchor for MID thresholds to assess benefits of novel treatments for mMCC. Furthermore, as psychometric properties of the additional MCC-specific FACT-M scores were similar to the original ones with no evidence of superior measurement properties, the original FACT-M scores are the ones recommended to capture HRQoL in patients with mMCC.