Update on the psychometric properties and minimal important difference (MID) thresholds of the FACT-M questionnaire for use in treatment-naïve and previously treated patients with metastatic Merkel cell carcinoma

Objectives For valid and reliable assessment of patients’ Health-Related Quality of Life (HRQoL), it is crucial to use psychometrically robust instruments. In the context of rare diseases such as Merkel cell carcinoma (MCC), validated disease-specific instruments are often not available. The Functional Assessment of Cancer Therapy – Melanoma (FACT-M) was originally developed for use in melanoma. Its psychometric performance for use in MCC and minimal important difference (MID) thresholds have been previously reported based on a cohort of metastatic MCC patients who had disease progression following one or more prior line of chemotherapy (NCT02155647 Part A; n = 70). Since then, new data from the phase II JAVELIN Merkel 200 trial among treatment-naïve patients are available (NCT02155647 Part B; n = 102). This study aims to increase accuracy and precision of previously established psychometric properties and MID thresholds of FACT-M in metastatic MCC patients. Methods Published qualitative research suggests that patients with metastatic MCC had similar experiences and described similar concepts associated with their disease independent of whether they were treatment naïve or had prior treatment. Therefore, it was deemed appropriate to pool FACT-M data from Part A (previously treated) and Part B (treatment-naïve) cohorts for this study. Construct validity was assessed by evaluating item-factor correlations (convergent validity) and known-groups validity using ECOG performance status 0 versus 1. Concurrent validity was assessed using EQ-5D items. Internal consistency reliability was assessed using Cronbach’s α. Anchor- and distribution-based approaches were used to derive MID thresholds. Results Overall, psychometric tests based on various validity (convergent, known-groups, concurrent) and reliability (Cronbach α) analyses confirmed previous findings in that FACT-M performs well in MCC patients. MID thresholds derived from this study are largely in line with previously established thresholds with some minor adjustments. Conclusions In the context of rare diseases, which often have limited data available for psychometric testing, a reasonably large MCC patient sample was available for this study, enhancing accuracy and precision of previously established FACT-M psychometric properties and MID thresholds with only small deviations for use in metastatic MCC patients. Results suggest that the FACT-M is suitable for Merkel cell carcinoma regardless of patients’ treatment status. Trial registration This study is a pre-planned post-hoc analysis conducted on data collected in Part A and Part B of the JAVELIN Merkel 200 trial. This trial was registered on 2 June 2014 with ClinicalTrials.gov as NCT02155647.


Background
The importance of including the patient's voice in clinical trials is well established [1,2]. The most common approach to incorporate the patient perspective is the collection of patient-reported outcomes (PRO) data. An important prerequisite of obtaining high quality selfreport data from the patient for valid inferences from these data [3] is the use of psychometrically robust PRO instruments. However, in the context of a rare disease such as Merkel cell carcinoma (MCC), disease-specific PRO instruments are often not available. As a result, PRO instruments have to be developed de novo or wellestablished PRO instruments have to be used from disease areas that are reasonably comparable to the disease of interest.
The phase II, single-arm JAVELIN Merkel 200 trial (NCT02155647) includes metastatic MCC patients who had disease progression following one or more prior line of chemotherapy (Part A) [4] or patients who were treatment naïve at study inclusion (Part B) [5]. As part of this trial, a range of PRO data was collected. In light of the lack of well-validated MCC-specific PRO instruments, the melanoma-specific Functional Assessment of Cancer Therapy -Melanoma (FACT-M) and EQ-5D-5 L questionnaires were used to assess patients' self-reported health-related quality of life (HRQoL) while receiving avelumab. To ensure the suitability of the FACT-M for use in MCC, it is crucial to test its psychometric performance in this patient population. A first publication exploring the psychometric performance of the FACT-M in MCC provided evidence for the suitability of the FACT-M for use in MCC patients [6]. These analyses had been based on patients who had already received second-line or later treatment (Part A). Since the publication of these first results, new PRO data obtained from treatment-naïve patients (Part B) became available. As the suitability of the FACT-M for use in MCC needs to be established for both MCC patient groups, it is crucial to repeat the psychometric analyses on Part B patients. For this, it was deemed justified and advantageous to pool the two samples for several reasons. First, the combined sample size is substantially larger than the individual samples ensuring more sensitive analyses and robustness of the results [7]. Second, qualitative interviews with patients from both study parts indicated similar experiences related to their MCC diagnosis and its management, and regarding perceived benefits and clinical changes experienced during the trial [8,9]. Third, it is crucial to establish that the FACT-M is suitable for the application in MCC in general, irrespective of treatment status at study inclusion. By including a greater range of MCC patients by pooling the two samples, validity evidence can be extended to a more heterogeneous MCC patient population. Finally, for the definition of minimal important difference (MID) thresholds, it is important to establish thresholds that can be applied to the entire MCC population. This warrants comparability of results obtained from different patient groups.
Hence, this study aims at confirming previously reported psychometric properties and MID thresholds of the FACT-M [6] in patients with MCC. By using pooled Part A and B trial data, the sample size could be increased substantially compared to the previous publication [6], enhancing accuracy and precision of psychometric tests and MID thresholds. This new set of analyses is intended to complement/replace the Part A results by providing a more robust piece of evidence applicable to a broader patient population consisting of previously treated (Part A) and treatment-naïve (Part B) MCC patients.

Study design
The JAVELIN Merkel 200 trial is a single-arm, openlabel, multi-center, international phase II study consisting of two parts. For inclusion in either of the two parts, eligible patients had histologically confirmed metastatic MCC (stage IV), were at least 18 years of age, and had an Eastern Cooperative Oncology Group (ECOG) performance score of 0 or 1. Patients were excluded if they had autoimmune or various other conditions [4,5]. For inclusion in the first part (Part A), patients had already received and failed one line or more of chemotherapy treatment for metastatic MCC. The planned sample size for Part A was 84 patients, giving the study 87% power to assess clinical activity [4]. For inclusion in the second part (Part B), patients had to be treatment naïve to systemic therapy [5]. Further details of the study design as relevant to the present study are reported elsewhere [6].

Study population
For the purpose of substantiating previously reported psychometric performance and MID thresholds of the FACT-M [6], the intention-to-treat trial populations of Part A (n = 88) and B (n = 116) were pooled, leading to a combined sample size of n = 204. As not all patients provided baseline data, a PRO analysis set (PAS) was defined consisting of n = 172 patients (Part A: n = 70; Part B: n = 102). To assess the ability of the FACT-M to detect change and derive MID thresholds, these analyses are based on data collected at week 7 (n = 121). Week 7 was chosen as the most suitable time point to measure responsiveness of the FACT-M, as the main tumor response is expected at that time. The pooled sample is based on respective Part A/B data cut-off date 14 September 2018.

Patient-reported outcome assessments
The FACT-M and EQ-5D instruments were used to capture PRO data in the JAVELIN Merkel 200 trial.
The FACT-M questionnaire includes 51 items grouped into nine scores, including six subscale and three summary scores [10,11]. Three additional MCCspecific FACT-M scores have been established previously for use in MCC [6]. The recall period of all FACT-M items is 7 days and items are scored on a 5-point scale, ranging from 0 = 'not at all' to 4 = 'very much'. For all subscale, summary and MCC-specific FACT-M scores, a higher score indicates higher well-being. For the purpose of this study, the psychometric properties of the FACT-M and its various subscale and summary scores, including the MCC-specific FACT-M scores [6], are documented. The latter include the MCC-specific subscale Physical Function (PF; six FACT-M items), Psychological Impact (PI; six FACT-M items) and the MCC summary score (PF + PI). While the previous publication established and tested the psychometric performance of the newly defined MCC-specific scores on a subset of MCC patients [6], the present study aims to substantiate the psychometric properties but also establish MID thresholds for PF, PI and MCC summary score.
The EQ-5D-5 L questionnaire includes five single-item dimensions (i.e., mobility, self-care, usual activities, pain/ discomfort, anxiety/depression) with five response levels each (5 L), and a vertical visual analogue scale (VAS, i.e. EQ VAS). There is no recall period in the EQ-5D items, i.e., the items ask patients to assess their health status on that particular day of filling out the questionnaire. For both the EQ VAS and the EQ-5D index score, a higher score indicates better health status, and a positive change reflects an improvement [12].

Statistical analysis
Sociodemographic and clinical characteristics are described (mean, median and range for quantitative variables; percentages for qualitative variables) and compared across study Part A and Part B (t-tests for continuous variables; Chi-square tests for categorical variables).
For the confirmation of the psychometric properties of the FACT-M in the MCC population, previous analyses based on the Part A sample [6] were largely repeated using pooled data. First, using baseline data, internal consistency of all FACT-M scales was explored using Cronbach's alpha [13]. In addition, and new to the analyses of the pooled data presented herein, McDonald's (1999) [14] coefficient omega was calculated as an alternative to alpha to assess the respective reliability of the six FACT-M subscales and the two MCC-specific subscales. It is calculated as the ratio of the common (i.e., truescore) variance to the total variance (i.e., common plus error variance). Omega has been shown to overcome deficiencies of alpha and has been strongly recommended as a more robust estimate of reliability compared with alpha [15,16]. In this article, omega is based on one-factor models [16] applying confirmatory factor analysis (CFA) and calculated for the six FACT-M subscales and the two MCC-specific subscales, respectively.
For construct validity, baseline data were used to test for item convergent and divergent validity, i.e., multiscaling analyses to test item-to-scale correlations (r) where individual items are expected to correlate highly with their own domain (r ≥ 0.4; convergent validity) and correlate higher with their own compared to correlations with other domains (divergent validity). To substantiate the construct validity of both the FACT-M and the newly developed two MCC-specific subscales for use in MCC, CFAs were carried out. In addition to a six-factor FACT-M model, a four-factor model was specified containing the four FACT-G subscales and a two-factor model containing the melanoma subscale and the melanoma surgery scale. The two MCC-specific subscales were run as a separate two-factor model. Clinical validity was assessed using subgroups defined by ECOG performance status (PS) 0 (=fully active) versus 1 (=restricted in physically strenuous activity). Criterion (concurrent) validity was assessed using the EQ VAS and EQ-5D index score and adding the 5 EQ-5D single items which had not been tested in the previous publication [6]. The ability of the FACT-M to detect change over time was based on variable 'change in tumor size' to assess group differences, comparing baseline with data assessed at week 7. These analyses were repeated on EQ VAS to explore group differences across categories 'improved', 'stable' and 'worsened'. The latter analysis had not been tested in the previous publication [6] but was deemed an important addition given that a patientreported variable, such as the EQ VAS, was expected to categorize patients into more patient-relevant groups compared with variable 'change in tumor size'. Change is expressed as means as well as effect size.
Closely following the Part A analyses [6], MID thresholds to define 'meaningful improvement' and 'meaningful worsening' on the FACT-M were derived from the pooled week-7 data. Thresholds were calculated for each FACT-M score using a combination of anchor-and distribution-based methods, a common approach to derive responder definitions and minimally important difference thresholds [17][18][19]. For the anchor-based approach, the initial analyses using Part A data applied variable 'change in tumor size' as an anchor [6]. Similar to the rationale behind FACT-M responsiveness analyses however, it was decided to again use a patient-reported anchor [20], i.e., 'change in EQ VAS', as the preferred anchor (MID = 7 points) given the higher correlation between FACT-M and EQ VAS compared with weaker correlations between FACT-M and 'change in tumor size'. Correlations of reasonable size between target instrument and anchor are a prerequisite for being a suitable anchor [21]. Following recommended anchor selection criteria, a PRO instrument is also the anchor of choice [22]. The remaining methods for MID definition replicated the already published analyses on Part A data [6].

Study population
Baseline socio-demographic and clinical characteristics of patients included in the PAS of the FACT-M (n = 172) are presented in Table 1. A majority of patients were male (70.3%) with a mean age of 71.6 years (SD = 10.4). The median time since diagnosis was 2 years. The ECOG score indicated that over half of the total population (60.5%) were fully active (ECOG PS = 0), while the remaining 39.5% were restricted in physically strenuous activity (ECOG PS = 1). Mean baseline tumor size was 88.6 mm.
When comparing Part A and Part B samples, some differences were apparent. Mean tumor size was larger for Part A compared with Part B patients (103.7 mm [SD = 79.7] versus 79.5 mm [SD = 58.5]) and the median time since patients reached first metastatic disease was 9.5 months for Part A and 2.3 months for Part B, respectively (5.7 months for Part A and B pooled).

Internal consistency reliability
As shown in Table 2, Cronbach's alpha coefficients were all superior to the recommended threshold of 0.7 supporting the internal consistency of all FACT-M generated scores. The estimates of coefficient omega were either identical to or slightly above alpha for all eight subscales, ranging between 0.80 and 0.89.

Convergent and divergent validity using multi-trait analysis
Multi-trait analysis, which is based on inter-item correlations, requires all items to be non-missing. As soon as one item is missing the patient is excluded from the analysis. Optional item GS7 as part of the Social well-being scale asks patients about their satisfaction with sex life. As this item exhibited high missingness, multi-trait analysis was conducted twice: Once with all FACT-M items (51 items), once excluding item GS7 (50 items). Exclusion of GS7 doubled the sample size available for these analyses. As the two sets of analyses led to similar results, results based on 50 items are presented Table 2, as this analysis provided a more robust sample size.
Convergent validity was generally good with 100% of items meeting the item-convergent validity criterion for three of the four FACT-G subscales and 83% for Emotional well-being. The two melanoma-specific subscales showed lower levels of correlations, i.e. 50 and 75%, respectively. The percentage of items that met the divergent validity criterion was highest for Social well-being, with all items (100%) meeting the divergent validity criterion, and lowest in the Melanoma subscale, with 38% of items meeting the divergent validity criterion.
Multi-trait analysis of the scaling structure of the two proposed MCC-specific FACT-M subscales (PF and PI) involving six items each was tested in a simple model using the selected 12 items only. Results indicated perfect item convergent and divergent validity ( Table 2).

Construct validity using confirmatory factor analysis
We ran into model conversion issues when specifying the six-factor FACT-M model, which was likely a combination of the sample size which was rather small for CFA and the size of the model (six factors with 51 items or 50 items when taking out item GS7, respectively).
In contrast, the four-factor FACT-G model converged (based on n = 170, excluding item GS7) with overall satisfactory fit indices, with a root mean square error of approximation (RMSEA) of 0.077 (90% confidence interval [CI], 0.068-0.086), a standardized root mean square residual (SRMR) of 0.076 and a comparative fit index (CFI) of 0.874. All but one factor loading of the Emotional well-being subscale were at least 0.5 or higher, with most being well above 0.6. The two-factor model containing the melanoma subscale and the melanoma surgery scale converged as well, with fit indices suggesting a worse model fit compared with the four-factor model, with RMSEA of 0.094 (90% CI, 0.085-0.103), SRMR of 0.083 and CFI of 0.773. Especially the factor loadings of the melanoma subscale showed some very small factor loadings, with five being below 0.4 and two being below 0.3. Finally, the two MCC-specific subscales showed excellent model fit, with RMSEA of 0.072 (90% CI, 0.049-0.094), SRMR of 0.057 and CFI of 0.957. All factor loadings were above 0.6. Of note, none of these models allowed for any correlated errors or other model adjustments.

Clinical/known groups validity
As shown in Table 2, mean FACT-M scores were larger for the group of patients who were fully active (ECOG PS = 0) compared to those restricted in physically strenuous activity (ECOG PS = 1) across all but two FACT-M scores, reflecting the better functioning of the former group. The difference between the two groups led to p-values below 0.05 for all scores except for Social well-being and the MCC-specific PI score.

Concurrent validity
As shown in Table 3, correlation coefficients between FACT-M subscale, summary and MCC-specific scores and EQ-5D items were generally medium to large (i.e., > 0.4), except for FACT-M Social well-being which showed small correlations at best. Apart from this subscale, all other FACT-M scores were also highly correlated with the two summary EQ-5D scores (VAS and Index). Finally, correlations between FACT-M scores and EQ-5D items were highest in absolute value for scores assessing similar or associated concepts and lower for those measuring different concept, e.g. Emotional well-being correlating highest with EQ-5D anxiety/depression.

Ability of the FACT-M scores to detect change over time
To explore the ability of the FACT-M scores to detect change over time 'change in tumor size' was used as an anchor replicating the psychometric analyses carried out on Part A data [6]. As information on the percentage change in tumor size was only available for 105 patients, this first set of analysis is based on a sample size of n = 105. The FACT-M showed a general logical pattern of improvement in FACT-M scores for the improved group (except for Physical well-being and PF which were negative but close to 0), worsening in FACT-M scores for the worsened group (except for Emotional well-being and PI which were positive but of small amplitude) and score changes generally negative but close to zero for the stable group (except for Emotional well-being and PI). The main departure from a clearly monotonous pattern was seen in the two melanoma-specific subscales where the stable group indicated the largest decrease (worsening) in scores. When using the EQ VAS as an additional anchor to differentiate between groups of change (n = 121), again a logical pattern of FACT-M scores was observed. That is, improvement was observed for the improved group, worsening in FACT-M scores for the worsened group and score changes generally negative (except for Emotional well-being) but close to 0 for the stable group. The main departure from a clearly monotonous pattern was seen in the Social well-being scale where the stable group indicated the largest decrease (worsening) in scores. The same patterns were observed when expressing change in form of effect sizes for the different groups (Table 4).

MID thresholds
The minimum and maximum MID thresholds for the various FACT-M subscale and summary scores were derived using a combination of anchor-and distributionbased approaches. As shown in Table 5, results were largely in line with those derived from Part A data [6]. For Functional well-being, Melanoma surgery scale and FACT-M Trial Outcome Index (TOI) the minimum threshold was smaller when derived from the pooled data (each by one point), whereas for FACT-G Total score, the maximum threshold was larger by one point compared with the threshold derived from Part A data. All remaining thresholds were identical to those reported previously [6].

Discussion
The main objective of this study was to confirm the psychometric properties of the FACT-M and MID thresholds for use in MCC patients, which had been previously obtained from Part A of the trial [6]. By using pooled Part A and B trial data, the sample size could be increased substantially compared to the previous publication [6], enhancing accuracy and precision of psychometric tests and MID thresholds as well as extending the applicability of the results to a broader patient population consisting of previously treated (Part A) and treatment-naïve (Part B) MCC patients. In addition Table 3 Pearson correlation coefficients between FACT-M and EQ-5D scores at baseline (n = 172)* Score at baseline EQ-5D mobility EQ-5D selfcare EQ-5D usual activity EQ-5D pain/ discomfort EQ-5D anxiety/ depression EQ VAS EQ-5D Index to the FACT-M scores, preceding analyses on Part A data had resulted in the development of three additional scores to capture concepts most relevant to MCC patients. Item selection to generate these MCC-specific scores had taken into account results from psychometric analyses [6] and qualitative research to ensure that key concepts elicited from MCC patients were included [8,9]. As these MCC-specific scores are new, the present study was also aimed at generating additional validity evidence and derive MID thresholds for these scores. Results from the psychometric analyses of the present study are generally supportive of the construct validity of the FACT-M in MCC patients. Particularly, the MCC-specific scores showed strong psychometric properties as part of the multi-trait analysis as well as the CFA, suggesting that these subscales may be particularly suitable for this patient population. In contrast, the Melanoma subscale showed low item convergent and divergent validity as part of the multi-trait analysis. On closer inspection, these results seem to reflect that this subscale includes disease-specific symptoms not necessarily associated to each other (e.g., changes in skin, fevers, shortness of breath, headaches, aches and pains in bones, blood in stools), which are also partly specific to melanoma. Further, this subscale covers aspects related to physical, emotional or functional wellbeing, leading to higher correlations with these subscales. Therefore, suboptimal psychometric properties would be expected. The suboptimal performance of the melanoma subscale, in combination with the melanoma surgery scale, was further confirmed as part of the CFA. Even when taking out several poor-performing items from these subscales, as proposed by Swartz et al. (2012) [23], model fit could not be improved substantially, with RMSEA being slightly worse and SRMR and CFI slightly better compared with the two-factor model containing all original items of the two scales. Following from these analyses, both the FACT-G and the MCC-specific scores seem to be performing strongest in MCC and should be the focus when interpreting HRQoL of MCC patients, while both the melanoma subscale and melanoma surgery scale should be used with caution. In the context of CFA, however, we want to stress that these analyses may not be robust given the rather small sample size to carry out CFA. Therefore, replication of CFA in a larger sample of MCC patients is highly recommended.
In addition to construct validity, strong clinical validity evidence of the FACT-M was found, with patients who were fully active (ECOG PS = 0) showing higher scores compared to those who were restricted in physically strenuous activity (ECOG PS = 1). Correlations between FACT-M and EQ-5D scores were as expected, highest in absolute value for scores assessing similar or associated concepts, supporting the concurrent validity of the FACT-M. The ability of the FACT-M to detect change was demonstrated, specifically when applying change in EQ VAS. Results of the analyses conducted with the variable 'change in tumor size' were also supportive but less conclusive than those using the EQ VAS, providing support for the choice of a patient-reported anchor, as also recommended in the literature [22]. Finally, the internal consistency reliability of all FACT-M scores was supported by Cronbach's alpha values > 0.7. In summary, this study provides strong support for the suitability of the FACT-M for use in MCC patients.
Finally, the derived MIDs were generally consistent with those obtained in the preliminary analyses conducted on Part A data [6], although slight variations were seen. We recommend using the newly derived thresholds for interpreting change in MCC as measured by the FACT-M.

Conclusion
The FACT-M was originally developed for use in melanoma. To justify its use in MCC, it was important to demonstrate satisfactory psychometric performance of the FACT-M when used in Merkel cell carcinoma patients. Results from qualitative research work support the FACT-M content validity, while the present quantitative analyses support its reliability, validity and ability to detect change in MCC patients. Therefore, the application of the FACT-M in MCC is deemed appropriate. While all FACT-M scores may be used, a shorter version of 12 itemsthe MCC-specific scoresmay be considered, as these presented the strongest psychometric properties of the FACT-M in MCC. Finally, the MID thresholds established as part of this study can serve as a guide for interpreting change scores in other research and trials in Merkel cell carcinoma.