Validation of the Japanese version of the EORTC hepatocellular carcinoma-specific quality of life questionnaire module (QLQ-HCC18)

Background This study examined the measurement properties of the Japanese version of the European Organisation for Research and Treatment of Cancer (EORTC) Hepatocellular Carcinoma-Specific Quality of Life Questionnaire (QLQ-HCC18). Methods EORTC quality of life (QOL) translation guidelines were followed to create a Japanese version of the EORTC QLQ-HCC18. This was then administered to 192 patients with hepatocellular carcinoma along with the EORTC QLQ-C30 and FACT-Hep questionnaires. Tests for reliability and validity were conducted including comparison of scores between the EORTC and FACT questionnaire and detailed assessment of the new scales and items in clinically distinct groups of patients. Results Multi-trait scaling analysis confirmed three putative scales in the QLQ-HCC18, fatigue, fever and nutrition. Cronbach’s alpha for these scales were between 0.68 and 0.78. The QLQ-HCC18 scales correlated with scales measuring similar items in the FACT-Hep and the questionnaire was stable over time with an intra-class correlation score of 0.70 for almost all scales. The questionnaire had the ability to distinguish between patients with different Karnofsky Performance Status, and Child-Pugh liver function class. Conclusions The Japanese version of EORTC QLQ-HCC18 is a reliable supplementary measure to use with EORTC QLQ-C30 to measure QOL in Japanese patients with hepatocellular carcinoma.


Background
Hepatocellular carcinoma (HCC) is the most common malignancy in the world, accounting for more than half a million new cases annually [1,2]. The highest incidence rates are in eastern and south-eastern Asia, western and central Africa [2]. The incidence is low in most developed countries, however, Japan has a very high prevalence of HCC, and 70% are caused by hepatitis C viruses [3]. Although the 5-year survival rates of up to 60 to 70% can be achieved in well-selected patients, the recurrence rate remains very high [4,5]. The 5-year recurrence rate after potentially curative liver resection is up to 80% [4][5][6]. In countries such as Japan, where cadaveric donor organs are scarce, application of liver transplantation is limited [7,8]. Thus, most patients with HCC undergo repeated nontransplant treatments such as surgical resection, percutaneous radiofrequency ablation and embolization. Although survival data and information about the side effects of treatment are widely available, much less is known about how treatment for HCC impacts upon the patients' quality of life (QOL). Given the time course of the disease, and the burden of repeated treatment, there are increasing concerns about QOL associated with HCC. When deciding upon treatment, consideration of QOL outcomes could be as important as survival. However, there are no HCC-specific QOL questionnaires in Japan.
At present, there are two disease-specific QOL questionnaires for evaluating the QOL of patients with HCC. One is the European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Group questionnaire, the QLQ-HCC18, and the other is the Functional Assessment of Cancer Therapy (FACT) Hepatobiliary (FACT-Hep) questionnaire [9,10]. As they are disease-specific, they are combined with generic questionnaires such as the QLQ-C30 and FACT generic questionnaires, respectively, to produce a generic and a specific QOL assessment [11,12]. The major difference between FACT-Hep and EORTC QLQ-HCC18 is that FACT-Hep targets not only patients with HCC but also patients with pancreatic, biliary and metastatic liver cancer, whereas the QLQ-HCC18 is designed specifically for patients with HCC. Currently there is a lack of published data demonstrating the measurement properties of EORTC QLQ-HCC18.
The objective of this study, therefore, was to develop a Japanese version of EORTC QLQ-HCC18, and to validate its measurement properties in patients with HCC.

Translation of the Japanese version of EORTC QLQ-HCC18
The EORTC guidelines for translation of the QLQ-HCC18 was followed and authorized by the EORTC [13]. This included a forward/backward translation of EORTC QLQ-HCC18. The original English version was translated into Japanese by two independent translators who were native Japanese speakers with proficiency in English. The research coordinator compared the two forward translations and checked them for any discrepancies. The discrepancies between the two translations were discussed with the translators until we agreed on one provisional forward translation. This forward translation was then back translated into English by two independent translators who were native speakers of English with proficiency in Japanese. The English back translations and the original English version were compared to assure that there were no differences in the meaning of the questions in the questionnaires. The provisional Japanese version was pilot tested on 10 patients diagnosed with HCC who had satisfied the following eligibility criteria: (1) age > 20 years; (2) ability to communicate in Japanese; (3) ability to participate in this study, as judged by an attending doctor; (4) confirmation of medical diagnosis; (5) no other concurrent malignancy; and (6) consent to participate in this study. The pilot test was conducted according to the manual provided by EORTC [13] as of June 2008. The average time necessary for completing the QLQ-HCC18 was less than 5 minutes and the questionnaire was well understandable and acceptable in most patients. Results of the translation and the pilot study were reviewed by the EORTC translation coordinator and the original author of QLQ-HCC18, to ensure the content and applicability was maintained, and the EORTC QLQ-HCC18 Japanese version was authorized by the EORTC Quality of Life Group.
The Japanese version of EORTC QLQ-HCC18 was used in this validation study.

Data collection
This study recruited 200 patients diagnosed with HCC at The University of Tokyo Hospital, one of the largest referral centers for treatment of HCC in Japan, and written consent was obtained. Patients were recruited between July 2008 and November 2008. The eligibility criteria were the same as for pilot testing. Patients completed each of the three questionnaires: EORTC QLQ-C30, QLQ-HCC18, and FACT-Hep, and a questionnaire about demographic characteristics. To confirm test-retest reliability of the Japanese version of QLQ-HCC18, patients with stable disease were invited to complete QLQ-HCC18 for a second time after two weeks. Medical data were collected by review of medical care records. The researcher checked for absent responses after receiving the questionnaire and wherever possible asked the patients to respond to the missing items. This study was conducted with the approval of the ethics committee of The University of Tokyo.

Measurements
The EORTC QLQ-C30 core questionnaire (version 3.0) is a generic QOL measure for cancer patients, and comprises a global health status/QOL scale, five multi-item functional scales, three multi-item symptom scales and single items for the assessment of symptoms and the financial impact of disease and treatment [11]. The reliability and validity of the Japanese version of the EORTC QLQ-C30 has been demonstrated [14].
EORTC QLQ-HCC18 is an 18-item HCC-specific supplemental module developed to augment QLQ-C30 and to enhance the sensitivity and specificity of HCC-related QOL issues [9]. EORTC QLQ-HCC18 was developed in four stages on the basis of the EORTC guidelines for scale development [9]. Briefly, items were created during phase one after conducting a literature review and interviewing 32 patients with HCC from four different countries as well as 10 health professionals. In phase two, a preliminary questionnaire was constructed using the EORTC item bank as a reference. In phase three, a pretest was administered to 158 patients with HCC from three countries to examine receptivity and relevance. The original questionnaire is from the end of phase three. The hypothesized scale structure and single items address aspects of chronic liver disease (nutrition, jaundice, fever, abdominal swelling), as well as QOL issues specific to the primary tumor and its treatment (fatigue, body image, pain).
The original English version contains six multi-item scales addressing fatigue, body image, jaundice, nutrition, pain and fever, as well as two single items addressing sexual life and abdominal swelling. The scales and items are linearly transformed to a 0 to 100 score, where 100 represents the worst status. An international field test (the phase 4 part of questionnaire development) is currently being conducted to examine the validity and reliability of the scores in several countries.
The reliability and validity of the original version of FACT-Hep, another hepatobiliary cancer-specific scale, has been demonstrated [10]. FACT-Hep is a 45-item self-report instrument that comprises 27 FACT General (FACT-G) items and an 18-item hepatobiliary subscale. The Japanese version of the 18-item hepatobiliary subscale was used in this study as a comparison instrument. All items are scored from 0 to 4, with higher scores indicating better QOL.

Data analysis
Multi-trait scaling analyses [15] evaluated the scale structures of QLQ-HCC18. This technique is used to test for item convergent and discriminant validity, and is based on the examination of item-scale correlations. The Pearson correlations of an item with its own scale (corrected for overlap) and other scales were calculated. Evidence of item convergent validity was defined as a correlation above 0.40 with its own scale. Evidence of item discriminant validity was based on a comparison of correlation of an item with its own scale and with other scales. Scaling success for any scale is defined as the number of convergent correlation coefficients significantly higher than the discriminant correlation coefficient divided by the total number of correlations. The mean scale and item scores were also calculated, and a frequency analysis was performed.
The following psychometric aspects were assessed: reliability, i.e., internal consistency and test-retest reliability; validity: known group comparison, and correlation analyses with the FACT-Hep.
The internal consistency reliability of the multi-item questionnaire scales was assessed by Cronbach's alpha coefficient. Preferable reliability was indicated by coefficient greater than 0.70. The test-retest reliability of the scales and single items was assessed by the intra-class correlation coefficient. Scale discriminant validity (clinical validity) was tested by known group comparisons to assess whether the questionnaire scores were able to discriminate between subgroups of patients differing in clinical status by using the Student t-test. The Karnofsky Performance Status (KPS) and Child-Pugh grade for clinical parameters were employed to form mutually exclusive patient subgroups. Higher scores in KPS signify better performance status. Liver function becomes worse in alphabetical order of Child-Pugh grade A, B, C. We hypothesized that scores of QLQ-HCC18 are low in patients with better performance status (KPS 80-100) and better liver function (Child-Pugh class A). Convergent validity was tested first by multi-trait analyses, and we then conducted another convergent validity test by correlation analyses with FACT-Hep. Pearson's correlation coefficient was used to examine the correlation between similar items in FACT-Hep and QLQ-HCC18. We hypothesized that if Pearson's correlation coefficients were more than 0.40 between scales, they were conceptually related. P < 0.05 was considered as statistically significant. Statistical analyses were performed using SAS software (SAS for Windows, release 9.1; SAS Institute Inc., Cary NC, USA).

Participants
Responses were obtained from 192 patients (eight non responders), and 139 completed the test-retest questionnaire two weeks after the first assessment.
Socio-demographic and clinical characteristics at baseline are shown in Table 1. Most patients were male (64.1%), had good performance status (86.5%) and had good liver function (66.2%).

Psychometric testing
We initially performed multi-trait scaling analyses for the putative scale structure, and the results showed that the original two-item scale of body image and jaundice had low convergent and discriminant validity. After discussion with the original author of QLQ-HCC18 (JMB) we decided to split the scale into single items. The tests were then performed on the remaining scales and four single items.
Results of the multi-trait scaling analyses are shown in Table 2. A summary of the multi-trait scaling analysis and internal consistency is shown in Table 3. The convergent correlation coefficient of the scales for fatigue, nutrition and fever varied from 0.23 to 0.75, and the scaling success rate ranged from 87% to 100%. Cronbach's alpha coefficient of these scales was satisfactory, ranging from 0.68 to 0.78. The convergent correlation coefficient of the scales for pain was 0.25, and the scaling success rate was 50%. Cronbach's alpha coefficient of this scale was 0.37. The results of the descriptive statistics of the putative scales/single items and test-retest reliability on the questionnaire are shown in Table 4. The intra-class correlation coefficients of the scales varied between 0.67 and 0.88. Ninety-four percent of the patients answered 'not at all' to item 36, which asked patients whether they were concerned by their skin or eyes being yellow. Responses to item 48, which asked about sexual function, were missing in seven patients (3.6%).
Results of the known group comparisons are shown in Table 5. Patients with poorer performance status (KPS of 70 or lower) reported significantly higher (worse) scores for all scales except for abdominal swelling and sexual interest than those with better performance status (KPS of 80-100). Patients with worse liver disease (Child-Pugh classes B and C) reported significantly higher (worse) scores for all scales except for body image and pain than those with better liver function (Child-Pugh class A).
Results of convergent validity are shown in Table 6. The QLQ-HCC18 Japanese version scales had an acceptable correlation (coefficient value over 0.40) with similar items in FACT-Hep except for items of weight loss, appetite and activity.

Discussion
This study describes psychometric testing of the Japanese version of the QLQ-HCC18 questionnaire, which is an HCC-specific module of EORTC QLQ-C30. The overall results show that this questionnaire is reliable and has acceptable measurement properties for use with the QLQ-C30 to assess health-related QOL in Japanese patients with HCC.
Assessment of QOL in cancer patients is optimally performed with a combination of a generic questionnaire and a disease-specific questionnaire to ensure that common problems are uniformly detected and reported as well as specific issues related to disease site and treatment. This framework for QOL assessment has been adopted and popularized by the EORTC Quality of Life Group and the Functional Assessment of Chronic Illness Therapy (FACIT) Organization. For patients with primary and secondary liver tumors, cholangiocarcinoma or pancreatic cancer, the FACIT system has developed a single hepatobiliary-pancreatic module [10]. The EORTC QOL Group has, however, focused in more depth on the specific clinical experiences within each disease site and therefore developed separate modules for pancreatic, primary and secondary liver cancer. The separate modules may be clinically more sensitive than a single questionnaire, although this has not yet been formally examined. A second advantage of the EORTC QLQ-HCC18 is that it provides subscale scores for different domains of functioning. FACT-Hep generates only a total score, which may obscure findings in particular problem areas. EORTC QLQ-HCC18 possesses a multi-dimensional QOL assessment that may be more useful for clinicians to direct therapy. A final advantage of the EORTC QLQ module is that it was specifically developed for use in international trials; a large database will soon be available to facilitate comparisons across studies, and there is some assurance of crosscultural suitability.
In this study, we tested the reliability and validity, including internal consistency reliability, test-retest reliability, convergent and discriminant validity, known group comparison, of the Japanese version of QLQ-HCC18. In the descriptive statistics and frequency analyses, the item assessing problems related to jaundice showed low scores. This was because few patients were jaundiced at the time of the data collection. In addition, because the Japanese belong to a race with a yellowish skin complexion, jaundice tends to be masked. The results of multi-trait scaling analyses (convergent and discriminant validity), had a good scaling success rate and acceptable Cronbach's alpha (internal consistency reliability) except for the scale for pain, which had a low scaling success rate and a low Cronbach's alpha. One reason for this may be because shoulder and abdominal pain are not necessarily related symptoms that occur simultaneously. Furthermore, although pain scales have been created in anticipation of pain caused by cancer treatment and progression, few patients had advanced cancer. The nutrition scale had a high rate of success. However, the convergent validity of the item termed "concern about low weight" was below the standard value. The nutrition scale was assumed to involve problems caused by impaired liver  function, but items regarding weight loss may also have been affected by cancer progression. Patients included in the original article and patients in this study had almost identical liver function, but the extent of cancer progression differed, and many of our patients had cancer that was detected at an earlier stage. The results of test-retest reliability showed good intra-class correlation coefficients for most scales. Results of known group comparisons showed that the module had the ability to assess differences between groups with different clinical characteristics in almost all of the scales, showing the module has clinical validity. We confirmed good correlations between the groups for most scales/single items in the two questionnaires (QLQ-HCC18 and FACT-Hep). However, correlations between items of weight loss, appetite, and activity in FACT-Hep and corresponding scales in QLQ-HCC18 were low. This may have occurred because of the reverse scoring used in appetite and activity items in FACT-Hep which may have led to confusion. While the results show the Japanese version of EORTC QLQ-HCC18 is a reliable instrument, some caution is necessary. First, these results on the QLQ-HCC18 are  preliminary as this study was performed in a single institution using the Japanese version, few patients with severe cirrhosis or advanced disease were recruited, and no patient had undergone liver transplantation, which may limit the generalizability of the findings. Second, this study did not address longitudinal construct validity and responsiveness for clinical validity. In future work, the Japanese version of EORTC QLQ-HCC18 should be performed in multicenter facilities to confirm the generalizability of the findings and to increase the number of liver transplantation groups and more severely ill patients. Furthermore, testing the sensitivity of the instrument to changes over time is needed to evaluate treatment effects.
There are currently a variety of treatment options for patients with HCC. Molecular targeted therapy for HCC has recently been introduced [16], and this will lead to increased demand for evaluating the QOL in more detail. In addition, Japanese patients with HCC are older than in other countries, which make the Japanese version of QLQ-HCC18 particularly valuable because treatment effects on QOL are more important in older patients.

Conclusion
This study showed that the Japanese version of the EORTC QLQ-HCC18 demonstrated evidence for the measurement properties of the questionnaire. These results suggest that it would be a reliable instrument for measuring QOL in patients with HCC in Japan.