This study reports on the psychometric assessment of the relationship between the genericEQ-5D and SF-6D and the condition specific DHP-18 for use with Type 2 diabetes. The study provided supporting evidence for the construct validity of all three measures, as we found that the measures discriminate between groups with differing levels of health problems and diabetes specific issues. This is in line with previous findings regarding their psychometric properties in diabetes samples [8, 10, 11]. However the results need to be interpreted with caution due to the indicators used, where the GPBMs may be sensitive to the co-morbid problems being reported rather than diabetes-related HRQL factors per se. It is also interesting to note that the DHP-18 discriminates between groups defined by presence or absence of non-diabetes specific co-morbid conditions. This could be linked to the progressive nature of diabetes, where co morbid health problems are more likely to be present when the impacts of diabetes are more severe. There was also evidence that the instruments measure overlapping constructs relevant in Type 2 diabetes to some extent, but there is still clear divergence and evidence of disagreement between the GPBMs and the DHP-18 across the severity scale. Further evidence about the responsiveness of the measures is required.
The results support the use of both the condition specific DHP-18 and EQ-5D and SF-6D in studies requiring the assessment of HRQL and psychosocial functioning in diabetes and there is evidence that using both a generic and condition specific measure will provide a more holistic assessment of the HRQL impacts of diabetes and related treatments. This is because the measures have some level of sensitivity to diabetes specific health concerns, and the results suggest some overlap in terms of the constructs measured which are of relevance to people with diabetes. However there is also clear divergence observed at the dimension level, where a range of areas of HRQL are assessed. Therefore the use of the measures alongside each other may increase the accuracy of outcomes assessment in Type 2 diabetes by enabling the measurement of generic health concerns alongside diabetes specific indicators. This is because the GPBMs may allow for a wider assessment of HRQL.
With regard to responsiveness, both the EQ-5D and SF-6D perform better in the groups who self-report health change, although all three measures had low SRMs indicating a generally low level of responsiveness. This low level of sensitivity could be problematic in the assessment of change in QALYs before and after interventions. However, this finding could be due to the study design and sample used, which was not testing a specific intervention, but was a population survey testing a change in service structure, where health may not be expected to change for all respondents between baseline and follow up. Secondly, the measure of change used was a self-report generic question which may not have a strong relationship with changes on generic or diabetes specific PROMs. It may be important to investigate responsiveness in more detail using diabetes specific indicators of health change. Recently, a five level version of EQ-5D (EQ-5D-5L)  has been developed, and this may increase the sensitivity of the instrument to change over time. However direct utility values for EQ-5D-5L are not yet available.
Another key finding of this work is the strong relationship between the EQ-5D and SF-6D which has been found for diabetes  but is not consistently found across other health conditions . The utility values derived from the measures were similar, but due to differences in the range of the utility scale (where SF-6D has a much smaller range) the spread of values differed. This affects agreement at the more severe end of the utility scale, where less SF-6D values are available, and this has been found elsewhere using similar methods . The utility scales were well correlated and at the dimension level, the correlations across similar dimensions indicates overlap in the constructs being measured. Both GPBMs also displayed evidence of distinguishing between clinical and severity groups. This means that both measures have a level of validity for use in Type 2 diabetes, and the values from both instruments could be used in the estimation of QALYs with some confidence. The overlap between the measures means that there is not the requirement to include both in surveys, and there are advantages and disadvantages to both. EQ-5D is short and easy to complete, and is accepted by NICE for use in the economic evaluation of interventions. The SF-6D is derived from the SF-36 or SF-12, and therefore requires this to be included, but these measures also provide detailed information about the HRQL of patient samples.
There are a number of limitations to this study which should be considered when interpreting the findings. Firstly, psychometric validity is difficult to prove as there is no gold standard for the measurement of outcomes against which to compare the measures. Therefore validity can only be inferred against other indicators and across the instruments. Secondly, the findings are limited to the sample used which has specific characteristics which may impact on findings, particularly in relation to the level of responsiveness that should be expected in a population survey. Further work should be done to test the validity and responsiveness of EQ-5D, SF-6D and DHP-18 in relation to other diabetes specific PROMS and clinical indicators using a range of patient samples (including clinical trials to assess responsiveness in more detail). This strategy has been used in the assessment of the EQ-5D and SF-6D across mental health conditions . Psychometric evidence is one method of assessing validity, and should be considered alongside other evidence to build up a picture of the measures performance. This study complements an earlier systematic review that found support for the construct validity of EQ-5D . Qualitative work could also be used to assess whether all of the HRQL issues of importance to people with diabetes are assessed by the PROMS that are used for the condition (see, for example Brazier et al.  who used this approach in mental health. Finally, the results are limited to Type 2 diabetes, and further assessment of the GPBMs and Type 1 diabetes specific PROMs is warranted.