Cross-sectional validity and specificity of comprehensive measurement in lymphedema and lipedema of the lower extremity: a comparison of five outcome instruments

Background Literature on the validity of outcome measurement in lymphedema and lipedema is very sparse. This study aimed to examine the convergent, divergent and discriminant validity of a set of 5 instruments in both conditions. Methods Cross-sectional outcome was measured by the generic Short Form 36 (SF-36), the lymphedema-specific Freiburg Quality of Life Assessment for lymphatic disorders, Short Version (FLQA-lk), the knee-specific Knee Outcome Survey Activities of Daily Living Scale (KOS-ADL), the Symptom Checklist-90-revised (SCL-90R), and the Six-Minute Walk Test (6 MWT). Construct convergent/divergent validity was quantified by bivariate correlations and multivariate factor analysis, and discriminant validity by standardized mean differences (SMDs). Results Health was consistently better in lymphedema (n = 107) than in lipedema (n = 96). The highest construct convergence was found for physical health between the SF-36 and KOS-ADL (bivariate correlations up to 0.78, factor loads up to 0.85, explained variance up to 56.8%). The second most important factor was mental health (bivariate correlations up to 0.79, factor loads up to 0.86, explained variance up to 13.3%). Discriminant validity was greatest for the FLQA-lk Physical complaints (adjusted SMD = 0.93) followed by the SF-36 Bodily pain (adjusted SMD = 0.83), KOS-ADL Function (adjusted SMD = 0.47) and SF-36 Vitality (adjusted SMD = 0.39). Conclusions All five instruments have specific strengths and can be implemented according to the scope and aim of the outcome examination. A minimum measurement set should comprise: the SF-36 Bodily pain, SF-36 Vitality, FLQA-lk Physical complaints, FLQA-lk Social life, FLQA-lk Emotional well-being, FLQA-lk Health state, KOS-ADL Symptoms, KOS-ADL Function, and the SCL-90R Interpersonal sensitivity.


Introduction
Lymphedema and lipedema of the leg are burdensome chronic diseases, for which no curative treatment has yet been found [1][2][3][4][5][6][7]. Primary lymphedema is characterized by intrauterine malformation or genetic deformity with impaired lymphatic transport [1,2]. Secondary lymphedema is caused by an ineffective lymphatic flow, mostly frequently the result of traumatic or iatrogenic lymphatic vessel interruption [1,2]. As a consequence of reduced lymphatic transport interstitial fluid accumulates, with chronic swelling of the drainage region involved. For the diagnosis of lymphedema the patient's case history and physical status are usually sufficient and specific enough [1][2][3]. There are no specific diagnostic tests for lymphedema.
Lipedema is characterized by the abnormal, disproportional deposition of subcutaneous fat in the extremities, leading to a disproportionate enlargement of the legs, and, in some cases, also the arms [4,5]. Lipedema is always linked with daily pain, i.e. allodynia, ranging from disturbing heavy legs, pain on contact to pain that is permanent and disabling [4][5][6][7]. Lipedemia almost exclusively affects women and is probably due to hormonal stimulation and /or a genetic predisposition. It is not necessarily linked to obesity, but it may be induced and further aggravated by weight gain. Diagnosis is based on case history and clinical signs [5][6][7]. As for lymphedema, no technical or objective tests have yet been developed to confirm the diagnosis of lipedema. A differential diagnosis can therefore be challenging with the potential for misdiagnosis as lymphedema, obesity, or rheumatic diseases, such as fibromyalgia [4,7].
Outcome data regarding the health and quality of life of patients with lymph-or lipedema of the lower extremities are scarce; some are limited to special syndromes [8][9][10]. In 2015, for example, we published a cross-sectional outcome report on primary and secondary lymphedema of the lower extremity [11]. That initial study measured health and quality of life, including social functioning and specific psychosocial and mental health dimensions, using a combination of comprehensive and condition-specific scales. We found that the health of primary lymphedema patients was unaffected when compared to specific population norms, whereas secondary lymphedema patients reported limited physical function and physical and emotional role performance compared to the norm. Our earlier research did not examine lipedema. A recent study presented crosssectional data collected with the Short Form 36 (SF-36) in n = 18 patients with lymphedema together with cholestasis, a very rare syndrome [9]. The SF-36 findings of that study were similar to our own. A recent validation study by Van den Pas et al. of the Lymphedema Quality-of-Life Questionnaire (LYMQOL) showed only correlations and presented no descriptive data from the SF-36 [8].
A comparative understanding of the psychometric properties of different instruments' scales is essential in order to further the aim of elaborated outcome measurement in any health condition [12]. In this context, validity plays the major role [12,13]. To the best of our knowledge, for lipedema of the lower extremities no detailed data has yet been published on the measurement properties and, in particular, on the validity of comprehensive and specific outcome assessments. In lymphedema, the report by Van den Pas et al. 2016 mentioned earlier is the only comparable study providing an analysis of validity [8].
Our analysis aimed 1) to examine and to compare the cross-sectional validity of the measurement scales of different outcome instruments covering specific constructs of health dimensions (especially convergent construct validity) and 2) to determine their ability to specify and to differentiate between lymphedema and lipedema (discriminant construct validity).

Patients and data sampling
Patients were consecutively referred by their family physician, internist or angiologist to the angiology department of the Rehabilitation Clinic "RehaClinic", Bad Zurzach, Switzerland for outpatient consultation or inpatient treatment. The examination of outpatients aimed to establish a plan for the future management of their condition by their family doctors and relevant therapists outside our clinic. Inpatient treatment consisted in intensive complex decongestive lymphatic therapy and comprehensive rehabilitation, mainly through aquatic and land-based physiotherapy. The Swiss health insurance companies reimbursed inpatient rehabilitation, on condition that patients were still suffering from symptoms needing further treatment, despite having received correct outpatient physiotherapy and compression therapy. The study was approved by the ethics committee of Aarau, Canton Aargau, Switzerland (EK AG 2008/026) and written informed consent was obtained from all study participants.
The inclusion criteria were as follows: age of 18 years or older, and a confirmed diagnosis of 1) lymphedema of the leg stage II-III, or 2) lipedema stage I-III, or 3) combined lip−/lymphedema, denoting advanced lipedema of the lower extremity (with secondary lymphedema characterizing the course) in accordance with the guidelines for the two syndromes [1,5]. All diagnoses were made or confirmed by the head of the angiology department (SW). A diagnosis of secondary lower limb lymphedema required that the patient's case history include one of the following explanatory causal events: trauma, surgery, neoplasm or its subsequent treatment [14,15].
The exclusion criteria were the following: 1) Edema combined with a predominantly non-lymphatic or nonlipedema component, especially edema caused by venous insufficiency (>stage C2 according to the CEAP classification), cardiac or renal failure, or liver insufficiency [16]. 2) A body mass index (BMI) > 50.0 reflecting severe obesity, which has a major impact on the levels of health dimensions in contrast to the lipedema alone. 3) Mixed edema of unknown origin, and/or classification as lymphedema or lipedema was impossible. 4) Assessment impossible due to the patient's insufficient knowledge of the German language, insufficient psycho-intellectual abilities, or severe somatic illness.

Measures
Sociodemographic and disease-relevant data were recorded using a standardized questionnaire that has proved its worth in several of our previous studies [17]. All necessary medical records were obtained to enable confirmation of the diagnosis and evaluation of the inclusion and exclusion criteria and the number of comorbid conditions.
The following five instruments were used. The Short Form 36 (SF-36) is the questionnaire most widely used for the self-assessment of generic health and quality of life [18][19][20]. We used the revised version SF36-version 2 [19,20]. The instrument's 36 items build eight dimensions, namely, physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional and mental health. A complex linear combination of the eight scales (each of which represents a content and construct dimension) forms two component summaries (physical and mental). The SF-36 is used world-wide, which facilitates good comparability among various health conditions [18].
The Freiburg Quality of Life Assessment for lymphatic disorders, Short Version (FLQA-lk) is comprised of 33 items composing six scales: physical symptoms, daily and professional life, social life, mental health, therapy of the lymphatic disorders and satisfaction [21]. The sum of the 33 items gives the total score. The FLQA-lk is, to our knowledge, the only validated disease-specific instrument for lymphatic disorders in the German language [11].
The Knee Outcome Survey Activities of Daily Living Scale (KOS-ADL) measures symptoms and quantifies the level of impairment due to knee pathologies (pain, swelling, stiffness, etc.) and the resulting functional restrictions affecting activities of daily living (stair-climbing, kneeling, etc.) [22][23][24]. The questionnaire is short (14 items) and has good psychometric properties [22]. The KOS-ADL was chosen because swelling of the lower extremity may lead to impaired knee function and symptoms. The subscale symptoms (items) and functions are summed up to give the total score. A numeric rating scale (0-100) quantifies the change of function in activities of daily living compared to the function pre-edema (0 = no function, 100 = function as before).
From the Symptom CheckList-90-Revised (SCL-90R), two further scales, measuring interpersonal sensitivity (nine items) and obsessive/compulsive (10 items), were selected. The SCL-90R is one of the best established tools for assessing psychiatric syndromes [25,26]. Both syndromes mentioned may affect young women with lipedema or lymphedema, chronic conditions having a potential impact on body image and demanding strict therapy adherence. An altered body image due to swollen legs may affect interpersonal sensitivity and the demands of strict therapy adherence may lead to obsessive/compulsive signs. The above two scales measure constructs not covered by the mental health dimensions of the most commonly used instruments.
Finally, we applied the Six-minute Walking Test (6 MWT), one of the most frequently used and responsive functional performance tests [27][28][29]. All the instruments were implemented in their validated German versions.

Analysis
The cross-sectional assessment of outpatients took place on the day of the first consultation and that of inpatients on the day of admission for therapy (before therapy). The instruments' missing rules had to be fulfilled in order to determine valid scales for the analysis (program criteria). Thus, for each scale of the SF-36 patients had to complete more than 50% of the items [19,20]. Since no missing rules were defined in the original descriptions and manuals of the other questionnaires used, the requirement was set at more than 66.7% (two thirds) completed items. This was the rule as originally applied in a similar study assessing outcome after total shoulder arthroplasty [30].
All scores were scaled from 0 = worst health, maximum symptoms/disability to 100 = best health, no symptoms, full function, as originally described for the SF-36. The one exception was the 6 MWT, where the walking distance was quantified in meters (m). Condition-stratified descriptive data included floor and ceiling effects (percentages) to quantify an instrument's ability to depict the whole spectrum of the disease and to specify symptom or impairment levels.
In validity testing, the terms "content", "criterion" and "construct" validity describe different focuses of testing and show overlapping aims and contents, which add and correlate together to provide an overall picture of measurement precision [12,13,31]. Content validity and reliability have been tested in the manuals and first presentations of the original questionnaires and their translations [18][19][20][21][22][23][24][25][26][27][28][29]. The SF-36, which has been used for more than three decades in thousands of studies and settings, serves as the "gold" standard for the examination of concurrent criterion validity, for example, to quantify the convergent validity of the complex construct of pain [19,20,31]. The content, criterion and construct validity of the SF-36 Mental health for the measurement of depression has recently been exhaustively demonstrated [32].
Correlation analysis (parametric product moment according to Pearson) and factor analysis were used to examine the construct convergence and divergence (or discriminant validity) of the scales [12,13,31]. The two resulting 24 × 24 half matrices of bivariate correlations for lymphedema and lipedema, i.e. 576 single correlation coefficients, are difficult to summarize and are shown in full in two separate appendices.
Factor analysis is a multivariate correlation analysis designed to reduce the number of dimensions and to specify common constructs [31]. The factor loads of different instruments' scales quantify the convergent and divergent/discriminant validity of complex constructs, for example pain as a syndromic dimension [13]. Principal component factor analysis with varimax rotation and parallel analysis to determine the number of valid factors was used to provide the explorative characteristics for this purpose [31]. This technique determines common vectors that summarize the direction of several dimensions/scales. The orthogonal projection (cosines) of each scale on each vector (=factor), the "factor load", reflects the construct convergence of the scale to the common dimension of the factor. Low correlations and factor loads reflect divergent construct validity [12,13]. All the instruments' summary and total scores were excluded from the factor analysis, because their constructs are already comprised in the single scores. The 6 MWT was also excluded, because walking distance data were available only for the inpatients (n = 89 lymphedema, n = 64 lipedema). Inclusion of those data would have restricted the analysis to the inpatients. Missing values were replaced by mean values of the subjects with completed scales. We used Velicer's minimum average partial (MAP) test and parallel analysis as criteria to determine the number of factors to be retained [31]. Both criteria are upgraded methods of the somewhat outdated "Eigen-value> 1.0″ criterion [31].
The bivariate comparison of the scores for lipedema and lymphedema was performed using standardized mean differences (SMDs), in order to quantify the instruments' ability to specify and to differentiate between the two conditions (discriminant construct validity or "known groups" validitya component of construct validity) [12,13,31]. SMDs are well-known measures of effect differences between verum (active study drug) and placebo and widely used in randomized controlled trials. In this study, SMDs were used as the standardized differences between two scores (lymphedema and lipedema), in line with the original description of their application [33].
Multivariate, adjusted SMDs were calculated by multivariate regression analysis, modeling the score difference (between lymphedema and lipedema) as dependent variable by the potential confounders age, education level, number of comorbidities, and in−/outpatient status as independent variables. Score differences that are independent from those co-variates can thus be obtained. Further substantial confounders are sex and BMI (being overweight). These were not included in the regression modeling, because they are defining characteristics of lipedema and related to its diagnosis.
In determining the sample size, the level of the difference in outcome between lymphedema and lipedema was an important consideration. In order to reach statistical significance for an SMD = 0.30, the sample size for each condition should be n ≥ 87 (minimal degrees of freedom = 87 + 87-2 = 172) [34]. This constitutes the lower limit (0.30) of the range of 0.30-0.50 currently considered to indicate minimum clinically important differences (MCIDs) [34]. In other words, above that level, differences become subjectively perceptible on the group level. Our sample sizes met this requirement. A doubling of the sampling effort and costs to increase the total sample size to 2*2*87 = 348 would narrow the upper and lower limits of the 95% confidence interval by only 0.075 each [33,34]. Furthermore, in order to be sufficiently determined, every factor analysis should comprise at least 5 cases per variable, i.e. in our analysis, 5*19(scales) = 95 patients per diagnostic group with complete data in all scales [31]. Table 1 presents the basic data of n = 107 patients with lymphedema and n = 96 with lipedema. Compared to lipedema, the lymphedema group was, on average, 9.7 years older, slimmer (BMI 6.7 lower) and also comprised men (29%). In−/outpatient status, educational level and number of comorbidities were similarly distributed.

Descriptive data
The score data on all instruments are shown in Table 2. There were few or no floor effects (maximum 6% on the SF-36 Bodily pain scale in lipedema). In lymphedema, ceiling effects were moderate (10-30%) on the SF-36 Role physical and Bodily pain scales, the FLQA-lk Social life and Treatment scales, the KOS-ADL Function and both SCL-90R scales; but they were high on the SF-36 Social functioning (34%) and Role emotional (36%). In lipedema, the ceiling rates were moderate on the SF-36 Role physical, Social functioning, and Role emotional scales. A high ceiling was observed on FLQA-lk Treatment (41%), reflecting that patients did not experience treatment of their condition as a burden.

Ability of the scales to specify between lymphedema and lipedema
The outcome of lymphedema was consistently better on all scales compared to lipedema (Table 2)

Bivariate correlations
Appendices 1 (lymphedema) and 2 (lipedema) present the bivariate correlations. The abbreviations for the scores can be found in Table 2 Everyday life, KOS-ADL Function and Total score. The total scores of the FLQA-lk and the KOS-ADL correlated with each other. In lipedema most corresponding bivariate correlations were consistently lower than in lymphedema. Across all instruments, the physical scales showed greater divergence from the psychosocial scales than in lymphedema. For example, the correlation of the SF-36 Physical functioning with the SF-36 MCS was 0.42 in lymphedema and 0.05 in lipedema. Between instruments, moderate to high correlations were found between the SF-36-Physical functioning and the KOS-ADL Function and Total score; between both the SF-36 Role physical and PCS and the FLQA-lk Everyday life, the KOS-ADL Symptoms, Function and Total score; and between the SF-36 Bodily pain and the FLQA-lk Total score. In lipedema, unlike lymphedema, the 6 MWT correlated well with the SF-36 Physical functioning, the SF-36 PCS (highest correlation 0.80), the KOS-ADL Symptoms and the KOS-ADL Total score. In lipedema all psychosocial scales showed low correlations of < 0.70.

Factor analysis
Explorative factor analysis extracted two major health dimensions, physical health and mental health, for both conditions by means of the MAP test and the parallel   (Table 3). Both solutions fitted well, explaining total variances of 67.1 and 61.0% respectively. In both conditions, physical health was by far the more important factor. In lymphedema, the SF-36 Physical functioning, Role physical and Bodily pain and the FLQA-lk Physical complaints and Everyday life showed high factor loads on physical health (≥0.70) as did all three KOS-ADL scores (56.8% explained variance). Regarding the mental health factor, the high factor loads were on the SF-36 Vitality and Mental health, the FLQA-lk Social Life and Emotional well-being, and both SCL-90R scales (10.3% explained variance).
In lipedema the factor solutions in mental health were very similar (13.3% explained variance) but were slightly different in physical health (47.7% explained variance). The FLQA-lk Physical complaints loaded much lower.
Overall, the greatest convergence (within the factors) occurred between the SF-36 and the KOS-ADL on physical health and between the SF-36 and the SCL-90R on mental health. The greatest divergence (between the factors) emerged between the KOS-ADL and the SCL-90R. Consistent with the bivariate analysis, the divergence between physical and mental/psychosocial dimensions was greater in lipedema than in lymphedema: for example, the loading of SF-36 Physical functioning on mental health was 0.28 in lymphedema but 0.02 in lipedema.

Discussion
To our knowledge this is the first study to examine and to compare a comprehensive set of generic and condition-specific (edema and leg) outcome measurement scales and to analyze their cross-sectional validity within specific construct dimensions in lymphedema and lipedema. Our detailed findings on the validity of the various scales may help in the process of matching Legend: see Table 2. Bold are factor loads≥0.70. Excluded are all total scores measurement instruments to the aims and scope of the outcome examination. Overall, the generic SF-36 scale showed high content and construct convergence with the condition-specific tools in both physical and mental health dimensions (convergent validity) and good ability to differentiate between lymphedema and lipedema (discriminant validity). As mentioned earlier, since the validity of the SF-36 is best proven, it serves also as criterion for validity for the other instruments. As expected, the ability to specify between lymphedema and lipedema was greatest in pain.
Since, by definition, pain characterizes lipedema but not lymphedema, the discriminant validity of the FLQAlk Physical complaints (which includes pain) and the SF-36 Bodily pain represents a "known-groups" validity, which also has the characteristic of criterion validity [13]. The "gold standard" criterion is the diagnosis lymphedema versus lipedema, characterized by the presence or absence of pain. The ability of the two abovementioned scales to specify and to discriminate between the two conditions remained stable even after multivariate adjustment for confounders. Construct divergence emerged between the physical and the mental health scales, especially so for the SF-36, the KOS-ADL and the SCL-90R, and to a slightly lesser degree for the FLQA-lk (see factor analysis).
The FLQA-lk, designed specifically for lymphedema, aligned well with the other tools in measuring that condition. When measuring outcome in lipedema, however, it represented a slightly different construct compared to the other tools (see factor analysis). The FLQA-lk's ability to specify/differentiate between the two conditions was high on four of the six scales and on the Total score. In both lymphedema and lipedema, the KOS-ADL was confirmed as a pure kneespecific tool and the SCL-90R as a pure mental health measure. The KOS-ADL was the most powerful instrument for the specification of lipedema from lymphedema, especially on function, whereas only the Interpersonal sensitivity scale of the SCL-90R differentiated satisfactorily between the two diagnoses. There was a much higher construct overlap between the 6 MWT and the function scales of the SF-36 and the KOS-ADL Physical complaints and Total score in lipedema than in lymphedema. The 6 MWT's potential to differentiate between the two conditions was moderate but as a result of the smaller inpatient sample sizes, the SMDs did not reach statistical significance.
The comparable study by Van den Pas, which was confined to the correlations between the LYMQOL and the two SF-36 component summary scales (PCS and MCS) in lymphedema of the leg (n = 67), found correlations similar to our own for physical health [8] Our study has several strengths. It is the first study to test both comprehensive and specific outcome measurement using five standardized, validated instruments in lymphedema and lipedema. The results provide a basis for recommendations on the establishment of assessment sets. In comparison to other studies, our sample sizes were large enough to ensure that minimally clinically important differences reached statistical significance and to fulfil the requirements for valid factor analysis. Beyond bivariate correlation analysis, multivariate factor analysis explored the most important dimensions of convergent and divergent constructs and provided dimension reduction to facilitate interpretation. Discriminant validity was examined by adjusted differences between the two diseases, giving results that are independent from unequally distributed confounding characteristics and more readily generalizable.
The limitations of our study were its cross-sectional design, which excludes the dimension of outcome changes over time, and the fact that the 6 MWT was only available for the inpatient sub-sample. Our findings on cross-sectional validity need to be completed by validity data from longitudinal studies and especially by analysis of the sensitivity to change (responsiveness).

Conclusions
This study provides unique elaborated data on the validity of the instruments used for the comprehensive and specific outcome measurement of lipedema and lymphedema. We would recommend that a core assessment set should include the following nine scales: SF-36 Bodily pain, SF-36 Vitality, FLQA-lk Physical complaints, FLQA-lk Social life, FLQA-lk Emotional well-being, FLQA-lk Health state, KOS-ADL Symptoms, KOS-ADL Function, and SCL-90R Interpersonal sensitivity. Further measurement needs can be met by complementing this basic scale set with additional constructs, for example the SF-36 Role physical (dimension of role performance), the FLQA-lk Treatment (for distress due to treatment), the SCL-90R Obsessive/compulsive, or the 6 MWT (examiner-based function). For an exhaustive assessment, including the total scores, the full set comprising all five complete instruments can be used.