Cultural adaptation and pilot study
The cultural adaptation of the ASES-p questionnaire, from English to Spanish, was performed following the recommendations of the International Quality of Life Assessment (IQOLA) project. The IQOLA protocol is considered a reference standard for translating health status instruments [20, 21]. Two persons, an orthopaedic surgeon and a professional translator (not familiar with shoulder related pathologies), both native Spanish speakers independently translated the English version into Spanish. After discussing the conceptual equivalence of the two translations and resolving discrepancies, a consensus was reached for the first Spanish version of the ASES-p questionnaire. In a second phase, two professional translators, whose first language was English, back translated the first Spanish ASES-p version into English. Discrepancies were again discussed and resolved. The back translated English version was compared with the original ASES-p version, by the participating translators. Differences were discussed and corresponding changes were made in the Spanish version. One of the principal investigators (KV) participated in the discussions between the parts in all translation stages. A committee of two orthopaedic surgeons (DG, FS), one health professional (AE), a professional translator and KV accepted the pre-final translated version of ASES-p. In order to assess its comprehensiveness, this version was administered to a sample of n = 10 randomly chosen shoulder pathology patients. They were asked to fill-in the scale and comment on its understanding and item relevance. None of the pilot study patients were included in the validation study.
Patient recruitment and data collection
Participants were recruited by the orthopaedic surgeons of five public hospitals, located in the Basque Country (Spain). Included patients were ≥18 years old, had a shoulder pathology, were going to receive a surgical or conservative treatment in the affected shoulder, and were able to speak and write in Spanish. Patients previously operated in the affected shoulder and those with cognitive impairment were excluded from the study.
Upon recruitment, functional assessment of the affected shoulder was performed by the orthopaedic surgeons, with the CMS instrument. Information on age, sex, marital status, daily lifestyle habits, medication consumption, additional pathologies and other questions of interest were filled in by the participants in their homes. The socio-demographic and clinical variables were sent by postal mail and replies were received in the same way. A reminder letter was sent to those not responding within two weeks, followed by further phone calls if necessary. If despite all efforts no reply was obtained, the patients were considered drop-outs. All assessments were performed twice: at recruitment and after the treatment. Conservatively treated and operated patients were assessed at 3 and 6 months respectively. Only one shoulder per patient was considered in the validation analyses.
The ASES-p questionnaire
The ASES-p scale is composed of 11 items, divided in 2 subscales: pain (1 item) and function (10 items). The pain item evaluates current pain level on a 10 cm VAS with minimum and maximum values “0 = no pain at all” and “10 = pain as bad as it can be” respectively. The 10 function items evaluate the ability to perform certain daily life activities, and are answered on a 4 point Likert scale from “0 = unable to do” to “3 = not difficult”. Each subscale is assigned from 0 to 50 points, with higher values indicating better health status. Points are calculated as: (10-VAS × 5) and ((5/3) × sum of 10 function items) for the pain and function subscale respectively. The total ASES-p score is the sum of the two subscales, with possible values ranging between 0 and 100 points. The originally published form of the scale was implemented in this study [10].
For analyses where individual item replies were considered, the pain item was reversed. In order to ease interpretation, this item’s replies were transformed to be on the same direction with the function items (i.e. more points, better health). Maximum and no pain at all were thus given 0 and 10 points respectively. For certain analyses, the pain score was further categorized in four groups, approximating the 4 point Likert responses. In those cases, values 0–1 were considered as “3 = no pain”; values 2–5 were “2 = some pain”; 6–8 as “1 = a lot of pain” and values 9–10 as “0 = maximum pain”. This categorization was decided by examining this item’s responses in relation to pain and general health items of SF-36. All analyses are based on available data. No imputations have been performed in this study.
Other measures
The Constant-Murley score (CMS) was published in 1987 and was approved by the executive committee of the European Society for Surgery of the Shoulder and the Elbow (ESSSE) [9, 22]. It assesses pain level, activities of daily living (ADL), range of movement (ROM) and strength, based on 2, 4, 4 and 1 item respectively. It is the most widely used questionnaire for the functional assessment of the shoulder and requires a specialist’s assessment [8]. In this study the ESSSE [23] CMS version was implemented. The strength component of CMS was assessed with the use of adjustable weights, while goniometers were used for the ROM items of flexion and abduction. The CMS questionnaire was filed in by the participating orthopaedic surgeons, all of whom were experienced in its administration. The Original CMS (CMSO) score assigns 15, 20, 40 and 25 points to each of its four components and its total score ranges between 0 and 100 points with higher scores indicating better shoulder function [9]. The relative CMS score adjusted for age and sex (CMSR) as suggested by the original CMS author [24] and an additional CMS, excluding the strength component (CMSNS) [25] were also implemented in this study.
The 36-Item Short Form health Survey (SF-36) [26] is a generic instrument composed of 36 items assessing eight dimensions: physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional and mental health. The 8 dimensions are further grouped into two summary components: a physical (PCS) and a mental (MCS) one. Dimension and component summary scores range from 0–100 points. The SF-36 has been translated and validated into Spanish [27]. In this study the SF-36 v2 version was implemented and scores were derived using the Quality Metric Software (QualityMetric Health Outcomes™ Scoring Software 4.5.1). The Barthel Index [28, 29] was used for evaluating basic activities of daily living (BADL). The scale is composed of 10 items and its total score ranges from 0 to 100 points, indicating completely dependent and independent individuals respectively [30].
For surgically treated patients, data related to hospital admission and intervention of the affected side was gathered from the clinical history files. Patients not receiving any treatment during the study course were also assessed at the end of follow-up.
Statistical analysis
Categorical data are presented with frequencies and percentages. Continuous data are presented with means and standard deviations (SD) when normally distributed, or medians and interquartile range (IQR) when skewed. Between-group comparisons of categorical and continuous data were performed with the chi-square, Student’s t-test or Wilcoxon rank-sum test respectively. Pre-post comparisons of the ASES-p scores performed with the paired t-test.
Reliability
Reliability was estimated with Cronbach’s alpha [31]. Item-item and item-total correlations were estimated with Spearman’s correlation coefficient (rS). Values ≥0.70 and ≥30 respectively were considered acceptable [32]. Item-total correlations were controlled for overlapping, as total scores excluded the respective item (i.e. “rest” total score was implemented). Cronbach’s alpha was estimated considering both all ASES-p items (with pain as categorical) and the 10 function scale items.
Validity
Construct validity was studied with two separate methods: confirmatory factor analysis (CFA) [33] and the Rasch model [34]. Even though the ASES-p is considered as having two components (pain and function), the fact that pain is evaluated with a single item does not allow neither for a two-factor CFA model, nor for Rasch unidimensionality to be tested per component. For this reason, in the present study 10 (function only) and 11 item (pain and function jointly) one latent factor models were fitted. CFA was performed with the unweigthed least squares (ULSMV) estimation method; recommended for ordinal or continuous non-normal indicators and samples <200 subjects [35]. In these models pain was implemented in its reversed form. Factor loadings ≥0.40 were considered acceptable [33]. The goodness of fit indexes examined were the root mean square error of approximation (RSMEA) with acceptable values <0.08; the Tucker-Lewis Index (TLI) and comparative fit index (CFI) with acceptable values >0.90 [33]. Residual values and modification indexes (MI) were examined. MI values ≥10 were considered for possible model modifications. The unidimensionality of the scale was additionally tested using the Rasch model, with pain as a categorical item. Difficulty estimations (logit) were derived and the infit and outfit mean square (MNSQ) statistics were explored. Desirable values for the latter lie between 0.6 and 1.4 [34]. Values <0.5 are less productive and those between 1.5 and 2.0 are unproductive for measurement construction, but none of them degrading; while values >2.0 are distorting the measurement system [36]. Person and item reliability indexes, as well as separation statistics were also examined with desired values being >0.80 and >2.0 respectively [34]. Point-measure correlations and average category measures were studied. Unidimensionality was further assessed via principal component analysis (PCA) of the Rasch model residuals. Lack of contrast eigenvalues ≥3 supports the unidimensionality of the scale. The Rasch-Andrich Rating Scale Model, for polytomous items was used [34, 37].
Convergent and divergent validity was explored via correlations with other scales. The ASES-p total score was correlated with the CMS, the SF-36 scale, and the Barthel index. It was hypothesized that ASES-p would present higher correlations (rS) with the CMS, the physical dimensions and physical component of SF-36, and Barthel. Lower correlations were expected with the mental dimensions and component of SF-36. Known-group validity was studied by examining the ASES-p score values against CMS, PCS and Barthel after transforming them to categorical variables. Comparisons were performed with the Jonckheere–Terpstra [38], testing for a trend among ordered categories and Student’s t-test when two categories where compared.
Responsiveness to change
At follow up, patients were asked to evaluate whether their ROM and capacity in doing their ADL had improved, compared to baseline. Those with a positive reply to both questions were considered as improved. We hypothesized that these patients would present higher ASES-p pre-post score differences, compared to the rest. Three effect sizes were calculated: the standardized effect size (SES), the standardized response mean (SRM) and the SRM adjusted for paired observations (SRMAdj). The SES is the mean difference between baseline and follow-up scores divided by the SD of the baseline score [39]. The SRM is the mean difference between baseline and follow-up divided by the SD of the difference [39]; while the SRMAdj was calculated considering the pooled effect size and the correlation of the respective pre-post observations [40]. Cohen’s definition about magnitude of effect sizes was considered, with values of 0.20, 0.50 and 0.80 perceived as small, medium and large [41]. Effect sizes were additionally studied considering three treatment groups: surgical intervention, infiltration and other.
Sample size
Around 10 subjects per item are recommended for scale validations [33]. Based on previous experience with this kind of studies it was estimated that 30% of the recruited subjects would not eventually participate. Given that the ASES-p consists of 11 items, a minimum of N = 160 subjects had to be recruited in the study.
Statistically significant results were considered those with p-values ≤ 0.05. Analyses were performed with the softwares of SAS (version 9.3; SAS Institute, Cary, NC), Mplus (version 7.4; Muthén et al., 1998-2015) and Winsteps (version 3.91.0.0; John M. Linacre, Chicago).