Validation and reliability of the translated Malay version of the psychosocial impact of dental aesthetics questionnaire for adolescents

Background This paper describes the cross-cultural adaptation of the Psychosocial Impact of Dental Aesthetics Questionnaire (PIDAQ) into Malay version (Malay PIDAQ), an oral health-related quality of life (OHRQoL) instrument specific for orthodontics for Malaysian adolescents between 12 and 17 years old. Methods The PIDAQ was cross-culturally adapted into Malay version by forward- and backward-translation processes, followed by psychometric validations. After initial investigation of the conceptual suitability of the measure for the Malaysian population, the PIDAQ was translated into Malay, pilot tested and back translated into English. Psychometric properties were examined across two age groups (319 subjects aged 12–14 and 217 subjects aged 15–17 years old) for factor structure, internal consistency, reproducibility, discriminant and construct validity, criterion validity, and assessment of floor and ceiling effects. Results Fit indices by confirmatory factor analysis showed good fit statistics (comparative fit index = 0.936, root-mean-square error of approximation = 0.064) and invariance across age groups. Internal consistency and reproducibility tests were satisfactory (Cronbach’s α = 0.71-0.91; intra-class correlations = 0.72-0.89). Significant differences in Malay PIDAQ mean scores were observed between subjects with severe malocclusion and those with slight malocclusion based on a self-rated and an investigator-rated malocclusion index, for all subscales and all age groups (p < 0.05). Construct validity of the Malay PIDAQ subscales with those who rated themselves with excellent to poor dental appearance and those who felt they needed or did not need braces, showed significant associations for all age groups (p < 0.05). Criterion validity also showed significant association between the Malay PIDAQ scores with those with and without impact on daily activities attributed to malocclusion. There were no ceiling effects detected but floor effects were detected for the Aesthetic Concern subscale. Conclusion The study has provided initial evidence for the validity and reliability of the Malay PIDAQ to assess the impact of malocclusion on the OHRQoL of 12–17 year old Malaysian adolescents.


Background
Dental malocclusions comprise a wide spectrum of dental arrangements perceived as aesthetically poor, such as protruding teeth, crowded or crooked teeth, and spacing. These are often reasons for seeking orthodontic treatment among adolescents regardless of the severity of the malocclusion. In resource limited health care systems, priority is given to those with greater need for treatment. Indices such as the Index of Orthodontic Treatment Need (IOTN) provide an objective measure for classifying treatment need [1]. In Malaysia, IOTN is often used to recommend treatment under the national healthcare system. Clinical impression where even those of lower grades who were advised not to have treatment would still seek treatment at private practices suggests that there may be other underlying factors that may influence desire for treatment such as the psychosocial impact of malocclusion on their oral health-related quality of life (OHRQoL). However, this could not be measured using such previously mentioned clinical indices which were developed by expert consensus based on health, functional and aesthetic grounds.
In Malaysia, a large population study comprising a sample of 5,112 school children aged between 12-13 years old found a high prevalence of orthodontic treatment need with 47.9% school children having IOTN-DHC scores of 4 and above [2]. In a more recent study on 837 school children in Malaysia, it was found that 51.4% of 12-year-olds and 56.4% of 16-year-olds had IOTN-DHC scores of 4 and above [3]. This implies that the percentage of school children in Malaysia who require orthodontic treatment has remained high over the decade between the studies. Despite similar proportions needing orthodontic treatment, Zreaqat et al. [3] found a statistically significant higher demand for orthodontic correction among the older school children with 42.7% of 16-year-olds desiring treatment against only 22% of the younger age group (p < 0.001). The discrepancy in demand suggests a need to determine if patients' desire for treatment is related to the negative impact of malocclusion on their OHRQoL and, if so, to give added support to prioritise these patients for treatment.
OHRQoL measures have gained wide interest among clinicians and researchers as instruments for evaluation of patients' subjective interpretation of the impact of oral health on their quality of life. Malocclusion is a unique aspect of oral health in which unsatisfactory dental aesthetics are more frequently reported reasons for seeking treatment to improve patient's well-being rather than for functional need and failure to treat would not necessarily cause physical pain, disability or become a handicap. Few instruments have been developed which include conditioned-specific impacts to assess need for orthodontic treatment [4][5][6]. Furthermore, any instruments used should be age appropriate as measures developed for adults may not be suitable for adolescents, who comprise the majority of orthodontic patients. The instruments also need to be validated for the specific population. Although the Oral Health Impact Profile (OHIP) [7,8] and Child Oral Impacts on Daily Performances (Child-OIDP) index [9] have been cross-culturally adapted and validated for the Malaysian population, neither the OHIP nor Child-OIDP were specifically designed to evaluate subjective impacts due to malocclusion but were modified to allow assessment relevant to orthodontics. Thus there is a need for a condition-specific OHRQoL measure of malocclusion for the Malaysian population.
The Psychosocial Impact of Dental Aesthetics Questionnaire (PIDAQ) focuses only on the impact of dental appearance and arrangement on OHRQoL, which is the most common reason for seeking orthodontic treatment. It comprises four domains with 23 items: the Dental Self-Confidence (DSC) domain measures positive dental concept and comprises 6 items that assess dental appearance; the Social Impact (SI) domain assesses interpersonal sensitivity and comprises 8 items that measure anxiety levels towards other people's reaction to the appearance of the subject's teeth; the Psychological Impact domain (PI) comprises 6 items that assess negative emotions towards one's dental appearance; and the Aesthetic Concern (AC) domain contains 3 items that assess disapproval of the image of one's exposed dentition [10]. Three of the PIDAQ subscales were developed from scales which were able to discriminate subjects with excellent dental aesthetics and those with only minor irregularities as determined by IOTN-AC [6]: the DSC was adapted from the Self-Confidence Scale [11,12], SI from the Social Aspect Scale of the Orthodontic Quality of Life Questionnaire (OQLQ) and AC from the Aesthetic Scale of the OQLQ [4,5]. PI items were developed in addition to the rest of the domains [6]. The PIDAQ was specifically developed to assess perceived need for orthodontic treatment, with potential for use for assessing changes to the patient's well-being under treatment, distinguishing patients' and providers' perspectives and values as well as documenting the impact of orthodontic treatment for health policy discussions and setting of clinical guidelines [6]. Originally developed in German for young adults between 18 to 30 years old [6], it has shown good cross-cultural psychometric properties [13][14][15][16][17]. More recently, it has been adapted for younger adolescents and in various languages [18,19]. Klages et al. [10] have also demonstrated that the instrument adapted for German adolescents showed good psychometric properties across the range of ages 11 to 17.
Until this study, the PIDAQ has been neither translated nor validated for the Malaysian adolescent population. Thus, this study aimed to conduct a cross-cultural adaptation of PIDAQ into Malay version and assess the psychometric properties of the Malay PIDAQ for use by Malaysian adolescents.

Methods
The cross-cultural adaptation of the PIDAQ into Malay version comprised two parts: linguistic and psychometric validations. The linguistic validation comprised the initial translation of PIDAQ into Malay, a pre-test for the Malay version, then a back translation and final review of the draft Malay PIDAQ. The psychometric validation consisted of assessing the Malay PIDAQ's internal reliability, reproducibility, construct, criterion and discriminant validities, and floor and ceiling effects.

Translation
The PIDAQ with 23 items comprises a positive subscale measuring the domain of DSC (6 items), and 3 negative subscales measuring the domains of SI (8 items), PI (6 items) and AC (3 items). It was originally written in German but has been published in English. In this study, both the published original and adolescent versions were reviewed. First, the PIDAQ adolescent version was translated from English into Malay by six individuals who were Malay-English bilinguals. All were of Malay ethnicity and were proficient in both languages. The group of translators comprised an expert in OHRQoL measures and questionnaire validation (ZYMY), two orthodontists (WNWH, MZMM) who were expert in the field of aesthetic dentistry, two undergraduate students (SSS, SFMA) who represented the youth perspective and a secondary school teacher (SY) who taught Malay language. The initial translations were done independently.
Following the translation process, two authors (WNWH and ZYMY) met and compared and analyzed all translations in terms of content and wording while paying attention to conceptual and item equivalence between the original index and its Malay version. Following the meeting, a single consensus translation called the draft Malay PIDAQ was agreed. All authors agreed to add a not relevant option to the question on Don't like own teeth on video as it was felt that not all Malaysian school children have access to a video recorder (including smart phones). This was later confirmed during the pilot test.
Next, the draft Malay PIDAQ was pilot tested on 7 school children aged 12-14 years and 15 school children aged 15-17 years from the orthodontic waiting list. The pilot test was conducted by SSS and SFMA under the observation of WNWH on separate sessions for each age group. The 22 school children were asked to complete the questionnaire. The time taken to complete the questionnaire was noted to test the practicality of the questionnaire administration under fieldwork condition. Following the test, the researchers held a discussion with the schoolchildren to assess their understanding of the questionnaire's instructions, content, answer options and wording. Words that were ambiguous were highlighted, discussed and replaced with other words with similar but clearer meanings. For example, it was found that the Malay translated verb for the word 'self-conscious' describing the item as Shy because of own teeth was hard for the participants to understand. The corresponding adult version for this item was compared and the translated Malay item was agreed to be acceptable and understood by the school children. Following the pilot test, a meeting was held among the researchers (SSS, SFMA, WNWH and ZYMM) to discuss the outcomes of the pilot test before changes to the draft Malay PIDAQ were agreed to.
The second step involved back translation of the draft Malay PIDAQ into English. This was independently carried out by the Malaysia Institute of Translation & Books. Then, the output of the back translation was thoroughly discussed (by WNWH and ZYMY), comparing the back translation with the original PIDAQ. After minor modifications, the back translation of the Malay PIDAQ was verified and the draft Malay PIDAQ was finalized.

Psychometric validation
Subsequently, the psychometric properties of the Malay PIDAQ were tested on 12 to 17-year-old non-randomly selected participants who had not been involved in the pilot study. Sample size calculation included the consideration for detecting mis-specified factor loadings comparing the two age-groups using A-priori Sample Size Calculator for Structural Equation Models [20]. Given 4 PIDAQ subscales with 23 items, the recommended sample size for each age group at a power level of 0.80 and a probability level of 0.05 for model structure was 166 [21,22]. This concurred with the rule-of-thumb guideline of 4 to 10 subjects per variable [23].
The questionnaire was self-administrated. Participants were classified into two age groups: 12-14 years old and 15-17 years old. For each age group, the participants completed the questionnaire either in a classroom setting or in the orthodontic clinic waiting area. Excluded were those having or having had orthodontic treatment and those with craniofacial deformities. The former were excluded to avoid confusion when the students were assessed using the Aesthetic Component of the Index of Orthodontic Treatment Need (IOTN-AC) while the latter were excluded to avoid the possibility that the psychosocial impact was due to deformity of the craniofacial features rather than due to dental aesthetics.
Participants indicated their agreement to each of the 23 items on a five-point Likert scale: not at all, a little, somewhat, strongly and very strongly. The responses were coded from 1 to 5 [10]. Scores for each subscale were tabulated by the sum of their items. Impact attributed to dental aesthetics was evaluated based on total PIDAQ scores, with the positive domain DSC values reverse scored. Thus, higher total scores would indicate a greater degree of negative psychosocial impact and a poorer OHRQoL accounted for by dental aesthetics [24].
Data was analyzed using IBM-SPSS-AMOS v.20 and IBM-SPSS-Statistics v.20. Chi-squared test and Fisher exact-test were used to compare equality of the proportions of the demographics between age groups while independent t-test was used to compare the total PIDAQ scores of the two age groups. Internal consistency was measured by confirmatory factor analysis (CFA), and Cronbach's α for each subscale. CFA was determined by calculating estimates of the maximum likelihood discrepancy. Goodness of fit of the observed data to the model was based on a comparative fit index (CFI) ≥ 0.90 and root-mean-square error of estimation (RMSEA) < 0.08 [25]. Multiple group comparison determined measurement invariance between the two age groups. Invariance was assessed by comparing the measurement weights and the structural covariance models with the baseline unconstrained configural model. In the measurement weights model, the factor loadings were constrained equally across groups while in the structural covariance model, all estimated factor loadings as well as factor variances and covariances were constrained to be equal across groups [25]. Measurement invariance was based on a difference in CFI values at a cut-off value of 0.01 (i.e. ΔCFI > 0.01 indicates non-invariance) [25,26]. Subscales with Cronbach's α of between 0.70 and 0.95 were considered to have good internal consistency [27].
The PIDAQ was developed to assess treatment need in patients requesting orthodontic treatment [6]; it measures orthodontic-specific quality of life outcomes [10]. Based on a previous study [10], the discriminant validity of the Malay PIDAQ was tested by comparing its relationship with self-rated and investigator-rated perceived need for orthodontic treatment: IOTN-AC and the awareness component of the Perception of Occlusion Scale (POS). The IOTN-AC was rated using a black and white photographic 10-point-scale showing teeth with increasing severity of malocclusion [28]. The POS component comprised 6 items of malocclusion traits [29], where participants were required to evaluate each item on a 5-point Likert scale from not at all to very strongly. The self-rated and investigator-rated Malocclusion Index (MI-S and MI-D, respectively) were adapted from the study by Klages et al. [10] for analysis of the severity of malocclusion where the scores of the IOTN-AC and total scores of the POS were standardized, summed up and divided by 2 to give an index value with a 0 mean value . Four investigators (WNWH, SSZS, SFMA and MZMM) calibrated the MI-D using 40 sets of study models, assessed twice two weeks apart. The inter-operator intra-class correlation (ICC) MI-D score at T1 was 0.96 (p = 0.00; 95% CI = 0.93 to 0.97). Intraoperator ICC MI-D scores were also excellent at above 0.75 (p = 0.00) [30]: the ICC scores were 0.95 (95% CI = 0.90 to 0.97), 0.85 (95% CI = 0.71 to 0.92), 0.91 (95% CI = 0.83 to 0.95) and 0.91 (95% CI = 0.83 to 0.95). Independent t-test was applied to compare the relationship between the PIDAQ subscales with the malocclusion index (MI-S/MI-D) scores of those with no or slight malocclusion (lower quartiles) and those with severe malocclusion (upper quartiles).
Construct validity of the Malay PIDAQ was tested by comparing the PIDAQ with other measures measuring related constructs, i.e. perceived dental appearance rank and need for braces. Ranking of perceived dental appearance required the participants to rate their dental appearance as excellent, good, average or poor. The participants were also asked if they needed braces to correct their teeth with response options of yes, no and don't know. Criterion validity was tested against the conditionspecific scores of the Child Oral Impacts on Daily Performances (CS-OIDP) index [31], which measures impact of teeth on daily activities. Malocclusions were accounted for Spaces between and Position of the teeth [32]. Although Deformity of the mouth and face was considered as a condition-specific item, none of the participants reported any impact from this item since the study excluded those with craniofacial deformities. The CS-OIDP performance score was tabulated by adding the product of the frequency (scale from 1 to 3) multiplied by the severity (scale from 1 to 3) of the impact in any of the 8 daily activities, i.e. cleaning teeth, eating, emotional stability, smiling, speaking, relaxing, doing schoolwork and socializing. Scores were tabulated only when the impact was due to malocclusion and recorded as 0 if there was no impact due to malocclusion on the 8 daily activities. The Kruskal-Wallis test was used to assess the relationship between PIDAQ scores and perceived dental appearance rank. The Mann-Whitney statistics were used to assess the relationship between PIDAQ scores and the need for braces, and between PIDAQ scores with those who reported the presence or absence of CS-OIDP. The Pearson correlation coefficient was calculated to assess the relationship between PIDAQ scores and CS-OIDP total performance scores on the eight daily activities.
The reproducibility of the Malay PIDAQ was tested by asking approximately 30% of the subjects to re-answer the questionnaire approximately 2 weeks later. The standard error of measurement (SEM) was calculated as the square root of the residual variance of the ANOVA analysis, and the smallest detectable change (SDC) was calculated as 1.96 x √2 x SEM [27,33]. Paired t-test was done to determine any significant change to the scores of the PIDAQ subscales between the first and the second test administrations. The limits of agreement was calculated as mean change ± 1.96 x standard deviation of the changes [34]. The ICC for absolute agreement by twoway random effects models was calculated, but a weighted Kappa coefficient was not determined since a weighted Kappa with quadratic weights was considered identical to ICC agreement [27].
Floor and ceiling effects within each subscale were calculated as the percentage of the achieved lowest and highest possible scores. Floor or ceiling effects were considered present if the prevalence of the highest or lowest possible scores was more than 15% [27].

Results
The participants comprised 319 and 271 subjects from the younger (12-14 years old) and older (15-17 years old) age groups, respectively. The proportions of the participants between the younger and older age groups were equally distributed and variation was not statistically significant (p > 0.05) by gender, source of recruitment, ethnicity and severity of malocclusion (Table 1).
Initial analysis showed that a relatively large proportion of the participants (14.2%; n = 84) chose not relevant for the item Don't like own teeth on video, which indicated that this item was relatively not relevant to their situation. Therefore, based on recommendation of handling items with large proportion of responses that could not relate the items to the participants [35] and following discussions among the authors, it was decided that the item Don't like own teeth on video was removed from the AC subscale of the PIDAQ. Thus, the subsequent psychometric analyses were based on the shortened version of the Malay PIDAQ that consisted of 22 instead of 23 items.
Histogram for the younger, older and overall age groups showed normal distribution of participants' total PIDAQ scores. Overall, the mean score was 58.0 (SD = 17.8; Range = 22-110). For the younger age group, the mean score was 58.3 (SD = 18.32; Range = 22-110) and for the older age group, the mean score was 57.4 (SD = 18.0; Range = 22-104). Independent ttest analysis showed that the difference between the mean PIDAQ scores of the two age groups was not statistically significant (p > 0.05).
The goodness-of-fit measures showed good data-fit: for both models A and B, the CFI scores were above 0.90 while the RMSEA scores were less than 0.08 with small confidence interval ( Table 2). The factor loadings of Model A and that constrained for the age groups (Model B) were within the acceptable range except for the item wish to look better which had factor loadings that were less than 0.50. The multi-group CFA test of the constrained models with the baseline configural model showed invariance across the age groups (ΔCFI < 0.01). The measurement weights model had a CFI value of 0.926 (ΔCFI = 0.002) while the structural covariance model had a CFI value of 0.921 (ΔCFI = 0.007). Table 3 shows the internal consistency of the subscales, scale statistics and inter-item correlations of the subscales. All subscales satisfactorily achieved Cronbach's α values of between 0.70 and 0.95. None of the inter-item correlations were ≥ 0.90 for all subscales or ≤0.30 for the DSC, SI and AC subscales. For the PI subscale, the items with inter-item correlations below 0.30 were: between Wish to look better and Distressed because of others' nice teeth (12-14 years = 0.25; all ages = 0.29), Unhappy about own teeth (12-14 years = 0.23; all ages = 0. 29) and Feel bad about own teeth (12-14 years = 0.27; all ages = 0.29). None of the item total correlations scores were < 0.30.  Table 4 shows there were statistically significant differences in Malay PIDAQ mean scores between adolescents who rated themselves (MI-S) with no or slight malocclusion and those with severe malocclusions for all subscales in all age groups (p < 0.01). Similarly, statistically significant differences were observed in mean scores between adolescents who were rated by the investigators (MI-D) with no or slight malocclusion and those with severe malocclusions for all subscales in all age groups (p < 0.01). In all the three age groups, comparison between MI-S and MI-D showed that DSC mean scores reduced with increasing severity of the malocclusion. In contrast, SI, PI and AC subscale mean scores increased with increasing severity of malocclusion.
Self-endorsement of the participants' dental appearance showed statistically significant associations between Malay PIDAQ subscales and self-perceived dental appearance (p < 0.01) for all age groups (Table 5). For the DSC subscale, the mean scores gradually decreased as participants rated their teeth from excellent to poor. The trend was statistically significant. Conversely, the trend in PIDAQ mean scores increased in SI, PI, and AC subscales for all age groups as participants rated their teeth from excellent to poor. The trend was statistically significant.
The associations between self-perceived need for braces and PIDAQ subscales for all age groups were statistically significant (Table 6). Those who perceived that they needed braces had significantly lower DSC subscale mean scores and significantly higher SI, PI and AC subscale mean scores than did those who perceived that they did not need braces. Table 7 shows the association between the presence of CS-OIDP impact and Malay PIDAQ subscales. Those with CS-OIDP impacts had significantly lower DSC subscale mean scores and significantly higher SI, PI and AC subscale mean scores than those without CS-OIDP impacts. The trend was statistically significant for all age groups (p < 0.01).
Analysis using Pearson correlation coefficient showed that the CS-OIDP performance scores had a weak statistically significant negative correlation with DSC subscale mean scores for all age groups (Table 7). Conversely, the CS-OIDP performance scores showed a weak to moderate statistically significant positive correlation with SI, PI and AC subscale scores for all age groups. Table 8 shows the findings of the Malay PIDAQ reproducibility test. The ICC values were above 0.70 (p < 0.05) for all subscales in all age groups. No statistically significant differences were found between the first and second test administration except for the PI subscale in the younger age group. The smallest detectable change (SDC) seemed to be higher in the younger age group for all subscales. Bland and Altman analysis showed that more than 90% of the scores of repeated measurements were within the limits of agreement except for AC subscale for the younger age group at 78.6%. None of floor and ceiling effects for all subscales were above the cut-off value of 15% in all age groups except for the AC subscale for the younger age group and overall age group (Table 9). AC subscale had the highest prevalence of the floor effects of between 14.8%-16.3% in all age groups.

Discussion
The cross-cultural adaptation of the Malay PIDAQ based its protocols on those of previous studies [9,10,19] and on OHRQoL expert advice and recommendations [27,36]. Herdman et al. [36] outlined 6 steps for crosscultural adaptation of an index, which are conceptual, item, semantic, operational, measurement and functional equivalences. In brief, conceptual equivalence ascertains that the answers to each question reflect the same concept so that they are meaningful in both cultures and languages concerned. Item equivalence is concerned that the item estimates the same parameters on the latent trait being measured. Semantic equivalence establishes that the meaning of the item is equally maintained after translation. Operational equivalence is concerned with the possibility to use similar format, instructions, mode of administration and measuring methods, while measurement equivalence refers to the achievement of acceptable similar psychometric properties. Functional equivalence is the extent to which the instrument does what it is supposed to do equally well in two or more cultures and is considered achieved when all other types of equivalence in the model have been achieved.
The sample age range included the expected age of the 5-year-period of secondary school children in Malaysia from Form 1 (12-13 years old) to Form 5 (16-17 years old). The age range was similar to those commonly seen to request for orthodontic treatment at the Malaysia Ministry of Health orthodontic clinics. Unless early interceptive treatment is required, referrals to orthodontic specialists by dental officers will be done only when all permanent teeth have erupted, which is usually by around 12 years of age. At these government-sponsored institutions, treatment is offered free of charge to children whose parents are in the government service or at very minimal fees to those whose parents don't work for the government. The considerably lower cost for treatment at these clinics compared with that in private practices is a major reason for the high treatment demand at these institutions. Due to limited resources, treatment at these institutions is offered only to schoolchildren and adults who require multidisciplinary management. Therefore, this study focuses on this adolescent age group where evidence of impact on their OHRQoL may be of interest to national health policy makers in determining services offered to them. Klages et al. [10] has advocated controlling the quality of the test for the validity of the PIDAQ for adolescents by narrowing the age range of the study age groups. Division of the age range for this study was influenced by the Ministry of Education's policy that does not permit data collection from students taking national examinations, that is, the Form 3 and Form 5       classes, which are usually composed of 15-and 17-yearold schoolchildren, respectively. Thus participants from schools were mainly recruited from the Forms 1 and 2 classes who represented the younger age group and from Form 4 students who represented the older age group, while participants from orthodontic clinics included all target ages who attended to request orthodontic treatment. Conceptual, item and semantic equivalences of the Malay PIDAQ were established through discussions and advice by the expert in OHRQoL throughout the linguistic validation process. Conceptual equivalence was based on the literature of the impact of malocclusion on the OHRQoL of Malaysian youths based on studies that used the OHIP-14 [37] and CS-OIDP [38] and on discussions between experts and a sample of the target population. In addition to information gained from the local professional literature, discussions with orthodontists and participants in the pilot study provided appropriate conceptual evidence of the impact of dental aesthetics on the local population. Item equivalence was established by identifying that all items were relevant to the population apart from one item (Don't like own teeth on video) which was initially maintained in the scale but the response format was amended. Semantic equivalence was determined initially by forward translation to identify words that were difficult to translate or to be understood by the schoolchildren such as self-conscious and upset. In such situations, the original and adolescent CS-OIDP condition-specific child oral impacts on daily performances, SD standard deviation *p < 0.05 versions were compared, and choice of the most appropriate words was made by discussions and consensus with experts and confirmed by pilot testing. This process was followed by back translation by an independent professional translator not involved in the study to avoid bias. Operational equivalence in terms of questionnaire format, response options and method of administration was established. Pilot testing showed the format to be acceptable to the participants. In terms of measurement methods, as expected due to limited access to video recording devices among schoolchildren, the item Don't like own teeth on video was not universally relevant. Participants in the pilot test and a relatively large proportion of those from the psychometric validation (14.2%) confirmed that this item was irrelevant to them. Although modification to the response format for this item by adding the answer option not relevant for the Malay PIDAQ was initially considered to maintain the item in the index, having large proportion of not relevant responses may have implications to future studies. Such responses may not be a concern in longitudinal studies that compare within-subject change but may be an issue in studies that compare differences between groups [35]. Jokovic et al. [35] suggested 3 methods to handle such situations: (1) to exclude cases with such responses. Excluding cases with such responses may result in loss of valuable information from participants who did not have opportunity related to the activity of only 1 item of the instrument; (2) to modify the scores. Modifying scores of not relevant responses using varying methods of imputation can affect the precision and accuracy of an instrument; or (3) to remove the item. Jokovic et al. [35] suggested that an item could be removed from the instrument that has a large proportion of responses that were not applicable to the participant. The percentage was not specified although another study have recommended that items with more than 10% responses in such category may not be suitable to  be included in an item bank [39]. Thus, after discussion among the authors, it was decided to remove the item from the Malay PIDAQ. In terms of response options, generally, the 5-point Likert scale was a suitable method to elicit response since the entire spectrum of responses was used in responding to all items with varying frequency. Other modes of administration apart from selfadministered questionnaire were limited to the pilot test, where each item was discussed one by one to determine their understanding of each item. In designing this study, the questionnaire was intended to be used in a large linguistically mixed sample population study or for any young Malaysian participants to answer in the waiting room in a busy orthodontic clinic since the literacy rate for basic Malay is high (95.2%) among secondary school children in Malaysia [40]. Telephone surveys were considered impractical since telephones or mobile phones are considered a luxury item and ownership of them is limited among adolescents in Malaysia. In addition, faceto-face interviews and mail correspondence by Malaysians have shown poor response rate (56.8% and 48.1%) [8].
Measurement equivalence was assessed through psychometric validation by tests of reliability and internal consistency, test of reproducibility and tests of construct validity [36] and by comparing the results with previous study on German adolescents population [10]. Test of responsiveness was not done and is recommended for longitudinal validation of this measure.
The multidimensional construct structure of the Malay PIDAQ for Malaysian adolescents was supported by good data-fit results and was invariant across the age groups. Measurement invariance across age groups was based on the ΔCFI rather than the commonly used Δχ 2 since ΔCFI as a test of invariance is not affected by sample size unlike Δχ 2 [26]. This study also did not use the fit-statistics like a previous study that used a combination of ΔCFI and ΔRMSEA [10] because the sample size did not fulfil the criteria for this fit-statistics which requires more than 300 subjects per group and equal in numbers across groups [41]. The Cronbach's α values were satisfactorily within the recommended criteria of between 0.70 and 0.95 [27] for all subscales and were generally slightly higher than those of the previous study [10].
Discriminant validity by unpaired t-test of the Malay PIDAQ scores of participants with self-rated malocclusion (MI-S) and investigator-rated malocclusion (MI-D) showed statistically significant differences between those with no or slight malocclusion and those with severe malocclusion for all subscales, regardless of age. Those with severe malocclusion had lower DSC subscale mean scores and higher SI, PI and AC subscale mean scores than did those with slight malocclusion, which was reflected in the positive and negative signs of the effect sizes. This concurred with the study by Klages et al. [10]. Similar to the previous study [10], the strength of the effect sizes based on the MI-S were strongly above 0.80, but the effect sizes based on the MI-D were between medium (0.50) and strong (≥0.80) for each subscale.
Construct validity of the Malay PIDAQ was further assessed against ranking of perceived dental attractiveness and self-assessed need for dental correction while criterion validity was assessed against the impact of malocclusion on daily activities as assessed using the CS-OIDP with the following rationale. Since PIDAQ is concerned with the impact of dental attractiveness on the OHRQoL, it was reasonable to test its properties against participants' self-perceived dental attractiveness. Those reporting an impact would likely also feel that they needed to have their teeth corrected, and if the impact affects their OHRQoL, the impact that is caused by malocclusion would most likely affect their daily activities. The results of the study concurred with these expectations. All subscales indeed showed statistically significant associations with perceived dental attractiveness rank, self-assessed need for dental correction and CS-OIDP, regardless of age. Those who felt they had excellent dental attractiveness had higher DSC subscale mean scores and lower SI, PI and AC subscale mean scores compared with those with lower self-rated dental attractiveness. Those who felt that they needed braces had lower DSC subscale mean scores and higher SI, PI and AC subscale mean scores compared with those who felt they did not need braces. Those with presence of CS-OIDP attributed to malocclusion also had lower DSC subscale mean scores and higher SI, PI and AC subscale mean scores than did those without CS-OIDP. This was further supported by a statistically significant negative correlation of the DSC subscale and positive correlation of the SI, PI and AC subscales with the performance scores of the CS-OIDP.
In terms of test-retest reliability, the ICC scores for all subscales were generally slightly lower, between 0.71 to 0.89, than those in the previous study, which were between 0.82 to 0.96 [10], but were all above the recommended minimum standard of 0.70 for reliability [27]. A statistically significant difference was detected in the PI subscale of the younger age group, but the difference was below the SDC score. In the study by Klages et al. [10], statistical significance was detected in the AC subscale of their younger age groups, but the differences were also below the SDC scores. The SDC reflects the smallest within-person change in score that can be interpreted as a real change provided that the difference is significant [27]. In this study, the SDC scores were higher than in the study by Klages et al. [10], indicating that higher score differences were needed in this population before the scores can be interpreted as true changes.
Presence of floor or ceiling effects reduces reliability since those with lowest and highest scores could not be distinguished from each other. Furthermore, the presence of these effects limits content validity since items in the extreme lower or upper part of the scale may be missing [27]. This study found no floor effects except in the AC subscale and no ceiling effects, since the prevalence of the lowest and highest possible scores were satisfactorily below the recommended maximum frequency of 15% [27]. The floor effects in the AC subscale were just above this value at 15.6% (12-17 years old age group) and 16.3% (12-14 years old age group). In Klages et al. [10], floor effects were present for the SI, PI and AC subscales. This demonstrates that the items of this instrument were sufficient to distinguish Malaysian adolescents with impact on their dental aesthetics at the lower and upper ends of the spectrum.

Conclusion
Overall, the Malay PIDAQ has satisfactorily achieved the conceptual, item, semantic, operational and measurement equivalences similar to the original PIDAQ. While small modifications in the scale were required for this population, the Malay PIDAQ showed adequate validity and reliability to be used to assess the impact of malocclusion on the OHRQoL of Malaysian adolescents aged 12-17 years.