Utility of Clinical Tests to Diagnose MRI-Confirmed Gluteal Tendinopathy in Patients Presenting With Lateral Hip Pain
Abstract and Introduction
Purpose Gluteal tendinopathy (GT) is a source of lateral hip pain, yet common clinical diagnostic tests have limited validity. Patients with GT are often misdiagnosed, resulting in inappropriate management, including surgery. This study determined the diagnostic utility of clinical tests for GT, using MRI as the reference standard.
Methods 65 participants with lateral hip pain were examined to evaluate the ability of clinical tests to detect MRI-determined GT (an increase in intratendinous signal intensity on T2-weighted images). Palpation of the greater trochanter and several clinical pain provocation tests applying compressive and tensile loads on the gluteal tendons were investigated. MRI of the painful hip was examined by a radiologist, blind to clinical findings.
Results Pain reported within 30 s of standing on the affected limb conclusively moves a (nominal) 50% pretest probability of GT presence on MRI to a post-test probability of 98% (specificity 100%, positive likelihood ratio ~12), whereas no pain on palpation (80% sensitivity) would rule out its presence. 20 participants (31%) had GT on MRI but clinically negative (ie, not positive on palpation and another test).
Conclusions Keeping in mind that the sample size was small (ie, possibly underpowered for indices of diagnostic utility with low precision), the results of this study indicate that a patient who reports lateral hip pain within 30 s of single-leg-standing is very likely to have GT. Patients with lateral hip pain who are not palpably tender over the greater trochanter are unlikely to have MRI-detected GT.
Lateral hip pain can present a diagnostic dilemma for healthcare professionals as soft tissue pathologies around the greater trochanter may coexist with or mimic other conditions. Surgical, histological and imaging studies have demonstrated that pain and local tenderness over the greater trochanter can most commonly be attributed to tendinopathy of the gluteus medius (GMed) and/or minimus (GMin) tendons (gluteal tendinopathy (GT)), with or without bursal distension.[2–4] Lateral hip pain can be a challenge to manage if it is misdiagnosed clinically as lumbar spine referral[5–7] or hip joint osteoarthritis, resulting in delayed or inappropriate intervention.
A meta-analysis (14 studies on diagnostic tests for hip conditions) pooled data from 3 large studies assessing the accuracy of 4 clinical tests used to diagnose GT, and reported that no single clinical test could convincingly predict findings from MRI. The test with most promise was the resisted external de-rotation test from a hip flexed and externally rotated position. This test is designed to stress the gluteal tendons by longitudinal (tensile) and compressive loads. Limitations of previous studies assessing diagnostic utility of clinical tests were identified in Reiman et al‘s systematic review. The primary issues were low participant numbers and the use of healthy controls without hip pain, who would not ordinarily present to a clinic or be considered for differential diagnosis of hip pain. The aim of this study was to determine the diagnostic utility of clinical tests in determining whether those presenting with lateral hip pain have evidence of GT on MRI.
We conducted a diagnostic utility study of people with lateral hip pain in which practitioners who were not aware of each other’s findings performed the clinical and MRI examinations.
Participants were recruited through advertisements (local print and social media) from Brisbane and Melbourne metropolitan regions. A preliminary phone screening was undertaken to confirm the presence of lateral hip pain and eligibility for the study. Participants were included if they were aged between 35 and 70 years, and had been experiencing pain in the lateral hip area, of an intensity of ≥4/10 on an 11-point Numeric Rating Scale on most days within at least the past 3 months. Potential participants were excluded if they had received a corticosteroid injection in the hip region in the past 12 months, had any hip or lower back surgery, had any known neurological disorders, any other relevant medical conditions (eg, fibromyalgia) or any contraindications to MRI (eg, cardiac pacemaker, metal implants). Eligible participants were then booked in for clinical examination at a university clinical research laboratory and MRI examination at a radiology practice within 2 weeks of each other. The Institutional Medical Research Ethics Committee approved the study and all participants gave written informed consent.
A physiotherapist performed a series of standardised tests during a clinical examination for GT. Diagnostic clinical tests were employed to apply loads/stresses that are pathognomic of GT. Tensile (longitudinal) and compressive (transverse) loads have been implicated in the development of tendinopathy,[10–13] which in the gluteal tendons involves positions of hip adduction and contraction of the GMed and GMin muscles.[10,14] The following six clinical tests employing either or both tensile and compressive loading were performed sequentially as described below, and rated positive if pain was reproduced in the region of the greater trochanter to a score of >2 on an 11-point Numeric Rating Scale (with 0=no pain, 10=most pain imaginable):
- Hip Flexion, Adduction, External Rotation (FADER): with the participant lying supine, the hip was passively flexed to 90°, adducted and externally rotated to end of range (figure 1). This test seeks to increase both tensile and compressive load on the GMed and GMin tendons at the greater trochanter through positioning of the hip and the overlying iliotibial band.
- Hip FADER with resisted isometric internal rotation at end of range (FADER-R): in the FADER position, the participant performed resisted isometric internal rotation against a force applied by the physiotherapist (figure 1). The test is designed to increase tensile and compressive loading of the gluteal tendons through active contraction of the GMed and GMin muscles, which in this position are internal rotators. This is a modification of the resisted external de-rotation test described by Lequesne et al and reported by Reiman et al to have pooled sensitivity of 71% (95% CI 51% to 87%) and specificity of 84% (71% to 93; 2 studies; n=79). Hip adduction was added to the flexion/external rotation position to add further tensile loading on the tendons, as well as increasing the compressive loading element exerted by the overlying iliotibial band at the greater trochanter.
- Hip Flexion, Abduction, External Rotation (FABER): the lateral malleolus of the test leg was placed above the patella of the contralateral leg, the pelvis stabilised via the opposite anterior superior iliac spine and the knee passively lowered so the hip moves into abduction and external rotation. This test is expected to place the anterior portions of the GMed and GMin on tensile load due to the fact that these portions of the muscles have an internal rotation function. Lateral hip pain with a FABER test has been shown to have high sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV; 82.9%, 90%, 94.4% and 72%, respectively) for differentiating between greater trochanteric pain syndrome and hip osteoarthritis.
- Passive Hip Adduction in Side Lying (ADD): the participant was placed diagonally across the bed in side-lying, with the underneath hip and knee flexed 80–90°, and the uppermost leg supported by the examiner with the knee extended, in neutral rotation, and the femur in line with the trunk (0° hip extension). The anterior superior iliac spines were maintained perpendicular to the examination table. The examiner passively moved the hip through a pure frontal plane motion into end range hip adduction with overpressure, while stabilising the pelvis with the other hand (figure 2). This test places the lateral insertions of the gluteal tendons under tensile and compressive load.
- ADD with resisted isometric abduction (ADD-R): in the ADD position, the participant was asked to push the thigh up, against the resistance of the examiner’s hand at the lateral knee. This adds an active tension component to the passive tensile and compressive loads imposed on the tendons of GMed and GMin at end range hip adduction.
- Single Leg Stance (SLS): the participant stood side-on to a wall with the affected limb furthest from the wall. A finger of the unaffected side touched the wall at shoulder height for balance. The foot nearest the wall was then raised so that the hip remained in neutral with the knee flexed to 90°. This SLS position was then maintained for up to 30 s. A positive test is reproduction of the patient’s lateral hip pain within this 30 s period. The SLS test has been shown to discriminate participants with MRI-diagnosed GT/trochanteric bursitis from those with normal non-symptomatic hips (100% sensitivity and 97.3% specificity).
- Palpation: the participant was positioned in side-lying with the symptomatic side uppermost, hips flexed ~60°and knees together. The lateral hip region was systematically palpated for tenderness as per Falvey et al. A positive test was pain with palpation of the GMed and/or GMin tendon insertions (anterior, lateral or posterosuperior facets of the greater trochanter).
FADER and FADER-R test position. FADER, Flexion, Adduction, External Rotation; FADER-R, Flexion, Adduction, External Rotation with resisted isometric hip internal rotation.
ADD and ADD-R test position. ADD, adduction; ADD-R, adduction with resisted isometric hip abduction.
Participants underwent an MRI examination of their gluteal tendons and hip at a participating radiology clinic. The MRI images were acquired on a MAGNETOM Espree 1.5 T scanner (Siemens AG), and the imaging protocol included the following sequences: axial PD fat sat (TR 3130, TE 23, 23×3.5 mm slices, Base res 384); coronal PD fat sat (TR 3050, TE 42, 19×3.5 mm slices, Base res 384); coronal T1 (TR 459, TE 12, 19×3.5 mm slices, Base res 384); sagittal PD fat sat (TR 3560, TE 39, 23×3.5 mm slices, Base res 384); sagittal PD (TR 2470, TE 31, 23×3.5 mm slices, Base res 384), all with an 0.7 mm×0.5 mm matrix.
The MRIs were examined by an experienced radiologist who was blind to the clinical examination findings. The radiologist looked for the presence of GT, bursitis, osteoarthritis, labral pathology or any other observed pathology. In particular, each participant’s images were evaluated for any area of T2 hyperintensity (representing oedema or fluid) around the greater trochanter. If T2 hyperintensity was present, the size, shape and location of the hyperintensity was described. Intratendinous T2 hyperintensity was required to fulfil the diagnostic criteria for GT. Partial or full thickness tears of the GMed and GMin tendons were also identified when present. A partial thickness tear was diagnosed if the tendon was irregular, thinned or focally discontinuous on T1-weighted images, with hyperintensity in the corresponding area on T2-weighted images. A complete tear was diagnosed when discontinuity and/or retraction of the torn tendon was seen on T1-weighted images, with a marked increase in signal on T2-weighted images.
A positive clinical diagnosis of GT from the clinical examination was defined as positive to palpation over the greater trochanter, plus at least one other positive result from the remaining clinical tests described above. From the MRI reports, the presence of GMed or GMin tendinopathy was defined as an intratendinous increase in signal intensity on T2-weighted images, based on the classification system from a previous study.
Statistical analysis was performed using IBM SPSS Statistics, V.22 (SPSS, Chicago, Illinois, USA). In order to be able to determine the diagnostic utility of the clinical tests (defined as the accuracy with which GT can be detected on clinical examination), we compared the physical examination results with the MRI diagnosis. A series of 2×2 contingency tables were generated, using the MRI results (positive or negative for GT) as the reference standard. Sensitivity (the proportion of true positives that are correctly identified by a diagnostic test, ie, how good the test is at detecting the condition), specificity (the proportion of true negatives correctly identified by a diagnostic test, ie, how good the test is at ruling out a condition that is not present), PPV (probability that the disease is present when the test is positive), NPV (probability that the disease is not present when the test is negative) and their 95% CIs were calculated for each of the clinical tests as well as for the positive clinical diagnosis (ie, palpation plus at least one other positive test). Accuracy (the proportion of true results either positive or negative) was also calculated. While these are indicators of diagnostic utility, they do not provide an indication of the shift in probability of the condition being present should the test be positive or negative. Likelihood ratios (LRs) combine sensitivity and specificity to provide such an indicator (point estimate of effect). LR can be used to calculate post-test probabilities of a condition being present (ie, the probability that the condition is present after the test is applied), based on pretest probabilities (the probability that the condition is present based on information the clinician has at hand before applying the test). A clinical example will be used to illustrate the clinical application of findings from this study. Since LR values can range from 0 to infinity, almost conclusive shifts in post-test probability of the condition being present or not are likely to occur for positive LR (LR+) ≥10 or negative LR (LR−) ≤0.1, respectively. Statistical significance (p<0.05) will be achieved when 95% CIs do not contain 1.
Sixty-five participants (45 (69%) females) with a mean age of 54.9±9 years were recruited for this study (Table 1). Figure 3 outlines the flow of participants throughout the study. All participants completed both clinical and MRI assessments without adverse effect.
Participant flow chart. CSI, corticosteroid injection; DNA, did not attend; FP, false positive; FN, false negative; LBP, low back pain; LHP, lateral hip pain; TN, true negative; TP, true positive; UTA, unable to attend; UTC, unable to contact.
Thirty-four participants (52%) had a positive clinical diagnosis (ie, positive palpation plus ≥1 other clinical test), whereas 50 (77%) of the MRI reports identified the presence of GT. Of the clinical diagnoses, there were 30 (46%) true positives, 11 (16.9%) true negatives, 4 (6%) false positives and 20 (30.7%) false negatives (Table 2). The clinical examination had moderate sensitivity and specificity (60% and 73%, respectively), a high PPV (88%) and a reasonable accuracy rate of 63%.
Palpation over the greater trochanter was the primary clinical diagnostic criterion and it had the highest sensitivity (80%) and accuracy rate (72%), but the lowest specificity (47%). Of the 65 participants, there were 40 true-positive palpation tests (positive on palpation and GT present on MRI) and only 8 false-positive diagnoses (ie, 8 tested positive on palpation, yet GT was not present on MRI). Palpation had the lowest negative LR (0.43), indicating that if palpation was negative on clinical testing, it was the single best of all the individual tests used in this study for ruling out the presence of GT.
In contrast to palpation, each of the other individual clinical examination tests showed high specificity (SLS 100% to FABER 80%). The SLS test had a specificity of 100%, sensitivity of 38% and PPV of 100%, indicating that if the test was positive there was an extremely high likelihood that GT was present on MRI. That is, all participants who tested positive on SLS had GT on MRI, but there were 31 participants who were negative on the test who had GT identified on MRI.
The FADER and ADD tests both had high specificity (86.67%) and high PPV (88.24% and 83.33%, respectively). Of particular interest is that when resisted muscle contraction is added to each of these tests (ie, FADER-R and ADD-R), the sensitivity, specificity, PPV and NPV of both of these tests improved. The positive LR of the FADER test (2.25) increased to 6.6 for the FADER-R test, and the positive LR of the ADD test (1.5) increased to 5.7 for the ADD-R test. This tends to suggest that if manual resistance is added to these tests, which effectively adds an active tension component (muscle tension) to the passive tension and compressive component of each test (positioning of the tendon), then diagnostic accuracy is increased substantially.
All clinical tests with the exception of palpation had a high number of false-negative tests (28 (43%) to 40 (62%)), which far outnumbers any of the other decisions (0–34% for false and true positive and true negative). That is, the MRI has revealed the presence of GT on imaging, yet the clinical test did not detect this.
This study provides guidance for the clinician regarding which tests might be most useful in diagnosing GT as observed on MRI in a population presenting with lateral hip pain. The clinical tests chosen for this study were ones that have been considered to be positive for GT, so it is not that surprising that the PPV were high and the NPV low (Table 2). The data indicate that combining palpation with its high sensitivity and accuracy with any of the other individual clinical examination tests, which have higher specificity and PPVs, might be best to identify those patients who have GT on MRI. Of all the physical tests, SLS when positive (ie, pain within 30 s of standing on one limb) is the one that significantly increases the probability that there is GT on MRI. As is the case with many other conditions, MRI detects substantially more abnormalities (in this case tendinopathy) than is identifiable on clinical examination. To make a diagnosis of symptomatic GT, clinicians should rely on MRI, as well as include palpation of the greater trochanter and at least one positive physical test of the clinical examination studied here.
Our data infer that if palpation is positive, there is a high likelihood of MRI-identified GT (ie, PPV of 83%), and if it is negative the patient is unlikely to have a positive MRI because negative results on tests with high sensitivity (ie, 80% sensitivity for palpation) are useful in ruling out pathology. This was supported by an LR− of 0.43, which signifies that the odds of GT being present are more than halved when palpation is pain-free. While palpation had the highest combined true positives and negatives (72%), it had the highest number of positive results when there was no GT on MRI and the lowest specificity (46.67%). Even in Lequesne et al‘s pain-free control group, specificity of palpation was only 66%, reflecting a significant number of asymptomatic yet tender greater trochanters. This reinforces that pairing palpation with one of the other tests with greater specificity will have greater utility in identifying those with GT on MRI compared with palpation alone.
Of all the physical tests studied, those that involve an active muscle contraction component, such as FADER-R, ADD-R and SLS, appear more useful than FADER, FABER and ADD for identifying GT on MRI. The FADER-R and ADD-R tests use active muscle contraction to intensify the tensile loading component from positions in which the gluteal tendons have been exposed to maximal compression and passive tension. Comparisons with previous studies find our results in agreement with Lequesne et al that when positive, both SLS and resisted internal rotation in a position of relative tendon compression (resisted external de-rotation test and FADER-R) are useful tests in the diagnosis of GT. Specificity of these tests was high for both studies, but the true negatives returned by the current study were gathered from a population with lateral hip pain rather than the completely pain-free controls of the previous study. This new finding provides greater confidence in the specificity of such tests. Regarding sensitivity, our findings are in contrast with those of Lequesne et al who reported high sensitivity for SLS (100%) and the resisted external de-rotation test (88%). This might have been due to the use of a no pain control group being used as the comparison and the GT group being included on the basis of a positive palpation of the greater trochanter and hip external rotation at 90° of hip flexion.
Trendelenburg-like SLS assessments that test the ability of the hip abductor muscles to control pelvic posture rather than testing pain provocation have been shown to be specific but not sensitive to MRI-detected GT and reasonably sensitive and specific for detection of gluteal tendon tears (72.7%, 76.9%, respectively). We found SLS to have high specificity but low sensitivity, possibly indicating that observation of postural control of the pelvis during SLS is not required in detecting GT on MRI. In contrast, identification of gluteal tendon tears on MRI might require both pain reproduction and observation of postural control on SLS.
Stronger conclusions may have been possible with higher participant numbers, even though with 65 participants this is the largest study of its kind until now. Increasing sample size in future studies might reduce the CIs around the indices of diagnostic utility, thereby providing greater confidence in conclusions from this study. The other limitation within the study design was the limitation inherent in using MRI as a reference standard, as MRI findings tend to be poorly correlated with symptomatic presentation. Blankenbaker et al reported that all patients in their cohort with trochanteric pain and 88% of patients without trochanteric pain had peritrochanteric abnormalities on MRI. Woodley et al also discussed the limitations of MRI as a diagnostic tool for GT. The fact that the presence of GT on MRI does not always correlate with the presence of significant pain and dysfunction is also supported somewhat by the large proportion of false-negative findings in this study, both for the clinical examination as well as each individual test. With respect to clinical implications, the low sensitivity and accuracy of many of our clinical tests reflects that MRI detects the presence of pathology within the gluteal tendons but does not determine well whether this pathology is a source of pain/symptoms. Forming a diagnosis of GT would optimally require considering both clinical examination and imaging findings.
To improve the relevancy of our data to clinical practice, we present an example with reference to figure 4. Envisage a middle-aged overweight female who presents with lateral hip pain. The possible source of this lateral hip pain could be from GT, but it could also be due to hip joint, pelvic or lumbar spine pathology. Let us assume that the probability of this lateral hip pain being associated with GT is 50%. If the clinician applied the tests studied here, when positive some might provide an increased probability of GT being present while when negative they would indicate a lower probability of GT. For example, figure 4 shows that FADER-R, when positive, increases the probability of the patient having MRI identified GT by ~37%, indicating that this test is helpful in confirming a diagnosis of GT. In contrast, when the test is negative, it does not rule out the likelihood of GT being present as the shift in probability is much lower at 12%. So a test might well be only good for helping to confirm a provisional diagnosis and not to refute it. This probably underpins the tendency of clinicians to use several tests, both those which confirm or refute a diagnosis, in coming to a more definitive working diagnosis. In this regard, envisage that if FADER-R was not positive; then palpation might be helpful for the clinician to rule out the presence of GT, because when palpation is negative it reduces the likelihood of GT by 20%.
Interpreting LRs of some of the studied tests and their implications: an example with 50% pretest probability. *Clinical diagnosis is palpation plus at least one of the other clinical tests. #FADER-R: while lower bound CI is just inclusive of 1 and thus not statistically significant, is used here as an example as the 95% cut-off is an arbitrary convention. ^LR+ is an estimate, which was derived by adding 0.5 to each cell in the contingency table, so as to provide a denominator other than 0 and enable an LR+ to be estimated for the purpose of this example.21 ~Post-test probability is derived from: post-test odds, pretest odds×LR. FADER-R, Flexion, Adduction, External Rotation with resisted isometric hip internal rotation; GT, gluteal tendinopathy; LR, likelihood ratio; LR+, positive likelihood ratio; SLS, Single Leg Stance.
An important consideration for the clinician in applying the LR findings from this study in practice is the notion that as further information becomes available during the clinical examination, the probability of a diagnosis being present or not varies. In the preceding example, we assumed that the (pretest) probability that the clinician assumes/determines on current knowledge of the patient before applying the test was 50%. A salient matter to understand is that the amount to which the probability changes when applying an LR (either positive or negative) is dependent on the assumed/predicted pretest probability. For example, if the pretest probability of MRI identified tendinopathy was 30% rather than 50%, then a positive FADER-R would shift the post-test probability by 44% (30% to 74%, compared with 50% to 87%). If the pretest probability was higher at 75%, then the shift in probability would be less (20%), but by that stage of the clinical examination there would be a 95% probability that MRI identified tendinopathy was present. This continuous nature of LR and their implication on shifting probabilities in a condition being present or not can be graphically illustrated on a nomogram (figure 5) which could also be used in clinic.
Example of the use of a nomogram to predict post-test probability from pretest probability and likelihood ratio;24 for the FADER-R test, where there is a positive likelihood ratio of 6.6, the post-test probability would be 75% (line), 87% (dashed line) and 95% (dot-dashed line) for pretest probability of 30%, 50% and 75%, respectively. FADER-R, Flexion, Adduction, External Rotation with resisted isometric hip internal rotation.
Somewhat surprisingly we found that the combination of palpation and any one of the other tests, defined as the clinical diagnosis test in this study, at best only shifts the likelihood of the condition being present or not by about 15–20%. For the purposes of this study, we a priori defined the clinical diagnosis, which might not be most optimal in terms of clinical relevance. Another a priori decision that needs to be considered when interpreting the data from this study is the fixed order of testing, which might have influenced pain provocation. Future studies should be designed and powered to develop clinical prediction rules that elucidate the best combination of a range of clinical tests and their order of testing in a clinical examination.
In conclusion, for the clinician assessing a patient with lateral hip pain, the combination of the most specific tests involving compression and active tension (SLS, FADER-R, ADD-R) and the most sensitive test, palpation, is likely to have greatest diagnostic utility. If a patient is not tender on palpation over the greater trochanter, it is highly advisable to consider other pathologies in the differential diagnosis of a patient with lateral hip pain. Furthermore, the use of a number of tests is advisable because a critical point stemming from this study is that the indices of diagnostic utility are at best modest and lack the conventionally accepted precision (ie, LR+ CI contain 1). In the event that there exists a positive finding of GT on MRI, a positive palpation with confirmation on at least one of the active clinical tests described should be considered necessary prior to the clinician diagnosing GT as the source of the patients’ lateral hip pain.