The NICE Low Back Pain Guidelines: A Big Misunderstanding

Mel Koppelman

Mel Koppelman

Mel Hopper Koppelman is the Executive Vice President of the Acupuncture Now Foundation. She is an acupuncturist and Functional Medicine practitioner based in Leicester, UK. She received her BSc from NYU in 2005, her MSc from the Northern College of Acupuncture, York in 2012 and a second MSc from the University of Western States in Nutrition and Functional Medicine in 2015. She is a reviewer for a number of peer-reviewed medical journals.
Mel Koppelman

This is a syndicated post, which originally appeared at A Better Way To HealthView original post.

On 24 March, the National Institute for Health and Clinical Excellence (NICE), a UK based organisation tasked with providing evidence-based clinical guidelines for the National Health Service, released a draft of its updated guidelines on the management and treatment of low back pain with and without sciatica. The biggest news was that acupuncture, recommended in the previous version based on overwhelming evidence, was no longer being recommended, apparently due to poor performance compared to minimal/sham acupuncture, according to the guideline developers.

This news has been unwelcome to many, with thousands of acupuncturists, patients and GPs expressing their displeasure with the new recommendations. But having had the opportunity to familiarise myself with the recommendations laid out in a series of meaty PDFs and the research on which they are presumably based, I think it’s fair to say that this is probably all just a huge misunderstanding. Below I lay out some of the points on which the guideline development group (GDG) seem to be confused and hopefully shed some light that will lead them to a more reasonable set of conclusions.

Point of Confusion 1: They Didn’t Understand the Research Question

The GDG were asked to find out if acupuncture was clinically effective for the treatment of low back pain with or without sciatica. After assessing the published research, they found that acupuncture demonstrated “clinically important benefits in terms of improvements in quality of life . . . Benefit was also observed in pain and function ≤4 months, identified from a large body of evidence.” P494

Simply put, they found that the answer to the question “is acupuncture effective for treating low back pain?” based on a large body of research was a resounding “yes.”

And yet, they seem to have gotten confused and recommended against it. Why on earth did they do that?

They write: “The GDG first discussed the necessity of a body of evidence to show specific intervention effects, that is, over and above any contextual or placebo effects. It was therefore agreed that if placebo-controlled evidence (or sham acupuncture) is available, this should inform decision making in preference to contextual effects, but that the effect sizes compared with usual care would be important to consider if effectiveness relative to placebo, or sham, has been demonstrated.” 1

But was that what they were supposed to do?

Effectiveness vs Efficacy

It is really easy to get confused about the difference between effectiveness and efficacy. For a start, both words begin with the letter “e” and both have four syllables. They also both have to do with measuring whether or not a treatment “works.” But understanding the difference isn’t just important for impressing your mates during a pub quiz.

Essentially, when we ask “which treatment is more effective?” we’re asking the question that we ask when we have a real life clinical issue like low back pain. Of all the treatments on offer, which option is most likely to help me get better with the least risk? When we ask if a treatment outperforms a sham treatment, a choice we’re never faced with in real life, we’re asking the question: “is this treatment efficacious?” It’s a hypothetical, academic question. In a highly controlled set up that has limited application to the real world (known in research lingo as “poor external generalisability”), which performs better?

For the development of clinical guidelines that will inform treatment options for real people in the real world in real pain, any five-year old, gerbil or well-looked after ficus plant could tell you that the relevant question is that of effectiveness, not efficacy. But fortunately for the GDG, there is an even more authoritative view. Because NICE itself has actually explained what the GDG is supposed to be looking at when making clinical recommendations: “NICE prefers data from head to head RCTs to compare the effectiveness of interventions. . . . Effectiveness [is] the extent to which an intervention produces an overall benefit under usual or everyday conditions.”

Well that seems clear enough. But is there still a chance, however slight, that it’s me, and not the GDG, that has the wrong end of the effectiveness vs efficacy stick?

It doesn’t look that way. In November 2014, BAcC CEO Nick Pahl contacted NICE for clarification on this very issue. In a letter dated 17 July 2015, he received a reply from Professor Mark Baker, NICEs Director of the Centre for Clinical Practice, who is responsible for designing the methods for producing NICEs clinical guidelines. Regarding these very guidelines, Professor Baker writes:

“I can confirm that our evidence reviews for this topic are looking for evidence that:
•    is not limited to RCTs
•    Is not limited to placebo comparisons
•    Focuses on effectiveness rather than efficacy” – Professor Mark Baker, NICE CCP

Ok, so that’s pret-ty darn clear. According to NICEs clinical guideline development director, the evidence review for the guideline on low back pain is meant to focus on “effectiveness rather than efficacy.” The GDG most definitely got the wrong end of this particular stick and has no legitimate grounds for recommending against an effective (safe, and cost-effective) treatment on the basis of presumed lack of efficacy.

I’m really glad that we’ve cleared that up.

Point of Confusion 2: Sham Acupuncture is not a placebo control and the difference between acupuncture and sham is not a measure of non-specific effects

So, confused about what they were actually supposed to be doing, first, the ‘GDG decided to ascertain if the intervention has treatment-specific effects over and above the contextual or placebo effects, and the best comparator to prove this would be a placebo or sham.’ 2 This line of reasoning sort of makes sense if you don’t think about it too hard. In research, we use sham surgery to see if surgery works and we can use sham exercise and sham therapy to evaluate the real deal. So, sham acupuncture must be a good placebo control to test efficacy, right?

There’s a small problem with this line of reasoning: it’s not evidence (or reality) based. A sham or placebo control only works if it’s biologically inert and not providing any of the same physiological effects as the treatment itself. The nature and suitability of sham acupuncture as a control for non-specific effects enjoys a robust literature and it is ubiquitously homogenous in its conclusions: sham acupuncture is not biologically inert.

In addition to successfully controlling for placebo and non-specific effects, it also controls for part of the needling effect, which is to say, part of the acupuncture treatment itself. So the difference between acupuncture and sham acupuncture is not a matter of treatment effects over and above placebo. It is a matter of the difference between a more robust, thorough, internally consistent approach to acupuncture and a less specific, less robust acupuncture treatment. It’s like comparing full-strength Aspirin to Baby Aspirin but pretending the Baby Aspirin is a sugar pill when you interpret the results, and that’s just plain silly.

In order for a control to be suitable in measuring efficacy, it has to not do any of the specific, physical things that the active treatment does. Sham acupuncture spectacularly fails at achieving this.

Not convinced? Let’s take a look at the basic science

The insertion of an acupuncture needle provokes a number of physiological changes. It activates fine nerves (groups I-IV afferents, depending on location and technique). So does ‘sham’ acupuncture, just less so. 3

The insertion of an acupuncture needle triggers tissue fibroblasts to release adenosine, an anti-inflammatory and neurotransmitter with both local and systemic effects. Adenosine is responsible for some of acupuncture’s local painkilling effects. Sham acupuncture does this too, but less so. 4

A 2009 study looked at the brains of fibromyalgia patients before and after treatment with acupuncture or sham acupuncture. It has long been understand that placebo treatments can activate our ability to produce endogenous opioids, our body’s own natural painkillers. But the study found that only true acupuncture, not sham acupuncture, caused an increase in the binding of opioid receptors in the short- and long-term. 5 In other words, both treatments increased natural pain killers but only true acupuncture improved the body’s ability to use them.

As a final example, a recent study on mice looked at changes in neuroprotective gene expression following acupuncture at a point on the arm called San Jiao 5. Compared to both handling control and control with a sham point, the study found a significant increase in gene expression for two neuroprotective proteins in the cerebellum, a part of the brain involved in coordinating muscle movement. But one of these proteins also increased at the sham point, just not as much as at the acupuncture point. Sham acupuncture had a direct physiological neuro-protective effect in the brain; true acupuncture also did but to a much greater extent. 6

Interpreting sham-controlled acupuncture trials as if they were only controlling for placebo and non-specific effects is fundamentally flawed and I’m personally unaware of any published evidence that suggests the contrary. If the GDG is aware of any, I’d be happy to read it. Sham/minimal acupuncture is a physiologically active intervention. Thus, NICE did not actually find evidence that acupuncture works solely through non-specific effects as they indicate in their draft.

But I can see why they were confused.

What does this mean for “clinical significance”?

“Ok, Mel, I hear ya, sham acupuncture is definitely not a physiologically inert placebo control. That I’m totally clear on. But didn’t the NICE committee find that acupuncture and sham acupuncture performed equally well? And doesn’t that mean that both acupuncture and sham acupuncture work better than usual care, but it doesn’t matter where you stick the needles?” I hear you ask . . .

Excellent questions. But before we answer that, we need to understand how the GDC defined ‘efficacy.’

In trials comparing drugs to sugar pills, we don’t just want a drug to do better than placebo (which would be a statistically significant difference). We want it to do better enough that we’d actually write home about it – this is known as achieving ‘clinical significance.’ A treatment achieves a clinically significant result when it outperforms placebo by at least a certain amount, known as the ‘minimal important difference’ or MID. In the case of the NICE guidelines, the minimal important difference was agreed upon by a show of hands.

Ok, so MIDs are important when comparing an active treatment to an inert control like a sugar pill, but what about when a treatment is compared to an active comparator like sham acupuncture? In this case, clinically important differences have no clinical or logical meaning. If what we want to know is whether or not acupuncture works better than pseudo-sham acupuncture, we want to know if the results are statistically significant. In other words, we want to know whether people getting acupuncture have better results than those getting sham acupuncture and whether or not these results are robust or are likely due to chance.

Acupuncture was compared to sham acupuncture for 32 outcomes. Of these, the GDC found that acupuncture outperformed sham with statistically significant results in 11 of these and sham outperformed acupuncture in 1 outcome looking at psychological distress. However, the one negative result was due to another source of confusion: a typo. In reality, acupuncture outperformed sham for 12 outcomes; sham out-performed acupuncture in none 7.

For the 21 short-term comparisons, acupuncture significantly outperformed minimal acupuncture in 8. Again, there was no evidence of equivalence between acupuncture and sham as sham never outperformed acupuncture. Acupuncture achieved NICEs arbitrary and somewhat nonsensical definition of ‘clinical significance’ for ‘all but one of the individual domain scores of SF-36 quality of life . . . Composite physical score . . . Depression as measured by HADS.’

“The GDG noted that although comparison of acupuncture with usual care demonstrated improvements in pain, function and quality of life in the short term, comparison with sham acupuncture showed no consistent clinically important effect, leading to the conclusion that the effects of acupuncture were probably the result of non-specific contextual effects.” 8 But since sham is a biologically active control, was that a reasonable, logical conclusion based on the data in front of them?

They note that acupuncture did not achieve a so-called clinically important effect over sham for pain reduction, but this is the result of yet more confusion! As Dr Mike Cummings so astutely points out, some of the numbers in their analysis do not seem to correspond to the numbers in the actual studies. When we use the real numbers. we do find that acupuncture’s superiority over minimal acupuncture is not just statistically significant but also meets NICEs definition of clinical significance. A lot of their justification for not recommending acupuncture rested on the confusion based on this error. So I’m really glad that we’ve cleared that up.

Point of Confusion 3: Errors of Omission

There were a couple of relevant results for acupuncture that the guidelines did not include without giving a reason for doing so. Perhaps they missed them? Forgot them? Didn’t like the font? It’s hard to say.

The first one has to do with something called ‘response rate.’ Most of the results look at symptom improvement, such as a reduction of pain, on a continuous scale (such as being asked to rate pain on a scale of 1 to 10). But a response rate measures the percentage of people in each group that have a certain level of improvement. In this case, NICE has defined that as 30% improvement or greater in either pain or function.

For some reason, the data from Haake 2007 was left out. This study reported the number of patients in each arm who successfully achieved a 33% improvement or better in pain reduction. Each arm had 387 patients. Verum acupuncture achieved success in 229 versus sham acupuncture’s 197. So if you were in the real acupuncture group you were significantly more likely to have successful pain reduction than if you had sham.

In looking at whether acupuncture treatment resulted in a decrease in healthcare utilisation, they also missed out some relevant findings from Brinkhaus 2006. This study found that compared to the sham acupuncture group, those in the verum acupuncture group took less than half as many pain meds. This detail invites circumspection in interpreting the lack of significant difference between the two groups in terms of pain reduction. Perhaps the difference in pain reduction wasn’t statistically significant, but unlike the real acupuncture group, the sham acupuncture group was still was still popping painkillers like pez!!9

Point of confusion 4: Acupuncture treatment leads to reduced self-efficacy, activity, exercise and an overall decline in healthy behaviour

“The GDG discussed whether the passive nature of acupuncture treatment might promote dependence on the procedure or possibly discourage self-management or participation in activity and exercise.” 10

Oh no, not dependence on acupuncture! Oh the humanity! Let’s offer them opioids instead . . .

I can see why you might want to protect patients from the slippery slope of becoming dependent on acupuncture if there’s a risk that acupuncture treatment is associated with a decline in self-management, activity and exercise. In fact, before my first acupuncture experience, I was worried that the practitioner would discourage me from looking after myself or participating in healthy activities so that I would get hooked on her effective, but non-specific treatments.

But fortunately a UK-based 2015 study showed quite the opposite! Compared to usual care, those randomised to acupuncture for chronic neck pain had a significant improvement in self-efficacy at both 6 months and 12 months. This sort of makes sense since traditional acupuncture treatment includes lifestyle advice, including diet, exercise and rest, as part of its package of care. Incidentally, the study also found that, compared to usual care, acupuncture treatment resulted in a significant and long-lasting reduction in pain. 11

Point of Confusion 5: Acupuncture must prove its efficacy, other recommended treatments not so much

It’s clear that, due to a number of misunderstandings, the GDG thought that by not recommending acupuncture, they were protecting patients from pain reduction and improved quality of life from non-specific effects, a fate they deemed worse than not getting better at all.

But what about the other treatments that NICE recommended, like exercise, psychological interventions, or a bunch of different treatments wrapped into a pretty little package? If acupuncture didn’t get the nod because of the vanishingly remote and scientifically disproved view that its effectiveness comes from a heady cocktail of empathy and placebo, what of the treatments that NICE did recommend?

Exercise therapy – Let’s use the real numbers, shall we?

The current draft recommends exercise therapy for low-back pain, so let’s see what an efficacious treatment really looks like, according to NICE.

There were two placebo/sham controlled comparisons for exercise: pain reduction in the short-term and pain reduction in the long-term. In the short-term, exercise had a clinically meaningful benefit over sham exercise but not in the long-term. I’m not sure this classifies as the ‘consistent clinically important effect’ compared to sham that they hold acupuncture to but here we find another problem. And yes, it is another data error.

The data used to assess exercise vs sham for pain reduction in the short-term seems to be the figment of some NICE committee member’s imagination as it bares too little resemblance to the actual numbers to constitute a simple typo. Using the real numbers from the study, we find that exercise therapy has no clinically important benefit over sham in either the short- or long-term.

So it looks like exercise therapy now fails the ‘acupuncture efficacy hurdle.’ Now that’s confusing, does that mean that the GDG needs to update its recommendation of exercise?

Psychological interventions – predefined critical outcomes? Or make it up as we go along?

The GDG recommends psychological therapies as part of a multi-modal treatment package. Surely, unlike exercise, this treatment had some of that all important consistent demonstration of efficacy?

For cognitive behavioural approaches, the GDG writes: “No clinical benefit was observed for people with low back pain with / without sciatica when cognitive behavioural approaches was compared to sham or usual care or waiting list controls for the majority of reported outcomes.” 12

<<Hold on a sec, I just snarfed my merlot>>

The GDG has indicated that a treatment cannot be recommended if the evidence does not strongly support a specific treatment effect. And now they’re saying here, not only did this approach not outperform sham, but it didn’t outperform usual care or waitlist? They’re recommending it even though it’s not effective or efficacious?

And wait a sec, I remember reading about waitlist control in the acupuncture section. “It was noted that 4 of the included studies had a ‘waiting list’ group as their usual care comparison. It was considered that this may over-estimate the effects of treatment as people may become disheartened in the comparison group whilst waiting to start active treatment . . It is noted this applies to all reviews with usual care comparators and has been taken into account equally across interventions reviewed in this guideline.” 13 Noted.

Ok, I know the GDG was confused, but how did they justify recommending a treatment that according to the evidence they looked at, didn’t work at all?

They start off by saying that “The GDG agreed that health related quality of life, pain severity, function and psychological distress were the outcomes that were critical for decision making” for psychological interventions.14 And then if we skip forward a few paragraphs, including the ones where they say that the treatment wasn’t effective for any of their outcomes we find the following: “The primary aim of a cognitive behavioural approach is not to directly improve pain and function, but reduce the fear of pain, thus increasing people’s confidence in undertaking physical rehabilitation and therefore the GDG considered it unsurprising that meaningful effects were not seen in these outcomes.” 15 They then went on to recommend it.

So to paraphrase, 1) the GDG had a predetermined set of agreed upon criteria for recommending a treatment. 2) It found that the treatment in question did not meet these criteria. 3) It went on to recommend it anyways. That is really confusing.

It’s all well and good to say that cognitive approaches work by reducing fear of pain, but if that’s the case, then shouldn’t that have been one of the critical outcomes for decision making? And sorry, did I mention that cognitive therapy didn’t even outperform waitlist control in reducing psychological distress?!!!!!

Surely you agree that if cognitive therapy is doing anything at all for low back pain sufferers, it should at least be able to reduce psychological distress better than giving people the ‘disheartening’ and psychologically distressing situation of waiting for treatment without actually providing one.

Point of Confusion 6: It’s ok to recommend a placebo, as long as it’s not recommended on its own

Brown paper Backages tied up with string

“But Mel,” you quickly point out. “Sure, there’s no rational way that they could recommend against acupuncture given how well it performed and yet recommend a treatment like cognitive therapy that couldn’t even outperform a psychologically distressing control. But they aren’t recommending it on it’s own, they’re recommending it as part of a package of care.

The NICE Guidelines evaluated a heterogenous collection of combination therapies called “Multidisciplinary biopsychosocial rehabilitation programmes,” known as MBR to its mates. The section on MBR starts out with a very heartfelt description of the plight of the population with whose care the committee has been entrusted.

“Non-specific low back pain, with or without sciatica is a complex, poorly understood, multi-factorial phenomenon which impacts on people’s ability to undertake normal activities of daily living, social function and affects their mood and confidence. People are often given broad descriptions for their symptoms, rather than a definitive diagnosis. This makes it difficult to define a clear treatment plan, causing further stress.”16

Ok, according to the GDG psychological interventions aren’t meant to work on their own, so how does it work when combined with a physical intervention? According to NICEs evidence summary, a 2-element MBR programme consisting of physical and psychological elements found no clinical difference for pain or psychological distress outcomes compared to wait-list control (yep, that’s the control that according to NICE actually over-estimates the effectiveness of the treatment?).

Recommending treatments that don’t outperform sham as part of a package seems to be a bit of a theme of the guidelines. We see this again with recommendations around paracetamol and opioids. Studies comparing each of these on their own were shown to be no more effective than a sugar pill for pain relief, demonstrating that they work via non-specific effects. While the guidelines recommend against each of these on their own, they do recommend them in combination.

But wait a sec. While the performance of opioid + paracetamol vs placebo managed to eek out one clinically meaningful result for change in pain severity (but not for perceptible or meaningful pain relief, quality of life or function, all considered to be ‘critical outcomes for decision making’) since neither of these seem to help on their own, isn’t it most likely that the little they work as a combo is through placebo effects? I mean, there were no studies comparing opioid + paracetamol vs opioid + placebo or vice versa.

Indeed the only clinically impressive results for the painkiller duo were in adverse effects with a relative risk of 3.48, so patients are more than three times as likely to be harmed or injured by the treatment than placebo and yet the odds having meaningful pain relief over sugar pill are teeny. In research lingo, this treatment would be described as having a flippin’ awful benefit to risk ratio.

So I guess when the GDG said that recommending treatments that work via non-specific effects was to be avoided, even if they’re effective, the main caveat is that you can recommend two un-efficacious treatments together?

At this point, you might be asking yourself “what in the name of Britany Spears’ left tit is going on here? Who the heck are these people?”

Yes, who is the GDG?

Point of Confusion 7: Conflict of Interest Policy? What Conflict of Interest Policy? (It must have gone into Spam)

Declare and Participate

Due to accusations that NICE has become the UK based marketing and distribution arm of the pharmaceutical industry, policies surrounding conflict of interest have become very stringent in order to ensure that recommendations are based on the best interest of patients, not the personal gain of guideline committee members.

NICEs Conflict of Interest Policy explains:

“1. NICE is expected to achieve and maintain high standards of probity in the way it conducts its business. These standards include impartiality, objectivity and integrity, and the effective stewardship of public funds. Managing potential conflicts of interest is an important part of this process.

2. The effective management of conflicts of interests is an essential element in the development of the guidance and advice that NICE publishes. Without it, professionals and the public will lose confidence in our work.

3. This policy provides guidance on what interests need to be declared, who needs to declare them and when, and what action should be taken to avoid conflicts of interest influencing the conduct of NICE’s business. Everyone referred to in this policy should ensure that they and those for whom they have responsibility understand their obligations to disclose all relevant interests.”

Makes sense. So what is their specific advice?

First, they address the role of conflicts of interest with committee chairs. “The Chairs of advisory committees are in a special position in relation to the work of their committee and so may not have any specific financial or non-financial personal, non-personal or family interests.”17

So who is the chair of this committee?

His is one Stephen Ward and in Appendix B of the guidelines, he indicates no fewer than six conflicts of interest, including a “personal pecuniary interest.” Until March of this year, he was the director of a private pain management service called Back@Work Ltd.

Ok, now I’m the one who’s confused. How can this Stephen Ward guy be the chair of the GDG committee when he has conflicts all over the place and committee chairs aren’t allowed to have conflicts of interest?

But then it goes from confusing to chaotic when we note that it’s the role of the unbiased, unconflicted and impartial chair to enforce the CoI policy for the rest of the members, which doesn’t seem to have happened. For example, all of the group members receiving payments from the pharmaceutical industry were asked to leave the room when discussing pharmaceutical treatments. But explain to me how that financial conflict of interest does not extend to voting on a treatment like acupuncture, that reduces patients’ intake of meds by over 50%? When the The Director of Back@Work Ltd asked the group to provide a show of hands to decide whether or not to recommend acupuncture, did the Pfizer guys really have no personal vested interest in the outcome?

Misunderstandings: A Summary

Taking all of the above into consideration, I’d like to conclude with a short video by Dr Ian Berstein where he summarises the GDGs misunderstandings and confusion around how they are meant to make clinical recommendations and what the data actually showed. Dr Berstein is not only a member of the guidelines development group but shares the honour of the most declared personal pecuniary conflicts of interest (ten in total). Apparently unaware that the conflict of interest policy precludes him from discussion and recommendations, he indicates that he chose to “Declare and Participate.” (Appendices A-G, pp18-21)

Let’s see what he has to say:

Regarding acupuncture, he explains:

“The evidence showed that there is no clinically important difference between acupuncture and sham acupuncture for its effects on pain. The guidelines development group thought that acupuncture was unlikely to have a specific biological treatment effect but was acting through contextual mechanisms, such as seeing a caring empathic healthcare professional or the laying on of hands.”

(Sounds like the acupuncturists need to teach the therapists how to be more caring and empathic!! Hoho)

“The guidelines development group considered that other treatments reviewed in this guideline had specific and clinically important treatment effects beyond contextual effects and that these should be prioritised for the use of healthcare resources.”

Gosh, that does sound great but I simply could not find NICEs evidence of ‘clinically important treatment effects beyond contextual effects’ for a bunch of the treatments they recommended.

  • Paracetamol – effective above placebo or sham? No. Recommended? Yes, recommended in some scenarios along with opioid, but no specific efficacy demonstrated.
  • Opioids – effective above placebo or sham? No. Recommended? Yes, recommended along with paracetamol, but no specific efficacy was demonstrated.
  • Exercise – effective above placebo or sham? No. Recommended? Yes.
  • Cognitive therapy – efficacy above sham? No. Recommended? Yes, recommended as part of a package of care.
  • Acupuncture – efficacy above sham? Yes, “clinically important” improvements above sham/minimal acupuncture for pain reduction and quality of life (including physical function, physical role limitation, vitality). Recommended by the guidelines? No.

Hopefully this post has cleared up some of the confusion around the guideline updates and the GDG will figure out how to update their recommendations to be more inline with (accurate) data, more consistent throughout and more in line with the best interests of patients with low back pain and the National Health Service.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.