“Because of limitations in the evidence base, we did not have high confidence in any of the findings from this review.”
The reports from ME/CFS advocates are in. The AHRQ report is “appallingly bad”, “scientifically indefensible and irresponsible”, and is “fundamentally and irredeemably flawed”. It contains “multitudes of mistakes,” has “serious and sometimes insurmountable flaws”. In short, it’s a dangerous document put together by a bunch of numbskulls – a document that every responsible ME/CFS patient should do their best to see gutted.
I see it differently. I believe it’s the single most effective tool for change that advocates have been handed in decades.
“The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-based Practice Centers (EPCs), sponsors the development of systematic reviews …..provide comprehensive, science-based information on common, costly medical conditions, and new health care technologies and strategies.”
The Agency for Healthcare Research and Quality (AHRQ) were handed the task of developing an evidence-based assessment of diagnostic and treatment efficacy in ME/CFS for the P2P project. That is what they do. They’re a busy, well respected organization that is often used to do analyses of medical issues.
They’ve done hundreds of “evidence-based” reports over time. In September they published five evidence-based reports on different disorders, three technology assessments, and four other reports.
Yes, they don’t “know” ME/CFS, but they’re not experts in any of the other disorders they analyze either. In this case, it’s not needed or even desired. The AHRQ’s clients are paying for a purely objective analysis of the evidence. I assume that they know what they’re doing.
Let’s see what they reported.
“The limitations in applicability, as well as the limitations of the evidence base, make it difficult to draw firm conclusions with implications for clinical practice.”
Let’s just cut to the chase. The AHRQ essentially reported that this field is such a mess that thirty years after this disorder burst into the scene the AHRQ group couldn’t come to any firm conclusions about such basic factors as how to define or diagnose this disorder or how to treat it.
There’s a lot of upset over the many studies that didn’t make it into the analysis. That was indeed shocking and a future blog will look at those studies. The real takeaway message from this report, though, is that this field – at least with regard to diagnosis and treatment – is a mess.
“Intervention studies were scarce and most were either fair- or poor-quality and measured outcomes using heterogeneous methods making it difficult to compare results across studies.”
Is that any surprise to anyone? Do you feel well-taken care of at your local doctor’s office? Do you know of any validated treatments? Are you going from one practitioner to the next hoping someone has a clue? Of course, this field isn’t up to snuff. If it were, we of all people would know it.
But then again, how could a complex disorder like ME/CFS be in good shape on a $5 million/year budget at the NIH? How could treatment be in good shape with just one drug in thirty years making it though the FDA – only to fail? How could it not fail to meet standards of scientific rigor with a variety of definitions vying for prominence? The field is so unformed that some researchers (Wyller in Norway) feel comfortable creating their own definitions! Relative to most of the medical field, with regards to diagnosis and treatment, Chronic Fatigue Syndrome is an under-funded, under-researched mess. That’s what this report essentially says – and it’s right.
Welcome to the big leagues, ME/CFS. This is the kind of rigorous, independent review that ME/CFS has not gotten in decades, if ever. It’s an eye-opener.
On to the report:
This section outlines some of the criteria they looked at. You might want to skip through it and go directly to the results. I don’t understand much of it. The point is that this was a very rigorous analysis.
The investigators – all PhDs – extracted data on study design, setting, inclusion and exclusion criteria, population characteristics (including sex, age, race, and co-morbidities), sample size, duration of follow-up, attrition, intervention characteristics, case definition used for diagnosis, duration of illness, and results.
Two investigators independently assessed the “quality” or “internal validity” of each study. If discrepancies occurred, a third independent investigator was brought in.
Two kinds of study quality criteria were used: diagnostic study quality and clinical quality.
Diagnostic Study Quality – The quality of diagnostic studies was assessed using questions adapted from the AHRQ Methods Guide for Medical Test Reviews for Chronic Fatigue Syndrome. They assessed whether a representative sample was used, whether the study used a credible reference standard, whether thresholds were pre-specified, and more.
For diagnostic accuracy studies they extracted relative measures of risk (relative risk [RR], odds ratio [OR], hazards ratio [HR]), ROC, and AUC – when available. (If you understood what these are you understand more than I do.) They wanted to quantitatively pool the results, but they couldn’t because differences in methods, case definitions, and outcomes prevented them from doing so.
Clinical Trials Quality – This was assessed based on the presence of such factors as the similarity of the groups assessed (healthy controls vs patients), reporting of dropouts, crossover, adherence, the use of intent to treat analyses, and more.
The “internal validity” and study design, patient populations, interventions, and outcomes determined whether they could do meta-analyses. They calculated “pooled relative risks” or “pooled weighted mean differences” depending on the study type and “heterogeneity of effects between studies”. They used “subgroup analysis”. They did a lot of statistical stuff, most of which is gibberish to me.
They defined what it meant to be a good or fair or poor quality study.
- Good-quality studies clearly describe the population, setting, interventions, and comparison groups; clearly report dropouts and have low dropout rates; are non-biased using blinding; and appropriately measure outcomes and fully report results. They are presumed to be valid.
- Fair quality studies have enough issues that it’s difficult to tell if they’re valid or not. Some are probably valid while others are not.
- Poor quality studies have a serious enough flaws that the results found are probably more a result of the flaw than anything else.
“Toto – I’ve a feeling we’re not in Kansas any more.”
They started out with a broad sweep, doing a full text review of no less than 914 articles and then excluding a stunning 90% + of them from further analysis for following reasons with number of studies excluded for each reason:
- Did not address a key question or meet inclusion criteria – 301
- Wrong population – 76
- Wrong intervention – 9
- Wrong outcomes – 84
- Wrong study design – 131
- Wrong publication type (presumably letters, opinions?) – 157
- Inadequate duration – 57
- Systematic review not meeting requirements – 27
That’s a pretty astonishing filter. Some bloggers are crying foul, but the reductions appear to be simply the result of strict inclusion factors, one of which was this:
“Articles that attempted to define an etiology on the basis of a biochemical marker or a particular physiologic test were not included in this review because the intent of these was to identify an etiology rather than understand how the specific test could distinguish patients that would respond to treatment.”
I don’t really understand what that means, but I assume that they do and that’s it’s a standard exclusionary factor in these kinds of analyses.
They whittled the number of studies in the final analysis down to 28 diagnostic and 36 treatment studies.
“No studies evaluated a diagnostic test for ME/CFS using an adequate size and spectrum of patients and no studies demonstrated an accurate and reliable method for identifying patients or subgroup of patients with ME/CFS.“
Not one diagnostic study met their standard of quality. Most of the studies were assessed were of fair or poor quality.
They noted that including ME/CFS patients and healthy controls – as almost all ME/CFS studies do – is just the first step. Studies next need to distinguish between patients who have symptoms similar to ME/CFS but don’t actually have the disorder. You could probably count the number of studies that have done that on the fingers of one hand.
Self-Rating Scales Fail
The AHRQ judged that even the SF-36, a primary fatigue scale used in much ME/CFS research, and generally proposed as having been validated, has not been validated at all. They asserted that the effectiveness of other scales were impossible to determine because the studies they were used in were too small, too poorly designed, etc.
If you were looking for diagnostic biomarkers – laboratory tests that define ME/CFS – four studies (just four!) made the grade for analysis. They included two cortisol tests, pro-inflammatory cytokines in response to stress, and RNase L. The AHRQ must have dismissed dozens of studies in this category.
This was the most disappointing section of the report for me and a cause for concern. In a follow up blog I’ll look at which diagnostic biomarker studies were excluded and why.
That didn’t mean these four were quality studies. All were judged too small to provide trustworthy evidence, and none included groups with “diagnostic uncertainty”. (The CDC is the only research group that I’m aware that regularly includes fatigued patients who do not meet the criteria for ME/CFS.)
With RNase L – a controversial factor if there ever was one in ME/CFS – making the cut, it’s hard to argue that bias against certain types of biomarkers was involved. [This can be said for the treatment trials as well. With a homeopathy study making the cut, it’s hard to argue that the reviewers weren’t willing to look at any kind of treatment. The knife probably cut equally – on every side.]
Studies that we would have thought at least preliminarily identified subgroups using exercise testing, cerebral blood flow, heart rate variability, acid accumulations, and, as the report noted, “many others”, simply did not include the diagnostic testing outcomes the AHRQ required (ROC/AUC, sensitivity, specificity, or concordance) and therefore were not included.
When diagnostic studies on serum parameters and cardiopulmonary functioning did meet those requirements, they failed in other ways. For one, they failed include a broad enough spectrum of patients to ensure that the test was actually diagnostic for ME/CFS.
Welcome again to the big leagues. This is not “pick on ME/CFS” time. This is what the AHRQ does, day in and day out. They’re engaged to clear out the cobwebs and bring clarity to complex issues such as how effective treatments really are. They appear to be highly trusted by the medical profession to do that.
“Most of the evidence available surrounding treatment is insufficient to draw conclusions. Because of limitations in the evidence base, we did not have high confidence in any of the findings from this review, and only had moderate confidence in the benefit of CBT (fatigue and global improvement) and GET (function and global improvement).”
In the treatment section they assessed 9 drug, 14 CBT, 7 CAM (complementary and alternative medicine), 6 exercise, and 5 trials that combined more than one treatment. Twenty of the thirty-six trials (55%) assessed were either CBT or exercise trials, but this no surprise, and surely reflects the fact that these trials tend to be larger and better funded and more of them have been done; i.e., they were more likely to make it into a report of this type.
Only seven the thirty-six studies were judged to be of “high quality”. That’s one high-quality ME/CFS treatment trial occurring about every four years!
Definition Heterogeneity Signaled Out
“Acceptance of a single case definition and development of a core outcome set would aid in better studying the interventions to allow for more meaningful guidance for clinicians, policy makers, and patients.”
One of the reasons for the AHRQ’s inability to provide firm answers was the muddying effect caused by the different case definitions used. The AHRQ panel essentially didn’t trust any of the definitions. Nor did they suggest one was better than the other. The lack of data present for the Canadian Consensus Criteria surely made it impossible for the panel to quantitatively assess how effective it is.
The evidentiary base needed to separately assess the impact of the definitions on treatment efficacy was simply not present. (The exception to that may be the use of the Oxford definition in behavioral studies.). The definition problem is a huge one and it’s one place this report may have a real impact.
The authors essentially stated that until a single definition and (validated) core outcomes are produced and used, it’s going to be very difficult for ME/CFS treatment trials to produce “meaningful guidance” for anybody – patients, clinicians, or policy makers.
Methodological Problems Exclude Drug and Alternative Treatment Trials
“Across all intervention trials, heterogeneity in the population samples (different case definitions used for inclusion), outcomes evaluated, and tools used to measure these outcomes limited the ability to synthesize data.”
All the drug and alternative medicine trials except one provided “insufficient” evidence on their effects on ME/CFS for the panel to say anything about them. Almost all were single studies, and a single study means nothing in the medical world. It must be validated.
Note the disparity between the number of attempts to validate treatment efficacy between behavioral and all other trials. In contrast to the 14 CBT and 6 GET trials only one other intervention (Ampligen/rintatolimod) had more than one study devoted to it.
Two trials of Ampligen (rintatolimod) and one of valganciclovir suggested improvement, but the studies were not definitive and were limited by inconsistencies in methods and findings, small numbers of participants, methodological shortcomings, and lack of long-term follow-up.
Several other trials (IVIG, Isoprinosine, galantamine, hydrocortisone) simply didn’t provide evidence for significant improvement.
Numerous other trials did not make the cut. We’ll be looking at those in an upcoming blog.
Both CBT and GET provided “some benefit” – hardly a ringing endorsement of them – primarily to fatigue and functioning. The report did not state, as one blogger has reported, that both were “effective” treatments for ME/CFS. These studies had their own problems (multiple methods of evaluating their outcomes, mixed results on the same measure when using different tools.)
In the end the AHRQ only had “moderate” confidence that CBT is able to reduce fatigue and provide “global improvement” in ME/CFS. Based on the evidence presented, they had low confidence CBT is able to improve overall functioning, enhance quality of life, increase working hours, or reduce work impairment.
The AHRQ also acknowledged the dangers associated with exercise and GET. They noted that harms were not well reported in GET studies and several factors indicated they could be significant. Those factors included the high degree of harms reported in one GET trial, high dropout rates in another, and higher withdrawal rates whenever the arms of trials included exercise.
AHRQ also reported that several studies find that exercise worsens symptoms, and that an ME Association survey reported that GET had higher rates of symptom exacerbation than other treatments. They also noted that one study indicated that ME/CFS patients who remained within their energy envelope and avoided overexertion significantly reduced their fatigue levels.
“The main limitation of the evidence base in our review was poor study quality.”
Rigorous research studies always include a shortcomings section which spells out possible limitations to their findings and the AHRQ did as well. This part of the report indicated how applicable the AHRQ authors believe their findings are to the Chronic Fatigue Syndrome community (patients and doctors). How well, in other words, do they reflect what doctors encounter in their office?
Several factors suggested the report was accurate. The use of all case definitions allowed them to cover a broad swath of patients seeing doctors for Chronic Fatigue Syndrome. They felt that the interventions they looked at probably covered those “commonly” used in medical practices. [Their report did not reflect many of the treatments used in ME/CFS expert’s practices, primarily because many of these treatments have not been studied in ME/CFS].
Multiple Definitions Limited Applicability
“This (the Oxford definition) has the potential of inappropriately including patients that would not otherwise be diagnosed with ME/CFS and may provide misleading results.”
The use of multiple definitions was a major limitation. In very clear terms, they warned that studies containing the Oxford definition, “in particular”, might not contain any ME/CFS patients. They also noted that “ME” and “ME/CFS” definitions selected out a more severely ill set of patients. (Unfortunately they did not note how many of the CBT/GET studies used the Oxford definition, nor did they break down the success rates of those studies relative to that definition).
Some other limitations also suggested the report may not apply to the entire ME/CFS community. Small study sizes, inadequate methodologies, lack of study replication, etc. rounded out some of their concerns regarding how applicable their findings are.
Expert Bias Warning – They asserted that, given the lack of a diagnostic standard, ME/CFS is at inherent risk of something called “expert bias”. “Expert bias” is apparently a known factor in disorders that do not have validated tools.
They singled out the post-exertional malaise symptom as a possible example of “expert bias”. Until methods of testing, comparing, and monitoring this symptom are developed and used, PEM is and has to be suspect. That’s not a dig at ME/CFS or ME/CFS experts, and they’re not saying it may be not be valid. It’s a warranted conclusion and caution based on past medical findings that later found the “experts” were wrong. (They also suggested that future studies should include findings on PEM, neurocognitive status, and autonomic functioning.)
It also shows why the attempt to institute the Canadian Consensus Criteria as the definition for ME/CFS was doomed to failure – at least at this level. Consensus definitions are simply not rigorous enough to “make the grade.”
Importantly, they noted that the many problems with ME/CFS diagnostic procedures and studies make it difficult for clinicians to draw any firm conclusions from the report. Some advocates will scoff at that conclusion and say the report was a waste of taxpayer dollars, but I disagree. This report simply documents how poorly ME/CFS treatment, definition, and diagnostic studies meet the standards of excellence in the medical community. What carries weight in any scientific community are rigorous peer-reviewed studies, and the AHRQ is representative of the “peers” in the medical community.
That may be painful to hear, but if you ask yourself why ME/CFS is so poorly funded and has such a poor reputation this is one reason. (It’s also an inevitable outcome of very poor funding over time. ). Some mistakes were surely made in a review of this size, but I think the report is probably right on in its finding that there’s not much “there” there with regard to defining, diagnosing, and treating ME/CFS.
Negative reviews by AHRQ like what occurred here with ME/CFS may not be particularly unusual either. A September AHRQ review of Chronic Urinary Treatments found:
“Evidence was insufficient due to risk of bias and imprecision, and we were not able to evaluate consistency of results across studies. Further research should address conceptual issues in studying CUR as well as strengthening the evidence base with adequately powered controlled trials or prospective cohort studies for populations and interventions common in practice.”
A Review of Imagining Efficacy in Colorectal Cancer that was more positive indicated how far the AHRQ can go when it has the material to work with.
“Low-strength evidence suggests ERUS is more accurate than CT… and MRI is similar in accuracy to ERUS. Moderate-strength evidence suggests MRI is more likely to detect colorectal liver metastases than CT. Insufficient evidence was available to allow us reach any evidence-based conclusions about the use of PET/CT. Low-strength evidence suggests that CT, MRI, and ERUS are comparable … but all are limited in sensitivity … . Long-range harm from radiation exposure with repeat examinations is particularly of concern with PET/CT.”
Again, welcome to the big leagues! We said we wanted to be treated like other major disorders are treated, and now we are. It’s not going to be pretty at times. Growing pains are to be expected.
Guidelines and Gaps
The most important part of the report may be its outline of the gaps the ME/CFS field needs to fill with regards to diagnosis, definitions, and treatments, and the guidelines it can use to fill them.
Future Diagnostic studies:
- Should include a broad range of people, including people with similar symptoms such as Fibromyalgia and depression.
- Should use concordance and net classification index data to determine how well a particular test distinguishes ME/CFS from other disorders.
- Should use comparative groups rather than healthy controls in diagnostic biomarker studies.
- Should settle on one definition.
- Should include multiple treatments (to mirror what doctors do), have larger sample sizes with calculated power calculations, and provide more rigorous adherence to methodological standards for clinical trials.
- Follow-up periods of greater than a year would be preferred to capture the cyclical nature of ME/CFS.
- Develop and use a set of core outcomes, including more patient-centered outcomes such as quality of life, employment, and time spent supine versus time active.
- Report about co-interventions, the timing of interventions in relation to other interventions, and adherence to interventions.
- Stratify findings based on patient characteristics (e.g., baseline severity, comorbidities, demographics).
- In particular, future studies should report findings that reflect the cardinal features of ME/CFS such as PEM, neurocognitive status, and autonomic function.
- Harms should be more clearly reported, particularly for exercise studies.
Some positive outcomes were
- They searched for myalgic encephalomyelitis. (That would certainly not have happened before Dennis Mangan came to town).
- They acknowledged the Oxford criteria may select out people who don’t have CFS.
- They acknowledged that ME/CFS and ME criteria select out patients with more severe symptoms.
- They acknowledged considerable harms may be occurring in GET studies
- They had only moderate confidence in the ability of CBT to improve fatigue
- They had low confidence CBT was able to improve overall functioning, enhance quality of life, increase working hours, or reduce work impairment. Nowhere was CBT/GET suggested to be a cure.
More importantly, any report that concludes that almost 30 years of research and study have failed to provide the evidentiary basis to provide any firm conclusions regarding diagnosis, diagnostic biomarkers, and treatment efficacy says something about the state of ME/CFS in our medical system. That AHRQ could find only 64 studies that fit its parameters says something very important.
To say almost 30 years later that you cannot say anything firm about the effectiveness of treatments is a damning indictment of the medical research establishment’s support of the Chronic Fatigue Syndrome community. To have only 39 treatment trials qualify for inclusion in this report – most of which were of poor to fair quality – from almost 30 years of study is astonishing. To have only one drug or alternative treatment provide other than “insufficient” evidence of it’s effectiveness in ME/CFS is tragic in a disorder effecting a million people in the U.S. and millions more across the world.
Read between the lines and this report says that ME/CFS has been spectacularly poorly studied, that it lacks fundamental tools that other disorders have, and until it has those tools not much effective work is going to be done in the realm of diagnosis and treatment. A definition that works must be developed (and validated), validated outcome measures must be developed and used consistently, study sizes need to be larger and rigorous study designs must be used.
That’s the real message from this report and that’s how this report will make a difference. For all the hubbub about the reports failings advocates will be able to use this government sanctioned report as a powerful cudgel to slam the NIH, CDC and FDA for their neglect of this disease. It’s going to one of the most be powerful tools we have for insisting on adequate research funding, for support in building up the infrastructure ME/CFS needs to be researched properly, and for more flexibility at the FDA regarding treatment trials.
The very fact that it’s an independent analysis – something some advocates railed against – will enable them to use it’s conclusions – that decades of research have largely been useless with regards to diagnosis and treatment – that much more effectively.
Pathways To Prevention
The Pathways to Prevention (P2P) program is an NIH-based program that host workshops which identify research gaps and methodological and scientific weaknesses in a field using “unbiased, evidence-based assessments”. The research gaps the AHRQ report outlines are plain – no validated definitions, few properly designed diagnostic biomarker studies, little basis for determining effective treatments, etc.
The federal government often requires overviews be done before it begins major funding efforts. The Neuroimmune Conference in the early 2000s provided the basis for what was probably the first and only major grant effort by the NIH for ME/CFS. The P2P report should provide an evidentiary basis the federal government needs to embark on a major funding effort to assist with defining, diagnosing, and treating ME/CFS. (Why else would do a report like this?)
The AHRQ report provides the most comprehensive overview of those facets of ME/CFS that the P2P panel will receive. Given its rigorous standards and its officially sanctioned nature, they will rely heavily on it in making their recommendations. Hopefully, some of the recommendations should lay the groundwork for a major grant designed to solve these problems.
Comments to the report are allowed until Oct. 20th here.
I say don’t gut the report. Don’t argue that ‘x’ treatment approach works. Any federal report that says we still don’t know how to diagnose or treat ME/CFS is perfect as it is. Leave it as it is.
For me, I’m going to look into the Oxford definition and behavioral studies and if most of them use it, I’m going to request a warning that the results may not apply to many people with ME/CFS. I’m also going to ask the panel to reconsider their assessment of the PACE trial given the many questions raised about it.
The main focus of my comments, though, will be on how the report’s findings reflect a tragic and decades long lack of support from, in particular, the federal government – for a million chronically ill people in the U.S. My comments will also reflect the enormous disparity between the number of clinical trials assessing CBT and GET and any other treatment approach.