One of the most common questions I get is along the lines of, “How should I go about reading and critically analyzing a research article?” Many clinicians and students are starting a journal club and want to make sure that they don’t fall victim to the trappings of “bad science”. After replying to enough of these, I have decided to just put it into a blog post. If you just want to know how to stay current with the literature, that is a separate post that I wrote several years ago.
Why do we need to critically analyze research papers? Why can’t we just take articles at their face value? Well, a study by the Center for Open Science that attempted to replicate 100 previously conducted studies showed that we had a problem. 97% of the original studies showed an effect. When the same studies were run a second time EXACTLY as before, only 36% showed an effect – and those remaining effects were much smaller than originally described. Why is this? We’ll get to that. For now just know that the evidence you thought you had supporting what you do may not be “real”.
A couple disclaimers I need to highlight here before I go any further:
First, this post assumes that you have little training in research and are not really wanting to get into the hardcore details of statistics, etc. This is aimed at the average clinician just trying to “do better”. Think big picture. I could write whole posts just on one type of design flaw (like this one). This does not mean that seasoned researchers should not follow this same review process – THEY SHOULD AT LEAST START HERE – but they will take things further than this review that I am describing.
Second, I’m going to be discussing clinical trials here. This is the most cited type of literature in our profession, the one most often misunderstood, and the one with the most room for error in methodology. Things like systematic reviews would require another post entirely (start with PRISMA).
Ok, so you’ve got yourself an article to review. Being a clinical trial, it should be pre-trial registered at some place like ClinicalTrials.gov before it was conducted. Look for the NCT number in the abstract for reference. If it isn’t pre-trial registered, you will see that it makes an honest review much more difficult. I’m assuming that you do not have your hands on something that went through the registered report process so I’m not going to talk about that (even though I think it is truly awesome and hope to see it become the standard in the near future).
Head on over to the pre-trial registration page and enter that NCT number in the search box. Download and/or print EVERYTHING from that pre-trial registration (it’s free and open). You will use just about everything there. Review this before you review the article itself.
Pre-Trial Registration Review
Here the authors lay out their reason for conducting the study. Now, we are going to stay away from accusing them of ulterior motives and intentional deception – not our place to make that judgement. But what we can do is ask questions about the statements made in this section.
Are these assumptions validated? A lot of times authors assume that certain concepts are “true” and base the study on those. Now, this doesn’t invalidate the entire study, but it needs to be considered.
Are there other ways to explain what they are talking about here? This will help generate your discussion points around this paper. We will come back to this later.
As I said before, we aren’t getting into deep and detailed statistics here. I am assuming that the reader is not a researcher themselves. Let’s keep it simple.
How many subjects? Look for something “respectable”. Not sure I can draw much of a conclusion with just 15 subjects. Over 100 makes me pretty confident in the primary outcome assessment. In order to draw strong conclusions from secondary outcomes those numbers would need to grow exponentially (more on primary and secondary outcomes in a minute).
What are the study start, estimated completion, and actual completion dates? Glance over at the final published article and you will see the submission date and the acceptance date. Is this entire timeline reasonable? For example, it would be odd if a study with only 30 subjects and a 6-month follow up took 10 years to complete. Or there is a huge gap between study completion and submission for publication. Or there is a huge gap between submission and acceptance. These gaps could be completely innocent, but they are also opportunities for hidden manipulations and should raise an eyebrow.
The study start date can give you an opportunity to consider “What we knew at that time”. In other words, don’t expect the paper to consider a concept that was developed after that start date. Not good or bad, just something to keep in mind.
Arms and Interventions
This gives a snapshot of what they are comparing. Remember a good study keeps it simple. The more arms you have, the more subjects you need.
If an effect were to be found in one of these arms, how else could you potentially explain those effects? For example, one intervention may require more provider attention than the other, and the effects could be due to that attention and not the intervention itself.
Skip on down to the eligibility criteria and see how narrowly they defined their population. Sure, there are important considerations there but I’m getting bored and just really want to talk about…
If you are going to jump straight to a section, this would be the one to jump to. Understanding primary and secondary outcomes is extremely important. First you need to know the difference between the two and why they matter.
Primary Outcome Measure
The primary outcome is THE outcome that the study is designed to assess. Like the Highlander, there can be only one!
[Insert Highlander GIF that no one will find funny]
Well, you can have more than one if, and only if, your study is powered to do so. Unless your study has thousands of subjects, it isn’t powered to do so.
Secondary Outcome Measures
These are all the other things that the authors would like to consider and track at the same time. What about pain? What about return to work? What about changes in function? What about a difference in later healthcare utilization? Etc, etc, etc.
As you can see, these are all very legitimate questions and any one of them could be their own primary outcome. The important thing to note here is that they cannot ALL be primary outcomes. Why they can’t be is very important to understand. Follow me for a minute.
Ok, just a little statistics here. I promise to keep it simple with only one little bit of math and one cartoon. That math will be around the idea of a p-value of 0.05 – Hey! Stop screaming and banging your head against the wall! This is very simple. Let’s look at that number in a different light.
- 0.05 = 5%
- 5% = 1/20
- 1/20 = 1 in 20 chance
- 1 in 20 chance = Every 20 times you look at something, you should expect at least 1 false positive
Remember that a false positive is the appearance of an effect, but what you are seeing is actually just random noise. One in 20 sounds like a low likelihood, which it is, if you are only looking at one thing. But what if you are powered to look at one thing, but instead you look at 20 things?
The webcomic xkcd does a great job demonstrating how this works below:
How many times do you think they looked at jelly beans in that comic? You guessed it! 20 times. Now that doesn’t mean you have to look at 20 things to get a false positive. Every time to look at another variable within the same data set, your likelihood of hitting a false positive goes up. If you look at 20 things, the chance goes way way higher than 5%. I have seen pre-trial registrations will more than 40 secondary outcomes which is…interesting.
There is a similar phenomenon at play behind the Monty Hall Paradox which I’ve written about before. It is even more counter-intuitive and requires much more math so let’s avoid that for now.
What authors have a very unfortunate history of doing is a trick known as “HARKing” – Hypothesis After Results are Known. In this situation, the authors looked at all of their outcomes equally, then reported on anything that was found to have a correlation.
To use xkcd’s jelly bean example, imagine an investigator doing one study that looked at all of those different colors as separate outcomes and found a correlation between green jelly beans and acne. So they create a hypothesis around the color green (something about the chemicals in that particular dye or whatever) and publish their results. Looks very scientific, but it’s far from it.
This isn’t malicious; these authors are typically good intentioned. They collected their data and looked for correlations. “What have we here? There seems to be something about green jelly beans…” The desire to find a correlation is strong. Really really strong.
When you set up a primary outcome BEFORE DATA COLLECTION BEGINS, you are telling the world that this is the outcome that is being looked at rigorously. The secondary outcomes cannot be considered with the same rigor because they run a higher likelihood of a false positive.
Now, let’s look at the published article (FINALLY). Does it match with the pre-trial registration? Is the primary outcome the main point of the paper (HINT: Check the title). As I mentioned before, that desire to show an effect is really really strong. If the primary outcome didn’t show the desired effect, they may have just switched it out with a secondary outcome that did show something. WRONG!!! This is why those studies don’t get reproduced (I told you we would get back to this). They reported something that had a high likelihood of false positive – just like the green jelly beans. When you run the study again, the random noise shows up somewhere else because, you know, it’s random.
That doesn’t mean you can’t get anything out of those secondary outcomes. It just means you can only consider them as “possible” (assuming that they don’t defy the laws of physics or electromagnetism or some other well-founded theory which, sadly, we as physical therapists often forget about). If it appears that there is a thing going on around that outcome, a NEW study should be conducted with new subjects; that old secondary outcome now being elevated to be the new primary outcome. Then you can start to draw some conclusions about it.
Reviewing the rest of the paper
NOW we can review the paper. If you did everything before this point, the review from here on out should be pretty straight forward. Ask these questions:
Does it match the pre-trial registration? If not, do they explain why? Sometimes things don’t go as planned, and that’s understandable. They should just be transparent about it so we can consider those factors. Typically the pre-trial registration will track the changes that were made to the design (check the “Tabular View” tab on the pre-trial registration page).
What did they find when the conducted the study? This is the most obvious question when reading a paper. Keep in mind that it is in the light of the previous review of the pre-trial registration.
How else can you explain these results? Ideally the authors will do this for you in the discussion and limitations section. Unfortunately they don’t always. That means it’s now your job.
How does this compare to previous literature? If it doesn’t line up, is there some clear reason why this would be more correct than the previous literature or vice versa?
Should this have a drastic effect on my practice? The answer here is usually “No”. It needs to be considered as another piece of the puzzle that includes a lot of other pieces. Or as I have said many times before:
“If one article changes everything for you, it is because you only read one article.”-Erik Meira (Me)
You will notice that I’m not getting into blinding and allocation and all of those other more nuanced things about research. They are very important, but if you do all that I described above, you will be able to have a thoughtful, honest, and pragmatic interaction with the literature.
But what if the paper you are reading isn’t pre-trial registered? You would still want to ask all the same questions, but you would have a much harder time answering them.
- Keep things simple
- The review starts with the pre-trial registration
- If you don’t have a pre-trial registration, your job is much harder
- Most studies are only powered to look at one outcome
- Outcome switching makes reviewing literature very difficult
- How else could you explain these results?