Studies of Brain Activity Aren’t as Useful as Scientists Thought

Duke researcher questions 15 years of his own work with a reexamination of functional MRI data

Brain scans showing functional MRI mapping for three tasks across two different days. Warm colors show the high consistency of activation levels across a group of people. Cool colors represent how poorly unique patterns of activity can be reliably measure
Brain scans showing MRI mapping for 3 tasks across 2 different days. Warm colors show the consistency of activation levels across a group of people. Cool colors represent how poorly unique patterns of activity can be reliably measured. (Annchen Knodt)

Hundreds of published studies over the last decade have claimed it's possible to predict an individual’s patterns of thoughts and feelings by scanning their brain in an MRI machine as they perform some mental tasks.

But a new analysis by some of the researchers who have done the most work in this area finds that those measurements are highly suspect when it comes to drawing conclusions about any individual person’s brain.

Watching the brain through a functional MRI machine (fMRI) is still great for finding the general brain structures involved in a given task across a group of people, said Ahmad Hariri, a professor of psychology and neuroscience at Duke University who led the reanalysis.

“Scanning 50 people is going to accurately reveal what parts of the brain, on average, are more active during a mental task, like counting or remembering names,” Hariri said

Functional MRI measures blood flow as a proxy for brain activity. It shows where blood is being sent in the brain, presumably because neurons in that area are more active during a mental task.

The problem is that the level of activity for any given person probably won’t be the same twice, and a measure that changes every time it is collected cannot be applied to predict anyone’s future mental health or behavior.

Hariri and his colleagues reexamined 56 published papers based on fMRI data to gauge their reliability across 90 experiments. Hariri said the researchers recognized that “the correlation between one scan and a second is not even fair, it’s poor.”

They also examined data from the brain-scanning Human Connectome Project -- “Our field’s Bible at the moment,” Hariri called it -- and looked at test/retest results for 45 individuals. For six out of seven measures of brain function, the correlation between tests taken about four months apart with the same person was weak. The seventh measure studied, language processing, was only a fair correlation, not good or excellent.

Finally they looked at data they collected through the Dunedin Multidisciplinary Health and Development Study in New Zealand, in which 20 individuals were put through task-based fMRI twice, two or three months apart. Again, they found poor correlation from one test to the next in an individual.

The bottom line is that task-based fMRI in its current form can’t tell you what an individual’s brain activation will look like from one test to the next, Hariri said. The new analysis, appears June 3 in Psychological Science

“This is more relevant to my work than just about anyone else’s!” Hariri said, his voice rising. “This is my fault. I’m going to throw myself under the bus. This whole sub-branch of fMRI could go extinct if we can’t address this critical limitation.”

Hariri has been using fMRI data as part of a long-term study of 1,300 undergraduate Duke students. By combining brain scans, genetic testing and psychological assessments, Hariri is searching for biomarkers of individual differences in the way people process thoughts and emotions, such as why one person comes away from a traumatic event with PTSD or depression and another does not.

“We can’t continue with the same old ‘hot spot’ research,” Hariri said. “We could scan the same 1,300 undergrads again and we wouldn’t see the same patterns for each of them.”

One possible solution to the reliability problem, using existing technology, would be to collect data for a full hour or longer in the scanner, not just five minutes. Hariri also said developing new tasks from the ground up with the explicit purpose of reliably measuring individual differences in brain activity is another strategy. In the meanwhile, Hariri and his team have shifted their focus to MRI measures of brain structure, which are highly reliable.

“It’s not as if we haven’t known these issues of reliability, but this paper brings them together more sharply,” said Russell Poldrack, the Albert Ray Lang Professor of Psychology at Stanford University, who had a 15-year-old fMRI paper among those that were reanalyzed.

“This is a good wakeup call, and it’s a marker of Ahmad’s integrity that he’s taking this on,” said Poldrack, who was not involved in the meta-analysis but said he has had suspicions about fMRI reliability for a few years now.

Connectivity mapping – seeing how areas of the brain are connected to address a task more than just what areas are active – is going to be the way forward, Poldrack predicted. Hariri agreed that identifying patterns of activity throughout the brain rather than in one or two areas may improve reliability.

In the meantime, the sociology behind a dramatic debunking of a scientific tool is going to be interesting to watch, Hariri and Poldrack both said.

“There’s three things you can do,” Poldrack said. “You can just up and quit, you can stick your head in the sand (and act as if nothing has changed), or you can dig in and try to solve the problems.”

This analysis was supported by the U.S. National Science Foundation. The Dunedin Study is supported by the U.S. National Institute on Aging (R01AG049789, R01AG032282) and the UK Medical Research Council (P005918), the New Zealand Health Research Council and the New Zealand Ministry of Business, Innovation and Employment (MBIE). The Human Connectome Project is supported by 16 centers of the U.S. National Institutes of Health via the Blueprint for Neuroscience Research.

CITATION: “What is the Test-Retest Reliability of Common Task-fMRI Measure? New Empirical Evidence and a Meta-Analysis,” Maxwell L. Elliott, Annchen R. Knodt, David Ireland, Meriwether L. Morris, Richie Poulton, Sandhya Ramrakha, Maria L. Sison, Terrie E. Moffitt, Avshalom Caspi, Ahmad R. Hariri. Psychological Science, June 3, 2020. DOI: 10.1177/0956797620916786