News

Science Advances: Reward-Based Option Competition in Human Dorsal Stream and Transition from Stochastic Exploration to Exploitation in Continuous Space

Organisms face a difficult dilemma between exploiting options that are known to be good and exploring alternative options that might be even better. This explore-exploit dilemma becomes especially hard when we move rapidly through a changing environment. As our primate ancestors adapted to hunting and foraging on swinging tree branches, they evolved dynamic maps of their environment in the dorsal or “where” cortical stream. These maps – found primarily in the parietal cortex – support exploration and steer choices toward rewarded options. Two complementary perspectives exist on how dorsal stream maps encode rewards. Reinforcement learning models integrate rewards incrementally over time, efficiently resolving the explore-exploit dilemma. Working memory models explain rapid plasticity of parietal maps but have no explanation for how we resolve the explore-exploit dilemma.

In a recent paper in Science Advances, investigators including Beatriz Luna, PhD (Distinguished Professor of Psychiatry and Professor of Psychology, Bioengineering and Radiology, and Staunton Professor of Pediatrics and Psychiatry), and Alex Dombrovski, MD (Pittsburgh Foundation Endowed Professor in Brain and Mind Research and Professor of Psychiatry), from the University of Pittsburgh, presented a reinforcement learning model that unifies both perspectives, enabling rapid, information-compressing map updates and efficient transition from exploration to exploitation.

They show that activity in human frontoparietal dorsal stream regions tracks the number of competing options, as preferred options are selectively maintained on the map, while spatiotemporally distant alternatives are compressed out. When valuable new options are uncovered, posterior β1/α oscillations desynchronize within 0.4 to 0.7 s, consistent with option encoding by competing β1-stabilized subpopulations.

“Our study sheds light on how competition between dorsal stream neuronal subpopulations may enable primates, including humans, to choose among multiple options in rapidly changing environments. Neuronal subpopulations representing rewarded options get to dominate the output to motor areas through a mechanism involving low beta oscillations. This helps us focus on preferred options and ignore inferior alternatives,” said Dr. Dombrovski, the study’s senior and corresponding author.

Reward-based option competition in human dorsal stream and transition from stochastic exploration to exploitation in continuous space
Hallquist MN, Hwang K, Luna B, Dombrovski AY.

Science Advances 10, eadj2219 (2024). DOI:10.1126/sciadv.adj2219