Now in Nature Communications: New Findings Help Resolve the Explore–Exploit Dilemma

In a recent study published in Nature Communications, a team of scientists asked, When making decisions, how does one decide whether to exploit known good options or explore potentially better alternatives?

While scientists have previously worked to uncover how the brain arbitrates between exploration and exploitation using experiments with one or two options, natural environments present a multitude of choices. In natural spaces, better options often cluster together, and a map of the environment is necessary to make good choices. The hippocampus binds reward information into cognitive maps to support navigation and foraging in such spaces, but its contributions to exploration were previously unknown.
Alexandre Dombrovski, MD (Associate Professor of Psychiatry) and Bea Luna, PhD, (Staunton Professor of Pediatrics and Psychiatry and Professor of Psychology) examined the roles of hippocampal subregions in resolving the explore/exploit dilemma.

Dr. Dombrovski, the study’s first author, explained: “In humans, the posterior hippocampus contains detailed, granular representations of the environment and is necessary for binding different locations together into a cognitive map. The anterior hippocampus, on the other hand, contains coarser, gist representations and responds strongly to reward and goal locations.  Thus, we hypothesized that the posterior hippocampus would respond to local reward information and invigorate exploration. Further, we thought that the anterior hippocampus would drive convergence on the best part of the environment, or exploitation.”

To test their hypothesis, the research team examined the contributions of the posterior hippocampus and the anterior hippocampus to exploration and exploitation, using the “clock task,” where action values vary along an interval marked by time and visuospatial cues. The task encouraged participants, 70 typically developing adolescents and young adults, to engage in extensive exploration and trial-by-trial learning. Participants completed the task while undergoing a functional magnetic resonance imaging scan.

The study revealed that the hippocampus plays a key role in resolving the explore/exploit dilemma. This process involved dissociated representations of reinforcement along the hippocampal long axis: rapidly evolving state-wise reward prediction error signals in the posterior hippocampus facilitated exploration and slowly evolving global value maximum signals in the anterior hippocampus drove the transition to exploitation.

“We navigate decision problems very much like we navigate physical spaces,” said Dr. Dombrovski.  “Our posterior hippocampus creates a detailed map of the problem, recording successes and failures. Our anterior hippocampus identifies the best solution and places it on a map.”

Differential reinforcement encoding along the hippocampal long axis helps resolve the explore–exploit dilemma
Dombrovski AY, Luna B, Hallquist MN

Nature Communications 11, 5407 (2020).