hows the brain learning stuff

First published on Feb 29, 2024

The idea that the brain operates using methods similar to backpropagation has become more accepted lately. Earlier doubts about its biological feasibility have been overshadowed by recent findings, suggesting a more complex reality that encourages cautious optimism. From my viewpoint, it appears the brain may not use backpropagation in its exact form but employs a mechanism closely related to it, as anything vastly different would require prohibitive computational resources. A notable comparison comes to mind: in reinforcement learning, backpropagation is utilized, yet the choice of function is crucial. Similarly, the brain likely holds significant advantages over our current methods, possibly employing strategies akin to policy gradient or PPO at an algorithmic level.

At the core of the question lies a fundamental parallel between brains and modern deep networks: both require orchestrating myriad small weight changes to improve behavior. Befitting the brain's slow evolutionary pace, this likely necessitates approximating the most efficient solution discovered thus far - backpropagation. Preliminary indications of such approximation exist, with backpropagation-trained models increasingly predictive of neural responses. However, similarity in outcome does not confirm similarity in process; masterful forgery can mimic masterful artistry.

So what possibilities exist for biological implementation? Contrasting propagation of activities versus errors poses an initial conundrum. While activities readily transmit through spiking, propagating signed errors across neurons seems peculiar. Dendritic segregation offers a potential solution, separating feedforward and feedback streams. Localized errors between resulting activities could drive learning without global transmission. This aligns with evidence that top-down signals actively alter bottom-up processing - improving representation through induced targeted differences.

Another proposal invokes distinct neuronal populations to convey errors. While neat in abstraction, it finds little physiological support. Moreover, cross-network coordination introduces further obstacles. Here too, nature disappoints theoretical symmetry. Feedback's divergent connectivity only crudely resembles feedforward reciprocity - though alignment improves with learning. This asymmetry nonetheless suffices. Surprisingly, random feedback weights successfully impart learning in deep networks, coaxed into usefulness through activity alterations that incidentally reduce errors. Precision, while desired, proves non-essential.

The timing of activities also appears ripe for reconciliation. Whereas backpropagation demands sequential phases, evidence points to simultaneous top-down and bottom-up cortical activity. But could fast neuromodulation provide functional temporality through rapid gating? Perhaps, though concrete mechanisms remain speculative. Alternatively, might continual predictive activity in higher regions update lower areas online? Equilibrium propagation demonstrates possible convergence, combining predictive coding with local errors to approximate backpropagation’s effect.

Most promising is eschewing error transmission altogether. Algorithms like target propagation demonstrate deep learning solely through activity alterations that embed output information across layers suitable for local comparison. Avoiding explicit error relay bypasses associated biological barriers. Neural gradient approaches more broadly exemplify the power of induced targeted differences, whether across time, space, or population.