In many real-world situations we must react rapidly to the earliest signs that could warn us about harmful stimuli. If an obstacle blocks our way, we want to avoid it before a painful contact occurs. If we ride a bicycle, we should make correcting steering movements already at small inclination angles of the bicycle, well before we fall down. Spike-time dependent learning rules with a temporally asymmetric learning window provide a hint of how a simple predictive coding could be implemented on the neuronal level.
Let us consider a single neuron that receives inputs from, say, twenty presynaptic cells which are stimulated one after the other; cf. Fig. 12.1. Initially, all synapses wij have the same weight w0. The postsynaptic neuron fires two spikes; cf. Fig. 12.1A. All synapses that have been activated before the postsynaptic spike are strengthened while synapses that have been activated immediately afterwards are depressed; cf. Fig. 12.1C. In subsequent trials the threshold is therefore reached earlier; cf. Fig. 12.1B. After many trials, those presynaptic neurons that fire first have developed strong connections while other connections are depressed. Thus a temporally asymmetric learning rule favors connections that can serve as `earliest predictors' of other spike events (Song et al., 2000; Mehta et al., 2000).
This theoretical observation predicts a shift in so-called place fields of hippocampal neurons that seems to be in agreement with experiment observations (Mehta et al., 2000). More generally, early predictors play a central role in the theory of conditioning and reinforcement learning (Rescorla and Wagner, 1972; Schultz et al., 1997; Montague et al., 1995; Sutton and Barto, 1981).
Place cells are neurons in rodent hippocampus that are sensitive to the spatial location of the animal in an environment. The sensitive area is called the place field of the cell. If, for example, a rat runs on a linear track from a starting point S to a target point T, this movement would first activate cells with a place fields close to S, then those with a place field in the middle of the track, and finally those with a place field close to T; cf. Fig. 12.2. In a simple feedforward model of the hippocampus (Mehta et al., 2000), a first set of place cells is identified with neurons in region CA3 of rat hippocampus. A cell further down the processing stream (i.e., a cell in hippocampal region CA1) receives input from several cells in CA1. If we assume that initially all connections have the same weight, the place field of a CA1 cell is therefore broader than that of a CA3 cell.
During the experiment, the rat moves repeatedly from left to right. During each movement, the same sequence of CA3 cells is activated. This has consequences for the connections from CA3 cells to CA1 cells. Hebbian plasticity with an asymmetric learning window strengthens those connections where the presynaptic neuron fires early in the sequence. Connections from neurons that fire later in the sequence are weakened. As a result the center of the place field of a cell in CA3 is shifted to the left; cf. Fig. 12.2B. The shift of place fields predicted by asymmetric Hebbian learning has been confirmed experimentally (Mehta et al., 2000).
The shift of responses towards early predictors plays a central role in conditioning. The basic idea is best explained by the paradigm of Pavlovian conditioning (Pavlov, 1927). Tasting or smelling food (stimulus s2) evokes an immediate response r. During the conditioning experiment, a bell (stimulus s1) rings always at a fixed time interval T before the food stimulus. After several repetitions of the experiment, it is found that the response now occurs already after the first stimulus (s1). Thus the reaction has moved from stimulus s2 to stimulus s1 which reliably predicts s2.
Spike-time dependent plasticity with an asymmetric learning window allows to replicate this result, if the time difference T between the two stimuli is less than the width of the learning window. The mechanism is identical to that of the previous example with the only difference that the input spikes are now clustered into two groups corresponding to the stimuli s1 and s2; cf. Fig. 12.3.
In behavioral experiments with monkeys, conditioning is possible with time intervals that span several seconds (Schultz et al., 1997) whereas typical learning windows extend over 50-100 milliseconds (Zhang et al., 1998; Markram et al., 1997; Bi and Poo, 1998,1999; Debanne et al., 1998; Magee and Johnston, 1997). In order to explain conditioning with time windows longer than 100 milliseconds, additional assumptions regarding neuronal architecture and dynamics have to be made; see, e.g., Brown et al. (1999); Suri and Schutz (2001); Fiala et al. (1996). A potential solution could be provided by delayed reverberating loops; cf. Chapter 8.3. As an aside we note that, traditionally, conditioning experiments have been discussed on the level of rate coding. For slowly changing firing rates, spike-time dependent rules learning rules with an asymmetric learning window yield a differential Hebbian term [cf. Eq. (11.64)] that is proportional to the derivative of the postsynaptic rate, which is the starting point of models of conditioning (Schultz et al., 1997; Montague et al., 1995).
© Cambridge University Press
This book is in copyright. No reproduction of any part of it may take place without the written permission of Cambridge University Press.