1 Introduction
Machine learning is going through remarkable developments powered by deep neural networks (LeCun et al., 2015). Interestingly, the workhorse of deep learning is still the classical backpropagation of errors algorithm (backprop; Rumelhart et al., 1986), which has been long dismissed in neuroscience on the grounds of biologically implausibility (Grossberg, 1987; Crick, 1989). Irrespective of such concerns, growing evidence demonstrates that deep neural networks outperform alternative frameworks in accurately reproducing activity patterns observed in the cortex (Lillicrap and Scott, 2013; Yamins et al., 2014; KhalighRazavi and Kriegeskorte, 2014; Yamins and DiCarlo, 2016; Kell et al., 2018)
. Although recent developments have started to bridge the gap between neuroscience and artificial intelligence
(Marblestone et al., 2016; Lillicrap et al., 2016; Scellier and Bengio, 2017; Costa et al., 2017; Guerguiev et al., 2017), how the brain could implement a backproplike algorithm remains an open question.In neuroscience, understanding how the brain learns to associate different areas (e.g., visual and motor cortices) to successfully drive behaviour is of fundamental importance (Petreanu et al., 2012; Manita et al., 2015; Makino and Komiyama, 2015; Poort et al., 2015; Fu et al., 2015; Pakan et al., 2016; Zmarz and Keller, 2016; Attinger et al., 2017)
. However, how to correctly modify synapses to achieve this has puzzled neuroscientists for decades. This is often referred to as the synaptic credit assignment problem
(Rumelhart et al., 1986; Sutton and Barto, 1998; Roelfsema and van Ooyen, 2005; Friedrich et al., 2011; Bengio, 2014; Lee et al., 2015; Roelfsema and Holtmaat, 2018), for which the backprop algorithm provides an elegant solution.Here we propose that the prediction errors that drive learning in backprop are encoded at distal dendrites of pyramidal neurons, which receive topdown input from downstream brain areas (we interpret a brain area as being equivalent to a layer in machine learning) (Petreanu et al., 2009; Larkum, 2013). In our model, these errors arise from the inability to exactly match via lateral input from local interneurons (e.g. somatostatinexpressing; SST) the topdown feedback from downstream cortical areas. Learning of bottomup connections (i.e., feedforward weights) is driven by such error signals through local synaptic plasticity. Therefore, in contrast to previous approaches (Marblestone et al., 2016), in our framework a given neuron is used simultaneously for activity propagation (at the somatic level), error encoding (at distal dendrites) and error propagation to the soma without the need for separate phases.
We first illustrate the different components of the model. Then, we show analytically that under certain conditions learning in our network approximates backpropagation. Finally, we empirically evaluate the performance of the model on nonlinear regression and recognition tasks.
2 Errorencoding dendritic cortical microcircuits
2.1 Neuron and network model
Building upon previous work (Urbanczik and Senn, 2014), we adopt a simplified multicompartment neuron and describe pyramidal neurons as threecompartment units (schematically depicted in Fig. 1A). These compartments represent the somatic, basal and apical integration zones that characteristically define neocortical pyramidal cells (Spruston, 2008; Larkum, 2013). The dendritic structure of the model is exploited by having bottomup and topdown synapses converging onto separate dendritic compartments (basal and distal dendrites, respectively), a first approximation in line with experimental observations (Spruston, 2008) and reflecting the preferred connectivity patterns of corticocortical projections (Larkum, 2013).
Consistent with the connectivity of SST interneurons (UrbanCiecko and Barth, 2016), we also introduce a second population of cells within each hidden layer with both lateral and crosslayer connectivity, whose role is to cancel the topdown input so as to leave only the backpropagated errors as apical dendrite activity. Modelled as twocompartment units (depicted in red, Fig. 1A), such interneurons are predominantly driven by pyramidal cells within the same layer through weights , and they project back to the apical dendrites of the samelayer pyramidal cells through weights (Fig. 1A). Additionally, crosslayer feedback onto SST cells originating at the next upper layer provide a weak nudging signal for these interneurons, modelled after Urbanczik and Senn (2014) as a conductancebased somatic input current. We modelled this weak topdown nudging on a onetoone basis: each interneuron is nudged towards the potential of a corresponding upperlayer pyramidal cell. Although the onetoone connectivity imposes a restriction in the model architecture, this is to a certain degree in accordance with recent monosynaptic input mapping experiments show that SST cells in fact receive topdown projections (Leinweber et al., 2017), that according to our proposal may encode the weak interneuron ‘teaching’ signals from higher to lower brain areas.
The somatic membrane potentials of pyramidal neurons and interneurons evolve in time according to
(1)  
(2) 
with one such pair of dynamical equations for every hidden layer ; input layer neurons are indexed by , ’s are fixed conductances, controls the amount of injected noise. Basal and apical dendritic compartments of pyramidal cells are coupled to the soma with effective transfer conductances and , respectively. Subscript is for leak, is for apical, for basal, for dendritic, superscript for inhibitory and for pyramidal neuron. Eqs. 1 and 2
describe standard conductancebased voltage integration dynamics, having set membrane capacitance to unity and resting potential to zero for clarity. Background activity is modelled as a Gaussian white noise input,
in the equations above. To keep the exposition brief we use matrix notation, and denote by andthe vectors of pyramidal and interneuron somatic voltages, respectively. Both matrices and vectors, assumed column vectors by default, are typed in boldface here and throughout. Dendritic compartmental potentials are denoted by
and are given in instantaneous form by(3)  
(4) 
where is the neuronal transfer function, which acts componentwise on .
For simplicity, we reduce pyramidal output neurons to twocompartment cells: the apical compartment is absent ( in Eq. 1) and basal voltages are as defined in Eq. 3. Although the design can be extended to more complex morphologies, in the framework of dendritic predictive plasticity two compartments suffice to compare desired target with actual prediction. Synapses proximal to the soma of output neurons provide direct external teaching input, incorporated as an additional source of current . In practice, one can simply set , with some fixed somatic nudging conductance . This can be modelled closer to biology by explicitly setting the somatic excitatory and inhibitory conductancebased inputs (Urbanczik and Senn, 2014). For a given output neuron, , where and are excitatory and inhibitory synaptic reversal potentials, respectively, where the inputs are balanced according to , . The point at which no current flows, , defines the target teaching voltage towards which the neuron is nudged^{1}^{1}1Note that in biology a target may be represented by an associative signal from the motor cortex to a sensory cortex (Attinger et al., 2017)..
Interneurons are similarly modelled as twocompartment cells, cf. Eq. 2. Lateral dendritic projections from neighboring pyramidal neurons provide the main source of input as
(5) 
whereas crosslayer, topdown synapses define the teaching current . This means that an interneuron at layer permanently (i.e., when learning or performing a task) receives balanced somatic teaching excitatory and inhibitory input from a pyramidal neuron at layer on a onetoone basis (as above, but with as target). With this setting, the interneuron is nudged to follow the corresponding next layer pyramidal neuron. See SM for detailed parameters.
2.2 Synaptic learning rules
The synaptic learning rules we use belong to the class of dendritic predictive plasticity rules (Urbanczik and Senn, 2014; Spicher et al., 2018) that can be expressed in its general form as
(6) 
where is an individual synaptic weight, is a learning rate, and denote distinct compartmental potentials, is a rate function, and is the presynaptic input. Eq. 6 was originally derived in the light of reducing the prediction error of somatic spiking, when represents the somatic potential and is a function of the postsynaptic dendritic potential.
In our model the plasticity rules for the various connection types are:
(7)  
(8)  
(9) 
where denotes vector transpose and the layer firing rates. The synaptic weights evolve according to the product of dendritic prediction error and presynaptic rate, and can undergo both potentiation or depression depending on the sign of the first factor (i.e., the prediction error).
For basal synapses, such prediction error factor amounts to a difference between postsynaptic rate and a local dendritic estimate which depends on the branch potential. In Eqs.
7 and 8, and take into account dendritic attenuation factors of the different compartments. On the other hand, the plasticity rule (9) of lateral interneurontopyramidal synapses aims to silence (i.e., set to resting potential , here and throughout zero for simplicity) the apical compartment; this introduces an attractive state for learning where the contribution from interneurons balances (or cancels out) topdown dendritic input. This learning rule of apicaltargeting interneuron synapses can be thought of as a dendritic variant of the homeostatic inhibitory plasticity proposed by Vogels et al. (2011); Luz and Shamir (2012).In experiments where the topdown connections are plastic, the weights evolve according to
(10) 
with . An implementation of this rule requires a subdivision of the apical compartment into a distal part receiving the topdown input (with voltage ) and another distal compartment receiving the lateral input from the interneurons (with voltage .
2.3 Comparison to previous work
It has been suggested that error backpropagation could be approximated by an algorithm that requires alternating between two learning phases, known as contrastive Hebbian learning (Ackley et al., 1985)
. This link between the two algorithms was first established for an unsupervised learning task
(Hinton and McClelland, 1988) and later analyzed (Xie and Seung, 2003) and generalized to broader classes of models (O’Reilly, 1996; Scellier and Bengio, 2017).The concept of apical dendrites as distinct integration zones, and the suggestion that this could simplify the implementation of backprop has been previously made (Körding and König, 2000, 2001). Our microcircuit design builds upon this view, offering a concrete mechanism that enables apical error encoding. In a similar spirit, twophase learning recently reappeared in a study that exploits dendrites for deep learning with biological neurons (Guerguiev et al., 2017). In this more recent work, the temporal difference between the activity of the apical dendrite in the presence and in the absence of the teaching input represents the error that induces plasticity at the forward synapses. This difference is used directly for learning the bottomup synapses without influencing the somatic activity of the pyramidal cell. In contrast, we postulate that the apical dendrite has an explicit error representation by simultaneously integrating topdown excitation and lateral inhibition. As a consequence, we do not need to postulate separate temporal phases, and our network operates continuously while plasticity at all synapses is always turned on.
Error minimization is an integral part of brain function according to predictive coding theories (Rao and Ballard, 1999; Friston, 2005). Interestingly, recent work has shown that backprop can be mapped onto a predictive coding network architecture (Whittington and Bogacz, 2017), related to the general framework introduced by LeCun (1988). A possible network implementation is suggested by Whittington and Bogacz (2017) that requires intricate circuitry with appropriately tuned errorrepresenting neurons. According to this work, the only plastic synapses are those that connect prediction and error neurons. By contrast, in our model, lateral, bottomup and topdown connections are all plastic, and errors are directly encoded in dendritic compartments.
3 Results
3.1 Learning in dendritic error networks approximates backprop
In our model, neurons implicitly carry and transmit errors across the network. In the supplementary material, we formally show such propagation of errors for networks in a particular regime, which we term selfpredicting. Selfpredicting nets are such that when no external target is provided to output layer neurons, the lateral input from interneurons cancels the internally generated topdown feedback and renders apical dendrites silent. In this case, the output becomes a feedforward function of the input, which can in theory be optimized by conventional backprop. We demonstrate that synaptic plasticity in selfpredicting nets approximates the weight changes prescribed by backprop.
We summarize below the main points of the full analysis (see SM). First, we show that somatic membrane potentials at hidden layer integrate feedforward predictions (encoded in basal dendritic potentials) with backpropagated errors (encoded in apical dendritic potentials):
Parameter sets the strength of feedback and teaching versus bottomup inputs and is assumed to be small to simplify the analysis. The first term is the basal contribution and corresponds to , the activation computed by a purely feedforward network that is obtained by removing lateral and topdown weights from the model (here and below, we use superscript ‘’ to refer to the feedforward model). The second term (of order ) is an error that is backpropagated from the output layer down to th layer hidden neurons; matrix is a diagonal matrix with th entry containing the derivative of the neuronal transfer function evaluated at .
Second, we compare model synaptic weight updates for the bottomup connections to those prescribed by backprop. Output layer updates are exactly equal by construction. For hidden neuron synapses, we obtain
Up to a factor which can be absorbed in the learning rate, this plasticity rule becomes equal to the backprop weight change in the weak feedback limit , provided that the topdown weights are set to the transpose of the corresponding feedforward weights.
In our simulations, topdown weights are either set at random and kept fixed, in which case the equation above shows that the plasticity model optimizes the predictions according to an approximation of backprop known as feedback alignment (Lillicrap et al., 2016); or learned so as to minimize an inverse reconstruction loss, in which case the network implements a form of target propagation (Bengio, 2014; Lee et al., 2015).
3.2 Deviations from selfpredictions encode backpropagated errors
To illustrate learning in the model and to confirm our analytical insights we first study a very simple task: memorizing a single inputoutput pattern association with only one hidden layer; the task naturally generalizes to multiple memories.
Given a selfpredicting network (established by microcircuit plasticity, Fig. S1, see SM for more details), we focus on how prediction errors get propagated backwards when a novel teaching signal is provided to the output layer, modeled via the activation of additional somatic conductances in output pyramidal neurons. Here we consider a network model with an input, a hidden and an output layer (layers 0, 1 and 2, respectively; Fig. 1A).
When the pyramidal cell activity in the output layer is nudged towards some desired target (Fig. 1B (i)), the bottomup synapses from the lower layer neurons to the basal dendrites are adapted, again according to the plasticity rule that implements the dendritic prediction of somatic spiking (see Eq. 7). What these synapses cannot explain away encodes a dendritic error in the pyramidal neurons of the lower layer 1. In fact, the selfpredicting microcircuit can only cancel the feedback that is produced by the lower layer activity.
The somatic integration of apical activity induces plasticity at the bottomup synapses (Eq. 7). As the apical error changes the somatic activity, plasticity of the weights tries to further reduce the error in the output layer. Importantly, the plasticity rule depends only on local information available at the synaptic level: postsynaptic firing and dendritic branch voltage, as well as the presynaptic activity, in par with phenomenological models of synaptic plasticity (Sjöström et al., 2001; Clopath et al., 2010; Bono and Clopath, 2017). This learning occur concurrently with modifications of lateral interneuron weights which track changes in the output layer. Through the course of learning the network comes to a point where the novel topdown input is successfully predicted (Fig. 1B,C).
3.3 Network learns to solve a nonlinear regression task
We now test the learning capabilities of the model on a nonlinear regression task, where the goal is to associate sensory input with the output of a separate multilayer network that transforms the same sensory input (Fig. 2A). More precisely, a pyramidal neuron network of dimensions 305010 (and 10 hidden layer interneurons) learns to approximate a random nonlinear function implemented by a heldaside feedforward network of dimensions 302010. One teaching example consists of a randomly drawn input pattern assigned to corresponding target , with scale factors and
. Teacher network weights and input pattern entries are sampled from a uniform distribution
. We used a soft rectifying nonlinearity as the neuronal transfer function, , with , and . This parameter setting led to neuronal activity in the nonlinear, sparse firing regime.The network is initialized to a random initial synaptic weight configuration, with both pyramidalpyramidal , , and pyramidalinterneuron weights , independently drawn from a uniform distribution. Topdown weight matrix is kept fixed throughout, in the spirit of feedback alignment (Lillicrap et al., 2016). Output layer teaching currents are set so as to nudge towards the teachergenerated . Learning rates were manually chosen to yield best performance. Some learning rate tuning was required to ensure the microcircuit could track the changes in the bottomup pyramidalpyramidal weights, but we did not observe high sensitivity once the correct parameter regime was identified. Error curves are exponential moving averages of the sum of squared errors loss computed after every example on unseen input patterns. Test error performance is measured in a noisefree setting (). Plasticity induction terms given by Eqs. 79 are lowpass filtered with time constant before being definitely consolidated, to dampen fluctuations; synaptic plasticity is kept on throughout. Plasticity and neuron model parameters are as defined above.
We let learning occur in continuous time without pauses or alternations in plasticity as input patterns are sequentially presented. This is in contrast to previous learning models that rely on computing activity differences over distinct phases, requiring temporally nonlocal computation, or globally coordinated plasticity rule switches (Hinton and McClelland, 1988; O’Reilly, 1996; Xie and Seung, 2003; Scellier and Bengio, 2017; Guerguiev et al., 2017). Furthermore, we relaxed the bottomup vs. topdown weight symmetry imposed by backprop and kept the topdown weights fixed. Forward weights quickly aligned to of the feedback weights (see Fig. S1), in line with the recently discovered feedback alignment phenomenon (Lillicrap et al., 2016). This simplifies the architecture, because topdown and interneurontopyramidal synapses need not be changed. We set the scale of the topdown weights, apical and somatic conductances such that feedback and teaching inputs were strong, to test the model outside the weak feedback regime () for which our SM theory was developed. Finally, to test robustness, we injected a weak noise current to every neuron.
Our network was able to learn this harder task (Fig. 2B), performing considerably better than a shallow learner where only hiddentooutput weights were adjusted (Fig. 2C). Useful changes were thus made to hidden layer bottomup weights. The selfpredicting network state emerged throughout learning from a random initial configuration (see SM; Fig. S1).
3.4 Microcircuit network learns to classify handwritten digits
Next, we turn to the problem of classifying MNIST handwritten digits. We wondered how our model would fare in this benchmark, in particular whether the prediction errors computed by the interneuron microcircuit would allow learning the weights of a hierarchical nonlinear network with multiple hidden layers. To that end, we trained a deeper, larger 4layer network (with 78450050010 pyramidal neurons, Fig. 3A) by pairing digit images with teaching inputs that nudged the 10 output neurons towards the correct class pattern. We initialized the network to a random but selfpredicting configuration where interneurons cancelled topdown inputs, rendering the apical compartments silent before training started. Topdown and interneurontopyramidal weights were kept fixed.
Here for computational efficiency we used a simplified network dynamics where the compartmental potentials are updated only in two steps before applying synaptic changes. In particular, for each presented MNIST image, both pyramidal and interneurons are first initialized to their bottomup prediction state (3), , starting from layer up to the top layer . Output layer neurons are then nudged towards their desired target , yielding updated somatic potentials . To obtain the remaining final compartmental potentials, the network is visited in reverse order, proceeding from layer down to . For each , interneurons are first updated to include topdown teaching signals, ; this yields apical compartment potentials according to (4), after which we update hidden layer somatic potentials as a convex combination with mixing factor . The convex combination factors introduced above are directly related to neuron model parameters as conductance ratios. Synaptic weights are then updated according to Eqs. 710. Such simplified dynamics approximates the full recurrent network relaxation in the deterministic setting , with the approximation improving as the topdown dendritic coupling is decreased, .
We train the models on the standard MNIST handwritten image database, further splitting the training set into 55000 training and 5000 validation examples. The reported test error curves are computed on the 10000 heldaside test images. The fourlayer network shown in Fig. 3 is initialized in a selfpredicting state with appropriately scaled initial weight matrices. For our MNIST networks, we used relatively weak feedback weights, apical and somatic conductances (see SM) to justify our simplified approximate dynamics described above, although we found that performance did not appreciably degrade with larger values. To speedup training we use a minibatch strategy on every learning rule, whereby weight changes are averaged across 10 images before being applied. We take the neuronal transfer function to be a logistic function, and include a learnable threshold on each neuron, modelled as an additional input fixed at unity with a plastic weight. Desired target class vectors are 1hot coded, with . During testing, the output is determined by picking the class label corresponding to the neuron with highest firing rate. We found the model to be relatively robust to learning rate tuning on the MNIST task, except for the rescaling by the inverse mixing factor to compensate for teaching signal dilution (see SM for the exact parameters).
The network was able to achieve a test error of 1.96%, Fig. 3B, a figure not overly far from the reference mark of nonconvolutional artificial neural networks optimized with backprop (1.53%) and comparable to recently published results that lie within the range 1.62.4% (Lee et al., 2015; Lillicrap et al., 2016; Nøkland, 2016). The performance of our model also compares favorably to the 3.2% test error reported by Guerguiev et al. (2017) for a twohiddenlayer network. This was possible despite the asymmetry of forward and topdown weights and at odds with exact backprop, thanks to a feedback alignment dynamics. Apical compartment voltages remained approximately silent when output nudging was turned off (data not shown), reflecting the maintenance of a selfpredicting state throughout learning, which enabled the propagation of errors through the network. To further demonstrate that the microcircuit was able to propagate errors to deeper hidden layers, and that the task was not being solved by making useful changes only to the weights onto the topmost hidden layer, we reran the experiment while keeping fixed the pyramidalpyramidal weights connecting the two hidden layers. The network still learned the dataset and achieved a test error of 2.11%.
As topdown weights are likely plastic in cortex, we also trained a onehiddenlayer (784100010) network where topdown weights were learned on a slow timescale according to learning rule (10). This inverse learning scheme is closely related to target propagation (Bengio, 2014; Lee et al., 2015). Such learning could play a role in perceptual denoising, pattern completion and disambiguation, and boost alignment beyond that achieved by pure feedback alignment (Bengio, 2014). Starting from random initial conditions and keeping all weights plastic (bottomup, lateral and topdown) throughout, our network achieved a test classification performance of 2.48% on MNIST. Once more, useful changes were made to hidden synapses, even though the microcircuit had to track changes in both the bottomup and the topdown pathways.
4 Conclusions
Our work makes several predictions across different levels of investigation. Here we briefly highlight some of these predictions and related experimental observations. The most fundamental feature of the model is that distal dendrites encode error signals that instruct learning of lateral and bottomup connections. While monitoring such dendritic signals during learning is challenging, recent experimental evidence suggests that prediction errors in mouse visual cortex arise from a failure to locally inhibit motor feedback (Zmarz and Keller, 2016; Attinger et al., 2017), consistent with our model. Interestingly, the plasticity rule for apical dendritic inhibition, which is central to error encoding in the model, received support from another recent experimental study (Chiu et al., 2018).
A further implication of our model is that prediction errors occurring at a higherorder cortical area would imply also prediction errors cooccurring at earlier areas. Recent experimental observations in the macaque faceprocessing hierarchy support this (Schwiedrzik and Freiwald, 2017).
Here we have focused on the role of a specific interneuron type (SST) as a feedbackspecific interneuron. There are many more interneuron types that we do not consider in our framework. One such type are the PV (parvalbuminpositive) cells, which have been postulated to mediate a somatic excitationinhibition balance (Vogels et al., 2011; Froemke, 2015) and competition (Masquelier and Thorpe, 2007; Nessler et al., 2013). These functions could in principle be combined with our framework in that PV interneurons may be involved in representing another type of prediction error (e.g., generative errors).
Humans have the ability to perform fast (e.g., oneshot) learning, whereas neural networks trained by backpropagation of error (or approximations thereof, like ours) require iterating over many training examples to learn. This is an important open problem that stands in the way of understanding the neuronal basis of intelligence. One possibility where our model naturally fits is to consider multiple subsystems (for example, the neocortex and the hippocampus) that transfer knowledge to each other and learn at different rates (McClelland et al., 1995; Kumaran et al., 2016).
Overall, our work provides a new view on how the brain may solve the credit assignment problem for timecontinuous input streams by approximating the backpropagation algorithm, and bringing together many puzzling features of cortical microcircuits.
Acknowledgements
The authors would like to thank Timothy P. Lillicrap, Blake Richards, Benjamin Scellier and Mihai A. Petrovici for helpful discussions. WS thanks Matthew Larkum for many inspiring discussions on dendritic processing. JS thanks Elena Kreutzer, Pascal Leimer and Martin T. Wiechert for valuable feedback and critical reading of the manuscript.
This work has been supported by the Swiss National Science Foundation (grant 310030L156863 of WS), the European Union’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 785907 (Human Brain Project), NSERC, CIFAR, and Canada Research Chairs.
References
References

Ackley et al. (1985)
Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985).
A learning algorithm for Boltzmann machines.
Cognitive Science, 9(1):147–169.  Attinger et al. (2017) Attinger, A., Wang, B., and Keller, G. B. (2017). Visuomotor coupling shapes the functional development of mouse visual cortex. Cell, 169(7):1291–1302.e14.
 Bengio (2014) Bengio, Y. (2014). How autoencoders could provide credit assignment in deep networks via target propagation. arXiv:1407.7906
 Bono and Clopath (2017) Bono, J. and Clopath, C. (2017). Modeling somatic and dendritic spike mediated plasticity at the single neuron and network level. Nature Communications, 8(1):706.
 Bottou (1998) Bottou, L. (1998). Online algorithms and stochastic approximations. In Saad, D., editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK.
 Chiu et al. (2018) Chiu, C. Q., Martenson, J. S., Yamazaki, M., Natsume, R., Sakimura, K., Tomita, S., Tavalin, S. J., and Higley, M. J. (2018). Inputspecific nmdardependent potentiation of dendritic gabaergic inhibition. Neuron, 97(2):368–377.
 Clopath et al. (2010) Clopath, C., Büsing, L., Vasilaki, E., and Gerstner, W. (2010). Connectivity reflects coding: a model of voltagebased stdp with homeostasis. Nature Neuroscience, 13(3):344–352.

Costa et al. (2017)
Costa, R. P., Assael, Y. M., Shillingford, B., de Freitas, N., and Vogels,
T. P. (2017).
Cortical microcircuits as gatedrecurrent neural networks.
In Advances in Neural Information Processing Systems, pages 271–282.  Crick (1989) Crick, F. (1989). The recent excitement about neural networks. Nature, 337:129–132.
 Dorrn et al. (2010) Dorrn, A. L., Yuan, K., Barker, A. J., Schreiner, C. E., and Froemke, R. C. (2010). Developmental sensory experience balances cortical excitation and inhibition. Nature, 465(7300):932–936.
 Friedrich et al. (2011) Friedrich, J., Urbanczik, R., and Senn, W. (2011). Spatiotemporal credit assignment in neuronal population learning. PLOS Computational Biology, 7(6):e1002092.
 Friston (2005) Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 360(1456):815–836.
 Froemke (2015) Froemke, R. C. (2015). Plasticity of cortical excitatoryinhibitory balance. Annual Review of Neuroscience, 38(1):195–219.
 Fu et al. (2015) Fu, Y., Kaneko, M., Tang, Y., AlvarezBuylla, A., and Stryker, M. P. (2015). A cortical disinhibitory circuit for enhancing adult plasticity. eLife, 4:e05558.
 Grossberg (1987) Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11(1):23–63.
 Guerguiev et al. (2017) Guerguiev, J., Lillicrap, T. P., and Richards, B. A. (2017). Towards deep learning with segregated dendrites. eLife, 6:e22901.
 Hinton and McClelland (1988) Hinton, G. E. and McClelland, J. L. (1988). Learning representations by recirculation. In Anderson, D. Z., editor, Neural Information Processing Systems, pages 358–366. American Institute of Physics.
 Kell et al. (2018) Kell, A. J., Yamins, D. L., Shook, E. N., NormanHaignere, S. V., and McDermott, J. H. (2018). A taskoptimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron.
 KhalighRazavi and Kriegeskorte (2014) KhalighRazavi, S.M. and Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain it cortical representation. PLOS Computational Biology, 10(11):1–29.
 Körding and König (2000) Körding, K. P. and König, P. (2000). Learning with two sites of synaptic integration. Network: Comput. Neural Syst., 11:1–15.
 Körding and König (2001) Körding, K. P. and König, P. (2001). Supervised and unsupervised learning with two sites of synaptic integration. Journal of Computational Neuroscience, 11:207–215.
 Kumaran et al. (2016) Kumaran, D., Hassabis, D., and McClelland, J. L. (2016). What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7):512 – 534.
 Larkum (2013) Larkum, M. (2013). A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex. Trends in Neurosciences, 36(3):141–151.
 LeCun (1988) LeCun, Y. (1988). A theoretical framework for backpropagation. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 21–28. Morgan Kaufmann, Pittsburg, PA.
 LeCun et al. (2015) LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.
 Lee et al. (2015) Lee, D.H., Zhang, S., Fischer, A., and Bengio, Y. (2015). Difference target propagation. In Machine Learning and Knowledge Discovery in Databases, pages 498–515. Springer.
 Leinweber et al. (2017) Leinweber, M., Ward, D. R., Sobczak, J. M., Attinger, A., and Keller, G. B. (2017). A Sensorimotor Circuit in Mouse Cortex for Visual Flow Predictions. Neuron, 95(6):1420–1432.e5.
 Lillicrap et al. (2016) Lillicrap, T. P., Cownden, D., Tweed, D. B., and Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7:13276.
 Lillicrap and Scott (2013) Lillicrap, T. P. and Scott, S. H. (2013). Preference distributions of primary motor cortex neurons reflect control solutions optimized for limb biomechanics. Neuron, 77(1):168–179.
 Luz and Shamir (2012) Luz, Y. and Shamir, M. (2012). Balancing feedforward excitation and inhibition via Hebbian inhibitory synaptic plasticity. PLOS Computational Biology, 8(1):e1002334.
 Makino and Komiyama (2015) Makino, H. and Komiyama, T. (2015). Learning enhances the relative impact of topdown processing in the visual cortex. Nature Neuroscience, 18(8):1116–1122.
 Manita et al. (2015) Manita, S., Suzuki, T., Homma, C., Matsumoto, T., Odagawa, M., Yamada, K., Ota, K., Matsubara, C., Inutsuka, A., Sato, M., et al. (2015). A topdown cortical circuit for accurate sensory perception. Neuron, 86(5):1304–1316.
 Marblestone et al. (2016) Marblestone, A. H., Wayne, G., and Kording, K. P. (2016). Toward an integration of deep learning and neuroscience. Frontiers in Computational Neuroscience, 10:94.
 Masquelier and Thorpe (2007) Masquelier, T. and Thorpe, S. (2007). Unsupervised learning of visual features through spike timing dependent plasticity. PLOS Computational Biology, 3.
 McClelland et al. (1995) McClelland, J. L., McNaughton, B. L., and O’reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 102(3):419.
 Nessler et al. (2013) Nessler, B., Pfeiffer, M., Buesing, L., and Maass, W. (2013). Bayesian computation emerges in generic cortical microcircuits through spiketimingdependent plasticity. PLOS Computational Biology, 9(4):e1003037.
 Nøkland (2016) Nøkland, A. (2016). Direct feedback alignment provides learning in deep neural networks. In Advances in Neural Information Processing Systems, pages 1037–1045.
 O’Reilly (1996) O’Reilly, R. C. (1996). Biologically plausible errordriven learning using local activation differences: The generalized recirculation algorithm. Neural Computation, 8(5):895–938.
 Pakan et al. (2016) Pakan, J. M., Lowe, S. C., Dylda, E., Keemink, S. W., Currie, S. P., Coutts, C. A., Rochefort, N. L., and MrsicFlogel, T. D. (2016). Behavioralstate modulation of inhibition is contextdependent and cell type specific in mouse visual cortex. eLife, 5:e14985.
 Petreanu et al. (2012) Petreanu, L., Gutnisky, D. A., Huber, D., Xu, N.l., O’Connor, D. H., Tian, L., Looger, L., and Svoboda, K. (2012). Activity in motorsensory projections reveals distributed coding in somatosensation. Nature, 489(7415):299–303.
 Petreanu et al. (2009) Petreanu, L., Mao, T., Sternson, S. M., and Svoboda, K. (2009). The subcellular organization of neocortical excitatory connections. Nature, 457(7233):1142–1145.
 Poort et al. (2015) Poort, J., Khan, A. G., Pachitariu, M., Nemri, A., Orsolic, I., Krupic, J., Bauza, M., Sahani, M., Keller, G. B., MrsicFlogel, T. D., and Hofer, S. B. (2015). Learning enhances sensory and multiple nonsensory representations in primary visual cortex. Neuron, 86(6):1478–1490.
 Rao and Ballard (1999) Rao, R. P. and Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extraclassical receptivefield effects. Nature Neuroscience, 2(1):79–87.
 Roelfsema and Holtmaat (2018) Roelfsema, P. R. and Holtmaat, A. (2018). Control of synaptic plasticity in deep cortical networks. Nature Reviews Neuroscience, 19(3):166.

Roelfsema and van Ooyen (2005)
Roelfsema, P. R. and van Ooyen, A. (2005).
Attentiongated reinforcement learning of internal representations for classification.
Neural Computation, 17(10):2176–2214.  Rumelhart et al. (1986) Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, 323:533–536.

Scellier and Bengio (2017)
Scellier, B. and Bengio, Y. (2017).
Equilibrium propagation: Bridging the gap between energybased models and backpropagation.
Frontiers in Computational Neuroscience, 11:24.  Schwiedrzik and Freiwald (2017) Schwiedrzik, C. M. and Freiwald, W. A. (2017). Highlevel prediction signals in a lowlevel area of the macaque faceprocessing hierarchy. Neuron, 96(1):89–97.e4.
 Sjöström et al. (2001) Sjöström, P. J., Turrigiano, G. G., and Nelson, S. B. (2001). Rate, Timing, and Cooperativity Jointly Determine Cortical Synaptic Plasticity. Neuron, 32(6):1149–1164.
 Spicher et al. (2018) Spicher, D., Clopath, C., and Senn, W. (2018). Predictive plasticity in dendrites: from a computational principle to experimental data (in preparation).
 Spruston (2008) Spruston, N. (2008). Pyramidal neurons: dendritic structure and synaptic integration. Nature Reviews Neuroscience, 9(3):206–221.
 Sutton and Barto (1998) Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning: An introduction, volume 1. MIT Press, Cambridge, Mass.
 UrbanCiecko and Barth (2016) UrbanCiecko, J. and Barth, A. L. (2016). Somatostatinexpressing neurons in cortical networks. Nature Reviews Neuroscience, 17(7):401–409.
 Urbanczik and Senn (2014) Urbanczik, R. and Senn, W. (2014). Learning by the dendritic prediction of somatic spiking. Neuron, 81(3):521–528.
 Vogels et al. (2011) Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., and Gerstner, W. (2011). Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks. Science, 334(6062):1569–1573.
 Whittington and Bogacz (2017) Whittington, J. C. R. and Bogacz, R. (2017). An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity. Neural Computation, 29(5):1229–1262.
 Xie and Seung (2003) Xie, X. and Seung, H. S. (2003). Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Computation, 15(2):441–454.
 Yamins and DiCarlo (2016) Yamins, D. L. and DiCarlo, J. J. (2016). Using goaldriven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3):356–365.
 Yamins et al. (2014) Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., and DiCarlo, J. J. (2014). Performanceoptimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624.
 Zmarz and Keller (2016) Zmarz, P. and Keller, G. B. (2016). Mismatch receptive fields in mouse visual cortex. Neuron, 92(4):766–772.
Supplementary Material: Dendritic cortical microcircuits approximate the backpropagation algorithm
The dendritic cortical circuit learns to predict selfgenerated topdown input
The microcircuit model introduced in the main text is key to encode and backpropagate errors across the network. Here, we illustrate how synaptic plasticity of lateral interneuron connections establishes a network regime, which we term selfpredicting, whereby lateral input cancels the selfgenerated topdown feedback, effectively silencing apical dendrites. For this reason, SST cells are functionally inhibitory and are henceforth referred to as interneurons. Crucially, when the circuit is in this socalled selfpredicting state, presenting a novel external signal at the output layer gives rise to topdown activity that cannot be explained away by the interneuron circuit. Below we show that these apical mismatches between topdown and lateral input constitute backpropagated, neuronspecific errors that drive plasticity on the forward weights to the hidden pyramidal neurons.
Learning to predict the feedback signals involves adapting both weights from and to the lateral interneuron circuit. Consider a network that is driven by a succession of sensory input patterns (Fig. S1B, bottom row). Learning to cancel the feedback input is divided between both the weights from pyramidal cells to interneurons, , and from interneurons to pyramidal cells, .
First, due to the somatic teaching feedback, learning of the weights leads interneurons to better reproduce the activity of the respective higher layer (Fig. S1B (i)). A failure to reproduce layer activity generates an internal prediction error at the dendrites of the interneurons, which triggers synaptic plasticity (as defined by Eq. 8) that corrects for the wrong dendritic prediction and eventually leads to a faithful tracing of the upper layer activity by the lower layer interneurons (Fig. S1B (ii)). The mathematical analysis (see section below, Eq. 37) shows that the plasticity rule (8) makes the inhibitory population implement the same function of the layer pyramidal cell activity as done by the layer–() pyramidal neurons. Thus, the interneurons will learn to mimic the layer–() pyramidal neurons (Fig. S1Ci).
Second, as the interneurons mirror upper layer activity, intertopyramidal neuron synapses within the same layer (, Eq. 9) successfully learn to cancel the topdown input to the apical dendrite (Fig. S1Cii), independently of the actual input stimulus that drives the network. By doing so, the intertopyramidal neuron weights learn to mirror the topdown weights onto the lower layer pyramidal neurons. The learning of the weights onto and from the interneurons works in parallel: as the interneurons begin to predict the activity of pyramidal cells in layer , it becomes possible for the plasticity at interneurontopyramidal synapses (Eq. 9) to find a synaptic weight configuration which precisely cancels the topdown feedback (see also Eq. 39 below). At this stage, every pattern of activity generated by the hidden layers of the network is explained by the lateral circuitry, Fig. S1C (ii). Importantly, once learning of the lateral interneurons has converged, the apical input cancellation occurs irrespective of the actual bottomup sensory input. Therefore, interneuron synaptic plasticity leads the network to a selfpredicting state.
We propose that the emergence of this state could occur during development, consistent with experimental findings (Dorrn et al., 2010; Froemke, 2015). Starting from a crosslayer selfpredicting configuration helps speedup learning of specific tasks, but is not essential. Indeed, we were able to train a nonlinear regression model (cf. Fig. 2) and an MNIST network starting from random conditions. Appropriate tuning of learning rates quickly led the network to a selfpredicting state, which unlocked learning of the task, see Fig. S2.
Supplementary data
Below we detail the model parameters used to generate the figures presented in the paper.
Fig. S1 details. The parameters for the compartmental model neuron were: , , . Interneuron somatic teaching conductances were balanced to yield overall nudging strength . Initial weight matrix entries were independently drawn from a uniform distribution . We used a soft rectifying transfer function
. We chose background activity levels of
. The learning rates were set as and .Input patterns were smoothly transitioned by lowpass filtering with time constant . A transition between patterns was triggered every 100 ms. Weight changes were low pass filtered with time constant . The dynamical equations were solved using Euler’s method with a time step of 0.1, which resulted in 1000 integration time steps per pattern.
Fig. 2 details. Initial forward and pyramidalinterneuron weights were drawn independently from a uniform distribution . The network learned under a background noise level of . The learning rates were , , , . Weight matrix was kept fixed, so the model relied on a feedback alignment mechanism to learn. Remaining parameters as used for Fig. S1.
Fig. 3 details. We chose mixing factors and . Forward learning rates were , , . Lateral learning rates were and . Initial forward weights were drawn at random from a uniform distribution , and the remaining weights from .
Supplementary analysis
In this supplementary note we present a set of mathematical results concerning the network and plasticity model described in the main text.
To proceed analytically we make a number of simplifying assumptions. Unless noted otherwise, we study the network in a deterministic setting and consider the limiting case where lateral microcircuit synaptic weights match the corresponding forward weights:
(11)  
(12) 
The particular choice of proportionality factors, which depend on the neuron model parameters, is motivated below. Under the above configuration, the network becomes selfpredicting.
To formally relate the encoding and propagation of errors implemented by the inhibitory microcircuit to the backpropagation of errors algorithm from machine learning, we consider the limit where topdown input is weak compared to the bottomup drive. This limiting case results in error signals that decrease exponentially with layer depth, but allows us to proceed analytically.
We further assume that the topdown weights converging to the apical compartments are equal to the corresponding forward weights, . Such weight symmetry is not essential for successful learning in a broad range of problems, as demonstrated in the main simulations and as observed before (Lee et al., 2015; Lillicrap et al., 2016; Nøkland, 2016). It is, however, required to frame learning as a gradient descent procedure. Furthermore, in the analyses of the learning rules, we assume that synaptic changes take place at a fixed point of the neuronal dynamics; we therefore consider discretetime versions of the plasticity rules. This approximates the continuoustime plasticity model as long as changes in the inputs are slow compared to the neuronal dynamics.
For convenience, we will occasionally drop neuron type indices and refer to bottomup weights and to topdown weights . Additionally, we assume without loss of generality that the dendritic coupling conductance for interneurons is equal to the basal dendritic coupling of pyramidal neurons, . Finally, whenever it is useful to distinguish whether output layer nudging is turned off, we use superscript ‘’.
Interneuron activity in the selfpredicting state. Following Urbanczik and Senn (2014), we note that steady state interneuron somatic potentials can be expressed as a convex combination of basal dendritic and pyramidal neuron potentials that are provided via somatic teaching input:
(13) 
with and the effective dendritic transfer and leak conductances, respectively, and the total excitatory and inhibitory teaching conductance. In the equation above, is the interneuron dendritic prediction (cf. Eq. 8), and is a mixing factor which controls the nudging strength for the interneurons. In other words, the current prediction and the teaching signal are averaged with coefficients determined by normalized conductances. We will later consider the weak nudging limit of .
The relation holds when pyramidaltointerneuron synaptic weights are equal to pyramidalpyramidal forward weights, up to a scale factor: , which simplifies to for the last layer where (to reduce clutter, we use the slightly abusive notation whereby should be understood to be zero when referring to output layer neurons). This is the reason for the particular choice of ideal pyramidaltointerneuron weights presented in the preamble. The network is then internally consistent, in the sense that the interneurons predict the model’s own predictions, held by pyramidal neurons.
Bottomup predictions in the absence of external nudging. We first study the situation where the input pattern is stationary and the output layer teaching input is disabled, . We show that the fixed point of the network dynamics is a state where somatic voltages are equal to basal voltages, up to a dendritic attenuation factor. In other words, the network effectively behaves as if it were feedforward, in the sense that it computes the same function as the corresponding network with equal bottomup but no topdown or lateral connections.
Specifically, in the absence of external nudging (indicated by the in the superscript), the somatic voltages of pyramidal and interneuron are given by the bottomup dendritic predictions,
(14)  
(15) 
To show that Eq. 14 describes the state of the network, we start at the output layer and set Eq. 1 to zero. Because nudging is turned off, we observe that is equal to if layer also satisfies . The same recursively applies to the hidden layer below when its apical voltage vanishes, . Now we note that at the fixed point the interneuron cancels the corresponding pyramidal neuron, due to the assumption that the network is in a selfpredicting state, which yields . Together with the fact that , we conclude that the interneuron contribution to the apical compartment cancels the topdown pyramidal neuron input, yielding the required condition .
The above argument can be iterated down to the input layer, where activity is constant, and we arrive at Eq. 14.
Zero plasticity induction in the absence of nudging. In view of Eq. 14, which states that in the absence of external nudging the somatic voltages correspond to the basal predictions, no synaptic changes are induced in basal synapses on the pyramidal and interneurons as defined by the plasticity rules (7) and (8), respectively. Similarly, the apical voltages are equal to rest, , when the topdown input is fully predicted, and no synaptic plasticity is induced in the intertopyramidal neuron synapses, see (9). When noisy background currents are present, the average prediction error is zero, while momentary fluctuations will still trigger plasticity. Note that the above holds when the dynamics is away from equilibrium, under the additional constraint that the integration time constant of interneurons matches that of pyramidal neurons.
Recursive prediction error propagation. Prediction errors arise in the model whenever lateral interneurons cannot fully explain topdown input, leading to a deviation from baseline in apical dendrite activity. Here, we look at the network steady state equations for a stationary input pattern and derive an iterative relationship which establishes the propagation across the network of prediction mismatches originating downstream. The following compartmental potentials are thus evaluated at a fixed point of the neuronal dynamics.
Under the assumption (11) of matching interneurontopyramidal topdown weights, apical compartment potentials simplify to
(16) 
where we introduced error vector defined as the difference between pyramidal and interneuron firing rates. Such deviation can be intuitively understood as an layerwise interneuron prediction mismatch, being zero when interneurons perfectly explain pyramidal neuron activity. We now evaluate this difference vector at a fixed point to obtain a recurrence relation that links consecutive layers.
The steadystate somatic potentials of hidden pyramidal neurons are given by
(17) 
To shorten the following, we assumed that the apical attenuation factor is equal to the interneuron nudging strength . As previously mentioned, we proceed under the assumption of weak feedback, small. As for the corresponding interneurons, we insert Eq. Supplementary analysis into Eq. 13 and note that when the network is in a selfpredicting state we have , yielding
(18) 
Using the identities (Supplementary analysis) and (18), we now expand to first order the difference vector around as follows
(19) 
Matrix is a diagonal matrix with diagonal equal to , i.e., whose th element reads . It contains the derivative of the neuronal transfer function evaluated componentwise at the bottomup predictions . Recalling Eq. 16, we obtain a recurrence relation
(20) 
Finally, last layer pyramidal neurons provide the initial condition by being directly nudged towards the desired target . Their membrane potentials can be written as
(21) 
and this gives an estimate for the error in the output layer of the form
(22) 
where for simplicity we took the same mixing factor for pyramidal output and interneurons. Then, for an arbitrary layer, assuming that the synaptic weights and the remaining fixed parameters do not scale with , we arrive at
(23) 
Thus, steady state potentials of apical dendrites (cf. Eq. 16) recursively encode neuronspecific prediction errors that can be traced back to a mismatch at the output layer.
Learning as approximate error backpropagation. In the previous section we found that neurons implicitly carry and transmit error information across the network. We now show how the proposed synaptic plasticity model, when applied at a steady state of the neuronal dynamics, can be recast as an approximate gradient descent learning procedure.
More specifically, we compare our model against learning through backprop (Rumelhart et al., 1986) or approximations thereof (Lee et al., 2015; Lillicrap et al., 2016) the weights of the feedfoward multilayer network obtained by removing interneurons and topdown connections from the intact network. For this reference model, the activations are by construction equal to the bottomup predictions obtained in the full model when output nudging is turned off, , cf. Eq. 14. Thus, optimizing the weights in the feedforward model is equivalent to optimizing the predictions of the full model.
We now assume that
is monotonically increasing and define the loss function
(24) 
where denotes the number of output neurons. can be thought of as the multilayer, multioutput unit analogue of the loss function optimized by the single neuron model (Urbanczik and Senn, 2014), where it stems directly from the particular chosen form of the learning rule (7). The nudging strength parameter allows controlling the mixing with the target and can be understood as an additional learning rate parameter. Albeit unusual in form, function imposes a cost similar to an ordinary squared error loss. Importantly, it has a minimum when and it is lower bounded. Furthermore, it is differentiable with respect to compartmental voltages (and synaptic weights). It is therefore suitable for gradient descent optimization. As a side remark, integrates to a quadratic function when is linear.
Gradient descent proceeds by changing synaptic weights according to
(25) 
The required partial derivatives can be efficiently computed by the backpropagation of errors algorithm. For the network architecture we study, this yields a learning rule of the form
(26) 
The error factor can be expressed recursively as follows:
(27) 
ignoring constant factors that depend on conductance ratios, which can be dealt with by redefining learning rates or backward pass weights. As in the previous section, matrix is a diagonal matrix, with diagonal equal to .
We first compare the fixed point equations of the original network to the feedforward activations of the reference model. Starting from the bottom most hidden layer, using Eqs. 16, Supplementary analysis and 23, we notice that , as the bottomup input is the same in both cases. Inserting this into second hidden layer steady state potentials and linearizing the neuronal transfer function gives . This can be repeated and for an arbitrary layer and neuron type we find
(28)  
(29) 
Writing Eq. 28 in the first form emphasizes that the apical contributions dominate the bottomup corrections, which are of order .
Next, we prove that up to a factor and to first order the apical term in Eq. 28 represents the backpropagated error in the feedforward network, . Starting from the topmost hidden layer apical potentials, we reevaluate difference vector (22) using (28). Linearization of the neuronal transfer function gives
(30) 
Inserting the expression above into Eq. 28 and using Eq. 29 the apical compartment potentials at layer can then be recomputed. This procedure can be iterated until the input layer is reached. In general form, somatic membrane potentials at hidden layer can be expressed as
(31)  
(32) 
This equation shows that, to leading order of , hidden neurons mix and propagate forward purely bottomup predictions with topdown errors that are computed at the output layer and spread backwards.
We are now in position to compare model synaptic weight updates to the ones prescribed by backprop. Output layer updates are exactly equal by construction, . For pyramidaltopyramidal neuron synapses from hidden layer to layer , we obtain
(33) 
while backprop learning rule (26) can be written as
(34) 
where we used that, to first order, the output layer error factor is . Hence, up to a factor of which can be absorbed in the learning rate , changes induced by synaptic plasticity are equal to the backprop learning rule (26) in the limit , provided that the topdown weights are set to the transpose of the corresponding feedforward weights, . The ‘quasifeedforward’ condition has also been invoked to relate backprop to twophase contrastive Hebbian learning in Hopfield networks (Xie and Seung, 2003).
Interneuron plasticity. The analyses of the previous sections relied on the assumption that the synaptic weights to and from interneurons were set to their ideal values, cf. Eqs. 11 and 12. We now study the plasticity of the lateral microcircuit synapses and show that, under mild conditions, learning rules (8) and (9) yield the desired synaptic weight matrices.
We first study the learning of pyramidaltointerneuron synapses . To quantify the degree to which the weights deviate from their optimal setting, we introduce the convex loss function
(35) 
where denotes the trace of matrix and , as defined in Eq. 12.
Comments
There are no comments yet.