Geometrical Aspects on Dynamics of Learning in Neural Networks

Shun-ichi Amari

RIKEN Brain Science Institute

Learning takes place in neural networks by modifiying their conncetion weights, that is, in the parameter space consisting of the connection weights. The parameter space is identified with the set of all the networks, so that it is called a neuromanifold. The neuromanifold is not a flat Euclidean space but is a Riemannian space whose geometry represents the architecture of neural networks. Information geometry is used to elucidate the geometry.

It has recently been remarked that a neuromanifold includes lots of algebraic singularities due to its symmetrical hierarchical structure. Moreover, such singular points affect the trajectory of dynamics of learning. In particular, they causes the so-called plateaus. Ordinary backpropagation learning becomes extremely slow because of these plateaus.

The present talk analyzes these singularities, and show how they give rise to plateaus. The natural gradient learning algorithm is shown to avoid such plateau phenomena. We use a number of simple models to elucidate statistical and learning aspects of parameters in the presence of singularities. The maximum likelihood estimator is no more efficient, and the Cramer-Rao paradigm does not holds in this case, because the central limit theorem does not hold either. This opens a new aspect in statistics and learning theory.