RIKEN Brain Science Institute
Learning takes place in neural networks by
modifiying their conncetion weights, that
is, in the parameter space consisting of
the connection weights. The parameter space
is identified with the set of all the networks,
so that it is called a neuromanifold. The
neuromanifold is not a flat Euclidean space
but is a Riemannian space whose geometry
represents the architecture of neural networks.
Information geometry is used to elucidate
It has recently been remarked that a neuromanifold includes lots of algebraic singularities due to its symmetrical hierarchical structure. Moreover, such singular points affect the trajectory of dynamics of learning. In particular, they causes the so-called plateaus. Ordinary backpropagation learning becomes extremely slow because of these plateaus.
The present talk analyzes these singularities, and show how they give rise to plateaus. The natural gradient learning algorithm is shown to avoid such plateau phenomena. We use a number of simple models to elucidate statistical and learning aspects of parameters in the presence of singularities. The maximum likelihood estimator is no more efficient, and the Cramer-Rao paradigm does not holds in this case, because the central limit theorem does not hold either. This opens a new aspect in statistics and learning theory.