Abstract: Deep networks notoriously suffer from performance deterioration on previous tasks when learning from sequential tasks, i.e., catastrophic forgetting. Recent methods of gradient projection ...
Abstract: In this paper, we study the convergence properties of the natural gradient methods. By reviewing the mathematical condition for the equivalence between the Fisher information matrix and the ...