Continual learning-learning new tasks in sequence while maintaining performance on old tasks-remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow's hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation / Lee, S.; Mannelli, S. S.; Clopath, C.; Goldt, S.; Saxe, A.. - 162:(2022), pp. 12455-12477. (Intervento presentato al convegno International Conference on Machine Learning, 17-23 July 2022, Baltimore, Maryland, USA tenutosi a Baltimre, Maryland, USA nel 17-23 July 2022).

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

Goldt S.;
2022-01-01

Abstract

Continual learning-learning new tasks in sequence while maintaining performance on old tasks-remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow's hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.
2022
Proceedings of Machine Learning Research
162
12455
12477
https://arxiv.org/abs/2205.09029
ML Research Press
Lee, S.; Mannelli, S. S.; Clopath, C.; Goldt, S.; Saxe, A.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/135773
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact