SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta sdl@sissa.it

Despite their huge success in many fields of engineering science, neural networks continue to suffer from a number of shortcomings. For example, they exhibit catastrophic forgetting: they will forget the details of a task when learning a second task. They are also susceptible to adversarial attacks: small input perturbation that make the network change its prediction. These problems do not seem to hamper biological neural networks. One difference between biological and artificial neural networks that could account for this difference is the learning rule: while artificial neural networks are universally trained with (variants of) stochastic gradient descent, this algorithm is not a good model for learning in biological neural networks. In this thesis, we therefore study the potential of a more biologically plausible learning algorithm called Direct Feedback Alignment (DFA) to alleviate these problems. The key idea of DFA is to directly project the error signal via dedicated, random feedback matrices onto the change of the weights. We first explore the idea in the context of continual learning. We find that in fully-connected networks trained on image classification tasks, DFA can alleviate catastrophic forgetting by constraining the network weights to a particular region in weight space when using the same feedback matrix across tasks, or by orthogonalising weight updates by using distinct feedback matrices for each task. We then investigate the ability of DFA to increase robustness to adversarial perturbations, and more generally to mimic the benefits of Bayesian Neural Networks, by adding a dynamic on the feedback weights to sample network parameters. We find that ensembles of networks sampled with dynamical DFA exhibit enhanced robustness to gradient-based adversarial attacks. Furthermore, the test accuracy of the ensemble outperforms a network trained with backpropagation. We finally explore the potential of Direct Feedback Alignment to train recurrent neural networks, which are an important model of recurrent computations which are omnipresent in the brain. We show that DFA can be used to train simple variants of Long Short-term Memory Networks (LSTM), overcoming the bottlenecks of the standard backpropagation-trough-time algorithm. In summary, our results highlight the potential of Direct Feedback Alignment in three different domains. Our results raise the possibility that while biologically inspired learning rules for artificial neural networks may not always reach the on-task performance of vanilla backpropagation, their advantages really become clear once they are applied to complex, multi-goal settings like continual learning.

Direct Feedback Alignment for Continual Learning, Bayesian Neural Networks and Recurrent Neural Networks / Folchini, Sara. - (2024 Dec 16).

Direct Feedback Alignment for Continual Learning, Bayesian Neural Networks and Recurrent Neural Networks

FOLCHINI, SARA

2024-12-16

Abstract

Despite their huge success in many fields of engineering science, neural networks continue to suffer from a number of shortcomings. For example, they exhibit catastrophic forgetting: they will forget the details of a task when learning a second task. They are also susceptible to adversarial attacks: small input perturbation that make the network change its prediction. These problems do not seem to hamper biological neural networks. One difference between biological and artificial neural networks that could account for this difference is the learning rule: while artificial neural networks are universally trained with (variants of) stochastic gradient descent, this algorithm is not a good model for learning in biological neural networks. In this thesis, we therefore study the potential of a more biologically plausible learning algorithm called Direct Feedback Alignment (DFA) to alleviate these problems. The key idea of DFA is to directly project the error signal via dedicated, random feedback matrices onto the change of the weights. We first explore the idea in the context of continual learning. We find that in fully-connected networks trained on image classification tasks, DFA can alleviate catastrophic forgetting by constraining the network weights to a particular region in weight space when using the same feedback matrix across tasks, or by orthogonalising weight updates by using distinct feedback matrices for each task. We then investigate the ability of DFA to increase robustness to adversarial perturbations, and more generally to mimic the benefits of Bayesian Neural Networks, by adding a dynamic on the feedback weights to sample network parameters. We find that ensembles of networks sampled with dynamical DFA exhibit enhanced robustness to gradient-based adversarial attacks. Furthermore, the test accuracy of the ensemble outperforms a network trained with backpropagation. We finally explore the potential of Direct Feedback Alignment to train recurrent neural networks, which are an important model of recurrent computations which are omnipresent in the brain. We show that DFA can be used to train simple variants of Long Short-term Memory Networks (LSTM), overcoming the bottlenecks of the standard backpropagation-trough-time algorithm. In summary, our results highlight the potential of Direct Feedback Alignment in three different domains. Our results raise the possibility that while biologically inspired learning rules for artificial neural networks may not always reach the on-task performance of vanilla backpropagation, their advantages really become clear once they are applied to complex, multi-goal settings like continual learning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				16-dic-2024
			
	Relatore/i afferenti alla SISSA
	
				Goldt, Sebastian Dominik
Laio, Alessandro
Arora, Viplove
			
	Relatore/i esterni
	
				Cossu, Andrea
			
	Tutti gli autori
	
						Folchini, Sara
					
	Appare nelle tipologie:
	
				8.1 PhD thesis

File in questo prodotto:

File	Dimensione	Formato
thesis_Sara_Folchini-corrected.pdf accesso aperto Descrizione: tesi di Ph.D. Tipologia: Tesi Licenza: Non specificato Dimensione 5.7 MB Formato Adobe PDF Visualizza/Apri	5.7 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/144470

Citazioni

ND

ND

ND

social impact