SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta [email protected]

The widespread adoption of embedded vision systems in industrial applications has highlighted the limitations of deep learning models, which are characterized by a high number of parameters. This is representing a significant concern within the scientific community due to the increased computational resources and memory required for training and inference of these models. Addressing this, we propose a flexible and effective methodology for neural network compression that integrates a pluggable dimensionality reduction layer with a Knowledge Distillation (KD) approach. The proposed compression framework allows for the exploration and comparison of various state-of-the-art techniques as reduction mechanism. Specifically, we investigate and implement reduction layers based on: tensor decompositions, such as Averaged Higher-Order Singular Value Decomposition (AHOSVD); non-linear methods like bottleneck projection layers, convolutional autoencoders (CAEs), and MLP-Mixer architectures. In our approach, this reduction layer replaces certain layers of the original network, projecting feature maps into a lower-dimensional space. The subsequent KD process then guides the compressed network to retain high performance. We conducted extensive experiments on image classification tasks, evaluating the efficacy of networks incorporating these reduction strategies across multiple architectures (VGG19, ResNet101) and datasets (CIFAR-10, CIFAR-100, STL-10). Our approach was then compared against both the original, uncompressed models and quantization, a widely used reduction method, in terms of accuracy, model size, parameter reduction, and inference time. The results demonstrate the versatility and effectiveness of our approach in achieving substantial neural network compression and efficiency across various reduction layer instantiations, while consistently maintaining high accuracy.

Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction / Meneghetti, L., Bianchi, E., Demo, N., Rozza, G.. - In: JOURNAL OF SYSTEMS ARCHITECTURE. - ISSN 1383-7621. - 175:(2026). [10.1016/j.sysarc.2026.103778]

Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction

Meneghetti, Laura;Bianchi, Edoardo;Demo, Nicola;Rozza, Gianluigi

2026-01-01

Abstract

The widespread adoption of embedded vision systems in industrial applications has highlighted the limitations of deep learning models, which are characterized by a high number of parameters. This is representing a significant concern within the scientific community due to the increased computational resources and memory required for training and inference of these models. Addressing this, we propose a flexible and effective methodology for neural network compression that integrates a pluggable dimensionality reduction layer with a Knowledge Distillation (KD) approach. The proposed compression framework allows for the exploration and comparison of various state-of-the-art techniques as reduction mechanism. Specifically, we investigate and implement reduction layers based on: tensor decompositions, such as Averaged Higher-Order Singular Value Decomposition (AHOSVD); non-linear methods like bottleneck projection layers, convolutional autoencoders (CAEs), and MLP-Mixer architectures. In our approach, this reduction layer replaces certain layers of the original network, projecting feature maps into a lower-dimensional space. The subsequent KD process then guides the compressed network to retain high performance. We conducted extensive experiments on image classification tasks, evaluating the efficacy of networks incorporating these reduction strategies across multiple architectures (VGG19, ResNet101) and datasets (CIFAR-10, CIFAR-100, STL-10). Our approach was then compared against both the original, uncompressed models and quantization, a widely used reduction method, in terms of accuracy, model size, parameter reduction, and inference time. The results demonstrate the versatility and effectiveness of our approach in achieving substantial neural network compression and efficiency across various reduction layer instantiations, while consistently maintaining high accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				JOURNAL OF SYSTEMS ARCHITECTURE
			
	Numero del volume
	
				175
			
	Numero di articolo
	
				103778
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.sysarc.2026.103778
			
	Tutti gli autori
	
						Meneghetti, Laura; Bianchi, Edoardo; Demo, Nicola; Rozza, Gianluigi
					
	Appare nelle tipologie:
	
				1.1 Journal article

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/151331

Citazioni

ND

0

ND

social impact