In the field of Deep Learning, the high number of parameters in models has become a significant concern within the scientific community due to the increased computational resources and memory required for training and inference. Addressing this issue, we propose a novel tensorized technique to compress network architectures. Our approach aims to significantly reduce the network’s size and the number of parameters by integrating Averaged Higher Order Singular Value Decomposition with a novel Knowledge Distillation approach. Specifically, we replace certain layers of the original architecture with layers that perform linear projections onto a reduced space defined by our reduction technique. We conducted experiments on image classification tasks using multiple architectures and datasets. The evaluation focuses on final accuracy, model size, and parameter reduction, comparing our approach with both the original models and quantization, a widely used reduction method. The results underscore the effectiveness of our method in significantly reducing the number of parameters and the overall size of neural networks while maintaining high performance.
KD-AHOSVD: Neural Network Compression via Knowledge Distillation and Tensor Decomposition / Meneghetti, Laura; Bianchi, Edoardo; Demo, Nicola; Rozza, Gianluigi. - 15569 LNCS:(2025), pp. 81-92. (Intervento presentato al convegno 18th International Workshop on Design and Architecture for Signal and Image Processing, DASIP 2025 tenutosi a Barcelona, Spain nel 20-22 January 2025) [10.1007/978-3-031-87897-8_7].
KD-AHOSVD: Neural Network Compression via Knowledge Distillation and Tensor Decomposition
Meneghetti, Laura
;Demo, Nicola;Rozza, Gianluigi
2025-01-01
Abstract
In the field of Deep Learning, the high number of parameters in models has become a significant concern within the scientific community due to the increased computational resources and memory required for training and inference. Addressing this issue, we propose a novel tensorized technique to compress network architectures. Our approach aims to significantly reduce the network’s size and the number of parameters by integrating Averaged Higher Order Singular Value Decomposition with a novel Knowledge Distillation approach. Specifically, we replace certain layers of the original architecture with layers that perform linear projections onto a reduced space defined by our reduction technique. We conducted experiments on image classification tasks using multiple architectures and datasets. The evaluation focuses on final accuracy, model size, and parameter reduction, comparing our approach with both the original models and quantization, a widely used reduction method. The results underscore the effectiveness of our method in significantly reducing the number of parameters and the overall size of neural networks while maintaining high performance.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.