The widespread adoption of embedded vision systems in industrial applications has highlighted the limitations of deep learning models, which are characterized by a high number of parameters. This is representing a significant concern within the scientific community due to the increased computational resources and memory required for training and inference of these models. Addressing this, we propose a flexible and effective methodology for neural network compression that integrates a pluggable dimensionality reduction layer with a Knowledge Distillation (KD) approach. The proposed compression framework allows for the exploration and comparison of various state-of-the-art techniques as reduction mechanism. Specifically, we investigate and implement reduction layers based on: tensor decompositions, such as Averaged Higher-Order Singular Value Decomposition (AHOSVD); non-linear methods like bottleneck projection layers, convolutional autoencoders (CAEs), and MLP-Mixer architectures. In our approach, this reduction layer replaces certain layers of the original network, projecting feature maps into a lower-dimensional space. The subsequent KD process then guides the compressed network to retain high performance. We conducted extensive experiments on image classification tasks, evaluating the efficacy of networks incorporating these reduction strategies across multiple architectures (VGG19, ResNet101) and datasets (CIFAR-10, CIFAR-100, STL-10). Our approach was then compared against both the original, uncompressed models and quantization, a widely used reduction method, in terms of accuracy, model size, parameter reduction, and inference time. The results demonstrate the versatility and effectiveness of our approach in achieving substantial neural network compression and efficiency across various reduction layer instantiations, while consistently maintaining high accuracy.

Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction / Meneghetti, Laura; Bianchi, Edoardo; Demo, Nicola; Rozza, Gianluigi. - In: JOURNAL OF SYSTEMS ARCHITECTURE. - ISSN 1383-7621. - 175:(2026). [10.1016/j.sysarc.2026.103778]

Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction

Meneghetti, Laura;Demo, Nicola;Rozza, Gianluigi
2026-01-01

Abstract

The widespread adoption of embedded vision systems in industrial applications has highlighted the limitations of deep learning models, which are characterized by a high number of parameters. This is representing a significant concern within the scientific community due to the increased computational resources and memory required for training and inference of these models. Addressing this, we propose a flexible and effective methodology for neural network compression that integrates a pluggable dimensionality reduction layer with a Knowledge Distillation (KD) approach. The proposed compression framework allows for the exploration and comparison of various state-of-the-art techniques as reduction mechanism. Specifically, we investigate and implement reduction layers based on: tensor decompositions, such as Averaged Higher-Order Singular Value Decomposition (AHOSVD); non-linear methods like bottleneck projection layers, convolutional autoencoders (CAEs), and MLP-Mixer architectures. In our approach, this reduction layer replaces certain layers of the original network, projecting feature maps into a lower-dimensional space. The subsequent KD process then guides the compressed network to retain high performance. We conducted extensive experiments on image classification tasks, evaluating the efficacy of networks incorporating these reduction strategies across multiple architectures (VGG19, ResNet101) and datasets (CIFAR-10, CIFAR-100, STL-10). Our approach was then compared against both the original, uncompressed models and quantization, a widely used reduction method, in terms of accuracy, model size, parameter reduction, and inference time. The results demonstrate the versatility and effectiveness of our approach in achieving substantial neural network compression and efficiency across various reduction layer instantiations, while consistently maintaining high accuracy.
2026
175
103778
Meneghetti, Laura; Bianchi, Edoardo; Demo, Nicola; Rozza, Gianluigi
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/151331
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact