Alternate Through the Epochs Stochastic Gradient Training for Multi-Task Neural Networks
Please login to view abstract download link
This presentation addresses the training challenges of Neural Networks in Multi-Task Learning (MTL) settings. Specifically, we focus on Multi-Task Neural Network (MTNN) architectures characterized by the hard-parameter sharing property and by the multi-head design. A key training issue in such settings is the possibility of having conflicting gradients for the different task-specific loss functions, which can reduce the training quality compared to single-task training. While existing approaches often rely on modifying training procedures or adaptively adjusting the aggregate loss weights, we propose a novel method based on alternating stochastic gradient updates for shared and task-specific parameters [1]. This strategy forces a simultaneous minimization of task-specific losses while enhancing regularization and reducing both memory and computational costs. We provide a theoretical convergence analysis under standard assumptions and support our claims with empirical results that demonstrate the effectiveness of the proposed training approach. References: [1] Bellavia S., Della Santa F., Papini A., \emph{ATE-SG: Alternate Through the Epochs Stochastic Gradient for Multi-Task Neural Networks}, arXiv preprint
