A line-search based SGD algorithm with Adaptive Importance Sampling
Please login to view abstract download link
Stochastic Gradient Descent (SGD) is widely employed in large-scale supervised learning due to its simplicity and scalability. However, its performance can be hampered by the high variance of the stochastic gradient and by sensitivity to the choice of the step-size. To address these issues, importance sampling techniques have been proposed to reduce gradient variance by approximating the optimal sampling distribution, which is often theoretically defined but hard to compute in practice. In this work, we introduce a novel optimization scheme that combines adaptive importance sampling with an automatic step-size selection based on a stochastic Armijo-type line-search strategy. This approach avoids the need for manual tuning of the initial step-size or the design of a decay schedule, making the method more robust and easier to deploy in real-world large-scale problems. The proposed scheme ensures efficient variance reduction and adaptivity, enhancing both convergence speed and stability. Additionally, we propose a flexible mini-batch variant that eliminates the rigid partitioning strategy typically used in previous methods such as SGD-AIS, enabling compatibility with dynamic sampling techniques like LISA. Numerical experiments on real-world datasets in the context of supervised classification are presented.
