SIMAI 2025

Adaptability of KANs as activation functions in classical MLP networks

De Luca, Pasquale (University of Naples Parthenope)
Di Nardo, Emanuel (University of Naples Parthenope)
Ciaramella, Angelo (University of Naples Parthenope)

In session: MS020 - Numerical Issues in Machine Learning for Dynamical Systems

Please login to view abstract download link

The proposed work focuses on a recent type of neural network known as KAN (Kolmogorov-Arnold Networks), which explicitly leverage the Kolmogorov-Arnold representation theorem. This class of networks learns a nonlinear function directly, modeled through splines. KANs have demonstrated faster convergence compared to classical MLPs on a range of problems. However, when applied to highly complex tasks or data types such as images, they require a substantial number of parameters. Various KAN variants have been proposed, but our goal is to formally and rigorously combine the universal approximation capabilities of MLPs with the adaptive nonlinear function learning of KANs. This approach aims to overcome a common trade-off in classical networks that is the choice and usage of activation functions. Typically, in an MLP, such functions are used to project the linear combination of inputs into a nonlinear space, thereby "breaking" the inherent linearity of the model. However, these activation functions restrict the output to a specific range of values, which may not always approximate the target problem effectively. This limitation has led to extensive research on activation functions, not only because they can halt learning due to issues like the vanishing gradient, but also because they constrain the output space too tightly. To address this, it is proposed to exploit the ability of a KAN to learn arbitrary nonlinear functions from data by replacing standard activation functions with a KAN activation. This substitution aims to eliminate the limitations of traditional activations by introducing a more flexible yet controlled mechanism. Additional constraints can be easily imposed by adjusting the spline parameters, which is helpful in preventing the learned activation function from leading the network to overfit or fail to generalize, as it is otherwise "free" to adapt arbitrarily. Further, it is hypothesized that this control over the activation function may contribute to both regularizing and normalizing the network, potentially removing the need for normalization layers.