SIMAI 2025

Event Detection and Modelling of Physical Systems From Acoustic Responses

  • Chinellato, Erik (Università degli Studi di Padova)
  • Marcuzzi, Fabio (Università degli Studi di Padova)

Please login to view abstract download link

Often, in applications, acoustic signals are used to detect and analyze the behavior of physical systems, such as operating machines, natural phenomena, or living beings. Indeed, in many situations, the acoustic response of a physical system can be uniquely linked to a precise state of the system or a precise event acting on the system. Here we are interested in numerical methods that perform this recognition through approximation methods and mathematical modeling, since they allow us to obtain quantitative estimates. In particular, they can perform indirect measurements of physical phenomena. Acoustic responses are very attractive, since their data come from contactless sensors (microphones) which are quite easy to adopt in various situations. However, the mathematical problem is much more difficult than with contact/direct sensors (like using an accelerometer in vibration analysis, for example): a source separation is needed to discriminate the response of the physical system from the set of other sounds and noises created by the environment. This is cast as an approximation problem: to detect and estimate the shape and intensity of known clean sources, through an approximate nonnegative matrix factorization (NMF). Indeed, to optimize the separation of clean sources from background noise, a learning process from data substantially improves the algorithm: we need a "Deep-NMF". Moreover, since we are dealing with physical systems, embedding the properties of a physical model gives a stronger and more interpretable method, here realized by Physics-Aware Deep-NMF (PAD-NMF) and Deep-NMFD algorithms. Here, the response audio spectrogram is fed into a neural network-like scheme, providing a data-enhanced additive decomposition into multiple sources, which can then be analyzed. The mathematical modeling of audio sources is embedded in the network structure itself, and resembles traditional methods employed to model DLTI system responses, namely Hankel matrices. Block-Hankel spectrograms and dictionaries are used to model and embed the time correlation between consecutive spectrogram columns, leading to more precise and interpretable results. Some relevant results include: hit detection in general mixtures (e.g. piano notes within a poly-instrumental music track), model parameter estimation, and real-time recognition and play.