Lipschitz standardization for robust multivariate learning
Current trends in machine learning rely on out-of-the-box gradient-based approaches. With the aim of mitigating numerical errors and to improve the convergence of the learning process, a common empirical practice is to standardize or normalize the data. However, there is a lack of theoretical analysis regarding why and when these methods result in an improvement of the learning process. In this work, we first study these methods in the context of black-box variational inference, specifically analyzing the effect that scaling the data has on the smoothness of the optimization landscape. Our analysis shows that no general rule applies in order to decide which of the existing data scaling methods, or even if they, will improve the learning process. Second, we highlight the issues that arise when dealing with multivariate data, due to the discrepancy in smoothness of the likelihood functions for different variables, and the inability to scale discrete data. Finally, we propose a novel Lipschitz standardization, and its extension for discrete data, which overcomes the aforementioned limitations. Specifically, as backed by our experiments, Lipschitz standardization i) favors a fairer learning across different variables in the data; and ii) results in faster and more accurate learning.
READ FULL TEXT