### Research Project by Ingmar Schuster, Kashif Rasul, Urs Bergmann

This project aims at probabilistic time series models, which is a general problem setting across many modern internet companies. In other words, we want to be capable of getting a predictive distribution for the value of a time series at future time points from our model. Such solutions are widely applicable to a number of Zalando use cases for example demand and trend forecasting or detecting anomalous behaviour. Probabilistic models also help our stakeholders to better assess the risk associated with any forecast and provide a way to quantify the uncertainty associated especially when predicting far into the future. We currently follow two main approaches, one based on a recurrent neural network, the other on reproducing kernel hilbert space (RKHS) operators.

In the deep learning approach, a long short-term memory (LSTM) architecture is trained to, given past measurements, output parameters for a family of distributions such that observed data at the current time has a high likelihood [2,3]. For future time points, several samples from the predictive distribution are used in order to propagate uncertainty. We use embeddings [1] to encode categorical covariates and develop techniques to model interactions or correlations between different time series.

The RKHS based approach on the other hand uses recently developed RKHS operator ideas in order to get an estimate of the distribution for the current time, given past measurements. Different from a classical regression task we seek to provide a full estimate over the distribution of the output variable given the input. Nonparametric closed form solutions exist for the operator of interest, which however do not allow to place certain constraints on the obtained solutions. For this reason and to keep model complexity under control, we employ stochastic gradient optimization algorithms. The resulting method captures uncertainty like Gaussian Processes, but unlike these is neither bound to Gaussian distributions for the output nor to real domains or univariate outputs. Also, it is not bound to describe time series models, but can be used for any distribution regression task.

References

[1] Guo, C. & Berkhahn, F. (2016). Entity Embeddings of Categorical Variables https://arxiv.org/abs/1604.06737

[2] Flunkert, V., Salinas, D. & Gasthaus, J. (2017). DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. https://arxiv.org/abs/1704.04110

[3] Zhu, L. & Laptev, N. (2017) Deep and Confident Prediction for Time Series at Uber https://arxiv.org/abs/1709.01907

[4] Klus, S., Schuster, I., & Muandet, K. (2017). Eigendecompositions of transfer operators in reproducing kernel Hilbert spaces. Submitted to Journal of Machine Learning Research. https://arxiv.org/abs/1712.01572

[5] Song, L., Huang, J., Smola, A., & Fukumizu, K. (2009). Hilbert space embeddings of conditional distributions with applications to dynamical systems. In Proceedings of the 26th Annual International Conference on Machine Learning.