Adversarial Learning

Research Project by Christian Bracher, Martin Heusel, Sebastian Heinz & Roland Vollgraf

Deep learning computer vision models, based on our huge archive of fashion images, already empower various customer-facing AI products at Zalando including recommendations, search, and “complete the look”, a product that generates outfits. We currently face two obstacles in applying our models even further: (1) Models that are trained on Zalando’s highly curated shop imagery don’t generalize well to other assortments or customer produced content; and (2) Model performance is tainted by spurious correlations with confounding but irrelevant sources of information inside the Zalando image catalog. We aim to overcome both obstacles with the help of methods from adversarial learning.

Implementation Challenges

Adversarial learning tasks can be formulated with the help of two-player games. Here the first player’s goal is to predict a certain attribute (like the image source), while the second player’s goal is to sabotage the first player. Mathematically, this corresponds to solving saddle point optimization problems. As adversarial learning following this strategy is notoriously unstable (see the figure below for the particular challenge of “orbiting”), we plan to improve its training behaviour by refining optimization algorithms exploring 2

nd order methods as well as insights from image GANs. We also plan to approach the problem by finding adversary objectives (like MMD, meta learning) that avoid the need of solving saddle point optimization problems, without unduly affecting classification ability.

“Orbiting” of weight traces (sinusoidal graphs) implies lack of convergence.

Learning-to-Forget Example

We have an article embedding which encodes, in particular, the type of fabric, season, targeted gender and age, the color as well as the brand of a fashion item: Fashion DNA. Via adversarial learning, we trained a model on top of Fashion DNA with the task to forget all article properties except for color and pattern. Examples of neighborhoods with respect to the resulting article embedding are shown below. LSTM-based network recommended articles