Visual search, in particular the street-to-shop task of matching fashion items displayed in everyday images with similar articles, is a challenging and commercially important task in computer vision. Building on our successful Studio2Shop model, we report results on Street2Fashion2Shop, a pipeline architecture that stacks Studio2Fashion, a segmentation model responsible for eliminating the background in a street image, with Fashion2Shop, an improved model matching the remaining foreground image with “title images”, front views of fashion articles on a white background. Both segmentation and product matching rely on deep convolutional neural networks. The pipeline allows us to circumvent the lack of quality annotated wild data by leveraging specific data sets at all steps. We show that the use of fashion-specific training data leads to superior performance of the segmentation model. Studio2Shop built its performance on FashionDNA, an in-house product representation trained on the rich, professionally curated Zalando catalogue. Our study presents a substantially improved version of FashionDNA that boosts the accuracy of the matching model. Results on external datasets confirm the viability of our approach.
Learn more about the Fashion DNA research project.