Research Project by Leonidas Lefakis, Alan Akbik & Roland Vollgraf
The FEIDEGGER (fashion images and descriptions in German) dataset is a new multi-modal corpus that focuses specifically on the domain of fashion items and their visual descriptions in German. The dataset was created as part of ongoing research at Zalando into text-image multi-modality in the area of fashion.
Unlike other tasks typically encountered in multi-modal learning, in fashion the informative information in the visual data often consists of very fine-grained details that needs to be reflected in the textual descriptions. Furthermore in order to generate such detailed descriptions, users must often rely on a domain-specific vocabulary. These particularities make the creation of a multi-modal fashion-related dataset a challenging task.
In order to create FEIDEGGER we leveraged crowd-sourcing while developing a novel annotation and assessment pipeline in order to ensure the high-quality of the final dataset. The pipeline and motivation behind various design decisions can be found in our published work.
The dataset itself consists of 8732 high-resolution images, each depicting a dress from the available on the Zalando shop against a white-background, as shown in Figure 1. For each of the images we provide five textual annotations in German, each of which has been generated by a separate user. An example of the resulting multi-modal data can be seen in the Figure 2 (note the English translations do not form part of the dataset).
Create Mosaic of Dresses….
Fig. 2 : Example of Image of dress and corresponding textual descriptions