Research Project by Duncan Blythe (ex-member), Alan Akbik & Roland Vollgraf
The aim of this project is to use the intrinsic structure of language to train deep-learn models to better model textual data. Good modeling of textual data implies:
- Being able to differentiate proper language use from improper language use
- Being able to generate sensible and diverse samples from the language under study
- Useful applications of language modeling to downstream tasks such as predicting customer ratings from textual reviews (“sentiment analysis”) or locating mentions of fashion brands in a large corpus of text (“named-entity-recognition”).
Our approach is illustrated in the image below:
We consider a sentence such as “The dog is chasing the cat”. Standard language models model the likelihood of future words given already seen words (how likely is “cat” given “The dog is chasing”). Our approach goes further than this and also models the likelihood of future grammatical roles given past words and their grammatical roles. This is displayed at the top of the image — for instance it is very likely that a noun follows the verb phrase “is chasing”. Our project investigates to what extent this two-fold modeling improves the language understanding which can be derived from the model.
Our experiments show that in certain cases this joint modeling can significantly improve the standard approach of only modeling the words by themselves. In doing so we use techniques based on sequential Monte Carlo sampling.
We have published findings in a preprint here: https://arxiv.org/pdf/1803.03665