A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
- Online Demo
- Pre-trained word and character n-gram embeddings: download (912 MB)
- Our JMT model has been extended for learning task-oriented latent graph structures for neural machine translation: link
- My blog post about this paper is here, and there are also several articles: link 1, link 2, link 3, link 4, link 5, link 6.
- This paper was presented at Continual Learning and Deep Networks Workshop held with NIPS 2016.
Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task's loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.