Lookahead Part-Of-Speech Tagger

Overview

This is a C++ implementation of the part-of-speech (POS) tagging algorithm described in [1]. The tagger is fast (>500 sentences/sec), accurate (97.22% on the WSJ corpus), and trainable with your own POS-annotated corpus. The tagger contains model files trained for English.

How to use the tagger

1. Download the latest version of the tagger

29 Feb 2012 lapos-0.1.2.tar.gz (source code for linux+gcc)
12 Aug 2011 lapos-0.1.1.tar.gz (source code for linux+gcc)
28 Jun 2011 lapos-0.1.tar.gz (source code for linux+gcc)

2. Expand the archive


> tar xvzf lapos-X.X.tar.gz

3. Compile


> cd lapos-X.X/ 

> make

4. Tag sentences

Prepare the input in one-sentence-per-line format, then run the "lapos" command:


> echo "He opened the window." | ./lapos -t -m ./model_wsj02-21

He/PRP opened/VBD the/DT window/NN ./.

How to build a tagging model with your own annotated corpus

Please see the README file.

References

[1] Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Kazama. 2011. Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models? In Proceedings of CoNLL, pp. 238-246.

This page is maintained by Yoshimasa Tsuruoka