Lookahead Part-Of-Speech Tagger


Overview

This is a C++ implementation of the part-of-speech (POS) tagging algorithm described in [1]. The tagger is fast (>500 sentences/sec), accurate (97.22% on the WSJ corpus), and trainable with your own POS-annotated corpus. The tagger contains model files trained for English.

How to use the tagger

1. Download the latest version of the tagger

2. Expand the archive

> tar xvzf lapos-X.X.tar.gz

3. Compile

> cd lapos-X.X/
> make

4. Tag sentences

Prepare the input in one-sentence-per-line format, then run the "lapos" command:

> echo "He opened the window." | ./lapos -t -m ./model_wsj02-21
He/PRP opened/VBD the/DT window/NN ./.

How to build a tagging model with your own annotated corpus

Please see the README file.

References

[1] Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Kazama. 2011. Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models? In Proceedings of CoNLL, pp. 238-246.


This page is maintained by Yoshimasa Tsuruoka