Friday 4 September 2015

An introduction to TnT And Malayalam POS Tagging


TnT  (TRIGRAMS n TAGS) is an efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. It is optimized for any languages ie , we can  train with a  tagged corpora that can be of any language and the tagging can be done manually. After that we can test tagging.

Here i am going to tell
how can you tag POS(parts-of-speech)for a malayalam corpus

Step1:
Download TnT .tar.gz  file from the following site

Step2:
Open terminal and extract the TnT .tar.gz file
$ tar zxvf components-tnt.tar.gz

Step3:
Add a .txt file of POS tagged malayalam corpus of words inside the /components/tnt folder

Step4:
Train the tagger with the tagged corpus file. For this open terminal inside /components/tnt, then write the following commands . (Assume tagged file name is tagged.txt)
$ ./tnt-para tagged.txt

Step5:
Add a test file of malayalam words inside /components/tnt

Step6:
Test the output by the following command in terminal. Assume test file is Test1.txt
$ ./tnt tagged Test1
Then the result will shown in terminal having words in test file with appropriate POS tag.


For more visit : https://youtu.be/Q6kxhZDOE18

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Can you give some clue to calculate the accuracy of the tagger please?

    ReplyDelete