alWAYS Beginner: An introduction to TnT And Malayalam POS Tagging

TnT (TRIGRAMS n TAGS) is an efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. It is optimized for any languages ie , we can train with a tagged corpora that can be of any language and the tagging can be done manually. After that we can test tagging.

Here i am going to tell

how can you tag POS(parts-of-speech)for a malayalam corpus

Step1:

Download TnT .tar.gz file from the following site

https://nats-www.informatik.uni-hamburg.de/CDG/DownloadPage

Step2:
Open terminal and extract the TnT .tar.gz file

$ tar zxvf components-tnt.tar.gz

Step3:
Add a .txt file of POS tagged malayalam corpus of words inside the /components/tnt folder

Step4:
Train the tagger with the tagged corpus file. For this open terminal inside /components/tnt, then write the following commands . (Assume tagged file name is tagged.txt)
$ ./tnt-para tagged.txt

Step5:
Add a test file of malayalam words inside /components/tnt

Step6:
Test the output by the following command in terminal. Assume test file is Test1.txt
$ ./tnt tagged Test1
Then the result will shown in terminal having words in test file with appropriate POS tag.

For more visit : https://youtu.be/Q6kxhZDOE18

alWAYS Beginner

Friday 4 September 2015

An introduction to TnT And Malayalam POS Tagging

4 comments: