TnT (TRIGRAMS n TAGS) is an efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. It is optimized for any languages ie , we can train with a tagged corpora that can be of any language and the tagging can be done manually. After that we can test tagging.
Here i am going to tell
Step1:
Download TnT .tar.gz file from the following site
Step2:
Open terminal and extract the TnT .tar.gz file
$ tar zxvf components-tnt.tar.gzOpen terminal and extract the TnT .tar.gz file
Step3:
Add a .txt file of POS tagged malayalam corpus of words inside the /components/tnt folder
Step4:
Train the tagger with the tagged corpus file. For this open terminal inside /components/tnt, then write the following commands . (Assume tagged file name is tagged.txt)
$ ./tnt-para tagged.txt
Step5:
Add a test file of malayalam words inside /components/tnt
Step6:
Test the output by the following command in terminal. Assume test file is Test1.txt
$ ./tnt tagged Test1
Then the result will shown in terminal having words in test file with appropriate POS tag.
For more visit : https://youtu.be/Q6kxhZDOE18
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThanks, It works well!!
ReplyDeleteCan you give some clue to calculate the accuracy of the tagger please?
ReplyDelete