Thursday, 17 September 2015

Tnt tagging through python

From my previous post, I hope all of you know how to do tagging using TnT tagger. To tag a file called test_file by the model tagged_file can be done by following command in terminal
$ ./tnt tagged_file test_file
And if we want to save the tagged file, attempt following cammand instead of above
$ ./tnt tagged_file test_file > output.txt

Here is an another way to tag test_file using via python code, here the tagged output can be stored in another file for future reference as above. Add the following lines to python script
import os
executable = '/home/ajuna/components/tnt/tnt' #path to tnt folder
tnt_model = '/home/ajuna/components/tnt/tagged_file' #path to model named tagged_file 

def run_tnt(input='tokenized_file', output='taggedoutput.txt'):
     "call tnt from Linux"
     tnt_command = '%s %s %s > %s' % (tnt_executable, tnt_model, input, output)     
     os.system(tnt_command)
run_tnt('tagged_file', 'output.txt')
#here tagged_file is the input file to be tagged (which contains words), and output.txt is the file where output to be saved(words with tag after running TnT tagger)

Sunday, 6 September 2015

Know your python-nltk

NLTK - Natural Language Toolkit it is a platform to create python programs that process our natural language. NLTK is free, opensource and it is available for Windows, Mac OS X, and Linux.

Install python-nltk :
$sudo apt-get install python-nltk

To know where does you installed :
$python >>>
import nltk >>>
nltk.__path__

To know where does the NLTK data located:
>>>nltk.data.path

To know the version of NLTK you are running:
$python
>>>nltk.__version__

OR
$python -c 'import nltk; print(nltk.__version__)'
OR
$pip freeze | grep nltk

Friday, 4 September 2015

An introduction to TnT And Malayalam POS Tagging


TnT  (TRIGRAMS n TAGS) is an efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. It is optimized for any languages ie , we can  train with a  tagged corpora that can be of any language and the tagging can be done manually. After that we can test tagging.