Thursday, 17 September 2015

Tnt tagging through python

From my previous post, I hope all of you know how to do tagging using TnT tagger. To tag a file called test_file by the model tagged_file can be done by following command in terminal
$ ./tnt tagged_file test_file
And if we want to save the tagged file, attempt following cammand instead of above
$ ./tnt tagged_file test_file > output.txt

Here is an another way to tag test_file using via python code, here the tagged output can be stored in another file for future reference as above. Add the following lines to python script
import os
executable = '/home/ajuna/components/tnt/tnt' #path to tnt folder
tnt_model = '/home/ajuna/components/tnt/tagged_file' #path to model named tagged_file 

def run_tnt(input='tokenized_file', output='taggedoutput.txt'):
     "call tnt from Linux"
     tnt_command = '%s %s %s > %s' % (tnt_executable, tnt_model, input, output)     
     os.system(tnt_command)
run_tnt('tagged_file', 'output.txt')
#here tagged_file is the input file to be tagged (which contains words), and output.txt is the file where output to be saved(words with tag after running TnT tagger)

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Error:

    Building suffix trie (113 lowercase, 0 uppercase)
    Estimating lambdas (done)
    lambda1 = 2.857143e-01 lambda2 = 2.857143e-01 lambda3 = 4.285714e-01
    lam_bi1 = 3.333333e-01 lam_bi2 = 6.666667e-01
    suffix theta = 1.947190e-01
    Error: cannot find corpus 'tagged_file'

    ReplyDelete
  3. First need to create a tagged_file for training purpose which contains list of words with corresponding tags.

    ReplyDelete