From my previous post, I hope all of you know how to do tagging using TnT tagger.
To tag a file called test_file by the model tagged_file can be done by following command in terminal
$ ./tnt tagged_file test_fileAnd if we want to save the tagged file, attempt following cammand instead of above
$ ./tnt tagged_file test_file > output.txt
Here is an another way to tag test_file using via python code, here the tagged output can be stored in another file for future reference as above. Add the following lines to python script
import os
executable = '/home/ajuna/components/tnt/tnt' #path to tnt folder
tnt_model
= '/home/ajuna/components/tnt/tagged_file' #path to model named tagged_file
def run_tnt(input='tokenized_file', output='taggedoutput.txt'):
"call tnt from Linux"
tnt_command = '%s %s %s > %s' % (tnt_executable, tnt_model, input, output)
os.system(tnt_command)
run_tnt('tagged_file', 'output.txt')
This comment has been removed by the author.
ReplyDeleteError:
ReplyDeleteBuilding suffix trie (113 lowercase, 0 uppercase)
Estimating lambdas (done)
lambda1 = 2.857143e-01 lambda2 = 2.857143e-01 lambda3 = 4.285714e-01
lam_bi1 = 3.333333e-01 lam_bi2 = 6.666667e-01
suffix theta = 1.947190e-01
Error: cannot find corpus 'tagged_file'
First need to create a tagged_file for training purpose which contains list of words with corresponding tags.
ReplyDeletethis gives wrong output
ReplyDelete