hi
im am using this code(Python) to get the n -grams for a word :
import string;
import sys;
# N
N = 6;
# file
f_in = open("test.txt", 'r');
ln = f_in.read()
wlen = len(ln);
i = 0;
while (i < wlen - N + 1 ):
for k in ln [i:i+N]: print k,
print;
i = i + 1;
# close file
f_in.close()
The file "text.txt" contain the word "text"
the result i get for N = 2 is (te,ex,xt)
but the correct result is ( =t,te,ex,xt,t=) where ( = is space)
and the biggest N i can use N=4 the number of the letters. but 1 want to use it for bigger
e.g. N=5 where the result would be (=text,text=,ext==,xt===,t====)
any ideas to solve it would be very helpfull
thanx
Edit: Put code tags around script vegaseat