Hello,
I am trying to create a script that will compare one word at a time from one file to every word in a second file. I know how to design nested for loops to read the first entry in file1 and compare it to every entry in file2, so that's no problem. file1 contains entries that have non-alpha characters at the beginning of each line with either a single word or two words. file2 contains lines that are multiple words in length.
file1 contains:
12/3:word
45/6:word word
78/9:word
01/23:word word
and so on...
I can design a regex that captures each section correctly (checking it on regexpal.com) in file1:
[0-9]*\/[0-9]*:[a-zA-Z]*
I can check the regex in a terminal emulator (Gnome Term):
echo "45/6:word word" | sed -e 's/\([0-9]*\/[0-9]*:\)\([a-z]*.[a-zA-Z]*\)/\2/'
That correctly removes the first part and returns word word as expected. However, when read from file1, I get two entries for the lines that have more than one word, so that if file1 had 10 lines and two of them have two words I get 12 results rather than 10:
for word in `less file1 | sed -e 's/\([0-9]*\/[0-9]*:\)\([a-z]*.[a-zA-Z]*\)/\2/'`
do
echo $word
done
Why does it work with echo and not with my loop? I have tried adding .*
and .$
to the end of the second part of the sed read section in an effort to reach to the end of the line, but neither option helped. I suspect it might have something to do with using less, but I'm not finding a solution. I also tried using cat in place of less, but no difference. Any help is greatly appreciated!