Hello everyone,
I'm looking for some help with these simple tasks. I actually need this just for linguistic analysis, so I'm sorry for asking probably dumb questions. :)
There is a simple code that uses grep to find lines that contain a certain word in one file.
linecount=`grep "someword" $1/*file.txt | wc -l`
echo $linecount
wordcount=`grep "someword" $1/file.txt | cut -f2- | wc -w`
echo $wordcount
echo 'avg words per line:'
echo "scale=2; $wordcount / $linecount" | bc
What would be the simplest way to:
- find the maximum line length (in words)? wc -L should be probably used somehow?
- count the vocabulary size (simply number of different tokens) for all the found lines? I could only apply uniq -c to lines, not words
I really appreciate any help. many thanks in advance!