I am working on a shell script that takes a single command line parameter, a file path (might be relative or absolute). The script should examine that file and print a single line consisting of the phrase:

Windows ASCII

if the files is an ASCII text file with CR/LF line terminators, or

Something else

if the file is binary or ASCII with “Unix” LF line terminators.

currently I have tried the following code.

#!/bin/sh
file=$1
if grep -q "\r\n" $file;then
echo Windows ASCII
else
echo Something else
fi


#!/bin/sh
if test -f "$file"
then
echo  Windows ASCII
else
echo Something else
fi



#!/bin/sh
file=$1
case $(file $file) in
*"ASCII test, with CRLF lin terminators")
echo "Windows ASCII"
;;
*)
echo "Something else"
;;
esac

All cases displays information properly, but when I pass something that is not of Windows ASCII type through such as /bin/cat or SomeFile.sh it still id's it as Windows ASCII. When I pass a .doc file type it displays something else as expected it is just on folders that it displays Windows ASCII. I think I am not handling it properly, but I am unsure. Any pointers of how to fix this issue?

I took a different approach to solving this problem. Instead of grepping for newline characters from the start, I used the file command to tell me the mime type of the file. If the file is binary, then nothing is done. If the file is a text file, it is then grepped for "\r\n" to determine if it uses Windows line endings.

You can take this and modify it to suit your needs of course.

#!/bin/bash
# Determines whether or not a file is a text file
# with Windows line endings.

# Helper function to print usage.
print_usage () {
    # Print the usage message and exit with an error code.
    echo -e "\nUsage: ./istextfile.sh <file>\n"
    exit 1
}

# Make sure the user entered valid arguments.
if [[ -z "$1" ]]; then
    print_usage
elif [[ ! -f "$1" ]]; then
    echo -e "\nFile does not exist: $1"
    print_usage
fi

# Get the mime type for this file. (using `cut` to remove the file name..)
filetype="$(file --mime-type "$1" | cut -d " " -f2)"
if [[ "$filetype" == "text/plain" ]]; then
    echo "This is a text file: $1"
else
    # This is not a text file, we have no use for it.
    echo "Not a text file: $1 ($filetype)"
    exit 1
fi

# Determine line endings (probably a better way to do this.)
if grep -q "\r\n" "$1"; then
    echo "Windows: Yes"
else
    echo "Windows: No"
fi
exit 0

The file command alone can give you some insight into what type of file you are dealing with.

# Get file type description.
filedesc="$(file $1)"

# Use regex to test for certain strings.
if [[ "$filedesc" =~ "ASCII text" ]]; then
    echo "This file is an ASCII text file."
elif [[ "$filedesc" =~ "UTF-8 Unicode text" ]]; then
    echo "This file is a UTF-8 text file."
else
    echo "This is not ASCII or UTF-8."
fi

I've read that file will sometimes say "ASCII text, with CRLF line terminators", which is something you can also test for using the same method as above.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.