Warning: This code is tested, but not as-is. I combined multiple files to simplify the snippet.
One of my specialties is document imaging, and custom tools to handle it. A common request for both scanner sources and file import is correction of when pages are not properly oriented. To avoid forcing users to correct this in their scaning utility of choice, automatic detection and rotation of these images is a nice feature.
This snippet is a command-oriented solution using the Tesseract OCR engine and a .NET ImageMagick library (both available on NuGet). It might be used like this:
using (var fs = File.Open(file, FileMode.Open))
{
try
{
using (var ms = new AutoOrient().Run(fs, new AutoOrientOptions()))
{
// Close the original file so we can manage it as necessary
fs.Close();
// Overwrite the original file
File.WriteAllBytes(file, ms.ToArray());
}
}
catch (Exception ex)
{
Log("Error running command" + ex.ToString());
}
}
There's nothing unusual going down, but I'm offering this to save others time in developing something similar. The curious may find the simple command-oriented design interesting as well. Questions and comments are welcome. :)
Note that Tesseract OCR requires external training files that can be downloaded from the https://code.google.com/p/tesseract-ocr/downloads/list. The files I'm using are tesseract-ocr-3.02.eng.tar.gz and tesseract-ocr-3.01.osd.tar.gz. The latter is critical for orientation information.