i need to somehow detect text block (i dont need OCR just area where text is, but i am dealing with pixels) within book page and then cut everything else out. i am dealing with scanned books so i have specs and smudges in the page. the easiest way to clean pages is to detect text block in the page and cut everything else, maybe some problems will occur with page numbers, but that should be detected and left as is.
i have tiff, but there is several ways to convert on bmp so any suggestions? code snippets? library to use or anything else that might help.
thanks