pythonbegin 0 Light Poster

Hi All

I am working on a html parser 'selenium webdriver' and on one of the pages I found an image with text on it.The page contains other information as well. I tried to see the page source if I can extract the text on image using webdriver but I could not find any tag/field for the text on image.

I am wondering if there is any way I can extract the text and numbers from image using selenium webdriver? if not, can we save the image using webdriver and then extract the text from image using external modules?

To test this, I downloaded the image (please find attached) manually and tried to extract text and nos. using module "pytesser" and "Image".
Here is an image

test

and Here's my code-

from pytesser import *
import Image
image = Image.open('test.png')
print image_to_string(image)  # it should print the text from image 
#or
print image_file_to_string('test.png')

I got this error -

raise IOError("cannot write mode %s as BMP" % im.mode)
IOError: cannot write mode RGBA as BMP

Then, I tried to convert the image from RGBA to RGB

if image.mode=='RGBA': im=image.convert('RGB')
save(im,"test-rgbaTOrgb.png")

Here is the rgb image (find attached)

testrgbaTOrgb

Then run -

print image_file_to_string("test-rgbaTOrgb.png")

It worked but the output is just some strange characters - "as zwa- spa; `"

I expect the output to be -
"Gene:BRCA1, Median Rank:431.5, p-value:2.60E-6" and remove the rest.

Thanks.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.