Hi forum,
I'm a very old dog trying to learn some new tricks, and could use some help.
I want to do some sophisticated web page scraping which involves, among other things, analysing the format attributes of web page text. The formating of course can be specified inline using HTML or in CSS files, and any browser has the intelligence to combine these instructions and decide how to display web pages.
Clearly I dont want to have to replicate this intelligence - I want access to the results of the browser's intelligence. I've seen exactly the data I want using Firefox DOM Inspector - this displays the DOM itself, and what it calls the 'computed style' of a piece of text in full detail. I presume that these 'computed style' values are either held directly on its DOM model, or are in some way accessible through it.
What I'm looking for is the ability to use these sophisticated Firefox capabilities from within my external python module. How do I do it?
I haven't looked closely yet at the tools in the Python libraries - they can form a DOM, but I doubt they will do the format interpretation - am I right?
I have read briefly about various Firefox add-ons which incorporate Python within Firefox, but it seems that these are primarily designed to allow the use of Python as a scripting language within the browser. Can these add-ons be turned on their heads to allow an external Python programme to gain access to Firefox objects, and possibly to direct Firefox actions?
Or perhaps there's an entirely different way to crack it?
It's great to think someone might actually have the answer! Thanks in advance for your help.
kenny