I am building a program that will 'integrate' with a website I have built, the user wants a client based app that can be used to upload text from a .doc file to the website.
The website uses a database to store the text in HTML format.
The HTML produced by word is bloated and pretty useless to any program other than word (But then you already know that)
What I want, is to take the HTML output from Word, strip it of all the unneeded tags and send it to the DB (that last bit I can do)
Or even better, take the text and formatting straight from a .doc, and strip all formatting other than the basic ones (bold, italics, underlines, tables etc...)
Apart from a short look at VB.net in college, I haven't really used it, no idea if this is feasible, but Google doesn't bring up much useful information.
Any ideas?