Hi! I just inherited a rather large legacy site here at work that has no database behind it. It's a large volume of HTML pages with the content written right into the HTML page. I need to extract the content and bring it into a database, or XML files.
Each section of the HTML pages has header tag and a standard title, so I'm thinking I should write a perl script to parse the pages based on header tags and insert them into MYSQL.
Before I begin, I thought I'd check with you guys to see if you have had any similar experience and recommendations.
Thanks!
Tom Tolleson