Hey guys, this seems to be a thorn in my side. I've been working on scraping a website which uses aspx and has eventvalidation/viewstate inputs. Every other scraping experiment I've made was not this difficult. Maybe one of you geniuses here at Daniweb has an idea of how to solve this?
I've managed to get Selenium/PHPUnit to automate the process of opening the browser, typing in the URL, filling out the required fields (todays date), and then landing on the page I need to scrape and can get the viewstate and eventvalidation values from any of those pages.
I've been researching for hours some way to scrape the resulting page and have come up with several (useless?) ideas... Find a function in PHP which can scrape the currently active (fully loaded) page, Attempt to use xml/javascript to then make an XMLhttprequest, and a few other random ones (none of which have worked correctly so far, obviously).
I'm now trying to figure out how cURL works to maybe 'emulate' a live user, but I have no idea how the structuring works for my specific example. I can use firefox/firebug and look at the network tab, which shows the request headers and request body. I can even right-click this event and 'copy as cURL', but I have no idea what to do with any of these values. it appears the values I need are __VIEWSTATE / __EVENTVALIDATION (which I have put into vairables), the current date twice (txtStartDate and txtThruDate), and btnSearch=search... as far as I know thats it... Once I get the html in DOM form I already have to code to scrape it using Simple HTML DOM Parser. I've looked at this link which seems pretty close to what I need, but am not sure how to format it http://stackoverflow.com/questions/15337197/trying-to-connect-to-aspx-site-using-curl (specifically the first responder is what seems like the possible correct way of looking at it?).
If anyone has any idea what I'm talking about or want me to clarify anything, please let me know! Lost a lot of hair over this one.