Hi,
I'm new to CURL and I'm trying to use it to crawl a page.
I checked the HTTP Headers for the post and noticed there was an initial POST, then a redirection, which is followed by a GET. During the redirection, cookies are set.
I have tried doing this in CURL but I am not able to retrieve the contents of the page. Any help is appreciated.
Here is my code
**********************************************************
$file_path= tempnam("/tmp","cookies.txt");
$url = "http://www.myurl.com";
$postData = "myVar=1&myVar2=2";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $file_path);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$html= curl_exec($ch);
curl_close($ch);
echo $html;
*****************************************************************
Here is the feed when I use Live HTTP Headers on the page I'm trying to crawl
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
http://www.myurl.com/example
POST /example HTTP/1.1
Host: www.myurl.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.myurl.com/homepage.shtml
Content-Type: application/x-www-form-urlencoded
Content-Length: XX
ipaddr=X.X.X.X&myvar1=1&myvar2=2
HTTP/1.x 302 Moved Temporarily
Connection: close
Date: Mon, 07 Apr 2008 17:16:32 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Location: https://www.myurl.com/page2.jsp;jsessionid=XXXXXXYYYYYYYZZZZZZZ?myvar1=1&myvar2=2
Content-Type: text/html
Set-Cookie: JSESSIONID=XXXXXXYYYYYYYZZZZZZZ; path=/
Set-Cookie: LANGUAGEID=1; expires=Thursday, 07-Apr-2009 17:16:32 GMT; path=/
Set-Cookie: REGIONID=10; expires=Thursday, 07-Apr-2009 17:16:32 GMT; path=/
Set-Cookie: LANGUAGEID=1; expires=Thursday, 07-Apr-2009 17:16:32 GMT; path=/
Set-Cookie: REGIONID=10; expires=Thursday, 07-Apr-2009 17:16:32 GMT; path=/
https://www.myurl.com/page2.jsp;jsessionid=XXXXXXYYYYYYYZZZZZZZ?myvar1=1&myvar2=2
GET /page2.jsp;jsessionid=XXXXXXYYYYYYYZZZZZZZ?myvar1=1&myvar2=2 HTTP/1.1
Host: www.myurl.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.myurl.com/homepage.shtml
Cookie: JSESSIONID=XXXXXXYYYYYYYZZZZZZZ; LANGUAGEID=1; REGIONID=10
HTTP/1.x 200 OK
Connection: close
Date: Mon, 07 Apr 2008 17:16:32 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Type: text/html
Set-Cookie: LANGUAGEID=1; expires=Thursday, 07-Apr-2009 17:16:32 GMT; path=/
Set-Cookie: REGIONID=10; expires=Thursday, 07-Apr-2009 17:16:32 GMT; path=/