I've been programming a web crawler for a while, I'm almost done, it works perfectly but when it crawls vbulletin forums i get weird urls

example:

forum/index.php?phpsessid=oed7fqnm9ikhqq9jvbt23lo8e4
index.php/topic,5583.0.html?phpsessid=93f6a28f192c8cc8b035688cf8b5e06d

obviously this is being causes by php session IDs

what can I do to stop this?
I tried using cookies with HTTP::Cookies but the problem persists.

thanks

Actually, using cookies fixed it, but the site has to be requested twice (the first time to get the cookie and the second one to get normal links).
I will leave this thread opened in case someone knows about another solution.

I've been programming a web crawler for a while, I'm almost done, it works perfectly but when it crawls vbulletin forums i get weird urls

example:

forum/index.php?phpsessid=oed7fqnm9ikhqq9jvbt23lo8e4
index.php/topic,5583.0.html?phpsessid=93f6a28f192c8cc8b035688cf8b5e06d

obviously this is being causes by php session IDs

what can I do to stop this?
I tried using cookies with HTTP::Cookies but the problem persists.

thanks

If the phpsessid always occurs at the end of the url you could remove it with a regex substitution like this:

#!/usr/bin/perl;
use strict;
use warnings;

my $url = 'index.php/topic,5583.0.html?phpsessid=93f6a28f192c8cc8b035688cf8b5e06d';

$url =~ s/\?phpsessid=\w+$//;

print $url;

Go for security HTTPS. Try in this.Clear all cookies

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.