Hi, i was searching for a correct url regex, but unsure how would i do it. I am not very expert in regex. So can't be sure that this will always work. I wanted to be able to match even inner urls, such as:
http://google.com
http://www.google.com
http://google.com/something?some=unsome&w=anything

i only wanted to allow http(s) but not gopher or news protocol. Can someone help, please?

$allowed_chars = '[a-zA-Z0-9+._-]';
$protocol = '(http|https)://'
$w3 = "(www.)?";
$subdomain = "{$allowed_chars}*\.?";
$domain = "{$allowed_chars}*\.";
$domain_format "{$allowed_chars}{2,3}";
$country_code = "{$allowed_chars}{2,3}";
$end_parts = "({$allowed_chars}/&:\?)*";

I came up against this same issue myself recently, and finally settled on using cURL to determine whether a URL was valid, or not.

If you open a cURL connection to the URL, and check for the correct http headers, you can be sure it exists. Likewise, if you receive something like a 404, you know the URL doesn't exist.

A good example can be found here. I however would be inclined to accept 301, 302, etc as valid URLs too.

Cheers,
R

commented: Good idea. +3

Good Idea Robothy, thanks! :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.