Hi, new to Perl. Learning regular expressions. Im trying to validate a form field using regular expressions. Field criteria is:

- Begin with a letter
- 4-8 characters long
- must include at least 1 digit

So far Ive got:

$fieldValid=$username=~/^\D\w{3,7}\d+/;

I read this as "beginning with any non-digit [^0-9], containing between 3 and 7 alphanumeric characters and having one or more of the previous character which is any digit"

I dont understand it, but \w{3,7} seems to satisfy having 4-8 alphanumeric long?
The following test usernames fall over:

a1bcdefg
ab1cdefg
abc1defg

The following are accepted:

abcd1efg
abcd1efg
abcdef1g
abcdefg1

So the location of the digit seems to be a factor, but not sure why.

You may have to define 'character' a little better for your purposes. (Will you allow *?!@#$%^&()<>:;'~`)?

But the following works, while a bit kludgy:

#! /usr/bin/perl
use strict;
use warnings;

# Begin with a letter
# 4-8 characters long
# must include at least 1 digit

while (<DATA>)
{
	chomp;
	if ( (length($_) >= 4) && (length($_) <= 8) )
	{
		if ( /^[a-z]/)
		{
			if ( /\d+/g)
			{
				print "$_ : OK\n" ;
			}
		}
	}
	else
	{
		print "$_ : NO\n";
	}
}

__DATA__
1ab
abc
a1b
ab1
abc1
1abc
a1bc
ab1c
1a1bcdefg
a1bcdefg
ab1cdefg
abc1defg
abcd1efg
abcd1efg
abcdef1g
abcdefg1
a1234567
12345678
123456789
abcdefgh
abcdefghi

/^\D\w{3,7}\d+/;

the above means:

^\D starts with one non-digit character
\w{3,7} followed by 3 to 7 word characters, same as a-zA-Z0-9_
\d+ followed by one or more digits

what you probably want is is two regexps:

/\d/ has at least one digit

/^[a-zA-Z][a-zA-Z0-9]{3,7}/;

Not sure how to use two regexps. Can you put them in the same statement like so:

$fieldValid=$username=~/^[a-zA-Z][a-zA-Z0-9]{3,7}/ && /\d/;

or would I have to use a while loop, something like:

$formValid=1;
while ($username=~/\d/){
     	$fieldValid=$username=~/^[a-zA-Z][a-zA-Z0-9]{3,7}/;
        unless ($fieldValid)
        {$formValid=0
         }
}$formValid=0;

Which I read as "while there is a digit in $username, test the regular expression which if satisfied returns $fieldValid = true. If not satisfied, $fieldValid = false. Unless $fieldValid is true, then set $formValid = 0. If there is NOT a digit in $userName, then set $formValid = 0".

If the second way works, it sounds like a long winded way of doing it.

if ($username =~ /\d/ &&  $username =~ /^[a-zA-Z][a-zA-Z0-9]{3,7}/) {
   $username is good do whatever you want
}
else {
   $username is bad
}

But that might still not be good enough. The above will match strings like:

a1111111111111111111111111111111111111111111111111111111111111.....


if you need to match a specific length you have to add the end of string anchor ($) to the second regexp:

if ($username =~ /\d/ &&  $username =~ /^[a-zA-Z][a-zA-Z0-9]{3,7}$/) {
   $username is good do whatever you want
}
else {
   $username is bad
}

you could do something like

if (str =~ /^[a-z](?:\d\w{2,6}|\w{2,6}\d|\w\d\w{1,5}|\w{2}\d\w{0,4}|\w{3}\d\w{0,3}|\w{4}\d\w{0,2}|\w{5}\d\w?)$/i)
{
     ... stuff ...
}

but i would rather

if (str =~ /^[a-z](\w{3,7})/ )
{
    if ($1 =! /\d/ || $1 =! /[A-Z]/ || $1 != /[a-z]/)
    { 
        die "must have one number, cap, and lowercase!\n";
    }

    ... stuff ...
}

hmm.... my long regex above is cut off by the webcode formatting.

that's kind of crappy. and all that after everyone bitches about people not using the webcode tags :P

anyhow, you'll have to click the "toggle plain text" to see it.

>that's kind of crappy.
Yes, it is. You should take advantage of the /x modifier. :icon_rolleyes:

Hi guys, thanks for all your responses. Think I found what I was after, all in one regex.

The one offered by jephthah allows non-letters at the start, so &? etc allowed.

I managed to use a condensed regex using a lookahead:

/(?=^[a-zA-Z0-9]{4,8}$)[a-zA-Z][a-zA-Z0-9]{0,2}\d+/

Curious about the /x modifier tho, because I have another regex thats taking up about two page widths - is /x a way to split your code over 2 or more lines?

The one offered by jephthah allows non-letters at the start, so &? etc allowed.

what are you talking about? it does not in any sense allow non letters at the start, or anywhere else for that matter ... it will allow the underscore "_" in any position other than the first, which is typically allowed (and desired) anywhere alphanumerics are allowed.

and i'm afraid to tell you, your last attempt won't work at all, for anything. maybe mine ain't pretty, but it works.

Curious about the /x modifier tho, because I have another regex thats taking up about two page widths - is /x a way to split your code over 2 or more lines?

yeah, narue just busted me. it means "ignore whitespace" ... so you can put a CR/LF in there for readability.

I put it in my test script as follows:

#! /usr/bin/perl -Tw
use strict;
use warnings;

# Begin with a letter
# 4-8 characters long
# must include at least 1 digit

while (<DATA>)
{
	chomp;
	if ($_ =~/[a-z](?:\d\w{2,6}|\w{2,6}\d|\w\d\w{1,5}|\w{2}\d\w{0,4}|\w{3}\d\w{0,3}|\w{4}\d\w{0,2}|\w{5}\d\w?)$/i)
	{
		print "$_: OK\n";
	}
	else
	{
		print "$_: NO\n";
	}
}


__DATA__
a
ab
abc
abcd
abcdefghi
abcde1fgh
1abc
$abcd1
?abcd1
abcd1&
abcd1?
Abc1
a123
a1234567

And ran it in my command line using $ perl -Tw validusername_test.pl and it gave me the following output:

a: NO
ab: NO
abc: NO
abcd: NO
abcdefghi: NO
abcde1fgh: OK
1abc: NO
$abcd1: OK
?abcd1: OK
abcd1&: NO
abcd1?: NO
Abc1: OK
a123: OK
a1234567: OK

To me, the $abcd1 and ?abcd1 are accepted, when they shouldnt be, because they dont start with a letter. Am I missing something?

Also in your regex and the one I used, abcd1& and abcd1? are rejected when they shouldn't be, so yes youre right mine doesn't work as it should.

mine should not in any way accept either & or ? or any other non characters, other than the _ underscore and then only after the first position.

ive used RegexBuddy to validate. theres no way it can allow it. i dont know why you're seeing what you're seeing if it is somehow allowing it.

/^     // start
[a-z]  // single letter
(?:     // match any one of the following:
\d\w{2,6}|   //either
\w{2,6}\d|   //or
\w\d\w{1,5}|   //or
\w{2}\d\w{0,4}|   //or
\w{3}\d\w{0,3}|   //or
\w{4}\d\w{0,2}|   //or
\w{5}\d\w?   //or
)
$/i  //end, case insensitive

this does allow, as i mentioned, the "_"... this is typically standard, as it seems you are testing for a valid username and/or password.

if you do not wish to allow underscores, replace the \w with [a-zA-Z0-9]


FULL DISCLAIMER: this is ugly, I don't like it, and i'm sure there's a better way. but it does work, and as always, TMTOWTDI.


but i would rather

if (str =~ /^[a-z](\w{3,7})/ )
{
    if ($1 =! /\d/ || $1 =! /[A-Z]/ || $1 != /[a-z]/)
    { 
        die "must have one number, cap, and lowercase!\n";
    }

    ... stuff ...
}

Your second "if" condition contains some errors. "=!" is not a valid perl operator and "!=" is the wrong operator, they should all be "!~".

Your second "if" condition contains some errors. "=!" is not a valid perl operator and "!=" is the wrong operator, they should all be "!~".

oops, dangit. you're right.

i thought that looked odd for some reason, but it was late, and i was in a hurry to go to bed.

that's two, now.

:embarrass:

what was wrong with my previous suggestion?

use strict;
use warnings;
while (<DATA>) {
   chomp;
   if (/\d/ &&  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/) {
      print "$_ GOOD\n";
   }
   else {
      print "$_ BAD\n";
   }
}
__DATA__
a
ab
abc
abcd
abcdefghi
abcde1fgh
1abc
$abcd1
?abcd1
abcd1&
abcd1?
Abc1
a123
a1234567

okay, bradleykirby, i see the problem.

you did not put the carat ^ at the beginning, where it should be.

oops, dangit. you're right.

i thought that looked odd for some reason, but it was late, and i was in a hurry to go to bed.

that's two, now.

:embarrass:

hehehe.... sounds like something I would do. Watch out for those post-in-hurry-to-get-to-bed-it's-late replies. ;)


what was wrong with my previous suggestion?

/\d/ &&  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/

um... i don't know?

it's clearly more elegant than mine.

and doesn't pedantically force him to accept my idea of underscores.

i think yours is the winner

i dont know why you're seeing what you're seeing if it is somehow allowing it.

look closely and you will see why:

=~/[a-z]

that is the code he posted, not you.

okay, bradleykirby, i see the problem.

you did not put the carat ^ at the beginning, where it should be.

Ahh, you caught it. Very good. :)


Also in your regex and the one I used, abcd1& and abcd1? are rejected when they shouldn't be, so yes youre right mine doesn't work as it should.

You need to think about what your requirements are and post them clearly, I think we are all under the impression that the string can only contain digits and alphas (must start with an alpha) and can only be a certain length.

page three!

lol.

i think Kevin's is the best solution. mine works, but is ugly.

what was wrong with my previous suggestion?

use strict;
use warnings;
while (<DATA>) {
   chomp;
   if (/\d/ &&  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/) {
      print "$_ GOOD\n";
   }
   else {
      print "$_ BAD\n";
   }
}
__DATA__
a
ab
abc
abcd
abcdefghi
abcde1fgh
1abc
$abcd1
?abcd1
abcd1&
abcd1?
Abc1
a123
a1234567

end quote.

Nothing, the regex works fine, but I can't get this to work in my script. Im doing the validation in a subroutine:

sub validateForm
{   $failedFields="";
    $formValid=1;

    $fieldValid=$username=~/\d/ &&  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/;
    unless ($fieldValid)
    { $failedFields.="Username,";
        $formValid=0
    }

        return $formValid;
}

I have an if statement that sends the user to register the form if the validateForm sub returns true, or to an output failed message if false.

The error message I get is " Use of uninitialized value in pattern match (m//) at" referring to the line

$fieldValid=$username=~/\d/ &&  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/;

Thats when I read up about lookaheads being able to combine regex's.

use some parentheses


.

You need to think about what your requirements are and post them clearly, I think we are all under the impression that the string can only contain digits and alphas (must start with an alpha) and can only be a certain length.

Begin with a letter.
4-8 characters - is this the source of confusion perhaps... I read 4-8 characters as containing anything including symbols. Maybe thats where Im wrong - characters should only be [a-zA-Z0-9] and not contain quantifiers?

that sounds a bit ambiguous.

if you're doing a username, convention is to allow letters, numbers, and underscores only ... and to insist on a letter only for the first position

in regex, the '\w' will search for just that.

if youre doing a password, convention is to allow any and all characters to occur at any place within the string.

your first post indicated you were getting a username.


.

use some parentheses


.

Bingo!

awesome.

now we might get this to four pages!

:-D

The trees would be hating it!

:)

sub validateForm
{      $failedFields="";
	if ($username =~ /\d/ && $username  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/) {
             return 1;
        }
        else {
	     $failedFields .= "Username,";
             return 0;
	}
}

Looks like you're not using the "strict" pragma, you should. If you need to add more characters/symbols into the {3,7} range just add them to the character class, or add \W which is the opposite of \w, to add all symbols.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.