Anyone having experience in MySQL and REGEXP ?

I have 1 input field, "what"

Now, the value from the "what" field can contain several substrings and I like each of them to be searched. Can this be done by using REGEXP or do I need to split the "what" input string into arrays before searching ?. Or is there other tricks using the LIKE clause ?

Another issue while talking about REGEXP in MySQL. Let's say I have the input value "e-mail" and the database field contains the value "email". There must be some way to put some variants into the expression to accept the "-" and match the values.

Any examples is welcome.

what = Request.Form("what")


WHERE searchwords LIKE ' % " & what & " % ' "

WHERE searchwords REGEXP '[[:<:]]"&what&"'"

.... just to "simplyfy" my question a bit :-)


If the Input is ("e-mail") then I like to match db column value ("email")
And reverse:
If the Input is ("email") then I like to match db column value ("e-mail")

The string "e-mail" is just an example, it can can be any other words, and the hyphen can be anywhere in the word OR NOT.

I have busted my head off trying some acrobatic expressions for this operations, So any help or examples are extremely welcome.

The period is a wild card. You would first need to replace the string that you are working with. replace all "-" for "." and then setup your regex. This way it will match: email and e-mail. However, it will not match em-ail or -email. It has to be where it is, hence the wildcard.

Hi there :-)
Thank you for your time on this.

To handle spaces I also do replacements for that. This part is interesting
becourse it respond to both existing and NON existing spaces in database.

The hyphen "-" in the INPUT does only respond to existing hyphens in the database. Don't know if my thinking or coding is wrong, but the hyphen should be threathed like the space.

As long as the hyphen "-" exists in the INPUT string there is no fuzz.
----------------------------------------
Input:
e-mailmarketing OR e-mail marketing

Output:
e-mailmarketing
e-mail marketing
emailmarketing
email marketing


Now, without the hyphen "-"
----------------------------------------
Input
emailmarketing OR email marketing

Output:
emailmarketing
email marketing

whatinput = "e-mailmarketing"

If InStr(whatinput,"-") <> 0 Then
whatinput = Replace(whatinput,"-","[\-|\.]?")
what = whatinput

If InStr(what," ") <> 0 Then
what = Replace(what," ","[\ ]?")
End If

Else
what = whatinput
End If

Database query

REGEXP '[[:<:]]"&what&"'"

I must make an reverse on my previous post. It is obvious why not the hyphens been handled as the space. The space is replaced by space, if it exists.

The question remain, how to identify and match hyphens in database if no hyphens exists in the INPUT string.

you would have to remove them, or use some very complex regex which would be extremely slow. Something would be like:
[a-Z|-] for each character. You would end up having to search every character, and allow the possibility of it being a hyphen. My suggestion.. forget about it or remove hypens from words, or replace words that are common with a common spelling:

email -> e-mail

It keeps consistency, and no one really cares.

Otherwise you are looking at 10 minute load times for records in excess of 1,000, if those strings are more than 200 characters long.

OK, guess you have a good point there.

I will give it a few more shots then I give up. :-)

I know you are extremely talented with database queries.
It would be great if you have a minute to help me analyse this question

I have 2 fields for input
1. what, anything that may identify a company, by name, words etc.
2. where , can contain strings as "statename", or "countyname" or "cityname". So this means, in my case searching for matches in 3 different tables looking for statename, county or city or specific addresses.

Within one single SQL query there is no problem searching for one at time.
I have tried using OR operations, but it doesn't works very well. And I try to limit the input fields to a minimum.

This example doesn't works:
AND KO.county LIKE '%"&where&"%' OR RG.state LIKE '%"&where&"%' "

I don't expect a full specification, but if you have a few hints how to start building on this handle different inputs and searching in different tables and keep the database connections to a minimum.

SQL = "SELECT CL.organisation, CA.adress,CA.postnr,CA.ort,RG.state,RG.stateID,KO.county,KO.countyID "&_
"FROM tbclients CL, tbclientaddress CA, tbstate RG, tbcounty KO "&_
"WHERE CL.clientID = CA.clientlink "&_
"AND CA.state = RG.stateID "&_
"AND CA.county = KO.countyID "&_
"AND CL.searchwords REGEXP '[[:<:]]"&what&"'" &_
"AND KO.county LIKE '%"&where&"%' "

You will only have one connection, but multiple record sets.

Change this a bit to do a more sophisticated join:

SQL = "SELECT CL.organisation, CA.adress,CA.postnr,CA.ort,RG.state,RG.stateID,KO.county,KO.countyID "&_
"FROM tbclientaddress CA JOIN tbclients CL ON CA.clientlink = CL.clientID JOIN tbstate RG ON CA.state = RG.stateID JOIN tbcounty KO ON CA.county = KO.countyID "&_
"WHERE CL.searchwords REGEXP '[[:<:]]"&what&"'" &_
"AND KO.county LIKE '%"&where&"%' "

Anyway, the reason why this won't work is because you are missing parenthesis:

AND (KO.county LIKE '%"&where&"%' OR RG.state LIKE '%"&where&"%') "

The reason for that it is failing at the or part, as it has no way to correctly organize the where segment. By adding the parenthesis, you are allowing it to be: (AND county=) or (And state=).

There is no medal big enough for you !

Many thanks !

I will send you a message if I come up with something useful in the Regexp issue.

Well for the regex, what you can do is do the search for the hyphenless version. Then when inside the search, do a regex replace command to remove all hyphens. That way it will match regardless

Okidoki
Not sure I fully understand that.

Can the REPLACE be used at the column name like this

AND REPLACE(CL.searchwords, "-",".") REGEXP '[[:<:]]"&what&"'"

I don't use this stuff, so I don't truly know. But yes, I do think you can do that. However, make sure noquotes and it is single quotes. Also make sure that you are not replacing it with a period, but rather an empty string ( '' ).

OK
I will take deep dive into this and check the MySQL reference as well. I'll get back with some feedback.

Thanks again

oh if it's MySQL, then yes there will need to be changes.

I did a quick check at MySQL and REPLACE is used in UPDATE operations as far as I can see now.

REPLACE is a special keyword, what it says is that it will insert the record if it does not exist, however, if it does then it will overwrite or REPLACE it.

That's why I said in MySQL it will be different. Not sure what it is for though ^^

I may do some labs on this .. :-)
I am also on the EXCLUDE functions. But the examples I have found yet in this field is hairy. Probably easier to learn chinese.

And yes sir, you have earned a golden plate in Acrobatic Server Pages and Extreme Expressions

Input:
emailmarketing

Output:
e-mailmarketing
emailmarketing

REPLACE(CL.searchwords,'-','') REGEXP '[[:<:]]"&what&"'"


I don't know how this affecting a table query with 10000 records, but it works.

You can download some software that will check where your bottle necks are, to see what lines are taking the most time to complete. However, glad to hear it's working.

Thanks

Have you ever considered start working in the programming business ;-)

Looking forward to hear from you again.

yeah, but no one will hire me.

it's crappy.

Well, I don't really believe that :-)

I have a biz idea if you are interested ?. Not sure if it work, but if it does ..?

Give me a couple of days to phrase it in proper english and get back to you.

I know I have put "solved" to this thread but I put my hope to some answer anyway.

I have a new challenge in my search facility.

The input field "where" are planned to be used for different searches as state, county, city or addresses.

Now, the address format used up here in the Northpole are as followed:
streetname [space] streetnumber [space] zipcode [space] city

Like this:
Coldstreet 123 12345 Freezytown

So if the input are:
where = "Coldstreet 123 12345 Freezytown"

I like to replace "Freezytown" with "" becourse I can identify unique records anyway by Coldstreet 123 and zipcode 12345, which have to be splitted for search in database.

If the input are:
where = "Coldstreet 123 12345"

Leave it as it is but it must be splitted for search in database. The Coldstreet 123 alone is not unique, but together with zipcode 12345 it is.

If the input are:
where = "Coldstreet 123"
Or
where = "12345"
Or
where = "Freezytown"

Leave it just as it is


Example code of the splitfunction so far..

where = "Coldstreet 123 12345 Freezytown"
Words = Split(where, " ")
For i = 0 to Ubound(Words)
 Response.Write i & " = " & Words(i) & "<br>"
Next

Output:
0 = Coldstreet
1 = 123
2 = 12345
3 = Freezytown

The db query so far...

SQL = "SELECT CL.organisation, CA.adress,CA.zipcode,CA.city,RG.state,RG.stateID,KO.county,KO.countyID "&_
"FROM tbclientaddress CA "&_
"JOIN tbclients CL ON CA.clientlink = CL.clientID "&_
"JOIN tbstate RG ON CA.state = RG.stateID "&_
"JOIN tbcounty KO ON CA.county = KO.countyID "&_
"WHERE REPLACE(CL.searchwords,'-','') REGEXP '[[:<:]]"&what&"'" &_
"AND (REPLACE(CA.zipcode,' ','') REGEXP '[[:<:]]"&where&"' "&_
"OR REPLACE(KO.county,'-','') REGEXP '[[:<:]]"&where&"' "&_
"OR REPLACE(RG.state,'-','') REGEXP '[[:<:]]"&where&"') "

The splitting function so far...

I am not yet satisfied.

It works quiet well if the city doesn't contain any space:

Also I must probably set some limitations to this while if a string holds a streetname with a missing streetnumber must at least be followed by a zipcode, otherwise it would be threathed as a cityname.
And a cityname must have a zipcode before or nothing.

Where = "Coldstreet 123A 12345 Freezytown"

Words = Split(Where, " ")

For i = 0 to Ubound(Words)

If i = 0 AND i < UBound(Words) AND IsNumeric(Words(i)) = False Then
 Response.Write i & " = Street: " & Words(i) & "<br>"
End If

If i = 1 AND i < UBound(Words) AND IsNumeric(Left(Words(i),1)) = True AND Len(Words(i)) < 5 Then
Response.Write i & " = Streetnumber: " & Words(i) & "<br>"
 ElseIf i = UBound(Words) AND IsNumeric(Left(Words(i),1)) = True AND Len(Words(i)) < 5 Then
Response.Write i & " = Streetnumber: " & Words(i) & "<br>"
End If

If i = 2 AND i < UBound(Words) AND (IsNumeric(Words(i)) = True) AND Len(Words(i)) >= 5 Then
 Response.Write i & " = Zipcode: " & Words(i) & "<br>"
 ElseIf i = UBound(Words) AND (IsNumeric(Words(i)) = True) AND Len(Words(i)) >= 5 Then
 Response.Write i & " = Zipcode: " & Words(i) & "<br>"
End If

If i > 0 AND i = UBound(Words) AND IsNumeric(Left(Words(i),1)) = False Then
 Response.Write i & " = City: " & Words(i) & "<br>"
ElseIf i = 0 AND i = UBound(Words) AND IsNumeric(Left(Words(i),1)) = False Then
 Response.Write i & " = City: " & Words(i) & "<br>"
End If

Next

The final version (maybe)

As long as there is no spaces in streetname or cityname it seems to work pretty much as I wanted.

If anyone comes up with some ideas how deal with spaces it would be welcome.

Where = "Coldstreet 123A 12345 Freezytown"

Words = Split(Where, " ")

For i = 0 to Ubound(Words)

If i = 0 AND i < UBound(Words) AND IsNumeric(Words(i)) = False Then
 Response.Write i & " = Street: " & Words(i) & "<br>"
End If

If i = 1 AND i < UBound(Words) AND IsNumeric(Left(Words(i),1)) = True AND Len(Words(i)) < 5 Then
Response.Write i & " = Streetnumber: " & Words(i) & "<br>"
 ElseIf i = UBound(Words) AND IsNumeric(Left(Words(i),1)) = True AND Len(Words(i)) < 5 Then
Response.Write i & " = Streetnumber: " & Words(i) & "<br>"
End If

If i > 0 AND i < UBound(Words) AND (IsNumeric(Words(i)) = True) AND Len(Words(i)) >= 5 Then
 Response.Write i & " = Zipcode: " & Words(i) & "<br>"
 ElseIf i = UBound(Words) AND (IsNumeric(Words(i)) = True) AND Len(Words(i)) >= 5 Then
 Response.Write i & " = Zipcode: " & Words(i) & "<br>"
End If

If i > 0 AND i = UBound(Words) AND IsNumeric(Left(Words(i),1)) = False Then
 Response.Write i & " = City: " & Words(i) & "<br>"
ElseIf i = 0 AND i = UBound(Words) AND IsNumeric(Left(Words(i),1)) = False Then
 Response.Write i & " = City: " & Words(i) & "<br>"
End If

Next

With city, do a split at the space, and then do the following:

For numero = 0 To UBounds(array)
sql &= " OR city LIKE '%" & array(numero) & "%'
If numero > 0 Then sql &= " OR city LIKE '%" & array(numero) & "%" & array(numero-1) & "%'"
Next

Thanks
I know your examples ALWAYS working

To be absolute honest, I dont don't really know where to put it =-(

I must have seen to many lines of code the past days. Sorry for hanging to much weight on your time and effort.

Where = "Coldstreet 123A 12345 Freezytown"

Okay, now obviously this is:

123 Coldstreet (apt. A??)
Freezytown, (State?), 12345

Right?

The letters like "A" is used sometimes in streetnumbers and in those cases it is a part of the number, like the 123A or 123 A.

The format is:
streetaddress, streetnumber, zipcode, city

Some actually use the apt in addresses but it's rare, and it then looks like this:
streetaddress, streetnumber aptnumber, zipcode, city

The database holds the streetname and streetnumber together in the same column.

I guess the US format is as you write with the streetnumber first, followed by streetname.

Yes it is. Okay then this is how I would do it:

'Prebuilt SQL query is set to sql up to WHERE clause

Dim array()

array = words.split(" ")

For numero = 0 To UBounds(array)
If numero = 0 Then sql = sql & " street LIKE '%" & array(numero) & "%'
If numero = 1 Then sql = sql & " OR streetnumber LIKE '%" & array(numero) & "'"
If numero = 2 Then sql = sql & " OR zipcode LIKE '%" & array(numero) & "%'"
If numero = 3 Then sql = sql & " OR city LIKE '%" & array(numero) & "%'"
If numero > 3 Then sql = sql & " OR city LIKE '%" & array(numero) & "%" & array(numero-1) & "%' OR city LIKE '%" array(numero) & "%'"
Next

execute(sql)
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.