Matching Strings With Wildcards

NathanOliver 4 Tallied Votes 480 Views Share

Hey All

This code is for matching a string with a wildcard in it to another string. If you want to have a Astrix in the first string be counted as a character and not as a wildcard put a \ in front of it. I have tested it for quite a few different possibilities and it has worked for what I have tested it with. Include is a short main function demonstrating it working. Fell free to use it if you want but I CAN'T guarantee that it is bug free.

Nathan

Ancient Dragon commented: Good job :) +28
Zoon commented: Very useful :) +1
// WildcardCompare.h
#ifndef WILDCARDCOMPARE_H
#define WILDCARDCOMPARE_H

#include <string>
#include <vector>

bool WildcardCompare(std::string searchTerm, std::string checkAgaints)
{
    bool found = false;
    std::vector<std::string> parts;
    std::string temp;
    if (searchTerm == "*")
        return true;
    if (searchTerm.size() - 1 > checkAgaints.size())
        return false;
    std::string::const_iterator it = searchTerm.begin(), end = searchTerm.end();
    size_t counter = 0;
    while (it != end)
    {
        if (*it != '*')
            temp += *it;
        if (*it == '*')
        {
            parts.push_back(temp);
            temp = "";
        }
        it++;
    }
    parts.push_back(temp);
    std::vector<std::string>::const_iterator vecIt = parts.begin(), vecEnd = parts.end();
    if (!parts[0].empty())
    {
        if (parts[0] != checkAgaints.substr(0, parts[0].size()))
            return false;
    }
    else
        vecIt++;
    size_t size = checkAgaints.size();
    size_t pos = 0;
    size_t tempSize;
    while (vecIt != vecEnd)
    {
        temp = *vecIt;
        tempSize = temp.size();
        if (temp.empty())
            return true;
        while (pos + tempSize < size)
        {
            if (temp == checkAgaints.substr(pos, temp.size()))
            {
                checkAgaints.erase(0, pos + temp.size());
                found = true;
                break;
            }
            pos++;
        }
        if (found == false)
            return false;
        pos = 0;
        vecIt++;
    }
    return true;
}

#endif

// main.cpp
#include <iostream>
#include "WildcardCompare.h"

int main()
{
    std::string searchFor = "*my*day.*";
    std::string word = "this is my good day.txt";
    if (WildcardCompare(searchFor, word))
        std::cout << "works";
    else
        std::cout << "doesnt work";
    searchFor = "*my\*day.*";
    word = "my*day.txt";
    if (WildcardCompare(searchFor, word))
        std::cout << "works";
    else
        std::cout << "doesnt work";
    std::cin.get();
    return 0;
}
Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

>> searchFor = "*my\*day.*";

That will not compile -- if you want a \ in the string then you have to put two of them, so the code should be searchFor = "*my\\*day.*"; Otherwise the program appears to work ok for the tests I made. Now I think you should improve the program by allowing ? as a wild card that matches just a single character.

Excizted 67 Posting Whiz

Very useful, thank you :)

Excizted 67 Posting Whiz

I think there is some problems..
I tried the escaping of asterisk - didn't seem to work.
I tried to searchFor "*.php" and set word to "MyScript.php" - returned false.

NathanOliver 429 Veteran Poster Featured Poster

Thank you AD for your reply. The single slash in my code was a typo. As for having a ? act as a any single letter I am implementing that right now and I should have it finished shortly. I'll post the updated code once its done.

@ Excizted I fixed what was causing the problem and when I post the updated code "*.php" for searchFor and "MyScript.php" for word will work.

NathanOliver 429 Veteran Poster Featured Poster

Okay well I believe I have it all sorted out. I added using a ? as a single letter wildcard and it appears to work just fine. I wound up having to write another function that compared 2 strings if one had a ? in it. I just went through the string that had the ?'s in it and where ever there was a ? i replaced it with the letter in the second string at that same spot. The main function has not changed from my first post so I will just post my new WildcardCompare.h. If anyone else finds something it doesn't work for or isn't right let me know so i can try and fix it. Thanks.
test cases

"*this*my*\\**da?.php"
against
"this is my *good day.php"
outcome
true
"???.txt"
against
"bad.txt"
outcome
true

"\\*.txt"
against
"8.txt"
outcome
true

and lastly
"*.php"
against
"MyScript.php"
outcome
true
#ifndef WILDCARDCOMPARE_H
#define WILDCARDCOMPARE_H

#include <string>
#include <vector>

bool SingleWildcardMatch(std::string, std::string);

bool WildcardCompare(std::string searchTerm, std::string checkAgaints)
{
    bool found = false;
    std::vector<std::string> parts;
    std::string temp;
    if (searchTerm == "*")
        return true;
    if (searchTerm.size() - 1 > checkAgaints.size())
        return false;
    std::string::const_iterator it = searchTerm.begin(), end = searchTerm.end();
    size_t counter = 0;
    while (it != end)
    {
        if (*it == '*')
        {
            parts.push_back(temp);
            temp = "";
            it++;
            continue;
        }
        if (*it == '\\' && *(++it) == '*')
        {
            temp += "*";
            it++;
            continue;
        }
        temp += *it;
        it++;
    }
    parts.push_back(temp);
    bool singleWildcardPresent = false;
    if(temp.find_first_of("?",0) != std::string::npos)
        singleWildcardPresent = true;
    std::vector<std::string>::const_iterator vecIt = parts.begin(), vecEnd = parts.end();
    if (!parts[0].empty() && !singleWildcardPresent)
    {
        if (parts[0] != checkAgaints.substr(0, parts[0].size()))
            return false;
    }
    if (!parts[0].empty() && singleWildcardPresent)
    {
        if (!SingleWildcardMatch(parts[0], checkAgaints.substr(0, parts[0].size())))
            return false;
    }
    else
        vecIt++;
    size_t size = checkAgaints.size();
    size_t pos = 0;
    size_t tempSize;
    while (vecIt != vecEnd)
    {
        temp = *vecIt;
        tempSize = temp.size();
        if (temp.empty())
            return true;
        while (pos + tempSize <= size)
        {
            if (singleWildcardPresent)
            {
                if (SingleWildcardMatch(temp, checkAgaints.substr(pos, temp.size())))
                {
                    checkAgaints.erase(0, pos + temp.size());
                    found = true;
                    break;
                }
            }
            else
            {
                if (temp == checkAgaints.substr(pos, temp.size()))
                {
                    checkAgaints.erase(0, pos + temp.size());
                    found = true;
                    break;
                }
            }
            pos++;
        }
        if (found == false)
            return false;
        pos = 0;
        vecIt++;
    }
    return true;
}

bool SingleWildcardMatch(std::string first, std::string second)
{
    if (first.size() != second.size())
        return false;
    std::string::iterator firstIt = first.begin(), firstEnd = first.end();
    std::string::const_iterator secondIt = second.begin();
    while (firstIt != firstEnd)
    {
        if (*firstIt == '?')
            *firstIt = *secondIt;
        firstIt++;
        secondIt++;
    }
    if (first == second)
        return true;
    return false;
}
#endif
nezachem 616 Practically a Posting Shark

One problem with this code is that it can't match a literal '?'. Another is its complexity. I can't even try to understand how it works, let alone debug it.
This is how I would approach globbing:

#include "glob.h"

int globMatch(char * pattern, char * text)
{
    while(*pattern && *text) {
        char p = *pattern++;
        switch(p) {
        case '*':
            while(*text) {
                int rc = globMatch(pattern, text++);
                if(rc != MATCH_FAIL)
                    return rc;
            }
            return MATCH_FAIL;
        case '\\':
            if((p = *pattern++) == 0)
                return MATCH_ERROR;
            if(p != *text++)
                return MATCH_FAIL;
            break;
        case '?':
            text++;
            break;
        default:
            if(p != *text++)
                return MATCH_FAIL;
        }
    }
    return ((*pattern == 0) && (*text == 0))? MATCH_SUCCESS: MATCH_FAIL;
}

with an obvious glob.h:

#ifndef _GLOB_H_
#define _GLOB_H_

#define MATCH_ERROR    -1
#define MATCH_SUCCESS   0
#define MATCH_FAIL      1
        
#ifdef cplusplus
extern "C" {
#endif          
int globMatch(char * pattern, char * text);
#ifdef cplusplus    
}           
#endif      
        
#endif
Zoon commented: Useful post +1
Zoon 0 Newbie Poster

One problem with this code is that it can't match a literal '?'. Another is its complexity. I can't even try to understand how it works, let alone debug it.
This is how I would approach globbing:

#include "glob.h"

int globMatch(char * pattern, char * text)
{
    while(*pattern && *text) {
        char p = *pattern++;
        switch(p) {
        case '*':
            while(*text) {
                int rc = globMatch(pattern, text++);
                if(rc != MATCH_FAIL)
                    return rc;
            }
            return MATCH_FAIL;
        case '\\':
            if((p = *pattern++) == 0)
                return MATCH_ERROR;
            if(p != *text++)
                return MATCH_FAIL;
            break;
        case '?':
            text++;
            break;
        default:
            if(p != *text++)
                return MATCH_FAIL;
        }
    }
    return ((*pattern == 0) && (*text == 0))? MATCH_SUCCESS: MATCH_FAIL;
}

with an obvious glob.h:

#ifndef _GLOB_H_
#define _GLOB_H_

#define MATCH_ERROR    -1
#define MATCH_SUCCESS   0
#define MATCH_FAIL      1
        
#ifdef cplusplus
extern "C" {
#endif          
int globMatch(char * pattern, char * text);
#ifdef cplusplus    
}           
#endif      
        
#endif

Very interesting code.
I tried it out and with simple things it worked, eg. "Hello.*" will match "Hello.cpp".
What doesn't work is "MyFile*.mat.*" trying to match "MyFileForComputers.mat.php" which obviously should match.

Otherwise that is a nice approach.

Zoon 0 Newbie Poster

Okay well I believe I have it all sorted out. I added using a ? as a single letter wildcard and it appears to work just fine. I wound up having to write another function that compared 2 strings if one had a ? in it. I just went through the string that had the ?'s in it and where ever there was a ? i replaced it with the letter in the second string at that same spot. The main function has not changed from my first post so I will just post my new WildcardCompare.h. If anyone else finds something it doesn't work for or isn't right let me know so i can try and fix it. Thanks.
test cases

"*this*my*\\**da?.php"
against
"this is my *good day.php"
outcome
true
"???.txt"
against
"bad.txt"
outcome
true

"\\*.txt"
against
"8.txt"
outcome
true

and lastly
"*.php"
against
"MyScript.php"
outcome
true
#ifndef WILDCARDCOMPARE_H
#define WILDCARDCOMPARE_H

#include <string>
#include <vector>

bool SingleWildcardMatch(std::string, std::string);

bool WildcardCompare(std::string searchTerm, std::string checkAgaints)
{
    bool found = false;
    std::vector<std::string> parts;
    std::string temp;
    if (searchTerm == "*")
        return true;
    if (searchTerm.size() - 1 > checkAgaints.size())
        return false;
    std::string::const_iterator it = searchTerm.begin(), end = searchTerm.end();
    size_t counter = 0;
    while (it != end)
    {
        if (*it == '*')
        {
            parts.push_back(temp);
            temp = "";
            it++;
            continue;
        }
        if (*it == '\\' && *(++it) == '*')
        {
            temp += "*";
            it++;
            continue;
        }
        temp += *it;
        it++;
    }
    parts.push_back(temp);
    bool singleWildcardPresent = false;
    if(temp.find_first_of("?",0) != std::string::npos)
        singleWildcardPresent = true;
    std::vector<std::string>::const_iterator vecIt = parts.begin(), vecEnd = parts.end();
    if (!parts[0].empty() && !singleWildcardPresent)
    {
        if (parts[0] != checkAgaints.substr(0, parts[0].size()))
            return false;
    }
    if (!parts[0].empty() && singleWildcardPresent)
    {
        if (!SingleWildcardMatch(parts[0], checkAgaints.substr(0, parts[0].size())))
            return false;
    }
    else
        vecIt++;
    size_t size = checkAgaints.size();
    size_t pos = 0;
    size_t tempSize;
    while (vecIt != vecEnd)
    {
        temp = *vecIt;
        tempSize = temp.size();
        if (temp.empty())
            return true;
        while (pos + tempSize <= size)
        {
            if (singleWildcardPresent)
            {
                if (SingleWildcardMatch(temp, checkAgaints.substr(pos, temp.size())))
                {
                    checkAgaints.erase(0, pos + temp.size());
                    found = true;
                    break;
                }
            }
            else
            {
                if (temp == checkAgaints.substr(pos, temp.size()))
                {
                    checkAgaints.erase(0, pos + temp.size());
                    found = true;
                    break;
                }
            }
            pos++;
        }
        if (found == false)
            return false;
        pos = 0;
        vecIt++;
    }
    return true;
}

bool SingleWildcardMatch(std::string first, std::string second)
{
    if (first.size() != second.size())
        return false;
    std::string::iterator firstIt = first.begin(), firstEnd = first.end();
    std::string::const_iterator secondIt = second.begin();
    while (firstIt != firstEnd)
    {
        if (*firstIt == '?')
            *firstIt = *secondIt;
        firstIt++;
        secondIt++;
    }
    if (first == second)
        return true;
    return false;
}
#endif

Indeed this code is a bit complex.
It seems to work in many cases, I did find one that doesn't work though.
Let me try to reproduce it with a test string.

Edit: "D?or*.material" against "Door424.material" returns false.

NathanOliver 429 Veteran Poster Featured Poster

Ill look into it later today and see what's goimg on

Member Avatar for iamthwee
iamthwee

And this is why kids, building a regular expression function is so damn hard.

NathanOliver 429 Veteran Poster Featured Poster

@ Zoon - The first example you posted with "MyFile*.mat.*" against "MyFileForComputers.mat.php" worked fine with the code I have. The second example you posted with "D?or*.material" against "Door424.material" did return false and it was a variable name error. On line 40 of my second code post it should be

if(parts[0].find_first_of("?",0) != std::string::npos)

Thanks for catching that.

@ nezachem - I understand it might look a little complex but it is pretty self explanatory. I went the STL route and that sometimes make it look a little more complicated. Also I wanted to try and solve this without using recursion.

I will work on adding support for having a ? be a ? as well. Not sure if I'll get to it today though. I'm attaching the updated code this time instead of posting it to save space.

NathanOliver 429 Veteran Poster Featured Poster

Well I think I have a solution now that will allow the user to \? to have that be an actual ?. I had to change my approach a little bit and added a couple new functions but it works for the test cases I have tried. I'm just attaching the code again.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.