Hi people,

I'm trying to match strings from a pattern containing wildcards.

Basically I need to be able to go through a stack of strings, and see which of them matches my pattern.

Fx My*.txt will match only one of these.
MyFile.txt
MyFile.php

I'm aiming to end up with a function like this, which returns all matching strings.

std::vector<std::string> matchStrings(const std::string& pattern, const std::vector<std::string>& stringsToMatch);

I've been thinking a lot about this, and wrote many different codes to do this, but I cannot think of a way that works in all scenarios. I'd also like it to be efficient.

If anyone can point out a free-to-use code available, which does this, please point me there :) This is not some school homework that I must achieve myself. I already did google without luck.

Otherwise, please post your ideas or experiences on this subject.

My current attempt is posted below, although it doesn't get far. I stopped, as I realised it wouldn't work in some scenarios, where the non-wildcard parts of the pattern occurs multiple times.

#include <iostream>
#include <string>
#include <vector>

typedef std::string String;
typedef std::vector<String> Stringvector;

int main(int argc, char** argv)
{
    Stringvector strings;
    strings.push_back("MyNotes.txt");
    strings.push_back("Groceries.txt");
    strings.push_back("MyCarPlates.txt");

    Stringvector matches;

    String pattern = "My*.txt";
    Stringvector texts;

    {   // split the pattern into separate strings, separated by wildcards
        String::size_type pos = pattern.find("*");
        while(pos != String::npos)
        {
            texts.push_back(pattern.substr(0, pos));
            pattern.replace(0, pos+1, "");
            pos = pattern.find("*");
        }
        texts.push_back(pattern);
        pattern = "";
    }

    for(Stringvector::const_iterator it = strings.begin(); it != strings.end(); ++it)
    {
        String str = *it; // temporary string
        for(Stringvector::const_iterator it = texts.begin(); it != texts.end(); ++it)
        {
            String::size_type pos = str.find(*it);
            while(pos != String::npos)
            {
                str.replace(pos, (*it).length(), "#"); // let's temporarily pinpoint non-wildcard pattern matches
                pos = str.find(*it);
            }
        }
    }

    return 0;
}

I appreciate your assistance ;)

After some googling, it looks like if you want a really robust solution you should use something like Boost Regex

http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html

If any of the experts here have a way to do it with std::string and std::algorithm only, that would be nice, but as soon as you allow multiple wildcards I think it gets pretty tough.

Dave

There are a bunch of different cases you will have to use to solve this. First I would separate the extension from the file name. Then for a simple case like "*.txt" you would just use the extension string to match the file names.

std::vector<string> wordList;
std::vector<string> matches;
// fill the list with file names
// separate the filename and put the name in filename and
// the extension in extension.  here im doing it manually
std::string filename = ""; // nothing here since its just a wild-card
std::string extension = ".txt";
size_t size = wordlist.size();
size_t length;
// go through the vector and check just the extensions since
// the filename is a wild-card
for (size_t i = 0; i < size; i++)
{
    length = wordList[i].size();
    if (extension == wordList[i].substr(length - 4, length))
        matches.push_back(wordList[i]
}
// now you have a list of all the files that match the extension

For cases like "my*.txt" we would separate the file name and the extension again. So the filename string would be "my*" and are extension string would be ".txt". To solve this you can use the function find_first_of(). Pass in the first letter from the filename string and if it returns that there is a "m" in the string then you start a comparison against the two string until there is a mismatch or you reach the *.

std::vector<string> wordList;
std::vector<string> matches;
// fill the list with file names
// separate the filename and put the name in filename and
// the extension in extension.  here im doing it manually
std::string filename = "my";
std::string extension = ".txt";
size_t size = wordlist.size();
size_t length;
// now we have to see if the string starts with the filename string
for (size_t i = 0; i < size; i++)
{
    length = filename.size();
    if (filename == wordList[i].substr(0, length)
    {
        length = wordList[i].size();
        if (extension == wordList[i].substr(length - 4, length))
        matches.push_back(wordList[i]
    }
}
// now you have a list of all the files that start with the filename and
// end with the right extension

Now we move on to dealing with a case like "my*day.txt". With this we will have to check the string in two different spots at the same time. Fortunately its not that more complex the the previous example. The math might look a little complex in my if statement but you should be able to follow it.

std::vector<string> wordList;
std::vector<string> matches;
// fill the list with file names
// separate the filename and put the name in filename and
// the extension in extension.  here im doing it manually
std::string filename = "my*day";
std::string extension = ".txt";
size_t size = wordlist.size();
size_t filenameLength, wordLength;
// since the wildcard is in the string we need to find what spot its in
size_t wildcard = filename.find("*");
// now we have to see if the string starts with the filename string
for (size_t i = 0; i < size; i++)
{
    filenameLength = filename.size();
    if (filename == wordList[i].substr(0, wildcard)
    {
        wordLength = wordList[i].size();
        // first part matches now see if the end matches
        if(filename.substr(wildcard + 1, length) == 
            wordList[i].substr(wordLength - 4 - (filenameLength - wildcard + 1), wordLength - 4)
            if (extension == wordList[i].substr(wordLength - 4, wordLength))
                matches.push_back(wordList[i]
    }
}
// now you have a list of all the files the have the right begining and end and
// end with the right extension

I'll leave it at this for now. Should show you how you can do it for all cases. you might have to make this recursive for really complex cases like "my*day*ruined*morning.txt"

As usual everything with boost is insanely complicated and the examples aren't very helpful...

I believe this does what you want (remember to link to boost_regex)

#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;

int main(int argc, char** argv) 
{
   std::string strRE = "Hello.*World.*Goodbye";
   
   boost::regex re(strRE.c_str());
   
   std::string PositiveTest = "HelloEveryoneWorldThenGoodbye";
   std::string NegativeTest = "Test";
   
    if (boost::regex_search(PositiveTest, re))
    {
      cout << "Found." << endl;
    }
    else
    {
      cout << "Not found." << endl;
    }
   
    if (boost::regex_search(NegativeTest, re))
    {
      cout << "Found." << endl;
    }
    else
    {
      cout << "Not found." << endl;
    }
  return 0;
 
}

Though you may learn something by following NathanOliver's suggestion, using Boost you don't have to reinvent the wheel :)

Dave

It is nice to have differnt things to chose from though. Personaly I like the challange of figuering this stuff out. But yes its nice the people at boost already thought of this one

I wrote this code many years ago and due to the less than stellar variable names it's hard to understand - but well, it works.

static bool PlaceHolderMatch(const char* QS,const char* MS,int QL,int ML,char Placeholder='?')
{
  if (QL!=ML)return false;
  for (int i=0;i<QL;i++)
  {
    if (MS[i]==Placeholder)continue;
    if (MS[i]!=QS[i])return false;
  }
  return true;
}

static bool PlaceHolderMatchIgnoreCase(const char* QS,const char* MS,int QL,int ML,char Placeholder='?')
{
  if (QL!=ML)return false;
  for (int i=0;i<QL;i++)
  {
    if (MS[i]==Placeholder)continue;
    if (tolower(MS[i])!=tolower(QS[i]))return false;
  }
  return true;
}

bool wildcardMatch(const std::string& QueryStr,const std::string& MatchStr,bool IgnoreCase=false,char Wildcard='*',char Placeholder='?')
{
  bool (*const MatchFunction)(const char* QS,const char* MS,int QL,int ML,char Placeholder)=IgnoreCase ? PlaceHolderMatchIgnoreCase : PlaceHolderMatch;

  const char* QS=QueryStr.c_str(),*MS=MatchStr.c_str();
  const int QL=QueryStr.length(),ML=MatchStr.length();
  if (ML==0)return QL==0;
  int QP=0;
  for (int i=0;i<ML;i++)
  {
    if (MS[i]==Placeholder)
    {
      QP++;
      continue;
    }
    if (MS[i]==Wildcard)
    {
      int WCTS=i+1;
      while (WCTS<ML && MS[WCTS]==Wildcard)WCTS++;
      if (WCTS>=ML)break;
      int WCTE=WCTS+1;
      while (WCTE<ML && MS[WCTE]!=Wildcard)WCTE++;
      int LL=WCTE-WCTS;
      if (WCTE>=ML)return MatchFunction(QS+QL-LL,MS+WCTS,(QL-QP>=LL) ? LL : (QL-QP),LL,Placeholder);
      int CLim=QL-LL-QP;
      bool Match=false;
      for (int j=0;j<=CLim;j++)
      {
        if (MatchFunction(QS+j+QP,MS+WCTS,LL,LL,Placeholder))
        {
          QP+=j+LL;
          Match=true;
          break;
        }
      }
      if (!Match)return false;
      i=WCTE-1;
      continue;
    }
    if (MS[i]!=QS[QP])return false;
    QP++;
  }
  if (QP<QL && MS[ML-1]!=Wildcard)return false;
  return true;
}
Member Avatar for iamthwee

As usual everything with boost is insanely complicated and the examples aren't very helpful...

I second this. I find the documentation so poor and convoluted.

I'd much rather use regex in either dotNet or python.

You probably don't want to include boost in your project, just for this one thing.
It's not that hard to figure out, so unless you already have boost linked, try doing it yourself! ;)

Many thank NathanOliver.
Long post - much appreciated :)
I'll see how far I get with this.

I'll not be using boost, it's a bit overkill to link a whole library to get regex matching, in a case I simply need wildcard matching :)

Thank you, all of you, anyway for posting!

Well I had some free time today and this peaked my interest so I created a function that should do the trick. I made it a code snippet and this will take you there.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.