find & count string withing a string> works, but any help improving it?

Question

somjit{} 60 Junior Poster in Training

12 Years Ago

this is an exam question i failed to answer properly, im trying to work it out. the code i came up with is below, it works ok, but i was hoping if anyone could help in improving it. iv read in various threads in daniweb itself about making a code run faster etc.. havent really understood much, maybe anyone can help me understand them here, with this code if possible. thanks a lot in advance :)

#include<stdio.h>
#include<string.h>
#include<stdlib.h>

void remLin(char *);

int main(){

    int index,checker,count=0,ls1,ls2;
    char *s1,*s2;

    printf("enter max length of s1:");
    scanf("%d",&ls1);
    printf("enter max length of s2:");
    scanf("%d",&ls2);

    while((getchar())!='\n'){}

    printf("\nenter s1: ");
    fgets( (s1=(char*)malloc(sizeof(char)*ls1)) , ls1 , stdin);
    remLin(s1);

    printf("\nenter s2: ");
    fgets( (s2=(char*)malloc(sizeof(char)*ls2)) , ls2 , stdin);
    remLin(s2);

    for(index=0,checker=0;s1[index]!='\0';index++){

        printf("\n\n at for() start>> location: %3d  s1 contains: %3c  count value: %3d",index,s1[index],count);

        while((s1[index]==s2[checker]) && ((s1[index])!='\0') && ((s2[checker])!='\0')){
            checker++;
            index++;        
        }

        if(checker>0)
            index=index-1; // this compensates for one extra index increment at the end of while()

        if(checker==(strlen(s2))){
            count++;        

        }
        checker=0;

        printf("\n at for()   end>> location: %3d  s1 contains: %3c  count value: %3d",index,s1[index],count);
    }

    printf("\n\ncount of %s in %s is: %d\n\n",s2,s1,count);


    return 0;
}

void remLin(char *str){
    int len=strlen(str);
    if(str[len-1]=='\n')
        str[len-1]='\0';
    realloc(str,len); // trims up the string
    printf("\nentered string: %s and its length: %d\n\n",str,strlen(str)); // strlen will not count the null terminator
}

somjit.
ps: pardon a few obvious comment lines, i just kept them there so that i can understand what i did later on. :)

c

Edited 12 Years Ago by somjit{}

2 Contributors
10 Replies
293 Views
1 Day Discussion Span
Latest Post 12 Years Ago Latest Post by Adak

Adak 419 Nearly a Posting Virtuoso

12 Years Ago

Code is useless at this point. Not your code, but ANY code. First, before ANYTHING ELSE, get the right algorithm. And you don't have it.

No amount of optimizing later will overcome the problems of a poor choice in the algorithm.

And this is the algorithm you should be studying:
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

THAT is the most widely used string searching algorithm, and at that link, at the bottom of the page, is a link to a C example showing it.

What's wrong with your algorithm? You are stepping through the string by merely 1 char at a time.

ONE CHAR AT A TIME

Think about it. How could that be improved?

The thing is, you need to work it through with paper and pen several times, by hand. Do YOU check for a string inside a larger string, by checking EACH AND EVERY CHAR?

Of course not!

What do YOU do? Work it through several times, S-L-O-W-L-Y, and notice the pattern that your eye and pen follow doing this. The pattern will reveal itself if you repeat it enough.

And do read up on Boyer-Moore, for all kinds of hints. You don't need to use all their tricks, but you should use some of them.

Save this code, and later, put a timer in it and compare it's time, with Boyer-Moore's program. The difference is astounding if the text being searched is large. The larger the sub-string, the quicker BM's algorithm gets.

Edited 12 Years Ago by Adak

Adak 419 Nearly a Posting Virtuoso

12 Years Ago

There are WAY too many algorithms to remember even the most important one's. Because depending on what the program is doing, LOTS of them are very important.

I'm a hobbiest, and never took anything beyond the first semester of C, but I've got a FEW years of hobby experience with it, so:

I try to match the speed (complexity) of the algorithm, with the job. No sense trying to optimize a telephone directory sort by name, when it's your own personal directory, with 100 names or less. On the other hand, if it's the county/district directory, with 5 Million names - yeah, that's worth a very fast algorithm.

Fast algorithms I like:
Insertion sort - for almost sorted arrays, and all small one's (less than 1,000) say. Oddly, it's faster than anything else for sorting small and almost sorted arrays.

Except of course, for the King of Fast:
Counting sort - limited use, but BLAZING UNMATCHED speed, if it can be used. (which is rare)

Quicksort - with Insertion sort on small sub-arrays to optimize it, if it REALLY needs the best speed. Pick first + last value of each sub-array/2 for the pivot. Uses minimal amount of extra memory. You hear bad mouthing about it - well, not if you pick the pivot this way.

Merge sort - for large external data that can't all fit in memory.

Binary Search - great general purpose searcher of sorted data. Don't use it for tiny jobs, but ALL the medium and large ones get it.

strstr() - all but the biggest and most demanding string search jobs.
Boyer-Moore - for the most demanding string search jobs.

Depth First Search - uses minimal memory to do a LOT of look ahead. Kind of an odd "bottom feeder" (goes to the bottom depth and works around like a catfish). Used recursively with Alpha-Beta, it kicks ass in games like checkers, chess, etc. Closely related to Backtracking in general.

Dancing Links - too complex for my preference, but this coloring algorithm kicks ass, on several problems - including solving standard and giant Sudoku puzzles. Takes great advantage of linked lists ability to be inserted or removed, quick and easy.

In general, I avoid linked lists - arrays rock thank you!

Sieve of Eratosthenes - finds prime numbers up to the limits of your array size, in amazingly fast time. Makes Trial by Division programs look like they're standing still.

Diijskstra's - made for a graph, but applies to travel in route planning, and etc.. It's not the very BEST to work with the Traveling Salesman problem, but it's darn good for a lot of problems.

Easy swap permutation creator (iterative, not recursive). Doesn't give you sorted output in the permutations it creates, but it's blazing fast.

Odometer - just something I coded up to work like the odometer on a car, recording miles or km traveled. I've used it several times to solve problems. Sometimes a brute force method is quick and easy, on PC's today. (Not so much for mobile phones, etc.). I used this to try and find the key for the dead pigeon message from WWII, that was in the news last Fall. No luck however, seems the message was coded using a one-time pad (and thus unbreakable).

Collision Detection - easy if you have just a few "asteroids", but when you have a lot of them, you need to optimize it - that's when it gets interesting.

That's most of one's I've used - except Dancing Links, which I have just studied. Most of the ways to optimize something is to work through it by hand (or in your imagination), and by being "lazy" about it. Some of the best stuff we keyboard, is with the delete key because we realize "hey, i don't need this big function there, if I just do something trivial over here".

I avoid:

Bubble sort like the plague. I use Substitution sort to find all possible pairs from a set of (numbers, names, whatever).

Bit fiddling: I don't work with bits enough to keep "fluent" with them. They'r wickedly fast, and sometimes beautifully done, but they're last on my optimizing list - dead last.

Linear searching: against my religion. ;)

There are zillions of algorithms out there, and Wikipedia has a bunch of the important one's that I've looked up, with just a few exceptions - even Dancing Links!

tux4life commented: Nice list ;) +13

Adak 419 Nearly a Posting Virtuoso

12 Years Ago

Interesting. Stanford and MIT (among several others), also have computer classes/lessons on-line.

There are several other "problem/challenge" computer programming sites also:

These are the two problems I've helped with:
Codechef: http://www.codechef.com/BTCD2012/problems/DOORS
SPOJ: http://www.spoj.com/problems/TDKPRIME/

And the funniest of them all:
http://www.ioccc.org/

hilarious! ;) Park your C code sanity, at the door!

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

somjit{} 60 Junior Poster in Training Featured Poster · Answer 1 · 2013-02-03T10:32:34+00:00

thanks a lot for the link ! i never had algorithms as a subject, so the thought that there might be an algorithm for this problem didnt occur to me.. thanks for the advice. :)

Adak 419 Nearly a Posting Virtuoso · Answer 2 · 2013-02-03T12:37:48+00:00

Thinking more about your post, I'm not sure whether you wanted a more optimal program, or a simpler - clearer - program. If it's simpler and clearer, or for a small assignment/job, then I would use strstr(), (part of the string.h include file definitions), in a loop:

#include <stdio.h>
#include <string.h>

int main(void) {
   int n,len;
   char text[273]={"I am the eagle, I live in high country, in rocky\n"
   "cathedrals that reach to the sky. I am the hawk,\n"
   "and there's blood on my feathers, but time is still turning,\n"
   "they soon will be dry. All those who see me, and all\n"
   "who believe in me, share in the freedom I feel when I fly."};

   char target[273];
   char *pch=text;
   printf("\nOur text is: \n%s\n\nEnter the string you want to search for: ",text);
   fflush(stdout);
   scanf("%[^\n]",target); //take all input up to a newline
   printf("\nOur target to search for is: %s\n\n",target);
   len=strlen(target);

   n=0;
   while(pch>(text-len)) {
      pch=strstr(pch, target);
      //printf("\ntext: %P   pch: %P  Press enter when ready\n\n",text,pch); getchar();
      if(pch) {
         printf("%s\n\n",pch);
         ++n;
      }
      pch+=len;
   }

   printf("\nTarget was found: %d times\n",n);
   return 0;
}

But this program has a problem. Try searching for:
in rocky cathedrals that reach to the sky.

And it can't find it. The problem is the newline - the target doesn't have one, right after "rocky", and the text phrase, does have one.

So you'd have to search for"
in rocky
cathedrals that reach to the sky.

Which would require replacing [^\n] in the scanf(), with maybe[^~], or other char not in the target phrase. Which is probably not what you would want.

The answer might be to pre-process the text, replacing all newlines with spaces. Then do the search, remembering to use a space between words, even when the words wrap around to another line on a long target phrase. Every char is critical in these string searching programs.

I don't know if I could have coded this up during an exam, but for a clear and quick string searcher, it's hard to beat, imo.

somjit{} 60 Junior Poster in Training Featured Poster · Answer 3 · 2013-02-04T06:08:54+00:00

@adak

I'm not sure whether you wanted a more optimal program, or a simpler - clearer - program

i wanted a more optimal code for the code i posted, n the algorithm u gave was just that.. a more optimal n "professional" (if thats the correct word) of doing it.

regards the strstr() function... its quite handy.. definately suits the clearer/simpler category.

I don't know if I could have coded this up during an exam

i find it quite difficult to write actual code in pencil n paper even at home... let alone during exams. perhaps im not alone then. :)

edit: can you give a list of a few common algorithms ? i never had that subject (im not a CS guy), so googling on my own might just be a little overwhelming.

somjit{} 60 Junior Poster in Training Featured Poster · Answer 4 · 2013-02-04T12:50:41+00:00

Sieve of Eratosthenes

thats one mighty cool sounding algorithm.... obviously im going to try that out first! :D

There are WAY too many algorithms to remember

had a feeling that would be the case, hence didnt want to google...
thanks a lot for the list! hoping to learn them soon.

bdw, any links that might be helpful in understanding time complexity of codes written? i would like to learn that a lot.

Adak 419 Nearly a Posting Virtuoso · Answer 5 · 2013-02-04T13:40:38+00:00

You could start here:
http://www.daniweb.com/software-development/computer-science/threads/13488/time-complexity-of-algorithm

Wikipedia has a couple pages on it. Start here, and follow the links to Big O Notation:
http://en.wikipedia.org/wiki/Analysis_of_algorithms

Eratosthenes was an interesting Greek - brilliant and well balanced in several sciences, he became known as "second" when he entered several math/science contests and came in second place. He was the head librarian in the famous Alexandria library, when it had the world's greatest knowledge resource.

He got the (very) last laugh it seems, since his Sieve for finding primes can be optimized easily, and runs in first place (fastest time), for all the ints you can fit into an array, without causing swapping to HD.

Google up "Euler Project" for a series of computer/math challenges.

somjit{} 60 Junior Poster in Training Featured Poster · Answer 6 · 2013-02-04T16:48:53+00:00

prime, greek, euler... u just reminded me of the phi function of the RSA algorithm :D im a big cryptography fan, especially coz of all the neat maths associated with it...

somjit{} 60 Junior Poster in Training Featured Poster · Answer 7 · 2013-02-04T18:23:28+00:00

@adak:

this might interest you

i just learned about this site today when i was talking to a friend of mine about the euler project site you mentioned.

one learns more when one shares :)