Hey there,

I'm having issues with a custom string tokenizer I'm using for an assignment. I've looked around and haven't managed to find anything that really answers my question, so here goes nothing.

Whenever I run the program and it runs the tokenizer, I get a seg fault. I'm fairly new to pointer arithmetic, so if someone could push me in the right direction, that would be great. Here is the offending function and main function:

char** tokenize(char* str, char ch, char** arr){

        while(*str != '\0'){ /*While we have not reached the end of the string*/

            if(*str == ch){

                    str++; /*Increment str*/
                arr++; /*Increment arr*/
            }

            **arr = *str; /*The character at arr's current string = the character at str.*/
            *arr++; /*Next character in arr's string*/
            str++; /*Next memory location in str.*/
        }

    return arr;
}

int main(int argc, char *argv[]){

    char* input = (char*)malloc(sizeof(char) * MAXLINE);
    char** buf = (char**)malloc(sizeof(char*) * (MAXLINE/2));
    int i = 0;

    for(;;){

        printf("sedit>> ");
        fgets(input,MAXLINE,stdin);

        buf = tokenize(input,' ',buf);
        printf("%s",*buf);

        printf("%s",input);
        printf("\n");

    }

    return 0;
}

Thank you kindly!

should it work same as strtok ? try not to use malloc and use array index instead of pointer arithmetic then when it works you can try pointer arithmetic(just increment address instead of index) and malloc
Make working code and convert would be more easy maybe :)

Maybe because your pointer-to-pointer arr doesn't really have the memory to store the tokens; it's just an array of pointers.

What you could do is after creating the array of pointers, allocate memory for strings to every single pointer in the array.

Also, personally, I find that calloc() is easier to read than malloc() when dealing with anything that isn't simply a single slab of memory. ;P just throwing that out there. lol

Okay, I allocated memory for each element in the string like so in the main:

    char* input = (char*)malloc(sizeof(char) * MAXLINE);
    char** buf = (char **)calloc(40, sizeof(char*));
    int i = 0;


    /*Memory allocation for each char* in buf*/
    for(i = 0; i < 16; i++) {
            buf[i] = (char *)calloc(1, 2*sizeof(char));
    }

Which solves the segmentation fault issue; however, my tokenize function only stores the first character of the input string for each index in the buffer.

eg.

Inputting "hello world", and printing buf[0] and buf[1] results in "h" and "w". Why is this? I've tried different levels of dereferencing and each time it has the same result.

I had some trouble making your function to work on mine since it was crashing.

The ++ operator has higher precedence than the * dereference operator, so the statement *arr++ increments through the arr array instead of what we intend, which is increment through the *arr string. Changing it to (*arr)++ should be good.

Also, you do a double increment of str when you enter the if block so it would be good also to put a continue inside.

Furthermore, the arr that you are returning has already been modified so you are returning a different value than the one you passed, so it would be good to save the value to a temporary pointer-to-pointer before anything else.

Even then, the function is still NOT working. So I come to the conclusion that, because we incremented (*arr)++ like so, we have effectively also changed the value of arr[0], do you see? So when we print the value of arr[0], and friends, we are actually printing the last part of the string; just like we modifed arr, we also modified every single member of the array of pointers. You can check this to be true by decrementing (negative offset) one of the strings in the array of pointers. printf("%s", arr[0] -= 3); should print the last 3 characters. Of course, only after fixing the stuff I pointed out.

That's actually quite neat. @_@

I tried changing the function like so, to avoid the presedence issue and point it to a temporary pointer to a pointer:

char** tokenize(char* str, char ch, char** arr){

    char** temp = arr;

    while(*str != '\0'){ /*While we have not reached the end of the string*/

        if(*str != ch){

            **temp = *str; /*The character at arr's current string = the character at str.*/
            (*temp)++; /*Next character in arr's current string*/
            str++; /*Next memory location in str.*/

        }
        else{

        str++; /*Increment str*/
        temp++; /*Increment arr*/

        }
    }
    return arr;
}

My terminal output is now empty, but it does not cause a seg fault. I think that only spaces are being stored...?

Remember that each item on your pointer-to-pointer arr also contains pointers. During the execution of the program, these pointers have already been offset by doing (*arr)++. Therefore, each member of the pointer-to-pointer already doesn't point to the start of the string. Do you see?

Like what I said, try to negative offset each member of the pointer-to-pointer and see that this is true.

Within the tokenizer, what you can do is instead of offsetting using the arr variable, try to store each string buffer to a temporary pointer and then offset using that pointer so that whatever pointers our pointer-to-pointer contains does not change. Something like

char *temp = *arr;
*temp = *str;
temp++;

Definite improvement there! Here's the changed function:

char** token(char* str, char ch, char** arr){

    char *temp = *arr;

    while(*str != '\0'){ /*While we have not reached the end of the string*/

        if(*str == ch){
            str++; /*Increment str*/
            temp++; /*Increment arr*/
            continue;
        }

        *temp = *str; /*The character at arr's current string = the character at str.*/
        temp++; /*Next character in arr's string*/
        str++; /*Next memory location in str.*/
    }

    return arr;
}

Still doesn't give the right output though. If I say, input "foo bar", it will print "foo" even though I've called for the values at indexes 0 and 1. (Now I understand what you mean about the whole offset array thing though!)

You use the temp pointer to temporarily hold the address of a particular string buffer in the pointer-to-pointer. Therefore, it stands to reason that whenever you are done with a particular token, you assign another string buffer in the pointer-to-pointer to temp. You do not remove the necessity to backup the copy of arr because you will still offset through the pointer-to-pointers.

What is happening in that snippet is that you do take the first buffer in the pointer-to-pointer and put your tokens ONLY in it, and therefore only in the first string buffer in the pointer-to-pointer; you are not populating the whole arr.

You assign a member of arr to temp during the loop execution per token; when you are on another token, you need to grab the next string buffer in arr. This is not something you do in that particular snippet.

The reason you are not printing the whole input string is because in the if block, you also increment temp which leaves a cell in the array (you skipped a cell) to be of NULL value because we are using calloc() which initializes all memory to zero. and '\0' == 0.

You still need to backup arr in order to get an appropriate return value.

this means you have something like that ? char *array[n];
you have allocated pointers bot now you need to malloc memmory where they will point and memcpy data to that memmory, how do you copy it i dont see

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.