Parsing a string: By lines.

Stack Overflow 0 Tallied Votes 2K Views Share

Greetings,

String parsing isn't always an easy task. Especially in cases where you need to split a single string into a great multitude, but also accounting for maximum performance.

The following code presented does this task simply. Using precise allocation techniques perform greatly when writing an algorithm to precision.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int getLineCount(char *buffer);
int parseStringbyLines(char *buffer, char ***string);

int main() {
	int i, j, lines;
	char string[25];
	char **parsed = NULL;

	strcpy(string, "Hello\nMy friends\nLets parse this string!");

	// Parse string
	lines = parseStringbyLines(string, &parsed);
	if (!lines) {
		printf("Parsing failed.\n");
		return 0;
	}

	// Print parsed string
	printf("%d lines parsed.\n\n", lines);
	for (i = 0; i < lines; i++)
		printf("%s\n", parsed[i]);

	// Free memory
	for (j = 0; j < lines; j++)
		free(parsed[j]);
	free(parsed);

	return 0;
}

int getLineCount(char *buffer) {
	int z = 1;
	char *pch;

	// Find first match
	pch = strchr(buffer, '\n');

	// Increment line count
	while (pch) {
		pch = strchr(pch+1, '\n');
		z++;
	}

	return z;
}

int parseStringbyLines(char *buffer, char ***string) {
	int		*newLine;
	int		b, j, l, z = 1;
	int		lineCount, len;
	char 		*pch, **temp = NULL;

	/*
	** Get line count
	** Allocate memory for new line handling
	** Check if memory allocating failed
	*/
	lineCount = getLineCount(buffer);
	newLine = (int *)malloc(lineCount + sizeof(int) * sizeof(*newLine));
	if (!newLine)
		return 0;
	newLine[0] = 0;

	// Find first occurance of a new line
	pch = strchr(buffer, '\n');
	if (!pch)
		return 0;

	// If found, find all positions
	while (pch) {
		newLine[z] = pch-buffer+1;
		pch = strchr(pch+1, '\n');
		z++;
	}
	newLine[z] = (int)strlen(buffer) + 1;

	// Allocate memory to our temporary pointer
	temp = (char **)malloc(lineCount * (sizeof *temp));
	if (!temp)
		return 0;

	// Go through all lines found
	for (l = 0; l < z; l++) {
		b = 0;
		len = ((newLine[l+1]-1) + (newLine[l]) + 1);

		// Allocate memory per index
		temp[l] = (char *)malloc(len * sizeof(**temp));
		if (!temp[l])
			return 0;

		// Put our data in
		for (j = newLine[l]; j < newLine[l+1]-1; j++) {
			temp[l][b] = buffer[j];
			b++;
		}
		temp[l][b] = '\0';
	}

	// Free memory for line position
	free(newLine);

	// Set our pointer to point to char **temp
	*string = temp;

	// Return lines found
	return z;
}
artun 0 Newbie Poster

Hello all,

Size calculation in memory allocation at line 63 would make sense if it was:

newLine = (int *)malloc(lineCount * sizeof(int) + sizeof(*newLine));

Agree?

babyshambles 0 Newbie Poster

newLine = (int *)malloc(lineCount * sizeof(int) * sizeof(*newLine));

??

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

Way too complicated and with unnecessary code. Here is a greatly simplified way to do it. Using strtok() you could parse a string that has more than one kind of deliminator, such as '\n' and '\t' and ',' and the space, these are typical of CVS files (exported XLS files).

int parseStringbyLines(char *buffer, char ***string) 
{
    char** lines = NULL;
    char* ptr;
    char *temp;
    int size = 0;
    if( buffer == NULL || *buffer == '\0' || string == NULL || *string != NULL)
        return 0;
    temp = strdup(buffer); // duplicate the string
    ptr = strtok(temp, "\n");
    while(ptr)
    {
        lines = realloc(lines, (size += 1) * sizeof(char*));
        lines[size-1] = strdup(ptr);
        ptr = strtok(NULL, "\n");
    }
    free(temp);
    *string = lines;
    return size;
}
Dave Sinkula 2,398 long time no c Team Colleague

strdup is nonstandard. Your realloc idiom is one to avoid.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

>>Your realloc idiom is one to avoid.
Just standard C. When the first parameter is NULL realloc() acts like malloc().

>>strdup is nonstandard
You might be right. That function can easily be re-implemented for any compiler.

Dave Sinkula 2,398 long time no c Team Colleague

>>Your realloc idiom is one to avoid.
Just standard C. When the first parameter is NULL realloc() acts like malloc().

*sigh*

Aia 1,977 Nearly a Posting Maven
lines = realloc(lines, (size += 1) * sizeof(char*));

On realloc failure a new pointer is made pointing to NULL and the original block is lost and not freed.

jephthah commented: good to know +6
Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

>>On realloc failure a new pointer is made pointing to NULL and the original block is lost and not freed.

If realloc() fails, then the program has much larger problems than a little memory leak, such as the entire program will, or is about, to crash due to lack of memory. And that could even extend to the entire operating system. Consequently, on modern computers with several gig ram I don't even worry about realloc() failing. If you are working with embedded systems that have very limited ram then that would be different. But not very many people are doing that.

Aia 1,977 Nearly a Posting Maven

>>If realloc() fails, then the program has much larger problems than a little memory leak, such as the entire program will, or is about, to crash due to lack of memory. And that could even extend to the entire operating system. Consequently, on modern computers with several gig ram I don't even worry about realloc() failing. If you are working with embedded systems that have very limited ram then that would be different. But not very many people are doing that.

Regardless of your excuses; it is bad programming.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

>>Regardless of your excuses; it is bad programming.
Its not an excuse -- its a fact of reality. You can pretty-up the code all you want but when malloc() and realloc() fail the program's gonna crash, and good programming ain't going to help that one bit.

Dave Sinkula 2,398 long time no c Team Colleague

I'd hate to be maintaining code you've written.

You're like the Herb Schildt of Daniweb. :(

Aia 1,977 Nearly a Posting Maven

>>Its not an excuse -- its a fact of reality. You can pretty-up the code all you want but when malloc() and realloc() fail the program's gonna crash, and good programming ain't going to help that one bit.

Memory management in C is the utmost importance. Disregarding checking for proper returns of malloc() and realloc() is bad programming. Ignoring possible memory leaks is negligence. The failures of these standard functions are not always related to lack of memory, but rather failure of giving you the memory you want to use. If program crashes, it is because you have created a bad program.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

I don't want to continue this discussion here because its hijacking the thread and not relevant to the topic of this thread.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.