hi,

I have a XML in a String coming from an external input. The problem is that this string is getting some junk characters appended to in the last. I need to find the lenght of the string only till the correct part. once i get this lenght i can assign a char array for this lenght and put my string into this array and pass it for processiing. basically i have to get rid of the junk. any ideas?

example string

"<?xml version="1.0" encoding="UTF-16"?>
<Company>
<Division>
<Employee>
</Employee>
</Division>
</Company>
"

Can you define "junk" for us?

when i do a cout of this string i get something like this

<?xml version="1.0" encoding="UTF-16"?>
<Company>
<Division>
<Employee>
</Employee>
</Division>
</Company>/

or any other char appended to the last tag

i have tried using this code

const char* ps = reinterpret_cast<const char*>(requestBody.Data());
	String inpMessage(ps);
	size_t Offset = inpMessage.FindFirst(TEXT("</Company>"));
	const char* pm = "</Company>";
	char* pn = strstr(ps,pm);	
	size_t end = 0;
	while ( *pn != '>')
	{
		end++;
		++pn;
	}
	end = end+2;
	size_t length = Offset + end;
	char* px = new char[length];
	//memcpy(px,ps,length);
	int i;
	for(i=0;i<(length-1);i++)
	{
		px[i] = *ps;
		*ps++;
		cout << px[i];
	}	
	i++;
	px[i] = '\0';		
	cout << px[i] << endl;   //till here no extra chars
	cout << px << endl;	   //here it shows the appended char
	RWTString inMessage(px);	
	delete [] px;

here requestBody.Data() is the input. it returns a void* . i used the position of the last tag to get the length of the string and then tried copying the data but still sometimes after the closing > of the last tag i'm getting some char appended to string. what is interesting is that when i did a cout of each char inside the while loop it doesnt show that there are any extra characters but when i do a cout of the entire char* in one shot it shows that there's an extra character.

because of which the XML does not get parsed at all. String class is our own wrapper around the standard string.

>i++;
>px = '\0';
That last increment of i is one too many. I'd wager that explains your mystery character.

Since i was doing a (length-1) in the loop that was not the problem. i had forgotten to do a memset after allocating memory, added this line of code

char* px = new char[length];
memset(px,'\0',length); // added this

after the this it seems to be working fine. is this a mandatory step in all situations?

thanks a lot
chandra

>Since i was doing a (length-1) in the loop that was not the problem
Yes, that was the problem. You allocated length characters to px. Let's say for the sake of brevity that length is 5. That means the first index is 0 and the last index is 4. px[4] is where your null character should be placed, and px[5] is out of bounds. This is your loop, which stops at length-1:

i = 0, px[0] = *ps
i = 1, px[1] = *ps
i = 2, px[2] = *ps
i = 3, px[3] = *ps
i = 4, break
// The loop is done at this point, i == 4
// This is where your bug is, you increment i again
i = 5
px[5] = '\0'

Notice that you never assigned anything to px[4]. This is the gap in your off-by-one error, and that's where your garbage character is coming from. Using memset doesn't solve the problem, it simply hides it by pre-setting px[4] to '\0'. You still have an off-by-one error, and you're still writing to px[length], which is out of bounds.

oh man, that was it .. i got the point narue and i fixed the code once and for all. for safety reason i've still kept the memset. That was a logical mistake and a bad one, need to look out for such things in future.

thanks a ton
chandra

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.