We know we already have built-in Trim methods, but trimming doesn't get rid of internal, (and unwanted) extra spaces. -So this is where Normalize method comes to play. It trims left, it trims right, but most importantly it also trims on the inside, one could say: "it trims inside-out". In fact it treats the text-string exactly as HTML parser does.
I had hard time to actually decide how to name the method but I finally did.
String.Normallize();
Using it is strait and simple
[string_var].Normalize();
I've also taken care to make it usable in a manner like:
"".Normalize(" this string contains to many spaces ");
//to return:
>> "this string contains to many spaces"//not anymore.
Which might prove useful in rare occasions, and most probably never. Yet it doesn't hurt having it at hand, even though using it like:
" this string contains to many spaces ".Normalize();
as with other existing methods would also be possible, but I don't consider it as clean and as readable as previous.
Yet when working with strings, - speed is always an issue...
So I wrote a test and run a few tests. Turns out that 'blazing' is not an overstatement.
To make a browser 'sweat' and escape some pseudo-optimization cheats I took a string of 1024bytes [1KB] x 100 000 iterations = 10MB worth of data processed and the results were; - well, very satisfactory. (~3 seconds). Because to open, (that is) render a page of 10MB (plain text) [locally], would most probably require more, or at least the same amount of time.
The test-string is 'a worst case scenario' highly atomized; every 2-letter "word" is separated by 2 white-space characters.
The code, which as it turns out, could also be used as a >>real-world<< browser benchmark (which will be provided here) takes only the bare algorithm of the method presented and adds some extra optimization code necessary for this lengthy string iteration [according to my experience] to be the fastest possible.
The loop used, is my fastest.
For the regexp pattern, -constructor is used,( which in modern browsers still provides a little improvement although barely noticeable).
The result assignment line is enclosed in (), where the improvement is very noticeable. etc...
The code is below; the results taken from my machine:
First click time:
Op 3.112 seconds
IE 3.136 seconds
Sa 3.316 seconds
Fx 3.402 seconds
Ch 4.801 seconds
[all latest release browser versions]
(your actual speed scale results will defer depending on your hardware)
p.s.:
the second click changes the string and the results but that's not very important because the second click will work on already normalized string.
The Test Page:
<!doctype html>
<html>
<head>
<title>String Normalize: 100MB worth data</title>
<style>
#cnt { word-wrap: break-word }
</style>
<script>
function go(){
var cnt = document.getElementById('cnt');
var s = cnt.innerHTML;
var re = new RegExp("\\S+","gi");
var c, endT, iter=100000;
var start = new Date();
while(iter--){ //the actual workplace
(c = s.match(re).join(' '));
}
endT = new Date();
return cnt.innerHTML=
"parsed in: "+
((endT.valueOf()-start.valueOf())/1000)+
' seconds!'+'<br>'+c.fontcolor('red');
}
onclick=function(){go()}
</script>
</head>
<body>
<p>click: test/result...</p>
<pre id='cnt'>oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp qq rr ss tt uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp uu vv xx yy zz 00 11 22 33 44 55 66 77 88 99 oo pp uu vv xx yy zz 00 11 22 33 44 55 66 77 88 00</pre>
</body>
</html>
All suggestions and remarks are welcome.
Have fun.