So, in answering another thread on this forum, I decided to do some performance testing on String. Mainly because a friend of mine said "You should always use new String
it's the fastest!".
I wasn't convinced and argued in favour of StringBuilder, at which point I was directed to some "performance" tests of their own. Needless to say, I was not impressed at the numbers; it showed new String
as a clear winner being at least 20% faster than StringBuilder
. This concerned me a great deal! I decided to investigate and get to the bottom of it.
After dissecting my friend's test, I believe I found the answer. Compiler optimisation. In their test, they performed 10,000 iterations of setting strings. The SAME string every time. There is an important point to make here.
When you create a string it is assigned memory. This isn't anything astounding, however, strings are a little different in terms of conception.
Lets take the following:
String firstString = new String("10");
String secondString = new String("10");
Although you have specified two strings, each with a hard-coded value, this will be "optimised" away. Here, the word "10" will be stored in memory at a single place. The two string values that are created, will simply reference this "10" memory location as "10" is a constant at compile time.
If you then did; firstString = firstString + "00";
an entirely new string will have been created in memory.
Seeing this, I decided to create my own test. It performs 10,000 iterations of the same logic on a randomly generated string 10,000 characters in length.
My output was as follows:
Using CHARACTER ARRAY
------------------------------
Time for new String: 21ms
Time for concat: 2325ms
Time for stringbuilder: 62ms
Using IEnumerable<Char>
------------------------------
Time for new String: 915ms
Time for concat: 2631ms
Time for stringbuilder: 81ms
Using List<Char>
------------------------------
Time for new String: 47ms
Time for concat: 2660ms
Time for stringbuilder: 10ms
In all cases, concat absolutely sucks and should never see the light of day again ;)
In terms of the first test, I believe the compiler was still able to optimise this. A character array is simply a string at the end of the day and the application will simply update pointer references rather than create new objects. However, this is what we wanted to know :)
In the second test we can see more clearly now that iterating an enumerator to create the string is a fairly slow process. StringBuilder will easily win out here as it uses dynamic memory. I suspect that new String
does not and instead generates a new object for each character in the string, which has to be enumerated again. I believe this explains the poor performance.
Using a List we can see that StringBuilder has the best performance by a long way whilst new String
comes back into action again. This is probably due to the single enumeration.
IMPORTANT NOTE: The IEnumerable interface has given us a lot of flexibility in C# and is absolutely brilliant for passing data around methods. But it is for this reason it is also rather dangerous! IEnumerable performs something called deferred execution that is, the value calculation is not actually performed until you use it.
Example:
IEnumerable<Int32> myInts = myBigArrayOfInts.Where(i => i > 0); // Get all integers larger than 0
// Some code is here
// that doesn't even touch
// the variable myInts
Console.WriteLine(myInts.Count()); // Execution of line 1 happens here! This is the first time we use myInts.
Console.WriteLine(myInts.First()); // Execution happens again!
What this also means, is that each time you call myInts, it will execute the enumerable!. Personally, I prefer to think of IEnumerable as a method pointer, a query method pointer if you will, as it helps to conceptualise what the code is doing.
To overcome this issue, you have to put the IEnumerable into a concrete class...
List<Int32> myInts = myBigArrayOfInts.Where(i => i > 0).ToList(); // Execution happens here, as we are converting it to a list
Console.WriteLine(myInts[0]); // Works just like an array and doesn't need to re-execute the query
So to relate this back to the above code, everytime new String
uses the IEnumerable<Char>
it will likely re-execute the query that retrieves all characters in the String and then pick out which character it is up to, create a new String and then do it all again with the next character. List doesn't suffer from this, because the query has already been executed, in effect turning it into a big character array just like the first test. (With some performance hit due to the way List lookup works)