Performance of String Operations

Ketsuekiame 0 Tallied Votes 249 Views Share

So, in answering another thread on this forum, I decided to do some performance testing on String. Mainly because a friend of mine said "You should always use new String it's the fastest!".

I wasn't convinced and argued in favour of StringBuilder, at which point I was directed to some "performance" tests of their own. Needless to say, I was not impressed at the numbers; it showed new String as a clear winner being at least 20% faster than StringBuilder. This concerned me a great deal! I decided to investigate and get to the bottom of it.

After dissecting my friend's test, I believe I found the answer. Compiler optimisation. In their test, they performed 10,000 iterations of setting strings. The SAME string every time. There is an important point to make here.

When you create a string it is assigned memory. This isn't anything astounding, however, strings are a little different in terms of conception.

Lets take the following:

String firstString = new String("10");
String secondString = new String("10");

Although you have specified two strings, each with a hard-coded value, this will be "optimised" away. Here, the word "10" will be stored in memory at a single place. The two string values that are created, will simply reference this "10" memory location as "10" is a constant at compile time.

If you then did; firstString = firstString + "00"; an entirely new string will have been created in memory.

Seeing this, I decided to create my own test. It performs 10,000 iterations of the same logic on a randomly generated string 10,000 characters in length.

My output was as follows:

Using CHARACTER ARRAY
------------------------------
Time for new String: 21ms
Time for concat: 2325ms
Time for stringbuilder: 62ms

Using IEnumerable<Char>
------------------------------
Time for new String: 915ms
Time for concat: 2631ms
Time for stringbuilder: 81ms

Using List<Char>
------------------------------
Time for new String: 47ms
Time for concat: 2660ms
Time for stringbuilder: 10ms

In all cases, concat absolutely sucks and should never see the light of day again ;)

In terms of the first test, I believe the compiler was still able to optimise this. A character array is simply a string at the end of the day and the application will simply update pointer references rather than create new objects. However, this is what we wanted to know :)

In the second test we can see more clearly now that iterating an enumerator to create the string is a fairly slow process. StringBuilder will easily win out here as it uses dynamic memory. I suspect that new String does not and instead generates a new object for each character in the string, which has to be enumerated again. I believe this explains the poor performance.

Using a List we can see that StringBuilder has the best performance by a long way whilst new String comes back into action again. This is probably due to the single enumeration.

IMPORTANT NOTE: The IEnumerable interface has given us a lot of flexibility in C# and is absolutely brilliant for passing data around methods. But it is for this reason it is also rather dangerous! IEnumerable performs something called deferred execution that is, the value calculation is not actually performed until you use it.

Example:

IEnumerable<Int32> myInts = myBigArrayOfInts.Where(i => i > 0); // Get all integers larger than 0
// Some code is here
// that doesn't even touch
// the variable myInts
Console.WriteLine(myInts.Count()); // Execution of line 1 happens here! This is the first time we use myInts.
Console.WriteLine(myInts.First()); // Execution happens again!

What this also means, is that each time you call myInts, it will execute the enumerable!. Personally, I prefer to think of IEnumerable as a method pointer, a query method pointer if you will, as it helps to conceptualise what the code is doing.

To overcome this issue, you have to put the IEnumerable into a concrete class...

List<Int32> myInts = myBigArrayOfInts.Where(i => i > 0).ToList(); // Execution happens here, as we are converting it to a list
Console.WriteLine(myInts[0]); // Works just like an array and doesn't need to re-execute the query

So to relate this back to the above code, everytime new String uses the IEnumerable<Char> it will likely re-execute the query that retrieves all characters in the String and then pick out which character it is up to, create a new String and then do it all again with the next character. List doesn't suffer from this, because the query has already been executed, in effect turning it into a big character array just like the first test. (With some performance hit due to the way List lookup works)

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace ScratchPad
{
    class Program
    {
        private static string RandomString(int size)
        {
            Random random = new Random();

            StringBuilder builder = new StringBuilder();
            char ch;
            for (int i = 0; i < size; i++)
            {
                ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
                builder.Append(ch);
            }

            return builder.ToString();
        }

        static void Main(string[] args)
        {
            Stopwatch nsStopwatch = new Stopwatch();
            Stopwatch ccStopwatch = new Stopwatch();
            Stopwatch sbStopwatch = new Stopwatch();
            String myString;
            
            

            for (Int32 i = 10000; i > 0; i--)
            {
                String perfString = RandomString(10000);
                Char[] charString = perfString.ToArray();

                nsStopwatch.Start();
                myString = new String(charString);
                nsStopwatch.Stop();

                ccStopwatch.Start();
                myString = String.Concat(charString);
                ccStopwatch.Stop();

                sbStopwatch.Start();
                StringBuilder sb = new StringBuilder();
                sb.Append(charString);
                myString = sb.ToString();
                sbStopwatch.Stop();
            }
            Console.WriteLine("Using CHARACTER ARRAY\n------------------------------");
            Console.WriteLine("Time for new String: {0}ms ", nsStopwatch.ElapsedMilliseconds);
            Console.WriteLine("Time for concat: {0}ms", ccStopwatch.ElapsedMilliseconds);
            Console.WriteLine("Time for stringbuilder: {0}ms", sbStopwatch.ElapsedMilliseconds);
            Console.WriteLine();

            nsStopwatch.Reset();
            ccStopwatch.Reset();
            sbStopwatch.Reset();

            for (Int32 i = 10000; i > 0; i--)
            {
                String perfString = RandomString(10000);
                IEnumerable<Char> numCharString = perfString.AsEnumerable();

                nsStopwatch.Start();
                myString = new String(numCharString.ToArray());
                nsStopwatch.Stop();

                ccStopwatch.Start();
                myString = String.Concat(numCharString);
                ccStopwatch.Stop();

                sbStopwatch.Start();
                StringBuilder sb = new StringBuilder();
                sb.Append(numCharString);
                myString = sb.ToString();
                sbStopwatch.Stop();
            }

            Console.WriteLine("Using IEnumerable<Char>\n------------------------------");
            Console.WriteLine("Time for new String: {0}ms ", nsStopwatch.ElapsedMilliseconds);
            Console.WriteLine("Time for concat: {0}ms", ccStopwatch.ElapsedMilliseconds);
            Console.WriteLine("Time for stringbuilder: {0}ms", sbStopwatch.ElapsedMilliseconds);
            Console.WriteLine();

            nsStopwatch.Reset();
            ccStopwatch.Reset();
            sbStopwatch.Reset();

            for (Int32 i = 10000; i > 0; i--)
            {
                String perfString = RandomString(10000);
                List<Char> listCharString = perfString.ToList();

                nsStopwatch.Start();
                myString = new String(listCharString.ToArray());
                nsStopwatch.Stop();

                ccStopwatch.Start();
                myString = String.Concat(listCharString);
                ccStopwatch.Stop();

                sbStopwatch.Start();
                StringBuilder sb = new StringBuilder();
                sb.Append(listCharString);
                myString = sb.ToString();
                sbStopwatch.Stop();
            }

            Console.WriteLine("Using List<Char>\n------------------------------");
            Console.WriteLine("Time for new String: {0}ms ", nsStopwatch.ElapsedMilliseconds);
            Console.WriteLine("Time for concat: {0}ms", ccStopwatch.ElapsedMilliseconds);
            Console.WriteLine("Time for stringbuilder: {0}ms", sbStopwatch.ElapsedMilliseconds);

            Console.ReadLine();
		}
	}
}
tinstaafl 1,176 Posting Maven

When I use this,

            testStopwatch.Start();
            myString = Convert.ToString(listCharString);
            testStopwatch.Stop();

I get a time about one tenth of the stringbuilder's time.

Ketsuekiame 860 Master Poster Featured Poster

If you actually look at the output, you'll see that it will return the ToString() method of the object you pass to it.

So although quick, the result is something along the lines of System.Collections.Generic.List'1[System.Char] and not the actual string content. Hence giving the impression it's incredibly fast because it's not having to enumerate the object, just retrieve the type name.

TnTinMN 418 Practically a Master Poster

I guess mileage will vary. I ran you code under VS2008 on a 32-bit XP and Vista and got similar results. Note that my Vista laptop is way under powered. The XP machine has 2 GB Ram were-as the Vista machine has 3-GB ram. Compiled Any-CPU, Release build.

The interesting thing is that I would have thought that the relative ratio between the same test on different machines would have been close. But this is not the case. Perhaps a function of memory?

XP:

        Using CHARACTER ARRAY
        ------------------------------
        Time for new String: 47ms
        Time for concat: 9ms
        Time for stringbuilder: 85ms

        Using IEnumerable<Char>
        ------------------------------
        Time for new String: 1349ms
        Time for concat: 2ms
        Time for stringbuilder: 51ms

        Using List<Char>
        ------------------------------
        Time for new String: 117ms
        Time for concat: 11ms
        Time for stringbuilder: 3ms

Vista:

    Using CHARACTER ARRAY
    ------------------------------
    Time for new String: 318ms
    Time for concat: 111ms
    Time for stringbuilder: 178ms

    Using IEnumerable<Char>
    ------------------------------
    Time for new String: 2691ms
    Time for concat: 15ms
    Time for stringbuilder: 154ms

    Using List<Char>
    ------------------------------
    Time for new String: 307ms
    Time for concat: 130ms
    Time for stringbuilder: 23ms
Ketsuekiame 860 Master Poster Featured Poster

Interesting how String.Concat appears to be a clear winner on your machine. I used Visual Studio 2012, .NET 4.5 set to Release Mode, compiled and then ran from the command line. (Windows 7)

The machine I ran it on has 12Gb of RAM.

I tested it at home also, using the same settings as above, however, on a much faster machine running Windows 8 and got similar results, albeit faster in all cases.

As you compiled using 2008 (and I also presume .net 3.5?), I think it's safe to assume, looking at your results, that it created completely different machine code.

Here is my binary file for comparison (links to DropBox): http://db.tt/J9Sfdbgv

TnTinMN 418 Practically a Master Poster

Yes, mine is .Net 3.5.

Well there is a difference in the IL (extracted using Reflector) on the string concat and the concat if the only difference.

Mine:

[System]System.Diagnostics.Stopwatch::Start()
        L_0049: ldloc.s chArray
        L_004b: call string [mscorlib]System.String::Concat(object)
        L_0050: pop 
        L_0051: ldloc.1 
        L_0052: callvirt instance void 
[System]System.Diagnostics.Stopwatch::Stop()

Yours:

[System]System.Diagnostics.Stopwatch::Start()
        L_0049: ldloc.s chArray
        L_004b: call string [mscorlib]System.String::Concat<char>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
        L_0050: pop 
        L_0051: ldloc.1 
        L_0052: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()

I wonder what would happen if you targeted 3.5 instead.

Ketsuekiame 860 Master Poster Featured Poster

Looks like the .NET 3.5 compiler is much better at this. Perhaps this is indicative of a change to how collections work in 4.5? Either that, or the 3.5 compiler is much better at optimising.

Output:

Using CHARACTER ARRAY
------------------------------
Time for new String: 0ms
Time for concat: 8ms
Time for stringbuilder: 25ms

Using IEnumerable<Char>
------------------------------
Time for new String: 1000ms
Time for concat: 0ms
Time for stringbuilder: 27ms

Using List<Char>
------------------------------
Time for new String: 32ms
Time for concat: 7ms
Time for stringbuilder: 1ms

The 0ms are not actually 0, however, as you can see I work on milliseconds and not ticks.

TnTinMN 418 Practically a Master Poster

Or it is a glitch in the 4.5 Framework. This is why many refuse to use a version of .Net until it has been vetted for a few years. Unfortunately MS's bug reporting automatically discounts rerports of bugs that are not being reported for their latest and greatest even if the bug has existed in all versions. Lesson learned if you find a bug using an old .Net framework, reproduce it on the newest or else you will be ignored. I recently got blown off for reporting an error in VB2010 and I haven't had time or need to install the needed stuff to support VS2012.

I suggest that you file a report at: http://connect.microsoft.com/

Ketsuekiame 860 Master Poster Featured Poster

Unfortunately some people don't have the luxury to wait (or at least don't have the power to make such decisions).

I will write up a report when I get home. This has been a good exercise though (which is still valid for 4.5 so take heed if you target this framework!) and yet another good example of peer review in action :)

TnTinMN 418 Practically a Master Poster

Unfortunately some people don't have the luxury to wait (or at least don't have the power to make such decisions).

Interesting perspective. Usually it is the other way around at least in my experience (which admittedly is limited, but I have read of many others who in the same situation).

Well, anyways good luck and please post back with a link to your report. I will vote it up to see if that helps pod along the elephant.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.