Finding and optimizing code performance issues in C#

Durée de lecture : environ 5 minutes

In a previous article, I wrote about finding sequential calls that could be parallelized. Today, I wish to address another issue with an algorithmic angle : finding and optimizing code performance issues in C#.

Story time

Just like in the previous post, I was investigating performance stuff back in May 2023. After fixing a couple of problems with our dependencies, I wanted to have a look at how our APIs were doing in production. Sure, I could see the big picture in Application Insights’ Performance view with a nice breakdown by route and even seeing how each dependency fared. But I lacked the insider view telling me which parts of our code that challenged the CPU or the memory intensively.

Application Insights Profiler

After a quick chat with our Azure expert, I found out we could activate a profiler directly in Azure. This profiler has an overhead, therefore activating it would take its toll on our websites’ performance. So we agreed on activating it for a short period of time, see how it behaved and reiterating or not.
We ended up activating it during a few hours for a couple of days. It gave me access to the Code Optimizations page which lists some potential performance issues that Azure observed when the profiler was enabled.

An example of the "Code Optimizations" page in Azure. There is a short introduction text above a large table. This table has 2 sections : 1 "Memory" that lists 4 potential issues ordered by Peak Usage (descending), and 1 "CPU" that lists 5 potential issues with the same order.
The table can be filtered using time range, role, subscription, resources or insight type.

Each issue comes with a short summary such as "Excessive allocations due to String.Ctor()".

When selecting a line in the table, details are shown on the right side of the page, showing what the issue is and a recommendation on how to fix it.

First discoveries

At the top of this list, I found a few things I had no control over : deserialization of database calls, GraphQL task scheduler, Regex calls (be very careful to follow best practices when playing with Regex !)… However, I ended up finding something weird that allocated quite a lot of objects and ate up CPU as well 🤔 I took a look at the code and…

public static string? GetTranslation(IEnumerable<TranslationCacheDto> translations, string itemName)
{
    return translations.FirstOrDefault(t => $"cache.key.{itemName?.Replace("/\\s/g", "").ToLower()}".Equals(t.CacheKey, StringComparison.InvariantCultureIgnoreCase))?.TranslatedText;
}

Well, it’s messy to say the least and it was a one-liner expression body originally. As I learned over time, it’s usually a good idea to break one-liners down to properly understand them. In our case, the bulk of the logic is located in the FirstOrDefault() predicate. Here’s what the code does in a nutshell :

Replaces something inside the itemName given in parameter ;
Then converts it to lowercase ;
Then appends it at the end of the “cache.key.” string ;
Then compares it with each translation’s CacheKey property, ignoring case and using the invariant culture.
Finally, returns the first matching translation’s TranslatedText property or null.

A few things come to mind here :

Applying ToLower() AND comparing string with an InvariantCultureIgnoreCase comparer both achieve the same results. So we can safely remove this ToLower() call and keep the custom comparer. That would get rid of a useless string allocation for each iteration in FirstOrDefault().
The string concatenation achieved through interpolation is performed at every iteration of the IEnumerable. A new string is then allocated each time for nothing. We can perform this interpolation outside the FirstOrDefault() for a nice boost in memory allocation.
That String.Replace() call looks fishy. It seems the original developer intended to pass a Regex here to remove whitespaces but… String.Replace() does not handle regular expressions. For what it’s worth, I checked in the database that none of the items contains the “/\s/g” string.

Looks like we can easily improve things over here ! I added a few tests to make sure I wasn’t breaking things and went on with my fix. But how do I make sure I actually fixed something ?

Sample code !

Let’s implement the original and “fixed” code

This first analysis looks promising on paper but let’s test it. To this end, I have created a dedicated repository ; feel free to have a look. The BadImplem class is the one hosting the original algorithm. The ImprovedImplem contains the improvements listed above, which gives the following code :

public static string? GetTranslation(IEnumerable<TranslationCacheDto> translations, string itemName)
{
    var key = $"cache.key.{itemName}";
    return translations
        .FirstOrDefault(translation => key.Equals(translation.CacheKey, StringComparison.InvariantCultureIgnoreCase))
        ?.TranslatedText;
}

Is it possible to improve this somehow ? Well, the point of this algorithm seems to be looking out for a specific value and then returning it. By iterating over a list and compare each item, the time complexity of the algorithm is O(n). Practically, iterating once every now and then over 10 items would not be a big deal performance-wise. But looking at actual numbers in our production environment, it appeared that the list contained 50,000+ items and the algorithm was used several times per second. That’s not good, our improvements may not be enough.

A chart showing various lines, explaining the relationship between the number of operations and the number of elements.
O(n!) is the worst because the number of operations increases extremely fast when the number of elements increasaes.
The best is O(1) (constant time), followed by O(log n), then O(n), O(n log n), O(n²) and O(2^n). — A visual representation of complexities with “big O” notation (via freecodecamp.org).

Thinking in O(1)

This is one case where I’ll immediately think of using a Dictionary. Retrieving an element from a Dictionary has a time complexity of O(1), the fastest possible way. There would be some overhead when creating the Dictionary though, so we’ll need to measure that as well. Anyway, I added the following code to build the Dictionary in Program.cs :

void InitializeDictionary(Dictionary<string, string> dictionary, TranslationCacheDto[] translationDtos, string[] itemNames)
{
    for(var i = 0; i < 50000; i++)
    {
        dictionary.Add(itemNames[i], translationDtos[i].TranslatedText);
    }
}

And the following trivial code to read from the Dictionary :

public static string GetTranslation(IDictionary<string, string> translationsByItemName, string itemName)
{
    return translationsByItemName[itemName];
}

Measures

OK so now we’ll see how these 3 approaches compare with each other. Take a look at the Program.cs. Basically, once the data and the dictionary are set up, we will try 1000 times to retrieve a random value. Then we will look at how much CPU was used and how many memory allocations were made in this operation.
We build the solution in Release mode, hit Alt+F2 in Visual Studio, select the CPU Usage profiler aaaand…

CPU usage

A call tree of the run as recorded by Visual Studio. Calls are ordered by CPU %, in descending order.
The biggest CPU consumer is BadImplem, with 59.89%.
Then ImprovedImplem uses 32.91% CPU.
Finally, building the Dictionary in InitializeDictionary takes 0.5% CPU and looking for the values is 0.02%.

There we have it. Now we can say with confidence that the first improved version almost halves the CPU usage.

What else ? Obviously O(1) beats everyone hands down. Even building the Dictionary is only a fraction of the cost of the other approaches. Building the Dictionary AND looking up for random values take 1/100th of the cost of the original version ! Surely this is the best method ? Not so fast, let’s measure the memory allocation first. (Note : in order to shorten analysis duration, I lowered the number of random items to find from 1000 to 200)

Memory allocation

A call tree of the run as recorded by Visual Studio.

The worst case is once again the BadImplem version with 10 millions allocations and a (cumulative) memory footprint of about 310 MB.
Then there is InitializeDictionary which only allocates 32 objects and its memory footprint is about 3.9 MB.
Finally we have ImprovedImplen which performs 600 allocations while only taking 17 KB.

Indeed, the Dictionary initialization allocates very little objects (of course, the TranslationDtos were already created by InitializeData()). But it eats a non-negligible amount of memory, almost 4 MB. And ImprovedImplem has very little allocations AND takes virtually no memory.

Conclusion

So what conclusion can we draw from all this ? Like many answers when it comes to development “it depends“. Sure, the first implementation is not optimal so it’s probably safe to pick one of the other two. But which one ? Every solution has its pros and cons and I have proposed only 2 solutions.

If you face a context where you have to rebuild the Dictionary from scratch at every request and are limited in the amount of memory you can take, then the ImprovedImplem in O(n) would be worth considering. But if CPU consumption is a concern or if using 4 MB of memory temporarily is not an issue, I would probably go with the Dictionary implementation.

Another important lesson that can be learned here is : be very careful each time you manipulate strings and perform string comparison ! Strings are immutable, therefore each modification will require lots of memory allocation.

I have barely scratched the surface of what can be done with profilers. The idea was to give you a glimpse on some basic code as well as a quick introduction to available tools. I hope it was useful !

Guillaume Téchené

Craftech | Website

Software developer since 2000, I try to make the right things right. I usually work with .Net (C#) and Azure, mostly because the ecosystem is really friendly and helps me focus on the right things.
I try to avoid unnecessary (a.k.a. accidental) complexity and in general everything that gets in the way of solving the essential complexity induced by the business needs.
This is why I favor a test-first approach to focus on the problem space and ask questions to business experts before even coding the feature.

Finding and optimizing code performance issues in C#

Published by Guillaume Téchené on 14/03/202514/03/2025

Story time

Application Insights Profiler

First discoveries

Sample code !

Let’s implement the original and “fixed” code

Thinking in O(1)

Measures

CPU usage

Memory allocation

Conclusion

Guillaume Téchené

Guillaume Téchené

How to simplify integration testing in C# in 2025

Parallelize the calls to your external dependencies

Nullable reference types in C# and how to handle them

Finding and optimizing code performance issues in C#

Published by Guillaume Téchené on 14/03/202514/03/2025

Story time

Application Insights Profiler

First discoveries

Sample code !

Let’s implement the original and “fixed” code

Thinking in O(1)

Measures

CPU usage

Memory allocation

Conclusion

Guillaume Téchené

Guillaume Téchené

Related Posts

How to simplify integration testing in C# in 2025

Parallelize the calls to your external dependencies

Nullable reference types in C# and how to handle them