Sunday, July 18, 2010

When it comes to performance issues most of the time it’s the code

For the past 4 weeks I have been assigned on a project with lot of performance issues. It’s an asp .net 3.5 project based on DNN 4.9 framework. The project is quite complex and the team has already put in almost 50 man years in the project and as usual most of the efforts were towards getting functional part working as per requirements.

At first when I came to the project I was told that the team has reviewed the code and found very few issues with the code and are already tackling it. So I tried to verify the server configurations and other parameters to make sure that there was no issue with any of that. After couple of days of research and making sure everything looked ok, I decided to take a look at the code and the real problems started popping up, I will summarize some of the issues that we discovered during the last 2 weeks, the purpose of this blog entry is not to pin point what all was wrong in the code but some of the areas that might get neglected in many projects

  • Looking at the performance counters I noticed that the number of CLR exceptions where really high (300% of total requests) and the acceptable value is anything below 5%. So we started an exercise to run the project in debug mode with stop on debugger option and found out that
    • one of the common module which was called on all pages was trying to set value of a property (int32) with database column that was BigInt. This small change took care of 60-70% of total exceptions. An ORM like EF would have avoided these kind of errors at development time itself.
    • The Response.Redirect causes ThreadAbortExcpetion as the normal processing of page has been terminated. The same applies to Response.End and Server.Transfer To avoid this exception you have to use Response.Redirect(Page, False), but this will cause the execution to continue after the redirect statement. To avoid this use HttpContext.ApplicationInstance.CompleteRequest(). This took care of another 10% of the exceptions. For more on this issue refer to MSDN article.
    • We found few more pages where the exceptions were used to determine business process flow, we replaced the logic by adding out parameter and using it to determine the flow.
    • Still there are about 6% exceptions, but that’s way down from 300% of the total requests. so for now we have stopped looking further. The whole process took just 2 days from 2 developers and we were able to improve the web server health significantly.

 

  • Another counter that we noticed is the active sessions. The number of active sessions was steadily going high, ideally in load test after the test has been run for one user with logout the session should be released (aborted) for that user. And in fact the sign out code was doing that but the culprit was default page with login screen where after logout user was getting redirected. In the default page the dropdown for language was getting loaded and it was storing the default language in the session variable so that it will be available to the next page after user clicks login. So the session was getting aborted after user clicks sign out but then as soon as the default page was being processed a new session was being created, and this one would have stayed alive on the server till it times out. This issue was causing lot of unused sessions at the server and was forcing application restart after few continuous load tests.

 

  • The last area that we targeted on web side is the grid control. The application required totally configurable grid controller with customization on row level, filters, business rules and security. And using MS grid and relying on their events to perform this customization was getting expensive. So we wrote our own server grid control and put all the customizations on this grid control. This also gave us improvement of 50-60% on web side.

Finally we have reached a situation were our web server is in good health and now the whole attention is on database side as the CPU utilization on database server is going up to 100%. We have identified some queries and SPs and are working on them. In coming post I will try to put down findings from database side.

But the key to the whole process was just monitoring some key indicators, there is an excellent book on MSDN on the whole process and specially Chapter 15 and chapter 17.