A few days ago I had to deal with an ColdFusion server that had stopped server page requests. The server could be contacted and using RDP it could be controlled however any request for a ColdFusion page was met with nothing.
After looking around and finding nothing obvious the CF service was restarted. This solved the problem, however I did notice that the amount of used memory had not reduced fully. The machine itself has Windows 2008 R2 with 32GB of RAM with CF set to take up 20GB, usually on a CF restart the amount of used memory would drop to a couple of gig. However used memory was still at 17Gigs. I started task manager and looked at the running processes, nothing was taking up 17Gigs worth of memory. Where had this memory gone? Then I noticed the non paged pool memory setting, nothing I usually look at but lo and behold it was sitting at around 17Gigs.
So what was this and how do I fix it. To cut a long story short, after some research I eventually established that this type of behaviour was indicative of a memory leak (memory being used by a process/driver but not released).
So while the restart of the CF server had rectified the issue in the short term, with 17Gig still assigned and unusable CF was going to run into the same problem again (and if the leak kept on growing each time CF restarted there would be less and less time in between each restart).
The machine was rebooted, this gave us back 17Gigs, and of course gave us longer. In the meantime I set about trying to track down the memory leak.
The screen shots below shows the NonPaged pool size (bottom left 3088MB), not with the 17Gig but with 3Gigs (it was sometime after the reboot).
So how to track down what was causing the memory leak. Now this is when I came across the excellent Mark Russinovich’s Blog entry
http://blogs.technet.com/b/markrussinovich/archive/2009/03/26/3211216.aspx
Mark Russinovich is a god among Windows people, his sysinternals tools are legendary and since joining Microsoft he has kept up this good work.
The article is a good read and I recommend readin it all, but the section we are interested in is Tracking Pool Leaks.
This article led me onto poolmon, it was a pain to get as I had to download the Windows Driver Kit from Microsoft, although I only needed the poolmon tool. Why microsoft don’t allow this to be downloaded seperatly I don’t know. ANyway I downloaded and installed the WDK locally and then copied the poolmon.exe to the offending server.
On running poolmon you get the following
Ok, now you can go and read up about all the bits and pieces here, as I am not going to go into (as I don’t fully understand it all). But there are a number of ways to filter and order the list.
Press B to order list by Bytes
Press D to order the list by Diff
Press F to order list by Frees
Press A to order the list by Allocations
Press P to toggle between listing Paged, NonPaged and Paged & Non Paged
OK, now with the above commands I was able to track down my memory leak to a specific driver. I listed only Nonpaged (pressing P a few times) and then ordered by Bytes. Now a memory leak is identified by a lot of Allocation and a free value that doesnot correlate (ie. not a lot of frees compared to allocations, memory is allocated but not freed). As you can see from my list the Tag BLFP stands out instantly, now as I understand your memory leak may or may not be as obvious as this, and if thats the case I believe you need to track figures this is covered in Marks Russinovich’s article above.
Now with the TAG code identified I did a look on the net, as you will see from Mark’s article you can search the system for the offending driver which I was going to do, but thought I would give google a go, and for once I was lucky and stumbled across someone who had gone through the same issue.
The fault lied with the Broadcom Virtual Network Adapter used for teaming the NICs together on A Dell R710. So in the end all I needed to do was to download the updated version of the Broadcom Management Application that would up date this virtual adapter driver.
I hope that makes sense, I have tried to summarise, simplify and explain what I did. I took a shortcut in not tracking down the driver and googling, so you may have to go and do some work here but hopefully this will get some people started.
Update: To get the update for the Virtual Adapter you don’t want to download the physical adapter drivers. You need to download the Broadcom Management Application Installer, this contains the virtual adapter driver. Link below is shortcut to download page (skip past the physical adapter downloads, unless of course you want to update those to.)
http://www.broadcom.com/support/ethernet_nic/netxtremeii.php
Cheers
No comments:
Post a Comment