Silver's Simple Site - Weblog - 2006 - September - 01


NVidia: 1 point for, 427 against

Last Sunday, I decided to start playing Oblivion again. Turns out I hadn't uninstalled it from last time, so that was easy. Except for the video playback problem - this was a problem originally, clearly hadn't fixed itself, and consists of all videos (opening sequence, main menu background, etc.) to be static images from apparently uninitialised memory. Which is fun.

Begin Oblivion Video Fixing Sequence, Take 2 (I'd tried before when I first got the game).

  1. Download and install latest NVidia drivers for Windows XP 64bit (91.31).
  2. Run Oblivion, and find the videos work!

Well that was shockingly easy. That is the one and only point for NVidia, though.

The first and most obvious downside to updating my NVidia drivers was the new "NVidia Control Panel", which does a good job of not quite matching Explorer is every way. It's got a UI consistency you could only otherwise have found on 1995 shareware, too. Horrible.

The looks of the new control panel 'thing' are not the only problems it has, oooh no. If you're running as a Standard User Account (LUA), as everyone does (right?), it fails to save any of the application-specific 3D settings. Better than that, it looks as if they were saved when they weren't! (The Apply button disappears in a fit of horrible UI design and there's no error message at all.)

Then there is the "NVidia Display Driver Service" (nvsvc64.exe), which sits in the background doing (apparently) nothing except leaking. It was leaking Paged Pool, Non-paged Pool, Commit and Handles earlier, although currently it only seems to be leaking Non-paged Pool and Handles. The 3 memory values were leaking at a combined rate of (approximately) 1.8MB/hour, and the handles at (approximately) 1700/hour. Yummy.

Finally, we come to the actual driver itself. The main deal. Which leaks entire processes through a really bizarre bug.

For this to make sense, I'll explain a few simple facts about the Windows Kernel:

  • It has an Object Manager that tracks all objects in kernel-space and user-space.
  • All objects have a "Handle Count" and "Pointer Count" - the former is for (obviously) any open handles to the object, which is mostly for user-space code, and the latter is for kernel code that simply has a pointer (it's a reference counter).
  • When both counts reach zero, non-permanent (i.e. most) objects are removed and cleaned up.

When you start a new process, naturally there enters into existence a kernel "Process" object (along with all the shenanigans that go with that). I started the NVidia Control Panel for this test.

lkd> !process fffffadfb2fe2750 1
PROCESS fffffadfb2fe2750
   SessionId: 0  Cid: 14c4    Peb: 7fffffd4000  ParentCid: 0230
   DirBase: 9546c000  ObjectTable: fffffa80009c0580  HandleCount: 189.
   Image: nvcplui.exe
   VadRoot fffffadfb1996b30 Vads 202 Clone 0 Private 4739. Modified 240. Locked 0.
   DeviceMap fffffa800249dc10
   Token                             fffffa80077cbcf0
   ElapsedTime                       00:00:48.515
   UserTime                          00:00:00.000
   KernelTime                        00:00:00.000
   QuotaPoolUsage[PagedPool]         1287904
   QuotaPoolUsage[NonPagedPool]      16720
   Working Set Sizes (now,min,max)  (8402, 50, 345) (33608KB, 200KB, 1380KB)
   PeakWorkingSetSize                8698
   VirtualSize                       657 Mb
   PeakVirtualSize                   658 Mb
   PageFaultCount                    15774
   MemoryPriority                    BACKGROUND
   BasePriority                      8
   CommitCharge                      5247

lkd> !object fffffadfb2fe2750
Object: fffffadfb2fe2750  Type: (fffffadfb5ab86c0) Process
   ObjectHeader: fffffadfb2fe2720
   HandleCount: 2  PointerCount: 74

Most of the above is not too important, but the Image: and two counts from !object are - notice it starts with 2 handles and 74 pointers (2 of which will be the 2 handles). These are all constant while I look around the control panel. Then I go to the "Adjust image settings with preview" page, which has a real live 3D animation. Big mistake! Only moments after going to it, the object has:

    HandleCount: 2  PointerCount: 2029

And it keeps going up, even after switching to another view! It was going up at something like 1000 pointers/second, although I don't have timestamps for my debugging log. By the time I closed the application, it was:

    HandleCount: 0  PointerCount: 21516

Notice that there's no handles - nothing in user-space cares about it any more. There's still over 21,000 pointers to it in kernel-space, though. Or so the Object Manager is lead to believe. One last look at the process object in detail gives:

lkd> !process fffffadfb2fe2750 1
PROCESS fffffadfb2fe2750
   SessionId: 0  Cid: 14c4    Peb: 7fffffd4000  ParentCid: 0230
   DirBase: 9546c000  ObjectTable: 00000000  HandleCount:   0.
   Image: nvcplui.exe
   VadRoot 0000000000000000 Vads 0 Clone 0 Private 253. Modified 769. Locked 0.
   DeviceMap fffffa800249dc10
   Token                             fffffa80077cbcf0
   ElapsedTime                       00:01:31.953
   UserTime                          00:00:11.593
   KernelTime                        00:00:02.734
   QuotaPoolUsage[PagedPool]         0
   QuotaPoolUsage[NonPagedPool]      0
   Working Set Sizes (now,min,max)  (6, 50, 345) (24KB, 200KB, 1380KB)
   PeakWorkingSetSize                11123
   VirtualSize                       80 Mb
   PeakVirtualSize                   670 Mb
   PageFaultCount                    27942
   MemoryPriority                    BACKGROUND
   BasePriority                      8
   CommitCharge                      0

Interesting points on this are that the ObjectTable, VadRoot and CommitCharge are now all zero. This means that the process' virtual address space has been cleaned up entirely. The process is not even in the session process table (list of processes for the logged in session), although it is in the overall kernel process table (which nothing in user-space can see - Task Manager can't see it).

So what's happened? Almost certainly, a driver (most likely the NVidia one, since this only happens with applications that use 3D acceleration and only since the driver upgrade) is adding a reference count to the process it is handling but not releasing it. Thus, leaking hundreds of reference counting points (there's unlikely to be any actual leaked pointers). A few of my Oblivion processes have over 3 million PointerCounts.

Excellent work, NVidia. You've managed to leak in such a special way that no-one will even notice. Except me and my wonderful friend windbg.

Permalink | Author: | Tags: NVidia | Posted: 02:30AM on Friday, 01 September, 2006 | Comments: 0

Powered by the Content Parser System, copyright 2002 - 2024 James G. Ross.