It's well-known that the \Windows\winsxs directory in Windows Vista is large and can take up a significant portion of disk space on smaller devices. Less well-known is what it actually does, why it is huge or why it's not entirely a waste of space. I'll try and answer all three of these today.
To help dig in to this, I've written a small C++ application that will report certain information about the files in a particular directory - or individual files - and will use this to help explain the answers. Binaries and source code will be provided at the end.
What does winsxs do?
At the highest level, it is a collection of "packages" - lots of small groups of files. Each package (group of files) is identified by:
- Architecture ("x86", "amd64", "ia64", "wow64" (32bit on 64bit) and "msil")
- Name
- Signing Key
- Version
- Locale ("none", "en-us", etc.)
- Package Hash
It does more than just store the packages, though; applications can bind themselves to particular packages and specify what versions are acceptable. This is most evident with the Microsoft Visual C++ runtime files which, along with the Common Controls version 6 package, are more or less all you'll find in \Windows\winsxs on Windows XP. There are other features, like package redirections, which are in Windows XP as well.
Why is winsxs huge?
For Windows Vista, however, the main change is not so much how it works (although that was improved and made more robust as well) but in what it's used to store. In Windows Vista, all the operating system components are in packages inside \Windows\winsxs. That's all the core features (like Explorer) and everything you can enable in "Turn windows features on or off" (like IIS). That's a lot of stuff!
A consequence of putting system components in \Windows\winsxs is that they can be uninstalled reliably and with less dependence on order in Windows Vista - Windows XP keeps a load of \Windows\$NtUninstall... directories to manage it, but it has issues on ordering of installs and uninstalls.
Back in Windows 2000, you could install all the data from the CD-ROM onto the hard disk to avoid needing the CD when turning features on or off. I used it, but few people knew about it as you had to set it all up by hand (some OEMs did bother, though). This was a much simpler version of what \Windows\winsxs is in Windows Vista today. If you've ever turned on East-Asian support in Windows XP, you'll probably know how annoying it is to go and find the CD just to copy some fonts and locale support - this is just one of many places where Windows Vista already has the data available and it will never need you to rummage for the install DVD.
Why is winsxs not entirely a waste of space?
I'm sure you can appreciate that not needing to rummage for a DVD when changing features is good, but you probably don't do that often. Windows Vista might have a lot of data in \Windows\winsxs, but what about the features that are enabled? Don't they use up twice their size in disk space?
Here's the clever bit.
Installed features only consume space for their data, if they have any, but not for any of the code. How does that work? Hardlinks, a little-known feature of NTFS since Windows 2000 that allow the same file (and the same data) to be available from multiple locations in the directory hierarchy. The important part of measuring disk usage is that the vast majority of tools - Windows Explorer included - don't know or don't bother to know about hardlinks, which is partially why I wrote my own tool to explore this further.
Let's look at an example, C:\Windows\system32\shell32.dll ("Hardlink Scanner.exe" /F C:\Windows\system32\shell32.dll):
Breakdown for "C:\Windows\system32\shell32.dll"
===============================================
Unique ID: 8e0000000a1534
Hardlink count: 2
Naive file size: 25,794,560 bytes
Unique file size: 12,897,280 bytes
Kind of file: normal
Filenames:
\Windows\winsxs\amd64_microsoft-windows-shell32_31bf3856ad364e35_6.0.6001.18062_none_c808e76dca883949\shell32.dll
\Windows\System32\shell32.dll
In this example, it's clear this is a 64bit system as it is the "amd64" architecture that is linked into \Windows\system32. More important is that although both locations appear to have their own 12MB shell32.dll, using nearly 25MB in total, there is actually only one file and only 12MB of disk space is used.
Another example, the entire directory C:\Windows\SysWOW64 ("Hardlink Scanner.exe" C:\Windows\SysWOW64):
Hardlinks: Files Unique Files
<total> 4,278 4,275 ( 99%)
1 394 ( 9%) 394 ( 9%)
2 3,101 ( 72%) 3,101 ( 72%)
3 725 ( 16%) 725 ( 16%)
4 53 ( 1%) 50 ( 1%)
5 1 ( 0%) 1 ( 0%)
7 4 ( 0%) 4 ( 0%)
Here, about 90% of the files in the directory have at least one other hardlink that is outside this directory (you can work this out by comparing the "Files" and "Unique Files"; if the hardlinks were all inside this directory, the unique counts would be significantly lower). The higher-than-2 hardlink counts also bring up another clever feature of the Windows Vista \Windows\winsxs directory. Let's have a look at the 32bit MPEG 2 decoder ("Hardlink Scanner.exe" /F C:\Windows\SysWOW64\Mpeg2Data.ax):
Breakdown for "C:\Windows\SysWOW64\Mpeg2Data.ax"
================================================
Unique ID: 100000004f2ab
Hardlink count: 4
Naive file size: 278,528 bytes
Unique file size: 69,632 bytes
Kind of file: normal
Filenames:
\Windows\winsxs\x86_microsoft-windows-v..e-filters-tvdigital_31bf3856ad364e35_6.0.6001.18061_none_dbbd58ec573f657a\Mpeg2Data.ax
\Windows\winsxs\x86_microsoft-windows-v..e-filters-tvdigital_31bf3856ad364e35_6.0.6001.18000_none_dbfd382a570fa47d\Mpeg2Data.ax
\Windows\SysWOW64\Mpeg2Data.ax
\Windows\winsxs\x86_microsoft-windows-v..e-filters-tvdigital_31bf3856ad364e35_6.0.6001.18115_none_dbf76b9657133c48\Mpeg2Data.ax
Although there are three different versions of this package, Mpeg2Data.ax didn't change between versions and so \Windows\winsxs has used hardlinks between the packages as well as to the final destination. That's reduced the space consumption to just 25% for this file.
What does the saving look like for the whole of \Windows? ("Hardlink Scanner.exe" C:\Windows):
Hardlinks: Files Unique Files
<total> 93,670 76,585 ( 81%)
1 61,354 ( 65%) 61,354 ( 80%)
2 22,750 ( 24%) 12,467 ( 16%)
3 6,117 ( 6%) 2,143 ( 2%)
4 552 ( 0%) 138 ( 0%)
5 1,365 ( 1%) 273 ( 0%)
6 796 ( 0%) 133 ( 0%)
7 329 ( 0%) 47 ( 0%)
8 200 ( 0%) 25 ( 0%)
9 18 ( 0%) 2 ( 0%)
10 30 ( 0%) 3 ( 0%)
Naive file size: 25,266,466,330
Unique file size: 19,881,302,264 ( 78%)
Difference: 5,385,164,066 ( 21%)
That's more than 5GB saved by using hardlinks within \Windows. Scanning my entire drive ("Hardlink Scanner.exe" C:), to include \Program Files (Internet Explorer and Media Player, for example) gives an overall difference of 5.9GB.
So, just takes those computed sizes from Windows Explorer (and other apps) with a little pinch of salt when dealing with Windows Vista's system files.
Code: Hardlink Scanner (x86), Hardlink Scanner (x64) and Hardlink Scanner source (Visual Studio 2008 Solution).