How to tune windows server 2012 R2 to handle NTFS file structure with 50 million files

cachentfsvmware-vspherewindows-server-2012-r2

I have a developer utility that I will use to generate 50 Million files. The directory structure goes four levels deep. The top level contains 16 directories (years 2000-2016), next level – months (1-12), next level – days (1 – 31) and then finally – xml files (up to 85k each). The final directory could have 3000+ files (I haven't done the math to figure out how 50 Million will fit within that directory structure).

I am currently running the utility and I'm about 1/3 of the way through (days to execute). As I feared, traversing any part of the directory tree is a painful experience. Takes several seconds just within explorer. This with server grade hardware. SAS 7200RPM (I know this isn't fast nowadays) 12 terabyte Raid 5 or 10, allocated with 4 3.4ghz xeon cpus.

How do I increase windows server 2012 R2 ability to cache file handles in memory? I do not have the NFS service running.


M:\>defrag /a /v /h m:
Microsoft Drive Optimizer
Copyright (c) 2013 Microsoft Corp.

Invoking slab consolidation on DB MDF (M:)...


The operation completed successfully.

Post Defragmentation Report:

    Volume Information:
            Volume size                 = 12.99 TB
            Cluster size                = 64 KB
            Used space                  = 1.55 TB
            Free space                  = 11.44 TB

    Slab Consolidation:
            Space efficiency            = 100%
            Potential purgable slabs    = 1

M:\>

C:\Windows\system32>fsutil fsinfo ntfsinfo m:
NTFS Volume Serial Number :       0x9c60357c60355de8
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x000000067ffbefff
Total Clusters :                  0x000000000cfff7df
Free Clusters  :                  0x000000000b6bcb45
Total Reserved :                  0x0000000000000004
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               65536
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000320900000
Mft Start Lcn  :                  0x000000000000c000
Mft2 Start Lcn :                  0x0000000000000001
Mft Zone Start :                  0x00000000018f8780
Mft Zone End   :                  0x00000000018f9420
Resource Manager Identifier :     A47067E0-6356-11E6-8

C:\Windows\system32>

Rammap

metafile details:
Total=2,882,220 K, Active=2,736,688 K, Standby=143,968 K, Modified=852
K, Modified no write=712 K.

What else would be of interest on this page?

At this time the server is allocated 16G of memory. I could ask for a lot more.


C:\Windows\system32>fsutil.exe 8dot3name query m:
The volume state is: 1 (8dot3 name creation is disabled).
The registry state is: 2 (Per volume setting - the default).

Based on the above two settings, 8dot3 name creation is disabled on m:

C:\Windows\system32>

Contig v1.8 - Contig
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals

m:\$Mft is in 80 fragments
m:\$Mft::$BITMAP is in 32 fragments

Summary:
     Number of files processed:      2
     Number unsuccessfully procesed: 0
     Average fragmentation       : 56 frags/file

NtfsInfo v1.2 - NTFS Information Dump
Copyright (C) 2005-2016 Mark Russinovich
Sysinternals - www.sysinternals.com


Volume Size
-----------
Volume size            : 13631357 MB
Total sectors          : 27917021183
Total clusters         : 218101727
Free clusters          : 184577826
Free space             : 11536114 MB (84% of drive)

Allocation Size
----------------
Bytes per sector       : 512
Bytes per cluster      : 65536
Bytes per MFT record   : 0
Clusters per MFT record: 0

MFT Information
---------------
MFT size               : 16210 MB (0% of drive)
MFT start cluster      : 49152
MFT zone clusters      : 33255616 - 33258848
MFT zone size          : 202 MB (0% of drive)
MFT mirror start       : 1

Meta-Data files
---------------

Best Answer

You currently have an MFT of 0x320900000 = 13,431,209,984 bytes = 12 GiB in size, with only 2GiB of that in memory. More RAM will allow you to have more of that in memory, if you want to cache more of the "file handles" aka file system metadata.

No matter what filesystem you use, there will be metadata, and depending on filesystem usage patterns you may be better off investing in more ram AND/OR faster storage. If the amount of metafile information is unrealistic to store it all in RAM and your file usage patterns are typically dealing with new files instead of repeatedly using a smaller subset of files, then faster storage like raid 10 arrays with many mirror pairs to stripe across made from faster SSD and/or 15K RPM SAS disks, may be needed to limit the seek time and increase the amount of available IOPs the storage can handle.

Keep in mind that Windows memory manager's default settings may not apply to your situation and you may need to tweak some settings, particularly if you're not planning on having enough RAM to fit the whole MFT in RAM in addition to what the rest of the system requires. I notice that nearly all of your metafile data is marked as Active memory, meaning the Windows caching system is not allowed to discard it out of RAM when it is not being used. My powershell script on Windows Server 2008 R2 Metafile RAM Usage can be used (even on Server 2008 to 2012R2, and I expect 2016) to set minimum and maximums on the amount of metafile memory that is marked as active, and forcing the rest to be standby. This allows the cache system to prioritise what is in RAM better.

Edit: While I'm not familiar with jmeter, it sounds like the filesystem usage pattern is going to be

  1. write them all in a sequential manner.
  2. read them all as fast as it can in a mostly sequential manner
  3. read them all a second time in a partly random pattern (as each thread competes to read the group of files it wants) to send them over the network

With that usage patter, to see a reasonable benefit of adding a LOT more ram you would need to add enough to fit the whole MFT in RAM. This is generally a waste of money. When it would be more cost effective to add a bit more RAM, and to upgrade the storage to significantly improve the IOPs. This should be faster than keeping a slow 7.2K rpm disk raid5 array, or even a raid10 made with only 4 disks with colossal amounts of ram, as the metadata is not the only data being read/written from/to storage. See this calculator as an estimation tool on expected IOPs performance, and how different number of disks and raid levels affect performance.

In the above case, the only way that adding even more ram can be faster than using a system with faster storage, is if you add enough ram that all data, including file content will be in ram as well. This is why some database systems advertise that they operate "100% in memory" so that there are no storage system delays.

Related Topic