First time data deduplication was introduced in Windows Server 2012. In the Windows Server 2016 represented the 3-rd version of deduplication component, significantly revised and improved. In this article we will take a look at the new deduplication features, settings and how it differs from previous implementations.
What’s New in Data Deduplication on Windows Server 2016
- First and the most important change in Windows Server 2016 data deduplication is the introduction of multi-threading. Windows Server 2012 R2 deduplication works in a single-threaded mode and can’t use more than one processor core on a single volume. This severely limits the performance, and to bypass this restriction it is necessary to split the disks on several volumes of smaller size. The maximum volume size should not be more than 10Tb.
On Windows Server 2016 a revised engine can run deduplication job in multi-threaded mode, each volume using multiple computing threads and I/O queues. The introduction of multi-threading and other changes in the engine affected the limits on the size of files and volumes. Because deduplication multithreading increases performance and eliminates the need to partition a disk into multiple volumes in Windows Server 2016, you can use deduplication for 64Tb volume. Also the maximum file size has increased, file deduplication is now supported up to 1TB.
- Support for virtualized backup applications. In Windows Server 2012 there was only one type of deduplication, designed primarily for the ordinary file servers. Deduplication of continuously running VM is not supported, since deduplication does not know how to work with open files.
Windows Server 2012 R2 data deduplication started to use VSS, respectively, started to support deduplication of virtual machines. For such tasks there is a separate type of deduplication.
In Windows Server 2016 added another, a 3-rd type of deduplication, designed specifically for virtualized backup servers (eg. DPM).
- Nano Server Support. Nano Server — this option allows to deploy Windows Server 2016 operating system with a minimum number of installed components. Nano Server fully supports deduplication.
- Support of Cluster OS Rolling Upgrade. Cluster OS Rolling Upgrade — a new feature of Windows Server 2016, which can be used in sequence to upgrade the operating system on each cluster node from Server 2012 R2 to Server 2016 without stopping the cluster. This is possible thanks to a special mixed-mode operation of the cluster, when it nodes at the same time can work under Windows Server 2012 R2 and Windows Server 2016.
Mixed mode means that the same data may be located at nodes with different versions of deduplication. Deduplication in Windows Server 2016 supports this mode and provides access to deduplicated data during the cluster upgrade process.
How to Install and Enable Deduplication Feature on Windows 2016
The first thing you need to enable the deduplication — to install the appropriate server role. You can use «Server Manager». Run the Server Role wizard and add the file server role with the component «Data Deduplication».
Or execute the following PowerShell command:
Install-WindowsFeature -Name FS-Data-Deduplication -IncludeAllSubfeature -IncludeManagementTools
How to Enable and Configure Deduplication
After installing the components you need to enable deduplication for a specific volume (or multiple volumes). This can be done in 2 different ways — from the graphics snap in or using PowerShell.
To configure component from the GUI, open the Server Manager, go to the section File and storage services -> Volumes, select the desired volume, right click and from the menu select Configure Data Deduplication.
Then select the desired type of deduplication (General puprose file server, for example) and press Apply. Additionally, you can specify the types of files that should not be exposed to deduplication as well as to exclude certain directories.
Next you need to set up a schedule by which the deduplication job will work. Сlick on the button Set Deduplication Schedule.
By default, the background optimization is enabled, and you can configure 2 additional tasks of throughput optimization. Here you will find a few settings available — you can only select a days of the week, start time and work duration.
PowerShell provides you with many options to customize the deduplication. To enable deduplication, use the following command:
Enable-DedupVolume -Name D: -UsageType HyperV
List current deduplication jobs:
As you can see, in the addition to the background task optimization, there are priority optimization job (PriorityOptimization), as well as jobs of garbage collection (GarbageCollection) and cleaning (Scrubbing). All these tasks can’t be seen from the GUI.
PowerShell allows you to fine-tune the parameters of the Dedup jobs. For example, create a new optimization task. The task should be started at 9 AM Monday through Friday and work for 11 hours, with normal priority, use no more than 20% of RAM and 20% CPU:
New-DedupSchedule -Name ThroughputOptimization -Type Optimization -Days @(1,2,3,4,5) -DurationHours 11 -Start (Get-Date ″12/8/2016 9:00 PM″) -Memory 20 -Cores 20 -Priority Normal
And disable the priority optimization:
Set-DedupSchedule -Name PriorityOptimization -Enabled $false
Manual deduplication run
If necessary, you can run deduplication job manually. For example, run a full optimization of the volume D with the highest priority:
Start-DedupJob -Volume D: -Type Optimization -Memory 75 -Cores 100 -Priority High -Full
Keep track of running deduplication jobs, you can use the command Get-DedupJob. Note that simultaneously can run only one task and the rest are in the queue and wait for its completion.
Viewing deduplication state
Data Deduplication state for the volume can be viewed using:
Get-DedupVolume -Volume D: | fl
So you can see the basic parameters of the volume — the total volume size, used and saved space, compression level etc.
To check the status of deduplication job use the command:
Get-DedupStatus -Volume D: | fl
How to disable data deduplication?
You can disable deduplication on a volume from GUI or by using PowerShell. For example:
Disable-DedupVolume -Name D:
Turning off deduplication for volume cancels all scheduled tasks. It also prevents the run of any deduplication tasks, except read-only operations (commands such as Get and unoptimization). The data can remain in the same condition in which it was before you turned off deduplication, just stop deduplication for a new files.
To return the data to its original state, use the procedure of un-deduplication. For example, the following command to disable deduplication for Volume D with the highest possible speed:
Start-DedupJob -Volume D: -Type Unoptimization -Memory 100 -Cores 100 -Priority High -Full
Please note that the additional space is required. If the free space on the volume is not enough, then the procedure will fail.