Parallel rebuild is a Storage Spaces features that enables to repair a storage pool even if the failed disk is not replaced. This feature is not new to Storage Spaces Direct because it exists also since Windows Server 2012 with Storage Spaces. This is an automatic process which occurs if you have enough free space in the storage pool. This is why Microsoft recommends to leave some free space in the storage pool to allow the parallel rebuild. This amount of free space is often forgotten when designing Storage Spaces Direct solution, this is why I wanted to write this theoretical topic.
How works parallel rebuild
Parallel rebuild needs some free spaces to work. It’s like spare free space. When you create a RAID6 volume, a disk is in spare in case of failure. In Storage Spaces (Direct), instead of spare disk, we have spare free space. Parallel rebuild occurs when a disk fails. If enough of capacity is available, parallel rebuild runs automatically and immediately to restore the resiliency of the volumes. In fact, Storage Spaces Direct creates a new copy of the data that were hosted by the failed disk.
When you receive the new disk (4h later because you took a +4h support :p), you can replace the failed disk. The disk is automatically added to the storage pool if the auto pool option is enabled. Once the disk is added to the storage pool, an automatic rebalance process is run to spread data across all disks to get the best efficiency.
How to calculate the amount of free spaces
Microsoft recommends to leave free space equal to one capacity disk per node until 4 drives:
- 2-node configuration: leave free the capacity of 2 capacity devices
- 3-node configuration: leave free the capacity of 3 capacity devices
- 4-node and more configuration: leave free the capacity of 4 capacity devices
Let’s think about a 4-node S2D cluster with the following storage configuration. I plan to deploy 3-Way Mirroring:
- 3x SSD of 800GB (Cache)
- 6x HDD of 2TB (Capacity). Total: 48TB of raw storage.
Because, I deploy a 4-node configuration, I should leave free space equivalent to four capacity drives. So, in this example 8TB should be the amount of free space for parallel rebuild. So, 40TB are available. Because I want to implement 3-Way Mirroring, I divide the available capacity per 3. So 13.3TB is the useable storage.
Now I choose to add a node to this cluster. I don’t need to reserve space for parallel rebuild (regarding the Microsoft recommendation). So I add 12TB capacity (6x HDD of 2TB) in the available capacity for a total of 52TB.
Parallel rebuild is an interesting feature because it enables to restore the resiliency even if the failed disk is not yet replaced. But parallel rebuild has a cost regarding the storage usage. Don’t forget the reserved capacity when you are planning the capacity.