Storage Spaces Direct: Parallel rebuild

Parallel rebuild is a Storage Spaces features that enables to repair a storage pool even if the failed disk is not replaced. This feature is not new to Storage Spaces Direct because it exists also since Windows Server 2012 with Storage Spaces. This is an automatic process which occurs if you have enough free space in the storage pool. This is why Microsoft recommends to leave some free space in the storage pool to allow the parallel rebuild. This amount of free space is often forgotten when designing Storage Spaces Direct solution, this is why I wanted to write this theoretical topic.

How works parallel rebuild

Parallel rebuild needs some free spaces to work. It’s like spare free space. When you create a RAID6 volume, a disk is in spare in case of failure. In Storage Spaces (Direct), instead of spare disk, we have spare free space. Parallel rebuild occurs when a disk fails. If enough of capacity is available, parallel rebuild runs automatically and immediately to restore the resiliency of the volumes. In fact, Storage Spaces Direct creates a new copy of the data that were hosted by the failed disk.

When you receive the new disk (4h later because you took a +4h support :p), you can replace the failed disk. The disk is automatically added to the storage pool if the auto pool option is enabled. Once the disk is added to the storage pool, an automatic rebalance process is run to spread data across all disks to get the best efficiency.

How to calculate the amount of free spaces

Microsoft recommends to leave free space equal to one capacity disk per node until 4 drives:

  • 2-node configuration: leave free the capacity of 2 capacity devices
  • 3-node configuration: leave free the capacity of 3 capacity devices
  • 4-node and more configuration: leave free the capacity of 4 capacity devices

Let’s think about a 4-node S2D cluster with the following storage configuration. I plan to deploy 3-Way Mirroring:

  • 3x SSD of 800GB (Cache)
  • 6x HDD of 2TB (Capacity). Total: 48TB of raw storage.

Because, I deploy a 4-node configuration, I should leave free space equivalent to four capacity drives. So, in this example 8TB should be the amount of free space for parallel rebuild. So, 40TB are available. Because I want to implement 3-Way Mirroring, I divide the available capacity per 3. So 13.3TB is the useable storage.

Now I choose to add a node to this cluster. I don’t need to reserve space for parallel rebuild (regarding the Microsoft recommendation). So I add 12TB capacity (6x HDD of 2TB) in the available capacity for a total of 52TB.

Conclusion

Parallel rebuild is an interesting feature because it enables to restore the resiliency even if the failed disk is not yet replaced. But parallel rebuild has a cost regarding the storage usage. Don’t forget the reserved capacity when you are planning the capacity.

About Romain Serre

Romain Serre works in Lyon as a Senior Consultant. He is focused on Microsoft Technology, especially on Hyper-V, System Center, Storage, networking and Cloud OS technology as Microsoft Azure or Azure Stack. He is a MVP and he is certified Microsoft Certified Solution Expert (MCSE Server Infrastructure & Private Cloud), on Hyper-V and on Microsoft Azure (Implementing a Microsoft Azure Solution).

2 comments

  1. Romain,

    First let me thank you for sharing your knowledge and experience.

    I wonder if you have an idea for the rational behind this recommendation?

    I generally leave the equivalent of the largest perfomrance drive and the largest capacity drive, so that in the case of any single-drive failure, after rebalancing I can remove the drive and replace it.

    That is, of course, much less tan the MS recommendation. The only downside I see is for multiple-drive failures.

    Or am I missing something else?

    Thanks again,

    Sergio

    • Hey,

      First this is a recommendation and it is not mandatory. We do that in case of failure. When a disk crashes, if you have free space, the data that should be on the disk is replicated in the free space. So you don’t have to wait for receiving the new disk to restore again the resiliency. It’s like a “pro-active” repair process and I find it really genius 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

x

Check Also

Storage Spaces Direct and deduplication in Windows Server 2019

When Windows Server 2016 has been released, the data deduplication was not available for ReFS ...

Real Case: Implement Storage Replica between two S2D clusters

This week, in part of my job I deployed a Storage Replica between two S2D ...

Deploy a Software-Defined Storage solution with StarWind Virtual SAN

StarWind Virtual SAN is a Software-Defined Storage solution which enables to replicate data across several ...