S2D Real case: detect a lack of cache

Posted by: Romain Serre in Storage December 4, 2018 1 Comment 3,910 Views

Last week I worked for a customer who went through a performance issue on a S2D cluster. The customer’s infrastructure is composed of one compute cluster (Hyper-V) and one 4-node S2D cluster. First, I checked if it was related to the network and then if it’s a hardware failure that produces this performance drop. Then I ran the script watch-cluster.ps1 from VMFleet.

The following screenshot comes from watch-cluster.ps1 script. As you can see, a CSV has almost 25ms of latency. A high latency impacts overall performance especially when intensive IO applications are hosted. If we look into the cache, a lot of miss per second are registered especially on the high latency CSV. But why Miss/sec can produce a high latency?

What happens in case of lack of cache?

The solution I troubleshooted is composed of 2 SSD and 8 HDD per node. The cache ratio is 1:4 and its capacity is almost of 6,5% of the raw capacity. The IO path in normal operation is depicted in the following schema:

Now in the current situation, I have a lot Miss/Sec, that means that SSD cannot handle these IO because there is not enough cache. Below the schema depicts the IO path for miss IO:

You can see that in case of miss, the IO go to HDD directly without being cached in SSD. HDD is really slow compared to SSD and each time IO works directly with this kind of storage device, the latency is increased. When the latency is increased, the overall performance decrease.

How to resolve that?

To resolve this issue, I told to customer to add two SSD in each node. These SSD should be equivalent (or almost) than those already installed in nodes. By adding SSD, I improve the cache ratio to 1:2 and the capacity to 10% compared to raw capacity.

It’s really important to size kindly the cache tier when you design your solution to avoid this issue. As said a fellow MVP: storage is cheap, downtime is expensive.

Tech-Coffee

S2D Real case: detect a lack of cache

What happens in case of lack of cache?

How to resolve that?

Related

About Romain Serre

Related Posts

One comment

Leave a Reply Cancel reply

Don’t do it: enable performance history in an Azure Stack HCI mixed mode cluster

Keep Dell Azure Stack HCI hardware up to date with WSSD Catalog

Archive Rubrik backup in Microsoft Azure

Getting started with Azure Update Management to handle Windows updates

Getting started with Rubrik to backup VMware VMs

Check Also

Storage Spaces Direct: Parallel rebuild

Storage Spaces Direct and deduplication in Windows Server 2019

Real Case: Implement Storage Replica between two S2D clusters

S2D Real case: detect a lack of cache

What happens in case of lack of cache?

How to resolve that?

Share this:

Related

About Romain Serre

Related Posts

One comment

Leave a Reply Cancel reply

Check Also