Deploy a 2-node vSAN cluster

A 2-node hyperconverged cluster is useful in branch office where you need high availability or for small infrastructure. With 2-node hyperconverged solution, you don’t need to leverage a NAS or a SAN for the shared storage. So, the hardware footprint is reduced and the manageability is improved because hyperconverged solution are easier to use than standard infrastructure with SAN. VMware provides a Software-Defined Storage solution called vSAN. vSAN can be deployed from 2 nodes to 16 nodes. 2-node cluster should be used for ROBO (Remote Office and Branch Office).

A 2-node cluster requires a Witness Appliance provided by VMware freely while the appliance is virtual. The Witness Appliance is based on ESXi. This is the first time that VMware supports a scenario in production with a nested ESXi. This topic describes how to deploy a 2-node vSAN cluster and its witness appliance.

Why you need a witness appliance

vSAN is something like a RAID over the network. vSAN currently supports RAID 1 and RAID 5/6. When you deploy a 2-node vSAN cluster, only the RAID 1 is available. When a VM objects such as VMDK is stored in vSAN, the data is written to a node and replicated to another (such as classical RAID 1 across two physical disks). So, two components will be created: the original data and the replica.

In vSAN environment, a storage object such as VMDK need more than half its components alive to be ready. So, in the above vSAN cluster, if a node is down, you lose half of the VMDK components and so the VMDK is not ready anymore. Not really a resilient solution :).

To solve this issue, VMware has introduced the vSAN Witness Appliance. Thanks to this appliance, in addition of these two components, a witness will be created. So even if you lose a node or the witness appliance, more than half of the components are available.

The Witness Appliance must not be located in the 2-node vSAN Cluster. It is not supported by VMware. You can deploy a third ESXi and deploy the Witness Appliance inside this ESXi. But the witness appliance must have access to the vSAN network.

The witness appliance is provided by VMware from an OVA file. It is free and a special license is provided with the appliance. So, it is really easy to deploy.

Requirements

To deploy this infrastructure, you need two nodes (physical or virtual) and at least a storage device for the cache and a storage device for the capacity. If you deploy a full flash solution, a 10Gb/s network is recommended for vSAN traffic. On my side, I have deployed the 2-node vSAN on this hardware for each node:

  • 1x Asrock D1520D4i (Xeon 1520) (NIC: 2x 1GB Intel i210 for VM and management)
  • 4x16GB DDR4 ECC Unregistered
  • 1x Intel NVMe 600T 128GB (Operating System)
  • 1x Intel S3610 400GB (Cache)
  • 1x Samsung SM863 480GB (Capacity)
  • 1x Intel x520-DA2 for the vSAN traffic and vMotion

These both nodes are already in a cluster and connected to a Synology NAS. Currently all VMs are stored on Synology NAS. Both nodes are direct connected via 10Gb adapters.

The storage adapter provided by the D1520D4i motherboard is not in the vSAN HCL. I strongly recommend you to check HCL before buying hardware for production.

To compute the memory resource needed for vSAN you can use this formula provided by VMware:

BaseConsumption + (NumDiskGroups x ( DiskGroupBaseConsumption + (SSDMemOverheadPerGB x SSDSize)))

  • BaseConsumption: This is the fixed amount of memory consumed by vSAN per ESXi host. This is currently 3 GB. This memory is mostly used to house the vSAN directory, per host metadata, and memory caches.
  • NumDiskGroups: This is the number of disk groups in the host, should range from 1 to 5.
  • DiskGroupBaseConsumption: This is the fixed amount of memory consumed by each individual disk group in the host. This is currently 500 MB. This is mainly used to allocate resources used to support inflight operations on a per disk group level.
  • SSDMemOverheadPerGB: This is the fixed amount of memory we allocate for each GB of SSD capacity. This is currently 2 MB in hybrid systems and is 7 MB for all flash systems. Most of this memory is used for keeping track of blocks in the SSD used for write buffer and read cache.
  • SSDSize: The size of the SSD in GB. (cache)

So, in my case:

3GB + (1 x (0,5GB + (0,007GB x 400GB)))= 6,3GB

My node requires at least 6,3GB of free memory for vSAN.

Regarding the vSAN witness appliance (version 6.2), you can download the OVA here. In my deployment, I will do something not supported. I will place the witness appliance in the 2-node vSAN Cluster. It is absolutely not supported in production so, don’t reproduce this for your production environment. Deploy the witness appliance inside a third ESXi node.

I also recommend you the following PDF:

Deploy the vSAN witness appliance

To deploy the witness appliance, navigate to vSphere web client and right click on the cluster or node where you want host the appliance. Select Deploy OVF template.

Next choose a host or a cluster to run the witness appliance.

In the next screen, you can review the details of the OVF that you deploy. As indicated in the below screenshot, the product is VMware Virtual SAN Witness Appliance.

Next accept the license agreements and click on next.

The model provides three deployment configurations. Choose one of them regarding your environment. In the description, you can review the supported environment for each deployment configuration.

Then choose a storage where you want to store the witness appliance files.

Next choose the network to connect the witness appliance.

To finish specify a root password. Then click on next and run the deployment.

Configure the witness appliance network

Once the witness appliance is deployed, you can start it. Then open a remote console.

When the appliance has started, you can configure the network like any ESXi nodes.

So, I set the network by configuring static IP. I also configure the name of the appliance and I disable IPv6.

When I have finished the settings, my appliance looks like this:

Add appliance to vCenter

The witness appliance can be added to vCenter like any ESXi nodes. Just right click on a datacenter or folder and select Add host.

Next provide connection settings and credentials. When you are in assign license screen, select the license related the the witness appliance.

When you have finished the wizard, the witness appliance should be added to vCenter.

Once you have added the witness appliance, navigate to Configure | VMKernel Adapters and check if vmk1 has vSAN traffic enabled.


Deploy 2-Node vSAN Cluster

Because my two nodes are already in a DRS cluster, I have to turn off the vSphere HA. You can’t enable vSAN in a cluster where vSphere HA is enabled. To turn off vSphere HA, select the cluster and select Configure | vSphere Availability.


Next navigate to Virtual SAN and select General. Then click on Configure.


Then I enable the Deduplication and Compression and I choose Configure two host Virtual SAN cluster.


Next the wizard check if vSAN adapters are available.


Then the wizard claims disk for cache tier and capacity tier.


Next choose the witness appliance and click on next.


Next, you should have a disk for the cache tier and another for the capacity tier. Just click on next.


To enable vSAN, just click on finish.


When vSAN is enabled successfully, you should have three servers and at least three diskgroups (2 nodes and the witness appliance).


In Fault Domains & Stretched Cluster you should have something like this screenshot. The witness host should be enabled. You can see that the 2-node configuration is the same as stretched cluster.

Now you can enable again the vSphere HA as below.

After moving a virtual machine to vSAN, you can see the below configuration. The VMDK has two components and a witness. Even if I lose one of the components or the witness, the VMDK will be ready.

Final configuration

In this section, you can find some recommendations provided by VMware for vSAN. These recommendations regard the configuration of the cluster, especially the vSphere Availability. First I change the heartbeat datastores setting to Use datastores only from the specified list and select no Datastore. This is a VMware recommendation for vSAN when vSAN nodes are also connected to another VMFS or NFS datastore. The heartbeat datastores is disabled to leave only the network heartbeat. If you leave heartbeat datastores enabled, if the network fails, the vSphere HA will not restart VM to another node. If you don’t want a VM restart to another node in case of network failure, keep this setting enabled.

To avoid warning because of datastore heartbeat is disabled (number of vSphere HA heartbeat datastore for this host is 0, which is less than required:2), you can add the following line in advanced options:

Das.ignoreInsufficientHbDatastore = True

For vSAN configuration, VMware recommends to enable Host Monitoring and change the response for host isolation to Power off and restart VMs. Thanks to Host Monitoring, the network will be used as heartbeating to determine the state of a host. The datastore with PDL (Permanent Device lost) and APD (All Path Down) should be disabled (for further information read this documentation). To finish, configure the VM Monitoring as you wish.

Conclusion

VMware vSAN provides an easy way for HA VM storage in branch office. If I compare with Microsoft Storage Spaces Direct, the 2-Node vSAN cluster is more complex to deploy because of the Witness Appliance. This appliance requires a third ESXi node in the same site or another datacenter. With Storage Spaces Direct, I can use a simple file share or Microsoft Azure as a Witness. Except this issue, vSAN is a great solution for your hyperconverged infrastructure.

About Romain Serre

Romain Serre works in Lyon as a Senior Consultant. He is focused on Microsoft Technology, especially on Hyper-V, System Center, Storage, networking and Cloud OS technology as Microsoft Azure or Azure Stack. He is a MVP and he is certified Microsoft Certified Solution Expert (MCSE Server Infrastructure & Private Cloud), on Hyper-V and on Microsoft Azure (Implementing a Microsoft Azure Solution).

15 comments

  1. Very awesome write up. The beginning screen shots were very helpful in understanding the layout of the 2 node cluster. Also I found this article detailing how to configure the direct connect networking to be very helpful. It shows how to tag an interface with witness traffic. https://blogs.vmware.com/virtualblocks/2016/10/18/2nodedirectconnect/

    1 question. If the 2 node acts as a RAID 1, does that mean I will only get 1/2 of my data storage capacity because it is mirrored? I set it up like you did above and my datastore shows the full amount of all disks.

  2. Do you still use the hardware RAID beneath the Van RAID?

  3. What I’m wondering about is what licensing is required for a stand-alone 2 node vSAN? Will it work with vSphere essentials plus along with a ROBO license?

  4. Hi Romain,
    On your final configuration, VMware recommends that because you disable HA Heartbeat datastore that you should add additional vSAN isolation addresses from the vSAN network.
    However, with a 2 node direct-connect you do not have any additional vSAN IP addresses available. What would be your recommendation on this?

  5. Thanks for this — was able to follow the steps you provided to setup our 2 node cluster.
    Just a couple of questions:
    – On our 2 node vsan cluster, if I take 1 host out and select “Full Evacuation” maintenance mode. It errors out and does not allow you to evacuate all data.
    – If above fails, how would one upgrade the ESXi host to the next version in the future (assuming a full install instead)?

    Thanks…

    • Hi,

      A full evacuation is not mandatory. In 2-node configuration, each component has 3 objects:
      – the initial data
      – the replicated data
      – a witness

      The witness is managed by the Witness ESXi appliance. Initial data is on one node while the replicated data is in the other.

      While two of these objects is online, the component is reachable. So you can stop a node or stop a witness and your VM keep running.

      In full evacuation, you migrate everything from one node to the other. This option is useful when you plan to replace the host permanently. For ESXi update, select Ensure data accessibility from other hosts.

  6. Thanks for the excellent article Romain. There is one question if i deploy the witness appliance in a 2 host configuration, say hostA and hostB and my witness appliance is on hostB and this hostB went down and so did the Witness.

    How will HA work in this scenario.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

x

Check Also

Don’t do it: enable performance history in an Azure Stack HCI mixed mode cluster

Lately I worked for a customer to add two nodes in an existing 2-nodes Storage ...

Keep Dell Azure Stack HCI hardware up to date with WSSD Catalog

The firmware and driver’s management can be a pain during the lifecycle of an Azure ...

Storage Spaces Direct: performance tests between 2-Way Mirroring and Nested Resiliency

Microsoft has released Windows Server 2019 with a new resiliency mode called nested resiliency. This ...