Failover Cluster – Tech-Coffee

The cluster resource could not be deleted since it is a core resource

Romain Serre — Tue, 16 Jan 2018 15:50:51 +0000

The last month, I wanted to change the Witness of a cluster from a Cloud Witness to a File Share Witness. The cluster is a 2-node S2D cluster and as discussed with Microsoft, Cloud Witness should not be used with a 2-node cluster. If the Cloud Witness fails (for example when subscription has expired), the cluster can crashes. I’ve experienced this in production. So, all my customers, I decided to change the cloud witness to file share witness.

Naivly, I tried to change the Cloud Witness to a File Share witness. But it doesn’t work You’ll get this message: The cluster resource could not be deleted since it is a core resource. In this topic I’ll show you the issue and the resolution

Issue

As you can see below, my cluster is using a Cloud Witness to add an additional vote.

So, I decide to replace the Cloud Witness by a File Share Witness. I right click on the cluster | more Actions | Configure Cluster Quorum Settings.

Then I choose Select the quorum witness.

I select configure a file share witness.

I specify the file share path as usual.

To finish I click next and it should replace the cloud witness by the file share witness.

Actually no, you should get the following error message.

If you check the witness state, it is offline.

This issue occurs because we can’t remove cluster core resources. So how I remove the Cloud Witness?

Resolution

To remove the Cloud Witness, choose again to configure the cluster quorum and this time select Advanced Quorum Configuration.

Then select Do not configure a quorum witness.

Voilà, the Cloud Witness is gone. No you can add the File Share Witness.

In the below screenshot you can see the configuration to add a file share witness to the cluster.

Now the file share witness is added to the cluster.

Conclusion

If you want to remove a Cloud Witness or a File Share Witness, you have first to not configure a witness for the quorum. Then you can add the witness type you want.

The post The cluster resource could not be deleted since it is a core resource appeared first on Tech-Coffee.

Cluster Health Service in Windows Server 2016

Romain Serre — Tue, 29 Nov 2016 21:22:00 +0000

Before Windows Server 2016, the alerting and the monitoring of the cluster were managed by monitoring tools such as System Center Operations Manager (SCOM). The monitoring tool used WMI, PowerShell scripts, performance counters or whatever to get the health of the cluster. In Windows Server 2016, Microsoft has added the Health Service in the cluster that provides metrics and fault information. Currently, the Health Service is enabled only when using Storage Spaces Direct and no other scenario. When you enable the Storage Spaces Direct in the cluster, the Health Service is also enabled automatically.

The health service aggregates monitoring information (fault and metrics) of all nodes in the cluster. These information are available from a single point and can be used by PowerShell or across API. The Health Service can raise alerts in real-time regarding event in the cluster. These alerts contain the severity, the description, the recommended action and the location information related to the fault domain. Health Service raises alerts for several faults as you can see below:

The rollup monitors can help to find a root cause of a fault. For example, in order that the server monitor is healthy, all underlying monitor must be also healthy. If an underlying monitor is not healthy, the parent monitor shows an alert.

In the above example, a drive is down in a node. So, the Health Service raises an alert for the drive and the parent node monitor is in error state.

In the next version, the Health Service will be smart. The cluster monitor will be “only” in warning state because the cluster still has enough node to run the service and after all, a single drive down is not a severe issue for the service. This feature should be called severity masking

The Health Service is also able to gather metrics about the cluster such as IOPS, capacity, CPU usage and so on.

Use Cluster Health Service

Show Metrics

To show the metrics gathered by the Health Service, run the cmdlet Get-StorageHealthReport as below:

Get-StorageSubSystem *Cluster* | Get-StorageHealthReport

As you can see, you have consolidated information as the memory available, the IOPS, the capacity, the average CPU usage and so on. We can imagine a tool that gather information from the API several times per minute to show charts or pies with these information.

Show Alerts

To get current alerts in the cluster, run the following cmdlet:

Get-StorageSubSystem *Cluster* | Debug-StorageSubSystem

To show you screenshot, I run cmdlet against my lab Storage Spaces Direct cluster which is not best practices. The following alert is raised because I have not enough reserve capacity:

Then I stop a node in my cluster:

I have several issues in my cluster! The Health Service has detected that the node is done and that some cables are disconnected. It is because my Mellanox adapters are direct attached to the other node.

SCOM Dashboard

This dashboard is not yet available at the time of writing but in the future, Microsoft should releaser the below SCOM dashboard which leverage the Cluster Health Service.

Another example: DataOn Must

DataOn is a company that provides hardware which are compliants with Storage Spaces (Direct). DataOn has also released dashboards called DataOn Must which are based on Health Service. DataOn Must is currently only available when you buy DataOn hardware. Thanks to Health Service API, we can have fancy and readable charts and pies about the health of the Storage Spaces Direct Cluster.

I would like thanks Cosmos Darwin for the topic review and to have left me the opportunity to talk about severity masking.

The post Cluster Health Service in Windows Server 2016 appeared first on Tech-Coffee.

2-node hyperconverged cluster with Windows Server 2016

Romain Serre — Fri, 07 Oct 2016 08:01:52 +0000

Last week, Microsoft announced the final release of Windows Server 2016 (the bits can be downloaded here). In addition, Microsoft has announced that Windows Server 2016 supports now a 2-node hyperconverged cluster configuration. I can now publish the setup of my lab configuration which is almost a production platform. Only SSD are not enterprise grade and one Xeon is missing per server. But to show you how it is easy to implement a hyperconverged solution it is fine. In this topic, I will show you how to deploy a 2-node hyperconverged cluster from the beginning with Windows Server 2016. But before running some PowerShell cmdlet, let’s take a look on the design.

Design overview

In this part I’ll talk about the implemented hardware and how are connected both nodes. Then I’ll introduce the network design and the required software implementation.

Hardware consideration

First of all, it is necessary to present you the design. I have bought two nodes that I have built myself. Both nodes are not provided by a manufacturer. Below you can find the hardware that I have implemented in each node:

CPU: Xeon 2620v2
Motherboard: Asus Z9PA-U8 with ASMB6-iKVM for KVM-over-Internet (Baseboard Management Controller)
PSU: Fortron 350W FSP FSP350-60GHC
Case: Dexlan 4U IPC-E450
RAM: 128GB DDR3 registered ECC
Storage devices:
- 1x Intel SSD 530 128GB for the Operating System
- 1x Samsung NVMe SSD 950 Pro 256GB (Storage Spaces Direct cache)
- 4x Samsung SATA SSD 850 EVO 500GB (Storage Spaces Direct capacity)
Network Adapters:
- 1x Intel 82574L 1GB for VM workloads (two controllers). Integrated to motherboard
- 1x Mellanox Connectx3-Pro 10GB for storage and live-migration workloads (two controllers). Mellanox are connected with two passive copper cables with SFP provided by Mellanox
1x Switch Ubiquiti ES-24-Lite 1GB

If I were in production, I’d replace SSD by enterprise grade SSD and I’d add a NVMe SSD for the caching. To finish I’d buy server with two Xeon. Below you can find the hardware implementation.

Network design

To support this configuration, I have created five network subnets:

Management network: 10.10.0.0/24 – VID 10 (Native VLAN). This network is used for Active Directory, management through RDS or PowerShell and so on. Fabric VMs will be also connected to this subnet.
DMZ network: 10.10.10.0/24 – VID 11. This network is used by DMZ VMs as web servers, AD FS etc.
Cluster network: 10.10.100/24 – VID 100. This is the cluster heart beating network
Storage01 network: 10.10.101/24 – VID 101. This is the first storage network. It is used for SMB 3.11 transaction and for Live-Migration.
Storage02 network: 10.10.102/24 – VID 102. This is the second storage network. It is used for SMB 3.11 transaction and for Live-Migration.

I can’t leverage Simplified SMB MultiChannel because I don’t have a 10GB switch. So each 10GB controller must belong to separate subnets.

I will deploy a Switch Embedded Teaming for 1GB network adapters. I will not implement a Switch Embedded Teaming for 10GB because a switch is missing.

Logical design

I will have two nodes called pyhyv01 and pyhyv02 (Physical Hyper-V).

The first challenge concerns the failover cluster. Because I have no other physical server, the domain controllers will be virtual. if I implement domain controllers VM in the cluster, how can start the cluster? So the DC VMs must not be in the cluster and must be stored locally. To support high availability, both nodes will host a domain controller locally in the system volume (C:\). In this way, the node boot, the DC VM start and then the failover cluster can start.

Both nodes are deployed in core mode because I really don’t like graphical user interface for hypervisors. I don’t deploy the Nano Server because I don’t like the Current Branch for Business model for Hyper-V and storage usage. The following feature will be deployed for both nodes:

Hyper-V + PowerShell management tools
Failover Cluster + PowerShell management tools
Storage Replica (this is optional, only if you need the storage replica feature)

The storage configuration will be easy: I’ll create a unique Storage Pool with all SATA and NVMe SSD. Then I will create two Cluster Shared Volumes that will be distributed across both nodes. The CSV will be called CSV-01 and CSV-02.

Operating system configuration

I show how to configure a single node. You have to repeat these operations for the second node in the same way. This is why I recommend you to make a script with the commands: the script will help to avoid human errors.

Bios configuration

The bios may change regarding the manufacturer and the motherboard. But I always do the same things in each server:

Check if the server boot in UEFI
Enable virtualization technologies as VT-d, VT-x, SLAT and so on
Configure the server in high performance (in order that CPUs have the maximum frequency available)
Enable HyperThreading
Disable all unwanted hardware (audio card, serial/com port and so on)
Disable PXE boot on unwanted network adapters to speed up the boot of the server
Set the date/time

Next I check if the memory is seen, and all storage devices are plugged. When I have time, I run a memtest on server to validate hardware.

OS first settings

I have deployed my nodes from a USB stick configured with Easy2Boot. Once the system is installed, I have deployed drivers for motherboard and for Mellanox network adapters. Because I can’t connect with a remote MMC to Device Manager, I use the following commands to list if drivers are installed:

gwmi Win32_SystemDriver | select name,@{n="version";e={(gi $_.pathname).VersionInfo.FileVersion}}
gwmi Win32_PnPSignedDriver | select devicename,driverversion

After all drivers are installed, I configure the server name, the updates, the remote connection and so on. For this, I use sconfig.

This tool is easy, but don’t provide automation. You can do the same thing with PowerShell cmdlet, but I have only two nodes to deploy and I find this easier. All you have to do, is to move in menu and set parameters. Here I have changed the computer name, I have enabled the remote desktop and I have downloaded and installed all updates. I heavily recommend you to install all updates before deploying the Storage Spaces Direct.

Then I configure the power options to “performance” by using the bellow cmdlet:

POWERCFG.EXE /S SCHEME_MIN

Once the configuration is finished, you can install the required roles and features. You can run the following cmdlet on both nodes:

Install-WindowsFeature Hyper-V, Data-Center-Bridging, Failover-Clustering, RSAT-Clustering-Powershell, Hyper-V-PowerShell, Storage-Replica

Once you have run this cmdlet the following roles and features are deployed:

Hyper-V + PowerShell module
Datacenter Bridging
Failover Clustering + PowerShell module
Storage Replica

Network settings

Once the OS configuration is finished, you can configure the network. First, I rename network adapters as below:

get-netadapter |? Name -notlike vEthernet* |? InterfaceDescription -like Mellanox*#2 | Rename-NetAdapter -NewName Storage-101

get-netadapter |? Name -notlike vEthernet* |? InterfaceDescription -like Mellanox*Adapter | Rename-NetAdapter -NewName Storage-102

get-netadapter |? Name -notlike vEthernet* |? InterfaceDescription -like Intel*#2 | Rename-NetAdapter -NewName Management01-0

get-netadapter |? Name -notlike vEthernet* |? InterfaceDescription -like Intel*Connection | Rename-NetAdapter -NewName Management02-0

Next I create the Switch Embedded Teaming with both 1GB network adapters called SW-1G:

New-VMSwitch -Name SW-1G -NetAdapterName Management01-0, Management02-0 -EnableEmbeddedTeaming $True -AllowManagementOS $False

Now we can create two virtual network adapters for the management and the heartbeat:

Add-VMNetworkAdapter -SwitchName SW-1G -ManagementOS -Name Management-0
Add-VMNetworkAdapter -SwitchName SW-1G -ManagementOS -Name Cluster-100

Then I configure VLAN on vNIC and on storage NIC:

Set-VMNetworkAdapterVLAN -ManagementOS -VMNetworkAdapterName Cluster-100 -Access -VlanId 100
Set-NetAdapter -Name Storage-101 -VlanID 101 -Confirm:$False
Set-NetAdapter -Name Storage-102 -VlanID 102 -Confirm:$False

Below screenshot shows the VLAN configuration on physical and virtual adapters.

Next I disable VM queue (VMQ) on 1GB network adapters and I set it on 10GB network adapters. When I set the VMQ, I use multiple of 2 because hyperthreading is enabled. I start with a base processor number of 2 because it is recommended to leave the first core (core 0) for other processes.

Disable-NetAdapterVMQ -Name Management*

# Core 1, 2 & 3 will be used for network traffic on Storage-101
Set-NetAdapterRSS Storage-101 -BaseProcessorNumber 2 -MaxProcessors 2 -MaxProcessorNumber 4

#Core 4 & 5 will be used for network traffic on Storage-102
Set-NetAdapterRSS Storage-102 -BaseProcessorNumber 6 -MaxProcessors 2 -MaxProcessorNumber 8

Next I configure Jumbo Frame on each network adapter.

Get-NetAdapterAdvancedProperty -Name * -RegistryKeyword "*jumbopacket" | Set-NetAdapterAdvancedProperty -RegistryValue 9014

Now we can enable RDMA on storage NICs:

Get-NetAdapter *Storage* | Enable-NetAdapterRDMA

The below screenshot is the result of Get-NetAdapterRDMA.

Even if it is useless because I have no switch and other connections on 10GB network adapters, I configure DCB:

# Turn on DCB
Install-WindowsFeature Data-Center-Bridging

# Set a policy for SMB-Direct
New-NetQosPolicy "SMB" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3

# Turn on Flow Control for SMB
Enable-NetQosFlowControl -Priority 3

# Make sure flow control is off for other traffic
Disable-NetQosFlowControl -Priority 0,1,2,4,5,6,7

# Apply policy to the target adapters
Enable-NetAdapterQos -InterfaceAlias "Storage-101"
Enable-NetAdapterQos -InterfaceAlias "Storage-102"

# Give SMB Direct 30% of the bandwidth minimum
New-NetQosTrafficClass "SMB" -Priority 3 -BandwidthPercentage 30 -Algorithm ETS

Ok, now that network adapters are configured, we can configure IP addresses and try the communication on the network.

New-NetIPAddress -InterfaceAlias "vEthernet (Management-0)" -IPAddress 10.10.0.5 -PrefixLength 24 -DefaultGateway 10.10.0.1 -Type Unicast | Out-Null
Set-DnsClientServerAddress -InterfaceAlias "vEthernet (Management-0)" -ServerAddresses 10.10.0.20 | Out-Null

New-NetIPAddress -InterfaceAlias "vEthernet (Cluster-100)" -IPAddress 10.10.100.5 -PrefixLength 24 -Type Unicast | Out-Null

New-NetIPAddress -InterfaceAlias "Storage-101" -IPAddress 10.10.101.5 -PrefixLength 24 -Type Unicast | Out-Null

New-NetIPAddress -InterfaceAlias "Storage-102" -IPAddress 10.10.102.5 -PrefixLength 24 -Type Unicast | Out-Null

#Disable DNS registration of Storage and Cluster network adapter (Thanks to Philip Elder :))

Set-DNSClient -InterfaceAlias Storage* -RegisterThisConnectionsAddress $False
Set-DNSClient -InterfaceAlias *Cluster* -RegisterThisConnectionsAddress $False

Then I try the Jumbo Frame: it is working.

Now my nodes can communicate with other friends through the network. Once you have reproduced these steps on the second node, we can deploy the domain controller.

Connect to Hyper-V remotely

To make future actions, I work from my laptop with remote PowerShell. To display the Hyper-V VM consoles, I have installed RSAT on my Windows 10. Then I have installed the Hyper-V console:

Before being able to connect to Hyper-V remotely, some configurations are required from the server and client perspectives. In both nodes, run the following cmdlets:

Enable-WSManCredSSP -Role server

In your laptop, run the following cmdlets (replace fqdn-of-hyper-v-host by the future Hyper-V hosts FQDN):

Set-Item WSMan:\localhost\Client\TrustedHosts -Value "10.10.0.5"
Set-Item WSMan:\localhost\Client\TrustedHosts -Value "fqdn-of-hyper-v-host"
Set-Item WSMan:\localhost\Client\TrustedHosts -Value "10.10.0.6"
Set-Item WSMan:\localhost\Client\TrustedHosts -Value "fqdn-of-hyper-v-host"

Enable-WSManCredSSP -Role client -DelegateComputer "10.10.0.5"
Enable-WSManCredSSP -Role client -DelegateComputer "fqdn-of-hyper-v-host"
Enable-WSManCredSSP -Role client -DelegateComputer "10.10.0.6"
Enable-WSManCredSSP -Role client -DelegateComputer "fqdn-of-hyper-v-host"

Then, run gpedit.msc and configure the following policy:

Now you can leverage the new Hyper-V manager capability which enable to use an alternative credential to connect to Hyper-V.

Domain controller deployment

Before deploying the VM, I have copied the Windows Server 2016 ISO in c:\temp of both nodes. Then I have run the following script from my laptop:

# Create the first DC VM
Enter-PSSession -ComputerName 10.10.0.5 -Credential pyhyv01\administrator

$VMName = "VMADS01"
# Create Gen 2 VM with dynamic memory, autostart action to 0s and auto stop action set. 2vCPU
New-VM -Generation 2 -Name $VMName -SwitchName SW-1G -NoVHD -MemoryStartupBytes 2048MB -Path C:\VirtualMachines
Set-VM -Name $VMName -ProcessorCount 2 -DynamicMemory -MemoryMinimumBytes 1024MB -MemoryMaximumBytes 4096MB -MemoryStartupBytes 2048MB -AutomaticStartAction Start -AutomaticStopAction ShutDown -AutomaticStartDelay 0 -AutomaticCriticalErrorAction None -CheckpointType Production

# Create and add a 60GB dynamic VHDX to the VM
New-VHD -Path C:\VirtualMachines\$VMName\W2016-STD-1.0.vhdx -SizeBytes 60GB -Dynamic
Add-VMHardDiskDrive -VMName $VMName -Path C:\VirtualMachines\$VMName\W2016-STD-1.0.vhdx

# Rename the network adapter
Get-VMNetworkAdapter -VMName $VMName | Rename-VMNetworkAdapter -NewName Management-0

# Add a DVD drive with W2016 ISO
Add-VMDvdDrive -VMName $VMName
Set-VMDvdDrive -VMName $VMName -Path C:\temp\14393.0.160715-1616.RS1_RELEASE_SERVER_EVAL_X64FRE_EN-US.ISO

# Set the DVD drive as first boot
$VD = Get-VMHardDiskDrive -VMName $VMName -ControllerNumber 0 -ControllerLocation 1
Set-VMFirmware -VMName $VMName -FirstBootDevice $VD

# Add a data disk to the VM (10GB dynamic)
New-VHD -Path C:\VirtualMachines\$VMName\data.vhdx -SizeBytes 10GB -Dynamic
Add-VMHardDiskDrive -VMName $VMName -Path C:\VirtualMachines\$VMName\Data.vhdx

# Start the VM
Start-VM
Exit

# Create the second DC VM with the same capabilities as below
Enter-PSSession -ComputerName 10.10.0.6 -Credential pyhyv02\administrator
$VMName = "VMADS02"

New-VM -Generation 2 -Name $VMName -SwitchName SW-1G -NoVHD -MemoryStartupBytes 2048MB -Path C:\VirtualMachines

Set-VM -Name $VMName -ProcessorCount 2 -DynamicMemory -MemoryMinimumBytes 1024MB -MemoryMaximumBytes 4096MB -MemoryStartupBytes 2048MB -AutomaticStartAction Start -AutomaticStopAction ShutDown -AutomaticStartDelay 0 -AutomaticCriticalErrorAction None -CheckpointType Production

New-VHD -Path C:\VirtualMachines\$VMName\W2016-STD-1.0.vhdx -SizeBytes 60GB -Dynamic
Add-VMHardDiskDrive -VMName $VMName -Path C:\VirtualMachines\$VMName\W2016-STD-1.0.vhdx
Get-VMNetworkAdapter -VMName $VMName | Rename-VMNetworkAdapter -NewName Management-0
Add-VMDvdDrive -VMName $VMName
Set-VMDvdDrive -VMName $VMName -Path C:\temp\14393.0.160715-1616.RS1_RELEASE_SERVER_EVAL_X64FRE_EN-US.ISO
$VD = Get-VMHardDiskDrive -VMName $VMName -ControllerNumber 0 -ControllerLocation 1
Set-VMFirmware -VMName $VMName -FirstBootDevice $VD
New-VHD -Path C:\VirtualMachines\$VMName\data.vhdx -SizeBytes 10GB -Dynamic
Add-VMHardDiskDrive -VMName $VMName -Path C:\VirtualMachines\$VMName\Data.vhdx
Start-VM
Exit

Deploy the first domain controller

Once the VMs are created, you can connect to their consoles from Hyper-V manager to install the OS. A better way is to use a sysprep’d image. But because it is a “from scratch” infrastructure, I don’t have a gold master. By using sconfig, you can install updates and enable Remote Desktop. Once the operating systems are deployed, you can connect to the VM across PowerShell Direct.

Below you can find the configuration of the first domain controller:

# Remote connection to first node
Enter-PSSession -ComputerName 10.10.0.5 -Credential pyhyv01\administrator

# Establish a PowerShell direct session to VMADS01
Enter-PSSession -VMName VMADS01 -Credential VMADS01\administrator

# Rename network adapter
Rename-NetAdapter -Name Ethernet -NewName Management-0

# Set IP Addresses
New-NetIPAddress -InterfaceAlias "Management-0" -IPAddress 10.10.0.20 -PrefixLength 24 -Type Unicast | Out-Null

# Set the DNS (this IP is my DNS server for internet in my lab)
Set-DnsClientServerAddress -InterfaceAlias "Management-0" -ServerAddresses 10.10.0.229 | Out-Null

# Initialize and mount the data disk
initialize-disk -Number 1
New-Volume -DiskNumber 1 -FileSystem NTFS -FriendlyName Data -DriveLetter E

# Install required feature
install-WindowsFeature AD-Domain-Services, DNS -IncludeManagementTools

# Deploy the forest
Import-Module ADDSDeployment

Install-ADDSForest `
    -CreateDnsDelegation:$false `
    -DatabasePath "E:\NTDS" `
    -DomainMode "WinThreshold" ` #should be soon Win2016
    -DomainName "int.HomeCloud.net" `
    -DomainNetbiosName "INTHOMECLOUD" `
    -ForestMode "WinThreshold" ` #should be soon Win2016
    -InstallDns:$true `
    -LogPath "E:\NTDS" `
    -NoRebootOnCompletion:$false `
    -SysvolPath "E:\SYSVOL" `
    -Force:$true

Promote the second domain controller

Once the first domain controller is deployed and the forest is ready, you can promote the second domain controller:

Enter-PSSession -ComputerName 10.10.0.6 -Credential pyhyv02\administrator

# Establish a PowerShell direct session to VMADS02
Enter-PSSession -VMName VMADS02 -Credential VMADS02\administrator

# Rename network adapter
Rename-NetAdapter -Name Ethernet -NewName Management-0

# Set IP Addresses
New-NetIPAddress -InterfaceAlias "Management-0" -IPAddress 10.10.0.21 -PrefixLength 24 -Type Unicast | Out-Null

# Set the DNS to the first DC
Set-DnsClientServerAddress -InterfaceAlias "Management-0" -ServerAddresses 10.10.0.20 | Out-Null

# Initialize and mount the data disk
initialize-disk -Number 1
New-Volume -DiskNumber 1 -FileSystem NTFS -FriendlyName Data -DriveLetter E

# Install required feature
install-WindowsFeature AD-Domain-Services, DNS -IncludeManagementTools

# Deploy the forest
Import-Module ADDSDeployment
Install-ADDSDomainController `
    -NoGlobalCatalog:$false `
    -CreateDnsDelegation:$false `
    -Credential (Get-Credential) `
    -CriticalReplicationOnly:$false `
    -DatabasePath "E:\NTDS" `
    -DomainName "int.HomeCloud.net" `
    -InstallDns:$true `
    -LogPath "E:\NTDS" `
    -NoRebootOnCompletion:$false `
    -SiteName "Default-First-Site-Name" `
    -SysvolPath "E:\SYSVOL" `
    -Force:$true

Configure the directory

Once the second server has rebooted, we can configure the directory has below:

Enter-PSSession -computername VMADS01.int.homecloud.net
#Requires -version 4.0
$DN = "DC=int,DC=HomeCloud,DC=net"

# New Default OU
New-ADOrganizationalUnit -Name "Default" -Path $DN
$DefaultDN = "OU=Default,$DN"
New-ADOrganizationalUnit -Name "Computers" -Path $DefaultDN
New-ADOrganizationalUnit -Name "Users" -Path $DefaultDN

# Redir container to OU
cmd /c redircmp "OU=Computers,OU=Default,$DN"
cmd /c redirusr "OU=Users,OU=Default,$DN"

# Create Accounts tree
New-ADOrganizationalUnit -Name "Accounts" -Path $DN
$AccountOU = "OU=Accounts,$DN"
New-ADOrganizationalUnit -Name "Users" -Path $AccountOU
New-ADOrganizationalUnit -Name "Groups" -Path $AccountOU
New-ADOrganizationalUnit -Name "Services" -Path $AccountOU

# Create Servers tree
New-ADOrganizationalUnit -Name "Servers" -Path $DN
$ServersOU = "OU=Servers,$DN"
New-ADOrganizationalUnit -Name "Computers" -Path $ServersOU
New-ADOrganizationalUnit -Name "Groups" -Path $ServersOU
New-ADOrganizationalUnit -Name "CNO" -Path $ServersOU

# New User's groups
$GroupAcctOU = "OU=Groups,$AccountOU"
New-ADGroup -Name "GG-FabricAdmins" -Path $GroupAcctOU -GroupScope DomainLocal -Description "Fabric Server's administrators"
New-ADGroup -Name "GG-SQLAdmins" -Path $GroupAcctOU -GroupScope DomainLocal -Description "SQL Database's administrators"

# New Computer's groups
$GroupCMPOU = "OU=Groups,$ServersOU"
New-ADGroup -Name "GG-Hyperv" -Path $GroupCMPOU -GroupScope DomainLocal -Description "Hyper-V Servers"
New-ADGroup -Name "GG-FabricServers" -Path $GroupCMPOU -GroupScope DomainLocal -Description "Fabric servers"
New-ADGroup -Name "GG-SQLServers" -Path $GroupCMPOU -GroupScope DomainLocal -Description "SQL Servers"
Exit

Ok, our Active Directory is ready, we can now add Hyper-V nodes to the domain

Add nodes to domain

To add both nodes to the domain, I run the following cmdlets from my laptop:

Enter-PSSession -ComputerName 10.10.0.5 -Credential pyhyv01\administrator
$domain = "int.homecloud.net"
$password = "P@$$w0rd" | ConvertTo-SecureString -asPlainText -Force
$username = "$domain\administrator"
$credential = New-Object System.Management.Automation.PSCredential($username,$password)
Add-Computer -DomainName $domain -Credential $credential -OUPath "OU=Computers,OU=Servers,DC=int,DC=HomeCloud,DC=net" -Restart

Wait that pyhyv01 has rebooted and run the following cmdlet on pyhyv02. Now you can log on on pyhyv01 and pyhyv02 with domain credential. You can install Domain Services RSAT on the laptop to parse the Active Directory.

2-node hyperconverged cluster deployment

Now that the Active Directory is available, we can deploy the cluster. First, I test the cluster to verify that all is ok:

Enter-PSSession -ComputerName pyhyv01.int.homecloud.net -credential inthomecloud\administrator
Test-Cluster pyhyv01, pyhyv02 -Include "Storage Spaces Direct",Inventory,Network,"System Configuration"

Check the report if they are issues with the configuration. If the report is good, run the following cmdlets:

# Create the cluster
New-Cluster -Name Cluster-Hyv01 -Node pyhyv01,pyhyv02 -NoStorage -StaticAddress 10.10.0.10

Once the cluster is created, I set a Cloud Witness in order that Azure has a vote for the quorum.

# Add a cloud Witness (require Microsoft Azure account)
Set-ClusterQuorum -CloudWitness -Cluster Cluster-Hyv01 -AccountName "&lt;StorageAccount&gt;" -AccessKey "&lt;AccessKey&gt;"

Then I configure the network name in the cluster:

#Configure network name
(Get-ClusterNetwork -Name "Cluster Network 1").Name="Storage-102"
(Get-ClusterNetwork -Name "Cluster Network 2").Name="Storage-101"
(Get-ClusterNetwork -Name "Cluster Network 3").Name="Cluster-100"
(Get-ClusterNetwork -Name "Cluster Network 4").Name="Management-0"

Next I configure the Node Fairness to run each time a node is added to the cluster and every 30mn. When the CPU of a node will be utilized at 70%, the node fairness will balance the VM across other nodes.

# Configure Node Fairness
(Get-Cluster).AutoBalancerMode = 2
(Get-Cluster).AutoBalancerLevel = 2

Then I configure the Fault Domain Awareness to have a fault tolerance based on rack. It is useless in this configuration, but if you add nodes to the cluster, it can be useful. I enable this because it is recommended to make this configuration before enabling Storage Spaces Direct.

# Configure the Fault Domain Awareness
New-ClusterFaultDomain -Type Site -Name "Lyon"
New-ClusterFaultDomain -Type Rack -Name "Rack-22U-01"
New-ClusterFaultDomain -Type Rack -Name "Rack-22U-02"
New-ClusterFaultDomain -Type Chassis -Name "Chassis-Fabric-01"
New-ClusterFaultDomain -Type Chassis -Name "Chassis-Fabric-02"

Set-ClusterFaultDomain -Name Lyon -Location "France, Lyon 8e"
Set-ClusterFaultDomain -Name Rack-22U-01 -Parent Lyon
Set-ClusterFaultDomain -Name Rack-22U-02 -Parent Lyon
Set-ClusterFaultDomain -Name Chassis-Fabric-01 -Parent Rack-22U-01
Set-ClusterFaultDomain -Name Chassis-Fabric-02 -Parent Rack-22U-02
Set-ClusterFaultDomain -Name pyhyv01 -Parent Chassis-Fabric-01
Set-ClusterFaultDomain -Name pyhyv02 -Parent Chassis-Fabric-02

To finish with the cluster, we have to enable Storage Spaces Direct, and create volume. But before, I run the following script to clean up disks:

icm (Get-Cluster -Name Cluster-Hyv01 | Get-ClusterNode) {
    Update-StorageProviderCache

    Get-StoragePool |? IsPrimordial -eq $false | Set-StoragePool -IsReadOnly:$false -ErrorAction SilentlyContinue

    Get-StoragePool |? IsPrimordial -eq $false | Get-VirtualDisk | Remove-VirtualDisk -Confirm:$false -ErrorAction SilentlyContinue

    Get-PhysicalDisk | Reset-PhysicalDisk -ErrorAction SilentlyContinue

    Get-Disk |? Number -ne $null |? IsBoot -ne $true |? IsSystem -ne $true |? PartitionStyle -ne RAW |% {

        $_ | Set-Disk -isoffline:$false

        $_ | Set-Disk -isreadonly:$false

        $_ | Clear-Disk -RemoveData -RemoveOEM -Confirm:$false

        $_ | Set-Disk -isreadonly:$true

        $_ | Set-Disk -isoffline:$true

    }

    Get-Disk |? Number -ne $null |? IsBoot -ne $true |? IsSystem -ne $true |? PartitionStyle -eq RAW | Group -NoElement -Property FriendlyName

} | Sort -Property PsComputerName,Count

Now we can enable Storage Spaces Direct and create volumes:

Enable-ClusterStorageSpacesDirect

New-Volume -StoragePoolFriendlyName "S2D*" -FriendlyName CSV-01 -FileSystem CSVFS_ReFS -Size 922GB

New-Volume -StoragePoolFriendlyName "S2D*" -FriendlyName CSV-02 -FileSystem CSVFS_ReFS -Size 922GB

To finish I rename volume in c:\ClusterStorage by their names in the cluster:

Rename-Item -Path C:\ClusterStorage\volume1\ -NewName CSV-01
Rename-Item -Path C:\ClusterStorage\volume2\ -NewName CSV-02

Final Hyper-V configuration

First, I set default VM and virtual disk folders:

Set-VMHOST –computername pyhyv01 –virtualharddiskpath 'C:\ClusterStorage\CSV-01'
Set-VMHOST –computername pyhyv01 –virtualmachinepath 'C:\ClusterStorage\CSV-01'
Set-VMHOST –computername pyhyv02 –virtualharddiskpath 'C:\ClusterStorage\CSV-02'
Set-VMHOST –computername pyhyv02 –virtualmachinepath 'C:\ClusterStorage\CSV-02'

Then I configure the Live-Migration protocol and the number of simultaneous migration allowed:

Enable-VMMigration –Computername pyhyv01, pyhyv02
Set-VMHost -MaximumVirtualMachineMigrations 4 `
           –MaximumStorageMigrations 4 `
           –VirtualMachineMigrationPerformanceOption SMB `
           -ComputerName pyhyv01,pyhyv02

Next I add Kerberos delegation to configure Live-Migration in Kerberos mode:

Enter-PSSession -ComputerName VMADS01.int.homecloud.net
$HyvHost = "pyhyv01"
$Domain = "int.homecloud.net"

Get-ADComputer pyhyv02 | Set-ADObject -Add @{"msDS-AllowedToDelegateTo"="Microsoft Virtual System Migration Service/$HyvHost.$Domain", "cifs/$HyvHost.$Domain","Microsoft Virtual System Migration Service/$HyvHost", "cifs/$HyvHost"}

$HyvHost = "pyhyv02"

Get-ADComputer pyhyv01 | Set-ADObject -Add @{"msDS-AllowedToDelegateTo"="Microsoft Virtual System Migration Service/$HyvHost.$Domain", "cifs/$HyvHost.$Domain","Microsoft Virtual System Migration Service/$HyvHost", "cifs/$HyvHost"}
Exit

Then I set authentication of Live-Migration to Kerberos.

Set-VMHost –Computername pyhyv01, pyhyv02 `
           –VirtualMachineMigrationAuthenticationType Kerberos

Next, I configure the Live-Migration network priority:

To finish I configure the cache size of the CSV to 512MB:

(Get-Cluster).BlockCacheSize = 512

Try a node failure

Now I’d like to shut down a node to verify if the cluster is always up. Let’s see what happening if I shutdown a node:

//www.tech-coffee.net/wp-content/uploads/2016/10/Video2NHC.mp4

As you have seen in the above video, even if I stop a node, the workloads still working. When the second node will be startup again, the virtual disks will enter in Regenerating state but you will be able to access to the data.

You can visualize the storage job with the below cmdlet:

Conclusion

2-node configuration is really a great scenario for small office or branch office. Without the cost of an expansive 10GB switch and a SAN, you can have high availability with Storage Spaces Direct. This kind of cluster is not really hard to deploy but I heavily recommend you to leverage PowerShell to make the implementation. Currently I’m working also on VMware vSAN and I can confirm you that Microsoft has a better solution in 2-nodes configuration. In vSAN scenario, you need a third ESX in a third room. In Microsoft environment, you need only a witness in another room as Microsoft Azure with Cloud Witness.

The post 2-node hyperconverged cluster with Windows Server 2016 appeared first on Tech-Coffee.

Understand Failover Cluster Quorum

Romain Serre — Tue, 17 Nov 2015 10:22:02 +0000

This topic aims to explain the Quorum configuration in a Failover Clustering. As part of my job, I work with Hyper-V Clusters where the Quorum is not well configured and so my customers have not the expected behavior when an outage occurs. I work especially on Hyper-V clusters but the following topic applies to most of Failover Cluster configuration.

What’s a Failover Cluster Quorum

A Failover Cluster Quorum configuration specifies the number of failures that a cluster can support in order to keep working. Once the threshold limit is reached, the cluster stops working. The most common failures in a cluster are nodes that stop working or nodes that can’t communicate anymore.

Imagine that quorum doesn’t exist and you have two-nodes cluster. Now there is a network problem and the two nodes can’t communicate. If there is no Quorum, what prevents both nodes to operate independently and take disks ownership on each side? This situation is called Split-Brain. Quorum exists to avoid Split-Brain and prevents corruption on disks.

The Quorum is based on a voting algorithm. Each node in the cluster has a vote. The cluster keeps working while more than half of the voters are online. This is the quorum (or the majority of votes). When there are too many of failures and not enough online voters to constitute a quorum, the cluster stop working.

Below this is a two nodes cluster configuration:

The majority of vote is 2 votes. So a two nodes cluster as above is not really resilient because if you lose a node, the cluster is down.

Below a three-node cluster configuration:

Now you add a node in your cluster. So you are in a three-node cluster. The majority of vote is still 2 votes. But because there is three nodes, you can lose a node and the cluster keep working.

Below a four-node cluster configuration:

Despite its four nodes, this cluster can support one node failure before losing the quorum. The majority of vote is 3 votes so you can lose only one node.

On a five-node cluster the majority of votes is still 3 votes so you can lose two nodes before than the cluster stop working and so on. As you can see, the majority of nodes must remain online in order to the cluster keeps working and this is why it is recommended to have an odd majority of votes. But sometimes we want only a two-node cluster for some application that don’t require more nodes (as Virtual Machine Manager, SQL AlwaysOn and so on). In this case we add a disk witness, a file witness or in Windows Server 2016, a cloud Witness.

Failover Cluster Quorum Witness

As said before, it is recommended to have an odd majority of votes. But sometimes we don’t want an odd number of nodes. In this case, a disk witness, a file witness or a cloud witness can be added to the cluster. This witness too has a vote. So when there are an even number of nodes, the witness enables to have an odd majority of vote. Below, the requirements and recommendations of each Witness type (except Cloud Witness):

Witness type

Description

Requirements and recommendations

Disk witness

Dedicated LUN that stores a copy of the cluster database
Most useful for clusters with shared (not replicated) storage

Size of LUN must be at least 512 MB
Must be dedicated to cluster use and not assigned to a clustered role
Must be included in clustered storage and pass storage validation tests
Cannot be a disk that is a Cluster Shared Volume (CSV)
Basic disk with a single volume
Does not need to have a drive letter
Can be formatted with NTFS or ReFS
Can be optionally configured with hardware RAID for fault tolerance
Should be excluded from backups and antivirus scanning

File share witness

SMB file share that is configured on a file server running Windows Server
Does not store a copy of the cluster database
Maintains cluster information only in a witness.log file
Most useful for multisite clusters with replicated storage

Must have a minimum of 5 MB of free space
Must be dedicated to the single cluster and not used to store user or application data
Must have write permissions enabled for the computer object for the cluster name

The following are additional considerations for a file server that hosts the file share witness:

A single file server can be configured with file share witnesses for multiple clusters.
The file server must be on a site that is separate from the cluster workload. This allows equal opportunity for any cluster site to survive if site-to-site network communication is lost. If the file server is on the same site, that site becomes the primary site, and it is the only site that can reach the file share.
The file server can run on a virtual machine if the virtual machine is not hosted on the same cluster that uses the file share witness.
For high availability, the file server can be configured on a separate failover cluster.

So below you can find again a two-nodes Cluster with a witness:

Now there is a witness, you can lose a node and keep the quorum. Even if a node is down, the cluster still working. So when you have an even number of nodes, the quorum witness is required. But to keep an odd majority of votes, when you have an odd number of nodes, you should not implement a quorum witness.

Quorum configuration

Below you can find the four possible cluster configuration (taken from TechNet):

Node Majority (recommended for clusters with an odd number of nodes)
- Can sustain failures of half the nodes (rounding up) minus one. For example, a seven node cluster can sustain three node failures.
Node and Disk Majority (recommended for clusters with an even number of nodes).
- Can sustain failures of half the nodes (rounding up) if the disk witness remains online. For example, a six node cluster in which the disk witness is online could sustain three node failures.
- Can sustain failures of half the nodes (rounding up) minus one if the disk witness goes offline or fails. For example, a six node cluster with a failed disk witness could sustain two (3-1=2) node failures.
Node and File Share Majority (for clusters with special configurations)
- Works in a similar way to Node and Disk Majority, but instead of a disk witness, this cluster uses a file share witness.
- Note that if you use Node and File Share Majority, at least one of the available cluster nodes must contain a current copy of the cluster configuration before you can start the cluster. Otherwise, you must force the starting of the cluster through a particular node. For more information, see “Additional considerations” in Start or Stop the Cluster Service on a Cluster Node.
No Majority: Disk Only (not recommended)
- Can sustain failures of all nodes except one (if the disk is online). However, this configuration is not recommended because the disk might be a single point of failure.

Stretched Cluster Scenario

Unfortunately (I don’t like stretched cluster in Hyper-V scenario), some customers have stretched cluster between two datacenters. And the most common mistake I see to save money is the below scenario:

So the customer tells me: Ok I’ve followed the recommendation because I have four nodes in my cluster but I have added a witness to obtain an odd majority of votes. So let’s start the production. The cluster is running for a while and then one day the room 1 is underwater. So you lose Room 1:

In this scenario you should have also a stretched storage and so if you have implemented a disk witness it should move to room 2. But in the above case you have lost the majority of votes and so the cluster stop working (sometimes with some luck, the cluster is still working because the disk witness has time to failover but it is lucky). So when you implement a stretched cluster, I recommend the below scenario:

In this scenario, even if you lose a room, the cluster still working. Yes I know, three rooms are expensive but I have not recommended you to make a stretched cluster J (Hyper-V case). Fortunately, in Windows Server 2016, the quorum witness can be hosted in Microsoft Azure (Cloud Witness).

Dynamic Quorum (Windows Server 2012 feature)

Dynamic Quorum enables to assign vote to node dynamically to avoid to lose the majority of votes and so the cluster can run with one node (known as last-man standing). Let’s take the above example with four-node cluster without quorum witness. I said that the Quorum is 3 votes so without dynamic quorum, if you lose two nodes, the cluster is down.

Now I enable the Dynamic Quorum. The majority of votes is computed automatically related to running nodes. Let’s take again the Four-Node example:

So, why implementing a witness, especially for stretched cluster? Because Dynamic Quorum works great when the failure are sequential and not simultaneous. So for the stretched cluster scenario, if you lose a room, the failure is simultaneous and the dynamic quorum has not the time to recalculate the majority of votes. Moreover I have seen strange behavior with dynamic quorum especially with two-node cluster. This is why in Windows Server 2012, I always disabled the dynamic quorum when I didn’t use a quorum witness.

The dynamic quorum has been enhanced in Windows Server 2012 R2. Now there is the Dynamic Witness implemented. This feature calculate if the Quorum Witness has a vote. There is two cases:

If there is an even number of node in the cluster with the dynamic quorum enabled, the Dynamic Witness is enabled on the Quorum Witness and so the witness has vote.
If there is an odd number of node in the cluster with the dynamic quorum enabled, the Dynamic Witness is enabled on the Quorum Witness and so the witness has not vote.

So since Windows Server 2012 R2, Microsoft recommends to always implement a witness in a cluster and let the dynamic quorum to decide for you.

The Dynamic Quorum is enabled by default since Windows Server 2012. In the below example, there is a four-node cluster on Windows Server 2016. But it is the same behavior.

I verify if the dynamic quorum is enabled and also the dynamic witness:

The Dynamic Quorum and the Dynamic Witness are well enabled. Because I have four nodes, the Witness has a vote and this is why the Dynamic Witness is enabled. If you want to disable the Dynamic Quorum you can run this command:

(Get-Cluster).DynamicQuorum = 0

To finish, Microsoft has enhanced the dynamic quorum by adjusting the number of online node’s vote to keep an odd number of votes. First the cluster plays with the dynamic witness to keep an odd majority of votes. Then if it can’t adjust the number of vote with the dynamic witness, it remove a vote on a running node.

For example you have a four-node cluster in a streched cluster. You have lost your quorum witness. Now you have two nodes in the room one and two nodes in the room two. The cluster will remove a vote on a node to keep a majority in a room. In this way, even if you lose a node, the cluster still working.

Cloud Quorum Witness (Windows Server 2016 feature)

By implementing a Cloud Quorum Witness, you avoid to spend money on a third room in case of stretched cluster. Below this is the scenario:

The Cloud Witness, hosted in Microsoft Azure, has also one vote. In this way you have also an odd majority of votes. For that you need an existing storage account in Microsoft Azure. You need also an access key.

Now you have just to configure the quorum as a standard witness. Select Configure a Cloud Witness when it is asked.

Then specify the Azure Storage Account and a storage key.

At the end of the configuration, the Cloud Witness should be online.

Conclusion

In conclusion I recommend this when you configure a Quorum in a failover cluster:

Prior to Windows Server 2012 R2, always keep an odd majority of vote
- In case of an even number of nodes, implement a witness
- In case of an odd number of nodes, do not implement a witness
Since Windows Server 2012 R2, Always implement a quorum witness
- Dynamic Quorum manage the assigned vote to the nodes
- Dynamic Witness manage the assigned vote to the Quorum Witness
In case of stretched cluster, implement the witness in a third room or use Microsoft Azure.

The post Understand Failover Cluster Quorum appeared first on Tech-Coffee.