Elden Christensen amp Ned Pyle Program Managers Windows Server High Availability amp Storage BRK3487 Session Objective Understand how to stretch a Window Server Failover Cluster to achieve Disaster Recovery as well as High Availability ID: 387214
Download Presentation The PPT/PDF document "Stretching failover clusters and using s..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
Stretching failover clusters and using storage replica in windows server 2016
Elden Christensen & Ned PyleProgram ManagersWindows Server – High Availability & Storage
BRK3487Slide3
Session Objective:
Understand how to stretch a Window Server Failover Cluster
to achieve Disaster Recovery as well as High Availability
Key Takeaways
How to plan
, design, and deploy a stretch cluster with Windows Server 2016
Assumption: A fundamental understanding of using Failover Clustering to achieve high availability
Session Objectives And TakeawaysSlide4
Site 1
Is my Cluster Resilient to Site Outages?
But what if there is a catastrophic event?
Fire, flood, earthquake …
High-Availability (HA) with Failover Clusteringallows applications or VMs to maintain service availability by moving them between nodes in a clusterSlide5
Stretching Clusters for Disaster Recovery
Extends a cluster from being a High Availability solution, to also being a Disaster Recovery solution
app’s fail
over to a separate physical location
Site 1Site 2Servers in separate locations in the same clusterSlide6
Stretch Clusters can achieve low RPO and RTO
Predicable behavior, as humans introduce the greatest point of failureDisaster Avoidance
*new trend*
Switching over to the recovery site with an impending disaster (such as a hurricane)
Finding your RTO and RPOAccepted amount of data loss when an outage occursSynchronous replication has zero data lossAsynchronous replication results in some data lossRecovery Time Objective (RTO)Recovery Point Objective (RPO)Accepted amount of downtime when an outage occursAutomated recovery involves a human to take action (increased downtime
)Automatic recovery involves the system detecting and taking action (less downtime)Slide7
Terminology
Stretch Cluster
Not Multi-site Cluster
Not Metro Cluster
Not Geo Cluster
Not
Geo Metro ClusterSlide8
Considerations when Stretching Clusters
Networking
Quorum
StorageSlide9
Network Considerations
Stretching Failover Clusters Slide10
Stretch Cluster Network Considerations
Site 2
Site 1
10.10.10.X
20.20.20.XWAN
LatencySubnetsDifferent datacenters (usually) equates to different subnetsLonger distance means greater network latencySlide11
Tuning Cluster Heartbeats for Latency
Property
Default
Recommended
DescriptionSameSubnetDelay 11Frequency heartbeats (HB) sentSameSubnetThreshold 510Missed HB before interface considered downCrossSubnetDelay 11Frequency HB sent to nodes on dissimilar subnetsCrossSubnetThreshold
520Missed HB before interface considered down to nodes on dissimilar subnetsPowerShell:(Get-Cluster).SameSubnetThreshold = 10(Get-Cluster).CrossSubnetThreshold = 20Slide12
Dependencies in Cluster Validation Report
Single Network Name resource
Dependent on multiple IP Address resources
Configured in an ‘or’ dependency
Only IP address resources with a corresponding network come onlineOnline IP resource registered in DNSDNS registration behavior configurable via NetName private property RegisterAllProvidersIP Multi-subnet Resource ConfigurationNetwork Name ResourceIP Address Resource AIP Address Resource BORORSlide13
Client Reconnect Considerations
Site 2
Site1
10.10.10.10
DNS ReplicationRecord CreatedRecord UpdatedRecord Updated
Record Obtained20.20.20.20
DNS
DNS
Client access point fails across
subnets
Client needs new address
Nodes in dissimilar
subnetsSlide14
Hyper-V Network Virtualization (HNV) can abstract VMs logical subnet boundaries
Each virtual network has an illusion it is running as a physical networkOverlays physical networkEncapsulating using NVGRE protocolVirtual networks enable VMs to move across different physical networks without re-configuring IP address in guest OS
Recommended for Hyper-V deployments
Strategy #1: Software Defined Networking
Site 2Site 1
20.20.20.20
10.10.10.10
Network Virtualization
30.30.30.30Slide15
RegisterAllProvidersIP
(default = 0 for FALSE) Determines if all IP Addresses for Network Name will be registered by DNS
TRUE (1): IP Addresses
registered whether online or offline
Ensure application set to try all IP Addresses, so clients can connect quickerSupported / recommended for SQL Server deploymentsHostRecordTTL (default = 1200 seconds)Controls time the DNS record lives on client for a cluster network nameShorter TTL: DNS records for clients updated soonerStrategy #2: Configure Network Name PropertiesPowerShell syntax: Get-ClusterResource ClusNN | Set-ClusterParameter RegisterAllProvidersIP 1Get-ClusterResource ClusNN | Set-ClusterParameter HostRecordTTL 300Slide16
Site 2
Site 1
Scale up for local failover for higher availability
No change in IP addresses for HA
Means not going over the WAN and is still usually preferredCross-site failover for disaster recoveryStrategy #3: Prefer Local Failover10.10.10.10VM = 10.10.10.11120.20.20.20
DNS
DNSSlide17
Site 2
Site 1
Deploying a VLAN minimizes client reconnection times
Layer 2 spans sites so the IP of the never changes
Strategy #4: Stretch VLAN’s
DNS Server 1
DNS Server 2
FS = 10.10.10.111
10.10.10.10
10.10.10.10
VLAN
DNS
DNSSlide18
Site 2
Site 1
Network device uses 3
rd
IP3rd IP is the one registered in DNS & used by clientExample: http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/App_Networking/extmsftw2k8vistacisco.pdf Strategy #5: Abstraction in Network Device10.10.10.1020.20.20.20DNS Server 1DNS Server 2
VM = 30.30.30.3030.30.30.30
DNS
DNSSlide19
Intra-cluster communication is signed and secure by default
Recommended to set to encrypted if spanning an unsecure WANCluster Network Security over the WAN
Value
Description
0Clear Text1Signed (default)2EncryptedPowerShell syntax:(Get-Cluster). SecurityLevel = 2Site 2Site 1
10.10.10.X20.20.20.XSlide20
Recap: Configuring your Stretch Cluster network
Adjust
intra-node heartbeat thresholds
Understand
NetName
Resource Configuration
Optimize Client Reconnection on CAP Failover
Encrypt
intra-
node
communication over
unsecure
WANsSlide21
Quorum Considerations
Stretching Failover Clusters Slide22
Quorum Overview
When nodes cannot talk to each other, there must be a way to reconcile who stays up and who shuts downA set of nodes which have a majority of votes have quorum and stay upVotes can be assigned to:
Considerations: When a site is lost, it can results in a stretch cluster losing many votes at the same time
Plan for site losses to ensure the cluster can maintain quorum
WitnessNodesEach node can have 1 voteWitness can only have 1 voteSlide23
Site 2
Site 1
Quorum Resiliency to a Site Loss
Vote
Witness5VoteVote
VoteVote3 out of 5 votes needed to maintain quorum Dynamic Quorum will re-calculate quorum once stabilized to 3 votes
1
2
3
4
Site 2 Down!!!
Site 1 can reach Cloud Witness!
Cluster Survives!Slide24
Leverages Azure as arbitration point
Quorum configuration achieved without extra siteWrites single blob file per cluster (unique blob file name)Arbitration across sites using the blob fileMinimal data written blob file (node state change)Same Azure Account/Container for multiple clusters
Cloud Witness in Windows Server 2016
Azure
WitnesshttpsSite 2Site 1Slide25
Demo
Configuring Cloud WitnessReference: http://
blogs.msdn.com/b/clustering/archive/2014/11/14/10572766.aspx Slide26
Picking a Witness
Cloud
Witness
File Share Witness
Share the same arbitration logicDo not keep copy of cluster databaseWrites to Blob fileWrites to file on ShareAccess file using Azure Storage Service REST APIAccess file using SMB ProtocolRequires Microsoft Azure Storage AccountRequires File Server hosted separatelyCluster manages permissions on the blob fileRequires CNO permissions to File ShareSlide27
Cluster
Site 1
Cluster will survive simultaneous 50% loss of votes
One site automatically elected to win
Site without LowerQuorumPriorityNodeID cluster common property winsNodes in the other site drop out of the clusterDefining Primary Datacenter for Quorum SplitSite 2Slide28
Site 2
Site 1
Quorum can be configured so that failover to the DR site is always manual (aka. Automated failover)
Cluster cannot
survive failure of primary siteRemove node vote weights of backup siteUse Preferred owners to keep workloads on primary siteConfiguring Cluster for Manual Failover
1
2
3
4
Vote
Vote
Loss of Primary Site:
Start-
ClusterNode
-
ForceQuorum
Recovery of Primary Site:
Start-
ClusterNode
-
PreventQuorumSlide29
Preferred
Owners can prioritize placement of workload on a set of nodesSort ordered list
Set nodes in primary site at the top of the list
Cluster Group property
Configuring Preferred NodesPowerShell syntax:Get-ClusterGroup MyVM | Set-ClusterOwnerNode Node1, Node2Slide30
Recap: Configuring your Stretch Cluster Quorum
Recommended to use Cloud Witness
When no access to Azure use
File Share Witness
in a
3rd siteAutomatic failover – Keep number of nodes on primary and secondary sites equal
Manual failover – Remove votes of nodes on secondary siteSlide31
Storage Considerations
Stretching Failover Clusters Slide32
Storage Replica
is here to help you
tolerate disastersSlide33
Superstorm Sandy
Chicago (you are here)
NYC
“Can you hear me now?”Slide34
Storage Replica
Replication
Block-level, v
olume-based
Synchronous & asynchronous SMB 3.1.1 transportFlexibilityAny Windows volumeAny fixed disk storageAny storage fabricManagementFailover Cluster ManagerWindows PowerShellWMIEnd to end MS Storage StackSlide35
Demo
“Look out, that ape has a bus!”Slide36
Stretch Clusters
Synchronous onlyAsymmetric storageTwo sites, two sets of shared storageCluster storage: CSV or role-assigned PDRManage with FCMOr Windows PowerShell
Increase cluster DR capabilities
Hyper-V and General Use File Server are the main use cases in the Technical Preview
Not Scale Out File ServerCluster
Site1 Site2Slide37
Synchronous workflow
Applications
(local or remote)
Source Server
Node (SR)DataLog1t2Destination ServerNode (SR)DataLogt132
54Slide38
Requirements & RecommendationsSlide39
Requirements for SR
Datacenter EditionActive DirectoryNo need for Schema updates, AD objects, certain AD functional levels, etc. ≥1 Gbps end to end network between servers
Disks
GPT, not MBR
Whatever your cluster thinks is Available Storage is ok for SRSame disk geometry (between logs, between data) and partitioning for dataFree space for logs on a Windows NTFS/ReFS volumeFirewall ports SMB, WS-MANIgnite InsiderSlide40
Recommendations for Stretch Cluster
Network latency≤5ms round trip averageAssuming the light speed vacuum ideal,
5ms is ~1500km round trip
Reality: optical fiber reduces by ~35%, you cross switches, routers, firewalls, etc.
Financial limits, availabilityEnd result: most customers end up 30-50kmNetwork BandwidthIt depends on your IO and sharing of the pipe (SR may not be the only traffic for the DR site)Learn your IOPS math (125MB/s of IO == ~1Gb/s network usage)Log volume performance and sizeFlash (SSD, NVME, etc.)Larger logs allow faster recovery from larger outages and less rollover, but cost spaceThese are very strong recommendations ;)Slide41
Familiar Failover Cluster Manager GUISlide42
Demo
Failover Cluster Manager Provisioning and FailoverSlide43
Want more?
Come to the Storage Replica Session at 5:00 todayBRK3489Exploring Storage Replica in Windows Server vNext
See all of Jake’s tricks and maybe get some swag!Slide44
Windows Server 2016 will deliver an end-to-end solution to stretch clusters
Stretch clusters bring special considerations around:NetworkingQuorumStorageThere is a rich ecosystem of storage replication vendors and solutions which enable stretch clusters as well
In Review: Session Objectives And TakeawaysSlide45
Learn more
with FREE
IT Pro Resources
Free technical training resources:
On-demand online training: http://aka.ms/moderninfrastructure Expand your Modern Infrastructure Knowledge
Free ebooks:Deploying Hyper-V with Software-Defined Storage & Networking: http://aka.ms/deployinghyperv Microsoft System Center: Integrated Cloud Platform: http://aka.ms/cloud-platform-ebook
Join the IT Pro community:
Twitter
@
MS_ITPro
Get hands-on: Free virtual labs:
Microsoft Virtualization with Windows Server
and
System Center:
http://aka.ms/virtualization-lab
Windows Azure Pack: Install and Configure:
http://aka.ms/wap-lab Slide46
Visit
Myignite
at http://myignite.microsoft.com or download and use the Ignite Mobile App with the QR code above.
Please evaluate this sessionYour feedback is important to us!Slide47