/
Stretching failover clusters and using storage replica in w Stretching failover clusters and using storage replica in w

Stretching failover clusters and using storage replica in w - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
404 views
Uploaded On 2016-07-02

Stretching failover clusters and using storage replica in w - PPT Presentation

Elden Christensen amp Ned Pyle Program Managers Windows Server High Availability amp Storage BRK3487 Session Objective Understand how to stretch a Window Server Failover Cluster to achieve Disaster Recovery as well as High Availability ID: 387214

cluster site network failover site cluster failover network server dns storage stretch quorum windows clusters nodes file recovery witness

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Stretching failover clusters and using s..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1
Slide2

Stretching failover clusters and using storage replica in windows server 2016

Elden Christensen & Ned PyleProgram ManagersWindows Server – High Availability & Storage

BRK3487Slide3

Session Objective:

Understand how to stretch a Window Server Failover Cluster

to achieve Disaster Recovery as well as High Availability

Key Takeaways

How to plan

, design, and deploy a stretch cluster with Windows Server 2016

Assumption: A fundamental understanding of using Failover Clustering to achieve high availability

Session Objectives And TakeawaysSlide4

Site 1

Is my Cluster Resilient to Site Outages?

But what if there is a catastrophic event?

Fire, flood, earthquake …

High-Availability (HA) with Failover Clusteringallows applications or VMs to maintain service availability by moving them between nodes in a clusterSlide5

Stretching Clusters for Disaster Recovery

Extends a cluster from being a High Availability solution, to also being a Disaster Recovery solution

app’s fail

over to a separate physical location

Site 1Site 2Servers in separate locations in the same clusterSlide6

Stretch Clusters can achieve low RPO and RTO

Predicable behavior, as humans introduce the greatest point of failureDisaster Avoidance

*new trend*

Switching over to the recovery site with an impending disaster (such as a hurricane)

Finding your RTO and RPOAccepted amount of data loss when an outage occursSynchronous replication has zero data lossAsynchronous replication results in some data lossRecovery Time Objective (RTO)Recovery Point Objective (RPO)Accepted amount of downtime when an outage occursAutomated recovery involves a human to take action (increased downtime

)Automatic recovery involves the system detecting and taking action (less downtime)Slide7

Terminology

Stretch Cluster

Not Multi-site Cluster

Not Metro Cluster

Not Geo Cluster

Not

Geo Metro ClusterSlide8

Considerations when Stretching Clusters

Networking

Quorum

StorageSlide9

Network Considerations

Stretching Failover Clusters Slide10

Stretch Cluster Network Considerations

Site 2

Site 1

10.10.10.X

20.20.20.XWAN

LatencySubnetsDifferent datacenters (usually) equates to different subnetsLonger distance means greater network latencySlide11

Tuning Cluster Heartbeats for Latency

Property

Default

Recommended

DescriptionSameSubnetDelay 11Frequency heartbeats (HB) sentSameSubnetThreshold 510Missed HB before interface considered downCrossSubnetDelay 11Frequency HB sent to nodes on dissimilar subnetsCrossSubnetThreshold

520Missed HB before interface considered down to nodes on dissimilar subnetsPowerShell:(Get-Cluster).SameSubnetThreshold = 10(Get-Cluster).CrossSubnetThreshold = 20Slide12

Dependencies in Cluster Validation Report

Single Network Name resource

Dependent on multiple IP Address resources

Configured in an ‘or’ dependency

Only IP address resources with a corresponding network come onlineOnline IP resource registered in DNSDNS registration behavior configurable via NetName private property RegisterAllProvidersIP Multi-subnet Resource ConfigurationNetwork Name ResourceIP Address Resource AIP Address Resource BORORSlide13

Client Reconnect Considerations

Site 2

Site1

10.10.10.10

DNS ReplicationRecord CreatedRecord UpdatedRecord Updated

Record Obtained20.20.20.20

DNS

DNS

Client access point fails across

subnets

Client needs new address

Nodes in dissimilar

subnetsSlide14

Hyper-V Network Virtualization (HNV) can abstract VMs logical subnet boundaries

Each virtual network has an illusion it is running as a physical networkOverlays physical networkEncapsulating using NVGRE protocolVirtual networks enable VMs to move across different physical networks without re-configuring IP address in guest OS

Recommended for Hyper-V deployments

Strategy #1: Software Defined Networking

Site 2Site 1

20.20.20.20

10.10.10.10

Network Virtualization

30.30.30.30Slide15

RegisterAllProvidersIP

(default = 0 for FALSE) Determines if all IP Addresses for Network Name will be registered by DNS

TRUE (1): IP Addresses

registered whether online or offline

Ensure application set to try all IP Addresses, so clients can connect quickerSupported / recommended for SQL Server deploymentsHostRecordTTL (default = 1200 seconds)Controls time the DNS record lives on client for a cluster network nameShorter TTL: DNS records for clients updated soonerStrategy #2: Configure Network Name PropertiesPowerShell syntax: Get-ClusterResource ClusNN | Set-ClusterParameter RegisterAllProvidersIP 1Get-ClusterResource ClusNN | Set-ClusterParameter HostRecordTTL 300Slide16

Site 2

Site 1

Scale up for local failover for higher availability

No change in IP addresses for HA

Means not going over the WAN and is still usually preferredCross-site failover for disaster recoveryStrategy #3: Prefer Local Failover10.10.10.10VM = 10.10.10.11120.20.20.20

DNS

DNSSlide17

Site 2

Site 1

Deploying a VLAN minimizes client reconnection times

Layer 2 spans sites so the IP of the never changes

Strategy #4: Stretch VLAN’s

DNS Server 1

DNS Server 2

FS = 10.10.10.111

10.10.10.10

10.10.10.10

VLAN

DNS

DNSSlide18

Site 2

Site 1

Network device uses 3

rd

IP3rd IP is the one registered in DNS & used by clientExample: http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/App_Networking/extmsftw2k8vistacisco.pdf Strategy #5: Abstraction in Network Device10.10.10.1020.20.20.20DNS Server 1DNS Server 2

VM = 30.30.30.3030.30.30.30

DNS

DNSSlide19

Intra-cluster communication is signed and secure by default

Recommended to set to encrypted if spanning an unsecure WANCluster Network Security over the WAN

Value

Description

0Clear Text1Signed (default)2EncryptedPowerShell syntax:(Get-Cluster). SecurityLevel = 2Site 2Site 1

10.10.10.X20.20.20.XSlide20

Recap: Configuring your Stretch Cluster network

Adjust

intra-node heartbeat thresholds

Understand

NetName

Resource Configuration

Optimize Client Reconnection on CAP Failover

Encrypt

intra-

node

communication over

unsecure

WANsSlide21

Quorum Considerations

Stretching Failover Clusters Slide22

Quorum Overview

When nodes cannot talk to each other, there must be a way to reconcile who stays up and who shuts downA set of nodes which have a majority of votes have quorum and stay upVotes can be assigned to:

Considerations: When a site is lost, it can results in a stretch cluster losing many votes at the same time

Plan for site losses to ensure the cluster can maintain quorum

WitnessNodesEach node can have 1 voteWitness can only have 1 voteSlide23

Site 2

Site 1

Quorum Resiliency to a Site Loss

Vote

Witness5VoteVote

VoteVote3 out of 5 votes needed to maintain quorum Dynamic Quorum will re-calculate quorum once stabilized to 3 votes

1

2

3

4

Site 2 Down!!!

Site 1 can reach Cloud Witness!

Cluster Survives!Slide24

Leverages Azure as arbitration point

Quorum configuration achieved without extra siteWrites single blob file per cluster (unique blob file name)Arbitration across sites using the blob fileMinimal data written blob file (node state change)Same Azure Account/Container for multiple clusters

Cloud Witness in Windows Server 2016

Azure

WitnesshttpsSite 2Site 1Slide25

Demo

Configuring Cloud WitnessReference: http://

blogs.msdn.com/b/clustering/archive/2014/11/14/10572766.aspx Slide26

Picking a Witness

Cloud

Witness

File Share Witness

Share the same arbitration logicDo not keep copy of cluster databaseWrites to Blob fileWrites to file on ShareAccess file using Azure Storage Service REST APIAccess file using SMB ProtocolRequires Microsoft Azure Storage AccountRequires File Server hosted separatelyCluster manages permissions on the blob fileRequires CNO permissions to File ShareSlide27

Cluster

Site 1

Cluster will survive simultaneous 50% loss of votes

One site automatically elected to win

Site without LowerQuorumPriorityNodeID cluster common property winsNodes in the other site drop out of the clusterDefining Primary Datacenter for Quorum SplitSite 2Slide28

Site 2

Site 1

Quorum can be configured so that failover to the DR site is always manual (aka. Automated failover)

Cluster cannot

survive failure of primary siteRemove node vote weights of backup siteUse Preferred owners to keep workloads on primary siteConfiguring Cluster for Manual Failover

1

2

3

4

Vote

Vote

Loss of Primary Site:

Start-

ClusterNode

-

ForceQuorum

Recovery of Primary Site:

Start-

ClusterNode

-

PreventQuorumSlide29

Preferred

Owners can prioritize placement of workload on a set of nodesSort ordered list

Set nodes in primary site at the top of the list

Cluster Group property

Configuring Preferred NodesPowerShell syntax:Get-ClusterGroup MyVM | Set-ClusterOwnerNode Node1, Node2Slide30

Recap: Configuring your Stretch Cluster Quorum

Recommended to use Cloud Witness

When no access to Azure use

File Share Witness

in a

3rd siteAutomatic failover – Keep number of nodes on primary and secondary sites equal

Manual failover – Remove votes of nodes on secondary siteSlide31

Storage Considerations

Stretching Failover Clusters Slide32

Storage Replica

is here to help you

tolerate disastersSlide33

Superstorm Sandy

Chicago (you are here)

NYC

“Can you hear me now?”Slide34

Storage Replica

Replication

Block-level, v

olume-based

Synchronous & asynchronous SMB 3.1.1 transportFlexibilityAny Windows volumeAny fixed disk storageAny storage fabricManagementFailover Cluster ManagerWindows PowerShellWMIEnd to end MS Storage StackSlide35

Demo

“Look out, that ape has a bus!”Slide36

Stretch Clusters

Synchronous onlyAsymmetric storageTwo sites, two sets of shared storageCluster storage: CSV or role-assigned PDRManage with FCMOr Windows PowerShell

Increase cluster DR capabilities

Hyper-V and General Use File Server are the main use cases in the Technical Preview

Not Scale Out File ServerCluster

Site1 Site2Slide37

Synchronous workflow

Applications

(local or remote)

Source Server

Node (SR)DataLog1t2Destination ServerNode (SR)DataLogt132

54Slide38

Requirements & RecommendationsSlide39

Requirements for SR

Datacenter EditionActive DirectoryNo need for Schema updates, AD objects, certain AD functional levels, etc. ≥1 Gbps end to end network between servers

Disks

GPT, not MBR

Whatever your cluster thinks is Available Storage is ok for SRSame disk geometry (between logs, between data) and partitioning for dataFree space for logs on a Windows NTFS/ReFS volumeFirewall ports SMB, WS-MANIgnite InsiderSlide40

Recommendations for Stretch Cluster

Network latency≤5ms round trip averageAssuming the light speed vacuum ideal,

5ms is ~1500km round trip

Reality: optical fiber reduces by ~35%, you cross switches, routers, firewalls, etc.

Financial limits, availabilityEnd result: most customers end up 30-50kmNetwork BandwidthIt depends on your IO and sharing of the pipe (SR may not be the only traffic for the DR site)Learn your IOPS math (125MB/s of IO == ~1Gb/s network usage)Log volume performance and sizeFlash (SSD, NVME, etc.)Larger logs allow faster recovery from larger outages and less rollover, but cost spaceThese are very strong recommendations ;)Slide41

Familiar Failover Cluster Manager GUISlide42

Demo

Failover Cluster Manager Provisioning and FailoverSlide43

Want more?

Come to the Storage Replica Session at 5:00 todayBRK3489Exploring Storage Replica in Windows Server vNext

See all of Jake’s tricks and maybe get some swag!Slide44

Windows Server 2016 will deliver an end-to-end solution to stretch clusters

Stretch clusters bring special considerations around:NetworkingQuorumStorageThere is a rich ecosystem of storage replication vendors and solutions which enable stretch clusters as well

In Review: Session Objectives And TakeawaysSlide45

Learn more

with FREE

IT Pro Resources

Free technical training resources:

On-demand online training: http://aka.ms/moderninfrastructure Expand your Modern Infrastructure Knowledge

Free ebooks:Deploying Hyper-V with Software-Defined Storage & Networking: http://aka.ms/deployinghyperv Microsoft System Center: Integrated Cloud Platform: http://aka.ms/cloud-platform-ebook

Join the IT Pro community:

Twitter

@

MS_ITPro

Get hands-on: Free virtual labs:

Microsoft Virtualization with Windows Server

and

System Center:

http://aka.ms/virtualization-lab

Windows Azure Pack: Install and Configure:

http://aka.ms/wap-lab Slide46

Visit

Myignite

at http://myignite.microsoft.com or download and use the Ignite Mobile App with the QR code above.

Please evaluate this sessionYour feedback is important to us!Slide47