/
High Availability (HA) Agenda High Availability (HA) Agenda

High Availability (HA) Agenda - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
351 views
Uploaded On 2018-09-21

High Availability (HA) Agenda - PPT Presentation

High Availability Introduction FrontEnd High Availability User Experience Server Management SQL BackEnd High Availability High Availability Introduction Understanding Availability What is Availability ID: 673628

availability sql pool server sql availability server pool group alwayson cluster skype fabric windows failover servers primary replica high business quorum node

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "High Availability (HA) Agenda" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

High Availability (HA)Slide2

Agenda

High Availability Introduction

Front-End High Availability

User Experience

Server Management

SQL Back-End High AvailabilitySlide3

High Availability

IntroductionSlide4

Understanding Availability

What is Availability?

By Definition: the level of redundancy applied to a system for ensuring an absolute degree of operational continuity during planned and un-planned outages.

The focus of this feature is to provide service availability – i.e. keep the service up and running

Define the degree of Availability

The secondary objective is to minimize the impact on the user experience in case of failure of any of the Lync assets and infrastructure glitches (AD, DNS, network connectivity, Etc.)Slide5

Determining Availability Requirements

Designs should be based on Business Requirements and Service Level Agreements (SLA)s

What are the business drivers?

What SLAs are in place?

Are these SLAs shared with other teams (network, hardware, Active Directory, SQL)

How do they define “availability”

How are “RPO” and “RTO” defined

Level of Availability has “potential” implications

Cost

Complexity

ManagementSlide6

Availability 9s

Considerations

Availability %

Downtime per year

Downtime per

month (30 days)

Downtime per week

90% ("one nine")

36.5 days

72 hours

16.8 hours

95%

18.25 days

36 hours

8.4 hours

98%

7.30 days

14.4 hours

3.36 hours

99% ("two nines")

3.65 days

7.20 hours

1.68 hours

99.5%

1.83 days

3.60 hours

50.4 minutes

99.8%

17.52 hours

86.23 minutes

20.16 minutes

99.9% ("three nines")

8.76 hours

43.2 minutes

10.1 minutes

99.95%

4.38 hours

21.56 minutes

5.04 minutes

99.99% ("four nines")

52.56 minutes

4.32 minutes

1.01 minutes

99.999% ("five nines")

5.26 minutes

25.9 seconds

6.05 seconds

99.9999% ("six nines")

31.5 seconds

2.59 seconds

0.605 secondsSlide7

HA Capabilities within Skype for Business

Server clustering via HLB and Domain Name Service (DNS) load

balancing

Mechanism built into Skype for Business Server to automatically distribute services and groups of users across various front end servers within an Enterprise Pool

HA:

S

erver

failure

Support for choice of technology: SQL Failover Clustering, SQL Always On, SQL Mirroring

Support auto-failover

(FO)/failback (FB) (

with witness

) and manual

FO/FB

Integrated

with

into the core product tools such as Topology Builder, Skype Server Control Panel and Skype Server Management Shell

HA:

B

ack-end

failureSlide8

Front End High Availability

OverviewSlide9

The Availability Brick Model EvolutionSlide10

Fabric v3 Within Skype for Business

Availability Model - Services

Supports MCU Factory, Conference Directory, Routing Group, LYSS

Fast failover with full service availability

Automatic scaling and load balancing

Failover management (Activate/

DeActivate

API during patching)

Performs Primary/secondary nodes election

Replication between primary and secondary nodes

Availability Model – Users

Users are mapped to Groups

Each group is a persisted

stateful

service with up to 3 replicasUser requests are serviced by primary replicaUser Location RoutingSlide11

Server OS Fabric Considerations

Operating system selection impacts the installed version of Windows Fabric during setup:

Recommended

OS:

Windows Server 2012 R2

Note:

If migrating from Windows 2008 R2, recommend to deploy side-by-side starting with Windows 2012 R2 instead

Latest fixes for Windows Fabric may not be available for older operating systems

Faster replica rebuilds under slow network conditions

Reduce size

of Windows Fabric performance counter logs

Ability to enable remote copying

of fabric logs

Ability

to control the size of Windows Fabric trace files

Better resiliency of fabric servicesBetter handling of certain error conditions

Operating SystemInstalled version of Windows FabricWindows Server 2008 R2Windows Fabric v3Windows Server 2012Windows Fabric v3Windows Server 2012 R2Windows Fabric v3Slide12

Pool Quorum

When Servers detect another Server or Cluster to be down based on their own state, they consult the Arbitrator before committing that decision.

Total Number of Front End Server in the pool (defined in Topology)

Number of Servers that must be

running for pool to be functional

2

1

3-4

Any 2

5-6

Any 3

7

Any 4

8-9

Any 4 of the first 7 servers

10-12Any 5 of the first 9 servers

Voter systemA minimum number of voters are required to prevent service startup failures and provide for pool failover as shown in the following table:Slide13

Example: Pool Quorum – Voter In-Depth

Two Server Pool

Three Server Pool

Six Server Pool

C:\ProgramData\Windows Fabric\FabricHostSettings.xmlSlide14

Pool Startup Fabric Behavior Scenarios

At Cluster Boot up

A Primary member for each Routing Group service is created

The Primary member synchronizes data available in blob store to local database

The elected Secondary member for each routing group will be synchronized with the Primary

When a Frontend restarts

Windows Fabric automatically load balances appropriate services to this Frontend once restart is complete.

Front-end is made idle Secondary for services, subsequently to active Secondary

To manage any service, only (3) nodes are required with synchronized communicationSlide15

Fabric Group Based Routing Scenarios

All users assigned to a group are homed on same Front End (FE)

Groups failover to other registrars within a pool when Primary member fails

Groups are rebalanced dynamically when FEs are added/removed

Routing Groups assigned to a dedicated Replica SetSlide16

Example: Fabric Routing Group AssignmentSlide17

17

Intra-Pool Load Balancing & Replication

Persistent User Data

Synchronous replication to two more FEs (Backup / Replicas)

Presence, Contacts/Groups, User Voice Setting, Conferences

Synchronous writes to the back end for conference state

Transient User Data

Not replicated across Front End servers

Presence changes due to user activity, including:

- Calendar

- Inactivity

- Phone call

Minimal portions of conference data replicated to include:

- Active Conference Roster

- Active Conference MCUs

Limited usage of “Shared” Storage Blob

Data rehydration of client endpointsDisaster recoverySlide18

Replica Set Behavior

Three replicas – 1 Primary, 2 Secondaries (quorum)

If one replica goes down another one takes over as the primary

For 15-30 minutes fabric will not attempt to build another replica*

18

*User Count impactsSlide19

Replica Set Behavior

Three replicas – 1 Primary, 2 Secondaries (quorum)

If one replica goes down another one takes over as the primary

For 15-30 minutes fabric will not attempt to build another replica*

19

*User Count impactsSlide20

Replica Set Behavior

Three replicas – 1 Primary, 2 Secondaries (quorum)

If one replica goes down another one takes over as the primary

For 15-30 minutes fabric will not attempt to build another replica*

20

*User Count impactsSlide21

Replica Set Behavior

Three replicas – 1 Primary, 2 Secondaries (quorum)

If one replica goes down another one takes over as the primary

For 15-30 minutes fabric will not attempt to build another replica*

If during this time one of the two replicas left goes down the replica set is in quorum loss

Fabric will wait indefinitely for the two replicas to come up again

21

*User Count impactsSlide22

Replica Set Stateful Service Failover

22

OS

OS

OS

OS

OS

Node

1

Node

4

Node

2

Node

3

Node

5Stateful Service(Primary)

Stateful Service

(Secondary)

Stateful Service

(Secondary)

Stateful Service

(Primary)

Stateful Service

(Secondary)

ReplicationSlide23

Survivable Branch Routing Group Scenarios

What about SBA/SBS-homed users?

SBA/SBS will have a pool defined for User Services

This pool will contain the Routing Groups for the users assigned to the SBS/SBA

One pool can service multiple SBA/SBS

Each SBS/SBA gets it’s own unique Routing Group

All users homed on SBS/SBA are in the same RG

This can include up to 5000 users based on current sizing guidelines

This Routing Group will have up to 3 copies, like any other Routing Group

Note: Since (1) SBA can be associated to (1) pool, for large environments, SBAs should be staggered across the pools they are associated to provide the highest level of availability possible Slide24

Survivable Branch Routing Group Scenarios

Let’s check out some SBS users…Slide25

Survivable Branch Routing Group ScenariosSlide26

Survivable Branch Routing Group Scenarios

Let’s add a new SBS to the topology….first we’ll check the Routing Group distribution

Now…after publishing the new SBA, let’s look again….Slide27

After creating users on the new SBS, let’s check the routing group ID

Survivable Branch Routing Group Scenarios

Look familiar?Slide28

High Availability User Experience

Primary Copy OfflineSlide29

Example: User Experience

Now, stop services on POOLA2……Slide30

Example: User Experience

Notice that one of the secondary copies was promoted to primary

Server restoredSlide31

Example: User ExperienceSlide32

Example: User Experience

Amy’s client logs show her client trying to REGISTER, 301 to POOLA3 (up)Slide33

Example: User Experience

But what about a 2-FE pool? Is it different because we don’t have 3 copies?

Nope…still works fine.Slide34

High Availability User Experience

All Copies OfflineSlide35

Example: User Experience

Now, stop VMs

POOLA4,

POOLA5, POOLA2…..Slide36

Example: User Experience

Amy’s Routing Group is in Quorum Loss (No Primaries)Slide37

Example: User Experience

HOW DO I GET OUT OF THIS?!?!?!

Perform a QuorumLossRecovery on the affected pool.Slide38

Example: User ExperienceSlide39

High Availability Server Management

PatchingSlide40

Server Grouping – Upgrade Domains

Logical grouping of servers on which software maintenance such as upgrades, and security updates are performed at the same time

You cannot lose more than one Upgrade Domain at a time

Loss of multiple Upgrade Domains = quorum lossSlide41

Upgrade domains and service placements

P

Node 3

Node 2

Node 4

Node 5

Node 6

Node 1

S

S

P

S

S

S

S

P

S

S

P

S

S

P

UD:/UpgradeDomain1

UD:/UpgradeDomain2

UD:/UpgradeDomain3Slide42

Upgrade Domains

Related to number of FEs in pool at creation time (TB Logic)

How can I tell?

Get-

CsPoolUpgradeReadinessState

| Select-Object –

ExpandProperty

UpgradeDomains

What if I add more FEs to the pool?

Depending on initial creation state, more UD may be created, or more servers placed into existing UDs

Initial Pool Size

Number of Upgrade Domains

Front

End Placement per Upgrade Domain

12

8First 8 FEs into 4 UD with 2 each, then 4 UD with 1 each88Each FE placed into its own UD98First 2 FEs into one UD, then 7 UD with 1 each55Each FE placed into its own UDSlide43

Example: Topology Builder Upgrade Domain

Within this example we see:

(1) Upgrade domain for a Standard Edition Pool

(1) Upgrade domain for the Monitoring Server RoleSlide44

Cmdlets

Get-

CsUserPoolInfo

-Identity <user>

Primary pool/FEs, secondary pool/FEs, routing groupSlide45

More Cmdlets

Get-

CsPoolFabricState

Detailed information about all the fabric services running in a pool

Get-

CsPoolUpgradeReadinessState

Returns information indicating whether or not your Lync Registrar pools are ready to be upgraded/patched

Reset-

CsRoutingGroup

Administrators can reset Windows Fabric routing groups that are missing or are otherwise not working correctly. Missing routing groups can be identified by using the Get-

CsPoolFabricState

cmdlet and the

FilterOnMissingReplicas

parameter.

Skip-CsRoutingGroupAtServiceStartupSlide46

Resetting the Pool

Reset-

CsPoolRegistrarState

FullReset

– cluster changes 1->Any, 2->Any, Any->2, Any->1, Upgrade Domain changes

QuorumLossRecovery

– force fabric to rebuild services that lost quorum

ServiceReset

– voter change (default if no

ResetType

specified)

MachineStateRemoved

– removes the specified server from the poolSlide47

Windows Fabric v3 Changes

Better load balancing to consider replica movement costs

Performance improvements, bug fixes (including issues reported by Skype), and improved debug-ability

Lease level performance improvements—lease expiration in cluster manifest honored

Support for Deactivate API

Slowly drain replicas out instead of a sudden spike resulting in heavy load on secondary

Prevents Windows Fabric from placing replicas on a FEs that are shutting down

Safe upgrade checks

Official support for Server 2012 R2Slide48

Lync 2013

Skype for Business

Skype for Business Patching Process Evolution

From (8) steps

to (4)!!! Slide49

Skype for Business Server patching

Simplified workflow leverages Windows Fabric

v2/v3

APIs

4 steps:

Invoke-

CsComputerFailOver

to

failover

(stop) a front end;

take

the FE out of rotation, move the replicas out

Perform patching/upgrade

Invoke-

CsComputerFailBack to failback (start) a front end; bring FE into active state, move replicas inRepeat for all front ends in the poolSlide50

Invoke-CsComputerFailOver

Checks for availability of sufficient number of servers

Waits for replica stability across the pool

- Confirm all replicas exists before taking server down

Initiates deactivation of the node; wait for success/failure from windows fabric

Stops services after successfully deactivating the nodeSlide51

Invoke-CsComputerFailBack

Starts services if not already started

Activates node— by doing so, Windows Fabric will now consider this server for replica placementSlide52

How is this better?

Simpler workflow, fewer commands, less errors

Faster: 2–3 hrs. for 12 FE pool (down from 8–12 hrs.)

More reliable

Checks for readiness across the pool within the cmdlet before failover

Leverages windows fabric v2/v3 deactivate/activate node APIs ensuring more dependable operation (moving replicas in/out, not move replicas into node going down)

Since scope is always one frontend, avoids situations where multiple front ends could be down within a pool (reason servers don’t start)

Will not allow fail over of a server if there are existing replica issues in the pool

Enforces min server requirements implicitly (if other servers are down)

Progress indicatorsSlide53

How is this better?

Invoke-CsComputerFailOver progress indicator

Invoke-CsComputerFailBack progress indicatorSlide54

Additional Patching Process Notes

Prerequisite: Skype for Business Server, Fabric version 2.0 CU3+

Don’t execute on more than one server at a time in a pool (it might block)

Invoke-CsComputerFailOver requires RTCSRV service to be running

Invoke-CsComputerFailBack will start RTCSRV service

Stopping services outside of this cmdlet out-of-scopeSlide55

High Availability Server Management

Pool

Cold

StartSlide56

Pool Cold start scenarios

Lync 2013 to Skype for Business in-place upgrade

Adding a new pool

Pool fail back starts a pool if it was offline

Miscellaneous cases where administrator decides to

take down the entire pool for a maintenance activity

(not recommended in 2013)Slide57

Previous Lync 2013/Pool Cold Start Problems

Typically, all the servers need to be started for

one server to be up

Confusing minimum number of servers requirements

Starting a subset of a pool is not straightforward since

it involves running Routing Group quorum loss recovery

Incomplete information on why a server cannot be started

No automatic recovery actions initiated for failuresSlide58

Skype for Business Server Pool Start

Start the servers within a pool with a single command with easy to follow instructions

Checks pre-requisites for all pool servers

For problems encountered that might cause issues during pool/server cold start, alerts and diagnosis with required resolution steps

In some cases, allows pool to start even if some of the routing group replicas are stuckSlide59

Example:

Starting a pool with a single Cmdlet

Start-

CsPool

Prerequisite checks (all servers Skype for Business Server, Windows Fabric v2+)

Attempts to start all servers in the pool

If problems starting any server; perform extended diagnosis; alert

If problem on Front End cannot be fixed, run

Start-CsPool with exclusion list

 Fail if min server requirements cannot be met due to exclusion list  Does this operation require quorum loss recovery  If no data loss, perform implicit quorum loss recovery  If there will be data loss  Seek admin approval with data loss information (or)  Configure option to skip specific routing group replicas and proceed with start

Start all servers if no issuesSlide60

SQL Back End High Availability

OverviewSlide61

SQL Mirroring Backend HA DiagramSlide62

SQL Mirroring File Share

What is it?

Temporary location used during setup

BAK files written here.

Primary SQL needs R/W, Mirror R/O

Where should it go?

Any file server, with proper permissions for SQL Service access

Do NOT use DFS! .BAK files are excluded from replication by default

Do not use the Lync Pool File Share

This is a one-time use shareSlide63

SQL Mirroring Witness as SQL Express

SQL Express fully supported as a witness

Remember to enable TCP/IP

Start SQL Browser Service (if using dynamic ports)

Open necessary firewall ports as requiredSlide64

SQL AlwaysOn

OverviewSlide65

SQL Server AlwaysOn Overview

SQL Server AlwaysOn Features

Next generation of database mirroring technologies

Provides high availability and disaster recovery in SQL

Introduced in SQL Server 2012 and present in SQL Server 2014

Runs on top of WSFC (Windows Server Failover Clustering)Slide66

SQL AlwaysOn Advantages

Latest and greatest SQL HA solution

Although database mirroring is still available in its original feature set, it is now considered a deprecated feature and will be removed in a future release of SQL Server

More reliable

SQL

AlwaysOn

(one primary, can have up to three corresponding secondary replicas)

Mirroring (one primary, one mirror)

Multi-database failovers

Useful in applications with several databases

Databases can be added to an Availability Group that can be failed over between replicas

All databases in Availability Group are failed over at the same timeSlide67

SQL AlwaysOn Availability Groups

Provides high availability by maximizing availability of databases for an enterprise

An Availability Group is a set of databases

that are failed over together at the same time

Supports one primary replica and up to two secondary replicas in synchronous-commit mode

Availability Group listener responds/redirects incoming client requests to replicas

Each Availability Group replica has local SQL instance and local copy of databasesSlide68

SQL AlwaysOn Failover Cluster Instance (FCI)

Provides high availability through redundancy at the server-instance level

One SQL instance is installed across all failover clustering nodes

Single copy of databases are located on shared disk storage which is failed over between nodesSlide69

SQL AlwaysOn Deployment Scenarios

New install scenario

- Install new backend using SQL Enterprise 2012 or SQL Enterprise 2014

- Add new Skype for Business Server pool with AlwaysOn back-end to topology

- Install Skype for Business Server and add databases to Availability Group

Upgrade scenario

- Upgrade an existing Lync Server 2013 pool to Skype for Business Server

- Upgrade back-end server to SQL Enterprise 2012 or SQL Enterprise 2014

- Enable SQL AlwaysOn for Skype for Business Server databasesSlide70

SQL AlwaysOn deployment options

Deploying

AlwaysOn

for new pools

Creating a new SQL

AlwaysOn

Availability Group for a new pool

Deploying

AlwaysOn

for existing pools

Moving from SQL standalone backend to

AlwaysOn

Availability Groups

Moving from SQL mirrored backend to

AlwaysOn Availability GroupsSlide71

SQL Version Requirements

Standalone Server

- Standard or Enterprise Edition

AlwaysOn Failover Clustering Instance (FCI)

- Standard or Enterprise Edition (two nodes)

- Enterprise Edition (three or more nodes)

Mirroring

- Standard or Enterprise Edition

AlwaysOn Availability Groups

- Enterprise Edition requiredSlide72

SQL AlwaysOn

S

upport

I

nformation

Supported with Skype for Business Server

Note

:

Availability groups are not supported with Lync Server 2010 or 2013

Lync/Skype Version

Standalone

Failover Clustering

Mirroring

Availability Groups

Lync Server 2010

SQL 2008 R2 SP2SQL 2008 R2 SP2Not supportedNot supportedLync Server 2013

SQL 2008 R2 SP2

SQL 2012 SP1

SQL 2008 R2 SP2

SQL 2012 SP1

SQL 2008 R2 SP2

SQL 2012 SP1

Not supported

Skype for

Business Server 2015

SQL 2008 R2 SP2

SQL 2012 SP1

SQL 2014

SQL 2008 R2 SP2

SQL 2012 SP1

SQL 2014

SQL 2008 R2 SP2

SQL 2012 SP1

SQL 2014

SQL 2012 SP1

SQL 2014Slide73

Supported SQL Availability Group settings

Supported configurations* for Skype for Business Server

Support having replicas only in the same subnet

Support only the Synchronous-Commit Mode

Support the Automatic Failover Mode

No support for read access on secondary replicas

No support for having an off-site replica in Azure

Note:

Other

configurations are possible and not actively blocked, but not supportedSlide74

SQL AlwaysOn

DeploymentSlide75

Windows Server Failover Clustering Requirements

Windows Server 2008 R2 SP1 or higher

WSFC feature installed, with sufficient nodes for desired configuration

Select the File Share Witness option for the quorum witness

Cluster nodes cannot be Active Directory domain controllers

Cluster nodes must be from the same Active Directory domain

Cluster nodes must be connected to the same network subnetSlide76

SQL Server and Database Requirements

SQL Server 2012 SP1/2014 Enterprise Edition or higher

SQL installation steps are different depending on HA option selected

SQL

AlwaysOn

must be manually enabled on SQL service and restarted

Full recovery model required for each Availability Group database

Full backup required for each database added to Availability Group

Database folder structure must be duplicated across all AG replicas

High availability option

Installation selection

Availability Groups (AG)

New SQL Server stand-alone installation (all replicas)

Failover Cluster Instance (FCI)

New SQL Server failover cluster installation (first node)

Add node to a SQL Server failover cluster (additional nodes)Slide77

Creating a new AlwaysOn Availability Group

Creating a new Availability Group for a new pool can be somewhat confusing

Creating a new SQL back-end Availability Group requires at least one database

Databases for a new pool cannot be created until a SQL backend is availableSlide78

Creating a new SQL AlwaysOn

Availability Group

Step 1: Add new SQL Store using the FQDN of the Availability Group Listener

- In Topology Builder, select the option New Front End Pool

-

When

prompted to define the SQL Server store, click New

-

Add

the FQDN of the Availability Group Listener as the SQL Server FQDN

-

Select High Availability Settings, and choose SQL AlwaysOn Availability Groups - Add the FQDN of the SQL primary replica as the FQDN for the SQL Server AlwaysOn Instance - Complete the configuration of the new pool and publish the topologyStep 2: Enable AlwaysOn Availability GroupsStep 3: Create a new AlwaysOn Availability Group for the back-end databasesStep 4: Update the settings for the SQL Store and publish the topologySlide79

Creating a new AlwaysOn

SQL Store

Availability Group Listener FQDN

Primary Replica FQDN

Note:

Databases will be created on the Primary Replica when the topology is publishedSlide80

Creating a new AlwaysOn

Availability Group

Step 1: Add new SQL Store using the FQDN of the Availability Group Listener

Step 2: Enable

AlwaysOn

Availability Groups

- Add the Windows Server Failover Clustering (WSFC) feature on each replica server

-

Validate

the cluster configuration

-

Create a new Windows Failover Cluster - Configure cluster quorum settings - Enable AlwaysOn Availability GroupsStep 3: Create a new AlwaysOn Availability Group for the back-end databasesStep 4: Update the settings for the SQL Store and publish the topologySlide81

Installing the Windows Server Failover Clustering Feature

Windows Failover Clustering

Built in Windows Server Feature

Installed through “Add Roles and Feature

Windows Server 2008 R2 SP1 or Higher

Important

Cluster nodes cannot be Active Directory domain controllers

Cluster nodes must be from the same Active Directory domain

Cluster nodes must be connected to the same network subnetSlide82

Installing the Windows Server Failover Clustering Feature

Windows Failover Clustering

Built in Windows Server Feature

Installed through “Add Roles and Feature

Windows Server 2008 R2 SP1 or Higher

Important

Cluster nodes cannot be Active Directory domain controllers

Cluster

nodes must be from the same Active Directory domain

Cluster

nodes must be connected to the same network subnetSlide83

Validating the Cluster Configuration

Adding cluster nodes

Add the servers that will be part of the cluster

All new nodes will be verified and testedSlide84

Validating the Cluster Configuration

Node validation

All new nodes will be verified and tested

Address and fix all errors and warningsSlide85

Validating the Cluster Configuration

Successful validation

Select “Create the Cluster now using the validated nodes”

A new Windows Failover cluster will be

ceated

using all configured nodesSlide86

Creating the Cluster

Creating the cluster

Define an administrative cluster name

For cluster administration only

Not used by Skype for Business

Automated DNS registration for IP Address Slide87

Creating the Cluster

Creating the cluster

Uncheck “Add all eligible storage to the cluster”

“Always On” does not require / utilize shared storageSlide88

Creating the Cluster

Creating the cluster

A cluster consisting of the

the

configured nodes is createdSlide89

Configuring Quorum Settings

Quorum Witness

A two node cluster requires a witness for a majority vote

The quorum should not be defined on a nodeSlide90

Configuring Quorum Settings

Quorum Witness

A two node cluster requires a witness for a majority vote

The quorum should not be defined on a node

Skype for Business uses a

File Share WitnessSlide91

Configuring Quorum Settings

Quorum Witness

A two node cluster requires a witness for a majority vote

The quorum should not be defined on a node

Skype for Business uses a

File Share Witness

The File Share should be accessible from both nodesSlide92

Configuring Quorum Settings

Quorum Witness

A two node cluster requires a witness for a majority vote

The quorum should not be defined on a node

Skype for Business uses a

File Share Witness

The File Share should be accessible from both nodes

The File Share Witness is configuredSlide93

Enabling SQL AlwaysOn

Configure the SQL instance

AlwaysOn

has to be enabled manually

If not enabled, an availability group cannot be createdSlide94

Enabling SQL AlwaysOn

Configure the SQL instance

AlwaysOn

has to be enabled manually

If not enabled, a availability group cannot be created

Check “Enable

AlwaysOn

Availability Groups” on “

AlwaysOn

High Availability”Slide95

Enabling SQL AlwaysOn

Configure the SQL instance

A restart of the SQL instance is requiredSlide96

Creating a new AlwaysOn

Availability Group

Step 1: Add new SQL Store using the FQDN of the Availability Group Listener

Step 2: Enable

AlwaysOn

Availability Groups

Step 3: Create a new

AlwaysOn

Availability Group for the back-end databases

- Set the recovery model for each database to Full

-

Perform

a SQL backup of each database

- Duplicate the database folder structure on each replica server - Create the new Availability Group and add the back-end databasesStep 4: Update the settings for the SQL Store and publish the topologySlide97

Creating a new Availability GroupSlide98

Creating a new Availability GroupSlide99

Creating a new Availability GroupSlide100

Creating a New Availability GroupSlide101

Creating a New Availability GroupSlide102

Creating a New Availability GroupSlide103

Creating a New Availability GroupSlide104

Creating a new AlwaysOn

Availability Group

Step 1: Add new SQL Store using the FQDN of the Availability Group Listener

Step 2: Enable

AlwaysOn

Availability Groups

Step 3: Create a new

AlwaysOn

Availability Group for the back-end databases

Step 4: Update the settings for the SQL Store and publish the topology

- In Topology Builder, open the properties of the Availability Group SQL Store

- Under

High Availability Settings, change the FQDN for the SQL Server

AlwaysOn instance value - to the FQDN of the Availability Group ListenerPublish the topologySlide105

Modifying the AlwaysOn

SQL Store

Availability Group Listener FQDN

Availability Group Listener FQDNSlide106

Questions?Slide107

SponsorsSlide108

Jabra

– Evolving the Lync Communication Experience

145 years of communication experience

Microsoft Gold Partner since 2007

Partner of the year Finalist 2013

Over 40 Lync certified devices

Exclusive USB/3.5M controller on

EVOLVE 40 and 80 Models

Over 4 million Lync Devices sold!

Our History

Jabra Programs for Skype for Business

Jabra Partner Program

Pilot, POC and Deployment Offers

Device Deployment and

Management Tools

Partner Demo and Seed Programs www.Jabra.com/microsoft Contact : Bill Orlansky

borlansky@Jabra.comSlide109

Voyager Focus UC

Works seamlessly across Bluetooth-enabled desk phones, laptops, mobile phones, tablets, and smart watches

Smart sensors answer calls by simply putting on the headset, mute by taking the headset off and pause music for incoming calls

Enhanced voice alerts announce caller ID, mute and connection status, talk time level

Dynamic Mute Alert feature senses and alerts you when you try to talk

when muted

Stereo Bluetooth headset

with active noise canceling (ANC)Slide110

Sennheiser

Sennheiser headsets and speakerphones provide excellent sound quality, wearing comfort and hearing protection for contact centers, offices and UC professionals

Sennheiser is an ideal companion for Skype for Business

www.sennheiser.com/cco

Slide111
Slide112

AppendixSlide113

Additional Skype for Business Role High Availability

OverviewSlide114

Additional Role Availability Considerations

Director

Mediation Server

Edge Server

By deploying multiple servers within a pool, capacity and redundancy provided

Load Balancing via DNS LB or HWLB

114Slide115

Video Interop (VIS) Role Availability Considerations

3rd Party Video Gateway Pool Failover Support

One or more VIS pools can support trunks to one or more Video Gateways

The video gateway can fall back to a backup VIS pool if the preferred is unavailable

A combination of one or more methods can be used in the same topology

VIS Server (Front End) Failover Support

VIS able to leverage the concept of an “Associated Backup Pool” defined on a Front End Pool

After 5 minutes of consecutive failures VIS moves to the backup pool

Inbound calls relayed by VIS to a failed primary pool will ring for 10 seconds before trying the backup

VIS also adheres to manual failover commands in the event of Front End pool maintenance

115Slide116

Office Web App High Availability

OverviewSlide117

Office Web App (OWA) Availability Topologies

Stretched OWA Farm Across Datacenters

117Slide118

Office Web App (OWA) Availability Topologies

Dedicated OWA Farm within Single Datacenter

118Slide119

Office Web App (OWA) Availability Considerations

Stretched OWA Farm Across Datacenters

We have not tested stretch farm scenarios

There is a great deal of communication between Office Web Apps servers within a farm

We will not attempt to optimize this communication for stretch farms

There is no reason that a stretch farm can’t work

There is a consistent intra-farm latency of <1ms one way, 99.9% of the time over a period of ten minutes. (Intra-farm latency is commonly defined as the latency between the front-end web servers and the database servers.)

The bandwidth speed must be at least 1 gigabit per second.

Our statement is

: “…We will support customers who set up stretch farms; however, if an issue turns out to be the result of how the stretch farm is configured, we are not in a position to deliver a change to Office Web Apps Server to support that configuration. In cases where a stretch farm is on a very low latency network (say, fiber within the same city) and the network is configured such that all the machines in the farm appear to each other as machines in the same subnet, there is nothing about our architecture that will fail…”

119Slide120

Office Web App (OWA) Availability Considerations

Dedicated

OWA

Farm within Single Datacenter

Our

statement

is

: “…

Stick

to

(1)

D

ata

center

. Servers within

an Office Web Apps Server farm must be within the same data center. Don’t distribute them geographically. Generally you need only (1) farm, unless you have security needs that require an isolated network that has its own Office Web Apps Server farm.120

“Best Practice”RecommendationSlide121

Persistent Chat (pChat

)

High Availability

OverviewSlide122

Persistent Chat Availability Topologies

HA for pChat within a Single Datacenter

122

Persistent

Chat Front

Ends:

Pool

with up to

(4)

active Persistent Chat Front End Servers

Persistent

Chat back

end:

SQL mirroring, with optional witness provides automated failover

Distributed File System Replication (DFSR) for file share

replicationCompliance database uses same mechanism as Persistent Chat content databaseSlide123

Persistent Chat Availability Topologies

SQL Mirroring with “Log Shipping” Across Datacenters

123Slide124

Persistent Chat Availability Topologies

SQL Mirroring with “Log Shipping” Across Datacenters

124

Using a Stretched Pool

- Looks like (1)

Pchat

pool within the topology

- Physical machines within the pool span datacenters

- 50% active, and 50% backup capacity to be used in case of a disaster

Persistent Chat Backend

- SQL mirroring for high availability within a datacenter

- SQL Log Shipping across data centers for high availability across

datacenters

Physical topology

-

Site 1

: Skype pool 1, Persistent Chat pool – machines (1-4) all active, database+ mirror + witness (optional)

-

Site 2

: Skype pool 2, Persistent Chat pool – machines (5-8) all idle, Backup database (SQL log shipping target)Slide125

Persistent Chat Availability Topologies

SQL

AlwaysOn

with “Log Shipping” Across Datacenters

125Slide126

Persistent Chat Availability Topologies

SQL AlwaysOn Across Datacenters

126

Using a Stretched Pool

- Looks like (1)

Pchat

pool within the topology

- Physical machines within the pool span datacenters

- 50% active, and 50% backup capacity to be used in case of a disaster

Persistent Chat Backend

- SQL

AlwaysOn

for high availability within a datacenter

- SQL

Log Shipping across data centers for high availability across datacenters

Physical topology

-

Site 1

: Skype pool 1, Persistent Chat pool – machines (1-4) all active, database+ SQL

AlwaysOn

-

Site 2

: Skype pool 2, Persistent Chat pool – machines (5-8) all idle, Backup database (SQL log shipping target)Slide127

Persistent Chat Availability Considerations

Factors affecting implemented HA Design

127

Single Persistent Chat pool stretched across

datacenter sites

that have

Skype for Business

pools paired for

DR

-

Additional

cold standby servers available to take load in case of DR,

(4) active at a time, four standbyVLAN not requiredLifetime of Lync and PChat pool is not coupled – one can fail without the other

- Persistent chat failure does not impact IM/Presence and Voice workloads, and vice-versaGeo-located data centers: Active servers in both data centers at any point - active database in one data center - Requires high bandwidth and a low latency connection - Clients in site one could be talking to a server in site two - Persistent Chat server in site two could be talking to the primary database in site oneGeo-distributed data centers: Active Persistent Chat servers in only one data center, Active database in the same datacenterMirroring with optional witness or SQL

AlwaysOn provides automated HA for databaseSlide128

Infrastructure High Availability

OverviewSlide129

Simple URLs Supported by

GeoDNS

External

The

SfB

client

(located in

APA

region) queries a DNS server for the name

lyncdiscover.contoso.com

.

The DNS server responds with a list of Name Servers (NS) to which this zone has been delegated to respond which in our case is a list of

geoDNS

devices.

A query is made to the first NS called

apa-geodns.contoso.com

for the record lyncdiscover.contoso.com.At this stage the geoDNS device will check the requestor’s IP address and compare it against a global IP address database to determine which host record to return to the client.A health check has been performed by all geoDNS devices to ensure a response is given for a functional HLB.The geoDNS

device returns a response to the original DNS query which contains a healthy HLB VIP and (optionally) the closest VIP to the user’s geographic location.

The

SfB

client connects directly to the HLB VIPSlide130

Simple URLs Supported by

GeoDNS

Internal

The

SfB

client (located in

APA

region) queries a DNS server for the name

lyncdiscoverinternal.contoso.com

.

The DNS server responds with a list of Name Servers (NS) to which this zone has been delegated to respond which in our case is a list of

geoDNS

devices.

A query is made to the first NS called

apa-geodns.contoso.com

for the record lyncdiscoverinternal.contoso.com.At this stage the geoDNS device will check the requestor’s IP address and compare it against a global IP address database to determine which host record to return to the client.A health check has been performed by all geoDNS devices to ensure a response is given for a functional HLB.The

geoDNS

device returns a response to the original DNS query which contains a healthy HLB VIP and (optionally) the closest VIP to the user’s geographic location.

The

SfB

client connects directly to the HLB VIPSlide131

Example: URLs Supported by

GeoDNSSlide132

Example: URLs Supported by Azure Traffic Manager

$

TrafficManagerProfile

=

Get-

AzureTrafficManagerProfile

-Name

"

ContosoProfile

"

 

Add-

AzureTrafficManagerEndpoint -TrafficManagerProfile $TrafficManagerProfile -DomainName "webext-pool01.contoso.com" -Status

"Enabled"

-Type

Any

|

Set-

AzureTrafficManagerProfile

Add-

AzureTrafficManagerEndpoint

-

TrafficManagerProfile

$

TrafficManagerProfile

-

DomainName

"

webext-pool02.contoso.com"

-Status

"Enabled"

-Type

Any

|

Set-

AzureTrafficManagerProfile

 

$

TrafficManagerProfile

|

Set-

AzureTrafficManagerProfile

-

MonitorPort

443

-

MonitorProtocol

Https

-

MonitorRelativePath

"/favicon.ico"