High Availability Introduction FrontEnd High Availability User Experience Server Management SQL BackEnd High Availability High Availability Introduction Understanding Availability What is Availability ID: 673628
Download Presentation The PPT/PDF document "High Availability (HA) Agenda" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
High Availability (HA)Slide2
Agenda
High Availability Introduction
Front-End High Availability
User Experience
Server Management
SQL Back-End High AvailabilitySlide3
High Availability
IntroductionSlide4
Understanding Availability
What is Availability?
By Definition: the level of redundancy applied to a system for ensuring an absolute degree of operational continuity during planned and un-planned outages.
The focus of this feature is to provide service availability – i.e. keep the service up and running
Define the degree of Availability
The secondary objective is to minimize the impact on the user experience in case of failure of any of the Lync assets and infrastructure glitches (AD, DNS, network connectivity, Etc.)Slide5
Determining Availability Requirements
Designs should be based on Business Requirements and Service Level Agreements (SLA)s
What are the business drivers?
What SLAs are in place?
Are these SLAs shared with other teams (network, hardware, Active Directory, SQL)
How do they define “availability”
How are “RPO” and “RTO” defined
Level of Availability has “potential” implications
Cost
Complexity
ManagementSlide6
Availability 9s
Considerations
Availability %
Downtime per year
Downtime per
month (30 days)
Downtime per week
90% ("one nine")
36.5 days
72 hours
16.8 hours
95%
18.25 days
36 hours
8.4 hours
98%
7.30 days
14.4 hours
3.36 hours
99% ("two nines")
3.65 days
7.20 hours
1.68 hours
99.5%
1.83 days
3.60 hours
50.4 minutes
99.8%
17.52 hours
86.23 minutes
20.16 minutes
99.9% ("three nines")
8.76 hours
43.2 minutes
10.1 minutes
99.95%
4.38 hours
21.56 minutes
5.04 minutes
99.99% ("four nines")
52.56 minutes
4.32 minutes
1.01 minutes
99.999% ("five nines")
5.26 minutes
25.9 seconds
6.05 seconds
99.9999% ("six nines")
31.5 seconds
2.59 seconds
0.605 secondsSlide7
HA Capabilities within Skype for Business
Server clustering via HLB and Domain Name Service (DNS) load
balancing
Mechanism built into Skype for Business Server to automatically distribute services and groups of users across various front end servers within an Enterprise Pool
HA:
S
erver
failure
Support for choice of technology: SQL Failover Clustering, SQL Always On, SQL Mirroring
Support auto-failover
(FO)/failback (FB) (
with witness
) and manual
FO/FB
Integrated
with
into the core product tools such as Topology Builder, Skype Server Control Panel and Skype Server Management Shell
HA:
B
ack-end
failureSlide8
Front End High Availability
OverviewSlide9
The Availability Brick Model EvolutionSlide10
Fabric v3 Within Skype for Business
Availability Model - Services
Supports MCU Factory, Conference Directory, Routing Group, LYSS
Fast failover with full service availability
Automatic scaling and load balancing
Failover management (Activate/
DeActivate
API during patching)
Performs Primary/secondary nodes election
Replication between primary and secondary nodes
Availability Model – Users
Users are mapped to Groups
Each group is a persisted
stateful
service with up to 3 replicasUser requests are serviced by primary replicaUser Location RoutingSlide11
Server OS Fabric Considerations
Operating system selection impacts the installed version of Windows Fabric during setup:
Recommended
OS:
Windows Server 2012 R2
Note:
If migrating from Windows 2008 R2, recommend to deploy side-by-side starting with Windows 2012 R2 instead
Latest fixes for Windows Fabric may not be available for older operating systems
Faster replica rebuilds under slow network conditions
Reduce size
of Windows Fabric performance counter logs
Ability to enable remote copying
of fabric logs
Ability
to control the size of Windows Fabric trace files
Better resiliency of fabric servicesBetter handling of certain error conditions
Operating SystemInstalled version of Windows FabricWindows Server 2008 R2Windows Fabric v3Windows Server 2012Windows Fabric v3Windows Server 2012 R2Windows Fabric v3Slide12
Pool Quorum
When Servers detect another Server or Cluster to be down based on their own state, they consult the Arbitrator before committing that decision.
Total Number of Front End Server in the pool (defined in Topology)
Number of Servers that must be
running for pool to be functional
2
1
3-4
Any 2
5-6
Any 3
7
Any 4
8-9
Any 4 of the first 7 servers
10-12Any 5 of the first 9 servers
Voter systemA minimum number of voters are required to prevent service startup failures and provide for pool failover as shown in the following table:Slide13
Example: Pool Quorum – Voter In-Depth
Two Server Pool
Three Server Pool
Six Server Pool
C:\ProgramData\Windows Fabric\FabricHostSettings.xmlSlide14
Pool Startup Fabric Behavior Scenarios
At Cluster Boot up
A Primary member for each Routing Group service is created
The Primary member synchronizes data available in blob store to local database
The elected Secondary member for each routing group will be synchronized with the Primary
When a Frontend restarts
Windows Fabric automatically load balances appropriate services to this Frontend once restart is complete.
Front-end is made idle Secondary for services, subsequently to active Secondary
To manage any service, only (3) nodes are required with synchronized communicationSlide15
Fabric Group Based Routing Scenarios
All users assigned to a group are homed on same Front End (FE)
Groups failover to other registrars within a pool when Primary member fails
Groups are rebalanced dynamically when FEs are added/removed
Routing Groups assigned to a dedicated Replica SetSlide16
Example: Fabric Routing Group AssignmentSlide17
17
Intra-Pool Load Balancing & Replication
Persistent User Data
Synchronous replication to two more FEs (Backup / Replicas)
Presence, Contacts/Groups, User Voice Setting, Conferences
Synchronous writes to the back end for conference state
Transient User Data
Not replicated across Front End servers
Presence changes due to user activity, including:
- Calendar
- Inactivity
- Phone call
Minimal portions of conference data replicated to include:
- Active Conference Roster
- Active Conference MCUs
Limited usage of “Shared” Storage Blob
Data rehydration of client endpointsDisaster recoverySlide18
Replica Set Behavior
Three replicas – 1 Primary, 2 Secondaries (quorum)
If one replica goes down another one takes over as the primary
For 15-30 minutes fabric will not attempt to build another replica*
18
*User Count impactsSlide19
Replica Set Behavior
Three replicas – 1 Primary, 2 Secondaries (quorum)
If one replica goes down another one takes over as the primary
For 15-30 minutes fabric will not attempt to build another replica*
19
*User Count impactsSlide20
Replica Set Behavior
Three replicas – 1 Primary, 2 Secondaries (quorum)
If one replica goes down another one takes over as the primary
For 15-30 minutes fabric will not attempt to build another replica*
20
*User Count impactsSlide21
Replica Set Behavior
Three replicas – 1 Primary, 2 Secondaries (quorum)
If one replica goes down another one takes over as the primary
For 15-30 minutes fabric will not attempt to build another replica*
If during this time one of the two replicas left goes down the replica set is in quorum loss
Fabric will wait indefinitely for the two replicas to come up again
21
*User Count impactsSlide22
Replica Set Stateful Service Failover
22
OS
OS
OS
OS
OS
Node
1
Node
4
Node
2
Node
3
Node
5Stateful Service(Primary)
Stateful Service
(Secondary)
Stateful Service
(Secondary)
Stateful Service
(Primary)
Stateful Service
(Secondary)
ReplicationSlide23
Survivable Branch Routing Group Scenarios
What about SBA/SBS-homed users?
SBA/SBS will have a pool defined for User Services
This pool will contain the Routing Groups for the users assigned to the SBS/SBA
One pool can service multiple SBA/SBS
Each SBS/SBA gets it’s own unique Routing Group
All users homed on SBS/SBA are in the same RG
This can include up to 5000 users based on current sizing guidelines
This Routing Group will have up to 3 copies, like any other Routing Group
Note: Since (1) SBA can be associated to (1) pool, for large environments, SBAs should be staggered across the pools they are associated to provide the highest level of availability possible Slide24
Survivable Branch Routing Group Scenarios
Let’s check out some SBS users…Slide25
Survivable Branch Routing Group ScenariosSlide26
Survivable Branch Routing Group Scenarios
Let’s add a new SBS to the topology….first we’ll check the Routing Group distribution
Now…after publishing the new SBA, let’s look again….Slide27
After creating users on the new SBS, let’s check the routing group ID
Survivable Branch Routing Group Scenarios
Look familiar?Slide28
High Availability User Experience
Primary Copy OfflineSlide29
Example: User Experience
Now, stop services on POOLA2……Slide30
Example: User Experience
Notice that one of the secondary copies was promoted to primary
Server restoredSlide31
Example: User ExperienceSlide32
Example: User Experience
Amy’s client logs show her client trying to REGISTER, 301 to POOLA3 (up)Slide33
Example: User Experience
But what about a 2-FE pool? Is it different because we don’t have 3 copies?
Nope…still works fine.Slide34
High Availability User Experience
All Copies OfflineSlide35
Example: User Experience
Now, stop VMs
POOLA4,
POOLA5, POOLA2…..Slide36
Example: User Experience
Amy’s Routing Group is in Quorum Loss (No Primaries)Slide37
Example: User Experience
HOW DO I GET OUT OF THIS?!?!?!
Perform a QuorumLossRecovery on the affected pool.Slide38
Example: User ExperienceSlide39
High Availability Server Management
PatchingSlide40
Server Grouping – Upgrade Domains
Logical grouping of servers on which software maintenance such as upgrades, and security updates are performed at the same time
You cannot lose more than one Upgrade Domain at a time
Loss of multiple Upgrade Domains = quorum lossSlide41
Upgrade domains and service placements
P
Node 3
Node 2
Node 4
Node 5
Node 6
Node 1
S
S
P
S
S
S
S
P
S
S
P
S
S
P
UD:/UpgradeDomain1
UD:/UpgradeDomain2
UD:/UpgradeDomain3Slide42
Upgrade Domains
Related to number of FEs in pool at creation time (TB Logic)
How can I tell?
Get-
CsPoolUpgradeReadinessState
| Select-Object –
ExpandProperty
UpgradeDomains
What if I add more FEs to the pool?
Depending on initial creation state, more UD may be created, or more servers placed into existing UDs
Initial Pool Size
Number of Upgrade Domains
Front
End Placement per Upgrade Domain
12
8First 8 FEs into 4 UD with 2 each, then 4 UD with 1 each88Each FE placed into its own UD98First 2 FEs into one UD, then 7 UD with 1 each55Each FE placed into its own UDSlide43
Example: Topology Builder Upgrade Domain
Within this example we see:
(1) Upgrade domain for a Standard Edition Pool
(1) Upgrade domain for the Monitoring Server RoleSlide44
Cmdlets
Get-
CsUserPoolInfo
-Identity <user>
Primary pool/FEs, secondary pool/FEs, routing groupSlide45
More Cmdlets
Get-
CsPoolFabricState
Detailed information about all the fabric services running in a pool
Get-
CsPoolUpgradeReadinessState
Returns information indicating whether or not your Lync Registrar pools are ready to be upgraded/patched
Reset-
CsRoutingGroup
Administrators can reset Windows Fabric routing groups that are missing or are otherwise not working correctly. Missing routing groups can be identified by using the Get-
CsPoolFabricState
cmdlet and the
FilterOnMissingReplicas
parameter.
Skip-CsRoutingGroupAtServiceStartupSlide46
Resetting the Pool
Reset-
CsPoolRegistrarState
FullReset
– cluster changes 1->Any, 2->Any, Any->2, Any->1, Upgrade Domain changes
QuorumLossRecovery
– force fabric to rebuild services that lost quorum
ServiceReset
– voter change (default if no
ResetType
specified)
MachineStateRemoved
– removes the specified server from the poolSlide47
Windows Fabric v3 Changes
Better load balancing to consider replica movement costs
Performance improvements, bug fixes (including issues reported by Skype), and improved debug-ability
Lease level performance improvements—lease expiration in cluster manifest honored
Support for Deactivate API
Slowly drain replicas out instead of a sudden spike resulting in heavy load on secondary
Prevents Windows Fabric from placing replicas on a FEs that are shutting down
Safe upgrade checks
Official support for Server 2012 R2Slide48
Lync 2013
Skype for Business
Skype for Business Patching Process Evolution
From (8) steps
to (4)!!! Slide49
Skype for Business Server patching
Simplified workflow leverages Windows Fabric
v2/v3
APIs
4 steps:
Invoke-
CsComputerFailOver
to
failover
(stop) a front end;
take
the FE out of rotation, move the replicas out
Perform patching/upgrade
Invoke-
CsComputerFailBack to failback (start) a front end; bring FE into active state, move replicas inRepeat for all front ends in the poolSlide50
Invoke-CsComputerFailOver
Checks for availability of sufficient number of servers
Waits for replica stability across the pool
- Confirm all replicas exists before taking server down
Initiates deactivation of the node; wait for success/failure from windows fabric
Stops services after successfully deactivating the nodeSlide51
Invoke-CsComputerFailBack
Starts services if not already started
Activates node— by doing so, Windows Fabric will now consider this server for replica placementSlide52
How is this better?
Simpler workflow, fewer commands, less errors
Faster: 2–3 hrs. for 12 FE pool (down from 8–12 hrs.)
More reliable
Checks for readiness across the pool within the cmdlet before failover
Leverages windows fabric v2/v3 deactivate/activate node APIs ensuring more dependable operation (moving replicas in/out, not move replicas into node going down)
Since scope is always one frontend, avoids situations where multiple front ends could be down within a pool (reason servers don’t start)
Will not allow fail over of a server if there are existing replica issues in the pool
Enforces min server requirements implicitly (if other servers are down)
Progress indicatorsSlide53
How is this better?
Invoke-CsComputerFailOver progress indicator
Invoke-CsComputerFailBack progress indicatorSlide54
Additional Patching Process Notes
Prerequisite: Skype for Business Server, Fabric version 2.0 CU3+
Don’t execute on more than one server at a time in a pool (it might block)
Invoke-CsComputerFailOver requires RTCSRV service to be running
Invoke-CsComputerFailBack will start RTCSRV service
Stopping services outside of this cmdlet out-of-scopeSlide55
High Availability Server Management
Pool
Cold
StartSlide56
Pool Cold start scenarios
Lync 2013 to Skype for Business in-place upgrade
Adding a new pool
Pool fail back starts a pool if it was offline
Miscellaneous cases where administrator decides to
take down the entire pool for a maintenance activity
(not recommended in 2013)Slide57
Previous Lync 2013/Pool Cold Start Problems
Typically, all the servers need to be started for
one server to be up
Confusing minimum number of servers requirements
Starting a subset of a pool is not straightforward since
it involves running Routing Group quorum loss recovery
Incomplete information on why a server cannot be started
No automatic recovery actions initiated for failuresSlide58
Skype for Business Server Pool Start
Start the servers within a pool with a single command with easy to follow instructions
Checks pre-requisites for all pool servers
For problems encountered that might cause issues during pool/server cold start, alerts and diagnosis with required resolution steps
In some cases, allows pool to start even if some of the routing group replicas are stuckSlide59
Example:
Starting a pool with a single Cmdlet
Start-
CsPool
Prerequisite checks (all servers Skype for Business Server, Windows Fabric v2+)
Attempts to start all servers in the pool
If problems starting any server; perform extended diagnosis; alert
If problem on Front End cannot be fixed, run
Start-CsPool with exclusion list
Fail if min server requirements cannot be met due to exclusion list Does this operation require quorum loss recovery If no data loss, perform implicit quorum loss recovery If there will be data loss Seek admin approval with data loss information (or) Configure option to skip specific routing group replicas and proceed with start
Start all servers if no issuesSlide60
SQL Back End High Availability
OverviewSlide61
SQL Mirroring Backend HA DiagramSlide62
SQL Mirroring File Share
What is it?
Temporary location used during setup
BAK files written here.
Primary SQL needs R/W, Mirror R/O
Where should it go?
Any file server, with proper permissions for SQL Service access
Do NOT use DFS! .BAK files are excluded from replication by default
Do not use the Lync Pool File Share
This is a one-time use shareSlide63
SQL Mirroring Witness as SQL Express
SQL Express fully supported as a witness
Remember to enable TCP/IP
Start SQL Browser Service (if using dynamic ports)
Open necessary firewall ports as requiredSlide64
SQL AlwaysOn
OverviewSlide65
SQL Server AlwaysOn Overview
SQL Server AlwaysOn Features
Next generation of database mirroring technologies
Provides high availability and disaster recovery in SQL
Introduced in SQL Server 2012 and present in SQL Server 2014
Runs on top of WSFC (Windows Server Failover Clustering)Slide66
SQL AlwaysOn Advantages
Latest and greatest SQL HA solution
Although database mirroring is still available in its original feature set, it is now considered a deprecated feature and will be removed in a future release of SQL Server
More reliable
SQL
AlwaysOn
(one primary, can have up to three corresponding secondary replicas)
Mirroring (one primary, one mirror)
Multi-database failovers
Useful in applications with several databases
Databases can be added to an Availability Group that can be failed over between replicas
All databases in Availability Group are failed over at the same timeSlide67
SQL AlwaysOn Availability Groups
Provides high availability by maximizing availability of databases for an enterprise
An Availability Group is a set of databases
that are failed over together at the same time
Supports one primary replica and up to two secondary replicas in synchronous-commit mode
Availability Group listener responds/redirects incoming client requests to replicas
Each Availability Group replica has local SQL instance and local copy of databasesSlide68
SQL AlwaysOn Failover Cluster Instance (FCI)
Provides high availability through redundancy at the server-instance level
One SQL instance is installed across all failover clustering nodes
Single copy of databases are located on shared disk storage which is failed over between nodesSlide69
SQL AlwaysOn Deployment Scenarios
New install scenario
- Install new backend using SQL Enterprise 2012 or SQL Enterprise 2014
- Add new Skype for Business Server pool with AlwaysOn back-end to topology
- Install Skype for Business Server and add databases to Availability Group
Upgrade scenario
- Upgrade an existing Lync Server 2013 pool to Skype for Business Server
- Upgrade back-end server to SQL Enterprise 2012 or SQL Enterprise 2014
- Enable SQL AlwaysOn for Skype for Business Server databasesSlide70
SQL AlwaysOn deployment options
Deploying
AlwaysOn
for new pools
Creating a new SQL
AlwaysOn
Availability Group for a new pool
Deploying
AlwaysOn
for existing pools
Moving from SQL standalone backend to
AlwaysOn
Availability Groups
Moving from SQL mirrored backend to
AlwaysOn Availability GroupsSlide71
SQL Version Requirements
Standalone Server
- Standard or Enterprise Edition
AlwaysOn Failover Clustering Instance (FCI)
- Standard or Enterprise Edition (two nodes)
- Enterprise Edition (three or more nodes)
Mirroring
- Standard or Enterprise Edition
AlwaysOn Availability Groups
- Enterprise Edition requiredSlide72
SQL AlwaysOn
S
upport
I
nformation
Supported with Skype for Business Server
Note
:
Availability groups are not supported with Lync Server 2010 or 2013
Lync/Skype Version
Standalone
Failover Clustering
Mirroring
Availability Groups
Lync Server 2010
SQL 2008 R2 SP2SQL 2008 R2 SP2Not supportedNot supportedLync Server 2013
SQL 2008 R2 SP2
SQL 2012 SP1
SQL 2008 R2 SP2
SQL 2012 SP1
SQL 2008 R2 SP2
SQL 2012 SP1
Not supported
Skype for
Business Server 2015
SQL 2008 R2 SP2
SQL 2012 SP1
SQL 2014
SQL 2008 R2 SP2
SQL 2012 SP1
SQL 2014
SQL 2008 R2 SP2
SQL 2012 SP1
SQL 2014
SQL 2012 SP1
SQL 2014Slide73
Supported SQL Availability Group settings
Supported configurations* for Skype for Business Server
Support having replicas only in the same subnet
Support only the Synchronous-Commit Mode
Support the Automatic Failover Mode
No support for read access on secondary replicas
No support for having an off-site replica in Azure
Note:
Other
configurations are possible and not actively blocked, but not supportedSlide74
SQL AlwaysOn
DeploymentSlide75
Windows Server Failover Clustering Requirements
Windows Server 2008 R2 SP1 or higher
WSFC feature installed, with sufficient nodes for desired configuration
Select the File Share Witness option for the quorum witness
Cluster nodes cannot be Active Directory domain controllers
Cluster nodes must be from the same Active Directory domain
Cluster nodes must be connected to the same network subnetSlide76
SQL Server and Database Requirements
SQL Server 2012 SP1/2014 Enterprise Edition or higher
SQL installation steps are different depending on HA option selected
SQL
AlwaysOn
must be manually enabled on SQL service and restarted
Full recovery model required for each Availability Group database
Full backup required for each database added to Availability Group
Database folder structure must be duplicated across all AG replicas
High availability option
Installation selection
Availability Groups (AG)
New SQL Server stand-alone installation (all replicas)
Failover Cluster Instance (FCI)
New SQL Server failover cluster installation (first node)
Add node to a SQL Server failover cluster (additional nodes)Slide77
Creating a new AlwaysOn Availability Group
Creating a new Availability Group for a new pool can be somewhat confusing
Creating a new SQL back-end Availability Group requires at least one database
Databases for a new pool cannot be created until a SQL backend is availableSlide78
Creating a new SQL AlwaysOn
Availability Group
Step 1: Add new SQL Store using the FQDN of the Availability Group Listener
- In Topology Builder, select the option New Front End Pool
-
When
prompted to define the SQL Server store, click New
-
Add
the FQDN of the Availability Group Listener as the SQL Server FQDN
-
Select High Availability Settings, and choose SQL AlwaysOn Availability Groups - Add the FQDN of the SQL primary replica as the FQDN for the SQL Server AlwaysOn Instance - Complete the configuration of the new pool and publish the topologyStep 2: Enable AlwaysOn Availability GroupsStep 3: Create a new AlwaysOn Availability Group for the back-end databasesStep 4: Update the settings for the SQL Store and publish the topologySlide79
Creating a new AlwaysOn
SQL Store
Availability Group Listener FQDN
Primary Replica FQDN
Note:
Databases will be created on the Primary Replica when the topology is publishedSlide80
Creating a new AlwaysOn
Availability Group
Step 1: Add new SQL Store using the FQDN of the Availability Group Listener
Step 2: Enable
AlwaysOn
Availability Groups
- Add the Windows Server Failover Clustering (WSFC) feature on each replica server
-
Validate
the cluster configuration
-
Create a new Windows Failover Cluster - Configure cluster quorum settings - Enable AlwaysOn Availability GroupsStep 3: Create a new AlwaysOn Availability Group for the back-end databasesStep 4: Update the settings for the SQL Store and publish the topologySlide81
Installing the Windows Server Failover Clustering Feature
Windows Failover Clustering
Built in Windows Server Feature
Installed through “Add Roles and Feature
Windows Server 2008 R2 SP1 or Higher
Important
Cluster nodes cannot be Active Directory domain controllers
Cluster nodes must be from the same Active Directory domain
Cluster nodes must be connected to the same network subnetSlide82
Installing the Windows Server Failover Clustering Feature
Windows Failover Clustering
Built in Windows Server Feature
Installed through “Add Roles and Feature
Windows Server 2008 R2 SP1 or Higher
Important
Cluster nodes cannot be Active Directory domain controllers
Cluster
nodes must be from the same Active Directory domain
Cluster
nodes must be connected to the same network subnetSlide83
Validating the Cluster Configuration
Adding cluster nodes
Add the servers that will be part of the cluster
All new nodes will be verified and testedSlide84
Validating the Cluster Configuration
Node validation
All new nodes will be verified and tested
Address and fix all errors and warningsSlide85
Validating the Cluster Configuration
Successful validation
Select “Create the Cluster now using the validated nodes”
A new Windows Failover cluster will be
ceated
using all configured nodesSlide86
Creating the Cluster
Creating the cluster
Define an administrative cluster name
For cluster administration only
Not used by Skype for Business
Automated DNS registration for IP Address Slide87
Creating the Cluster
Creating the cluster
Uncheck “Add all eligible storage to the cluster”
“Always On” does not require / utilize shared storageSlide88
Creating the Cluster
Creating the cluster
A cluster consisting of the
the
configured nodes is createdSlide89
Configuring Quorum Settings
Quorum Witness
A two node cluster requires a witness for a majority vote
The quorum should not be defined on a nodeSlide90
Configuring Quorum Settings
Quorum Witness
A two node cluster requires a witness for a majority vote
The quorum should not be defined on a node
Skype for Business uses a
File Share WitnessSlide91
Configuring Quorum Settings
Quorum Witness
A two node cluster requires a witness for a majority vote
The quorum should not be defined on a node
Skype for Business uses a
File Share Witness
The File Share should be accessible from both nodesSlide92
Configuring Quorum Settings
Quorum Witness
A two node cluster requires a witness for a majority vote
The quorum should not be defined on a node
Skype for Business uses a
File Share Witness
The File Share should be accessible from both nodes
The File Share Witness is configuredSlide93
Enabling SQL AlwaysOn
Configure the SQL instance
AlwaysOn
has to be enabled manually
If not enabled, an availability group cannot be createdSlide94
Enabling SQL AlwaysOn
Configure the SQL instance
AlwaysOn
has to be enabled manually
If not enabled, a availability group cannot be created
Check “Enable
AlwaysOn
Availability Groups” on “
AlwaysOn
High Availability”Slide95
Enabling SQL AlwaysOn
Configure the SQL instance
A restart of the SQL instance is requiredSlide96
Creating a new AlwaysOn
Availability Group
Step 1: Add new SQL Store using the FQDN of the Availability Group Listener
Step 2: Enable
AlwaysOn
Availability Groups
Step 3: Create a new
AlwaysOn
Availability Group for the back-end databases
- Set the recovery model for each database to Full
-
Perform
a SQL backup of each database
- Duplicate the database folder structure on each replica server - Create the new Availability Group and add the back-end databasesStep 4: Update the settings for the SQL Store and publish the topologySlide97
Creating a new Availability GroupSlide98
Creating a new Availability GroupSlide99
Creating a new Availability GroupSlide100
Creating a New Availability GroupSlide101
Creating a New Availability GroupSlide102
Creating a New Availability GroupSlide103
Creating a New Availability GroupSlide104
Creating a new AlwaysOn
Availability Group
Step 1: Add new SQL Store using the FQDN of the Availability Group Listener
Step 2: Enable
AlwaysOn
Availability Groups
Step 3: Create a new
AlwaysOn
Availability Group for the back-end databases
Step 4: Update the settings for the SQL Store and publish the topology
- In Topology Builder, open the properties of the Availability Group SQL Store
- Under
High Availability Settings, change the FQDN for the SQL Server
AlwaysOn instance value - to the FQDN of the Availability Group ListenerPublish the topologySlide105
Modifying the AlwaysOn
SQL Store
Availability Group Listener FQDN
Availability Group Listener FQDNSlide106
Questions?Slide107
SponsorsSlide108
Jabra
– Evolving the Lync Communication Experience
145 years of communication experience
Microsoft Gold Partner since 2007
Partner of the year Finalist 2013
Over 40 Lync certified devices
Exclusive USB/3.5M controller on
EVOLVE 40 and 80 Models
Over 4 million Lync Devices sold!
Our History
Jabra Programs for Skype for Business
Jabra Partner Program
Pilot, POC and Deployment Offers
Device Deployment and
Management Tools
Partner Demo and Seed Programs www.Jabra.com/microsoft Contact : Bill Orlansky
borlansky@Jabra.comSlide109
Voyager Focus UC
Works seamlessly across Bluetooth-enabled desk phones, laptops, mobile phones, tablets, and smart watches
Smart sensors answer calls by simply putting on the headset, mute by taking the headset off and pause music for incoming calls
Enhanced voice alerts announce caller ID, mute and connection status, talk time level
Dynamic Mute Alert feature senses and alerts you when you try to talk
when muted
Stereo Bluetooth headset
with active noise canceling (ANC)Slide110
Sennheiser
Sennheiser headsets and speakerphones provide excellent sound quality, wearing comfort and hearing protection for contact centers, offices and UC professionals
Sennheiser is an ideal companion for Skype for Business
www.sennheiser.com/cco
Slide111Slide112
AppendixSlide113
Additional Skype for Business Role High Availability
OverviewSlide114
Additional Role Availability Considerations
Director
Mediation Server
Edge Server
By deploying multiple servers within a pool, capacity and redundancy provided
Load Balancing via DNS LB or HWLB
114Slide115
Video Interop (VIS) Role Availability Considerations
3rd Party Video Gateway Pool Failover Support
One or more VIS pools can support trunks to one or more Video Gateways
The video gateway can fall back to a backup VIS pool if the preferred is unavailable
A combination of one or more methods can be used in the same topology
VIS Server (Front End) Failover Support
VIS able to leverage the concept of an “Associated Backup Pool” defined on a Front End Pool
After 5 minutes of consecutive failures VIS moves to the backup pool
Inbound calls relayed by VIS to a failed primary pool will ring for 10 seconds before trying the backup
VIS also adheres to manual failover commands in the event of Front End pool maintenance
115Slide116
Office Web App High Availability
OverviewSlide117
Office Web App (OWA) Availability Topologies
Stretched OWA Farm Across Datacenters
117Slide118
Office Web App (OWA) Availability Topologies
Dedicated OWA Farm within Single Datacenter
118Slide119
Office Web App (OWA) Availability Considerations
Stretched OWA Farm Across Datacenters
We have not tested stretch farm scenarios
There is a great deal of communication between Office Web Apps servers within a farm
We will not attempt to optimize this communication for stretch farms
There is no reason that a stretch farm can’t work
There is a consistent intra-farm latency of <1ms one way, 99.9% of the time over a period of ten minutes. (Intra-farm latency is commonly defined as the latency between the front-end web servers and the database servers.)
The bandwidth speed must be at least 1 gigabit per second.
Our statement is
: “…We will support customers who set up stretch farms; however, if an issue turns out to be the result of how the stretch farm is configured, we are not in a position to deliver a change to Office Web Apps Server to support that configuration. In cases where a stretch farm is on a very low latency network (say, fiber within the same city) and the network is configured such that all the machines in the farm appear to each other as machines in the same subnet, there is nothing about our architecture that will fail…”
119Slide120
Office Web App (OWA) Availability Considerations
Dedicated
OWA
Farm within Single Datacenter
Our
statement
is
: “…
Stick
to
(1)
D
ata
center
. Servers within
an Office Web Apps Server farm must be within the same data center. Don’t distribute them geographically. Generally you need only (1) farm, unless you have security needs that require an isolated network that has its own Office Web Apps Server farm.120
“Best Practice”RecommendationSlide121
Persistent Chat (pChat
)
High Availability
OverviewSlide122
Persistent Chat Availability Topologies
HA for pChat within a Single Datacenter
122
Persistent
Chat Front
Ends:
Pool
with up to
(4)
active Persistent Chat Front End Servers
Persistent
Chat back
end:
SQL mirroring, with optional witness provides automated failover
Distributed File System Replication (DFSR) for file share
replicationCompliance database uses same mechanism as Persistent Chat content databaseSlide123
Persistent Chat Availability Topologies
SQL Mirroring with “Log Shipping” Across Datacenters
123Slide124
Persistent Chat Availability Topologies
SQL Mirroring with “Log Shipping” Across Datacenters
124
Using a Stretched Pool
- Looks like (1)
Pchat
pool within the topology
- Physical machines within the pool span datacenters
- 50% active, and 50% backup capacity to be used in case of a disaster
Persistent Chat Backend
- SQL mirroring for high availability within a datacenter
- SQL Log Shipping across data centers for high availability across
datacenters
Physical topology
-
Site 1
: Skype pool 1, Persistent Chat pool – machines (1-4) all active, database+ mirror + witness (optional)
-
Site 2
: Skype pool 2, Persistent Chat pool – machines (5-8) all idle, Backup database (SQL log shipping target)Slide125
Persistent Chat Availability Topologies
SQL
AlwaysOn
with “Log Shipping” Across Datacenters
125Slide126
Persistent Chat Availability Topologies
SQL AlwaysOn Across Datacenters
126
Using a Stretched Pool
- Looks like (1)
Pchat
pool within the topology
- Physical machines within the pool span datacenters
- 50% active, and 50% backup capacity to be used in case of a disaster
Persistent Chat Backend
- SQL
AlwaysOn
for high availability within a datacenter
- SQL
Log Shipping across data centers for high availability across datacenters
Physical topology
-
Site 1
: Skype pool 1, Persistent Chat pool – machines (1-4) all active, database+ SQL
AlwaysOn
-
Site 2
: Skype pool 2, Persistent Chat pool – machines (5-8) all idle, Backup database (SQL log shipping target)Slide127
Persistent Chat Availability Considerations
Factors affecting implemented HA Design
127
Single Persistent Chat pool stretched across
datacenter sites
that have
Skype for Business
pools paired for
DR
-
Additional
cold standby servers available to take load in case of DR,
(4) active at a time, four standbyVLAN not requiredLifetime of Lync and PChat pool is not coupled – one can fail without the other
- Persistent chat failure does not impact IM/Presence and Voice workloads, and vice-versaGeo-located data centers: Active servers in both data centers at any point - active database in one data center - Requires high bandwidth and a low latency connection - Clients in site one could be talking to a server in site two - Persistent Chat server in site two could be talking to the primary database in site oneGeo-distributed data centers: Active Persistent Chat servers in only one data center, Active database in the same datacenterMirroring with optional witness or SQL
AlwaysOn provides automated HA for databaseSlide128
Infrastructure High Availability
OverviewSlide129
Simple URLs Supported by
GeoDNS
External
The
SfB
client
(located in
APA
region) queries a DNS server for the name
lyncdiscover.contoso.com
.
The DNS server responds with a list of Name Servers (NS) to which this zone has been delegated to respond which in our case is a list of
geoDNS
devices.
A query is made to the first NS called
apa-geodns.contoso.com
for the record lyncdiscover.contoso.com.At this stage the geoDNS device will check the requestor’s IP address and compare it against a global IP address database to determine which host record to return to the client.A health check has been performed by all geoDNS devices to ensure a response is given for a functional HLB.The geoDNS
device returns a response to the original DNS query which contains a healthy HLB VIP and (optionally) the closest VIP to the user’s geographic location.
The
SfB
client connects directly to the HLB VIPSlide130
Simple URLs Supported by
GeoDNS
Internal
The
SfB
client (located in
APA
region) queries a DNS server for the name
lyncdiscoverinternal.contoso.com
.
The DNS server responds with a list of Name Servers (NS) to which this zone has been delegated to respond which in our case is a list of
geoDNS
devices.
A query is made to the first NS called
apa-geodns.contoso.com
for the record lyncdiscoverinternal.contoso.com.At this stage the geoDNS device will check the requestor’s IP address and compare it against a global IP address database to determine which host record to return to the client.A health check has been performed by all geoDNS devices to ensure a response is given for a functional HLB.The
geoDNS
device returns a response to the original DNS query which contains a healthy HLB VIP and (optionally) the closest VIP to the user’s geographic location.
The
SfB
client connects directly to the HLB VIPSlide131
Example: URLs Supported by
GeoDNSSlide132
Example: URLs Supported by Azure Traffic Manager
$
TrafficManagerProfile
=
Get-
AzureTrafficManagerProfile
-Name
"
ContosoProfile
"
Add-
AzureTrafficManagerEndpoint -TrafficManagerProfile $TrafficManagerProfile -DomainName "webext-pool01.contoso.com" -Status
"Enabled"
-Type
Any
|
Set-
AzureTrafficManagerProfile
Add-
AzureTrafficManagerEndpoint
-
TrafficManagerProfile
$
TrafficManagerProfile
-
DomainName
"
webext-pool02.contoso.com"
-Status
"Enabled"
-Type
Any
|
Set-
AzureTrafficManagerProfile
$
TrafficManagerProfile
|
Set-
AzureTrafficManagerProfile
-
MonitorPort
443
-
MonitorProtocol
Https
-
MonitorRelativePath
"/favicon.ico"