Scott Schnoll Principal Technical Writer Microsoft Corporation EXL401 Agenda Recent Behavior Changes SP2 UR3 Database Availability Group Networks Active Manager Best Copy Selection Datacenter Activation Coordination Mode ID: 269713
Download Presentation The PPT/PDF document "Exchange Server 2010 SP2 High Availabili..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Exchange Server 2010 SP2 High Availability Deep Dive
Scott SchnollPrincipal Technical WriterMicrosoft Corporation
EXL401Slide2
Agenda
Recent Behavior Changes – SP2 + UR3Database Availability Group NetworksActive Manager
Best Copy SelectionDatacenter Activation Coordination ModeSlide3
Deep Dive
Code changes in Service Pack 2 (SP2) and Update Rollup 3 for SP2
Recent Behavior ChangesSlide4
OWA Cross-Site Silent Redirection with SSO
Introduced in Service Pack 2If you access OWA via CAS in the ‘wrong’ AD site, CAS has a decision to make: it can proxy
or redirect to the target siteIf there is no ExternalURL in that site, we proxy, the mailbox opens and the user gets accessIf the target site has an ExternalURL the user gets a page with a link to clickThe user clicks the link, and logs in again, and gets accessThe user has to log in twiceWe removed the need to click the
link, which in some scenarios, results
in a Single Sign On
experienceSlide5
OWA Cross-Site Silent Redirection with SSO
Documentationhttp://aka.ms/ljimzw
http://aka.ms/wjysstVideohttp://aka.ms/qjtvmqSlide6
Changes to Set-DatabaseAvailabilityGroup
Introduced in Update Rollup 3 for Exchange 2010 SP2Enables use
of AllowCrossSiteRPCClientAccess property of the DAGTrue – When database *overs from DC1 to DC2, Outlook will continue to use the CAS array in DC1 as the RPC endpointFalse (default) – When a database *overs from DC1 to DC2, the Outlook profile will be updated to use the CAS array in DC2 as the RPC endpointThis will require a restart of the Outlook clientAlternateWitnessServer and AlternateWitnessDirectory can now be set to $nullSlide7
Changes to Set-DatabaseAvailabilityGroup
Documentationhttp://
aka.ms/qjtvmqhttp://technet.microsoft.com/en-us/library/dd297934.aspx**To be published very soon!Slide8
Changes to Active Manager
Introduced in Update Rollup 3 for Exchange 2010 SP2Mailbox database move using dial tone portability can take an extremely long time to fail if the source or destination mailbox server is down
Get-Mailbox -Database DB1 | Set-Mailbox -Database DTDB1RPC attempts to purge remote mailbox objects, but it can’t when one side is down, and it will eventually timeoutCould take 40-60 second per mailbox!We no longer need to wait that long, as we now handle the down server in a different mannerSlide9
High Availability Concepts
http://aka.ms/ExHAConceptsSlide10
Deep Dive
Database Availability Group NetworksSlide11
Database Availability Group Networks
A DAG network is a collection of one or more subnetsTwo types of DAG networks
MAPI Network - connects DAG members to Active Directory, other Exchange servers, DNS, etc.; also used by content indexingRegistered in DNS / DNS configuredUses default gatewayClient for Microsoft Networks/File and Print Sharing enabledReplication Network - used for continuous replicationNot registered in DNS / DNS not configuredDoes not use a default gatewayClient for Microsoft Networks/File and Print Sharing disabledSlide12
Database Availability Group Networks
All DAGs must have:Exactly one MAPI networkZero or more Replication networks
Separate network(s) on separate subnet(s)LRU determines which network is used in multiple replication network environmentAutomatically created when server is added to DAGBased on cluster’s enumeration of networksCluster enumeration based on subnetOne cluster network is created for each subnetSlide13
Database Availability Group Networks
Server / Network
IP Address / Subnet BitsDefault Gateway
EX1
– MAPI
192.168.0.15/24
192.168.0.1
EX1 – REPLICATION
10.0.0.15/24
N / A
EX2 – MAPI
192.168.0.16/24
192.168.0.1
EX2 –
REPLICATION
10.0.0.16/24
N / A
Name
Subnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
EX2 (192.168.0.16)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
EX2 (10.0.0.16)
False
TrueSlide14
Database Availability Group Networks
Name
Subnet(s)Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
False
True
DAGNetwork03
192.168.1.0/24
EX2 (192.168.1.15)
True
True
DAGNetwork04
10.0.1.0/24
EX2 (10.0.1.15)
False
True
Server / Network
IP Address / Subnet
Bits
Default Gateway
EX1
– MAPI
192.168.0
.15/24
192.168.0.1
EX1 – REPLICATION
10.0.0
.15/24
N / A
EX2 – MAPI
192.168.1
.15/24
192.168.1.1
EX2 –
REPLICATION
10.0.1
.15/24
N / ASlide15
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
NameSubnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
False
True
DAGNetwork03
192.168.1.0/24
EX2 (192.168.1.15)
True
True
DAGNetwork04
10.0.1.0/24
EX2 (10.0.1.15)
False
True
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -
ReplicationEnabled
:$false
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0
Remove-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork03
Remove-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork04Slide16
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
NameSubnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
False
True
DAGNetwork03
192.168.1.0/24
EX2 (192.168.1.15)
True
True
DAGNetwork04
10.0.1.0/24
EX2 (10.0.1.15)
False
True
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -
ReplicationEnabled
:$
falseSlide17
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
NameSubnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
192.168.1.0/24
EX1 (192.168.0.15)
EX2 (192.168.1.15)
True
False
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
False
True
DAGNetwork03
DAGNetwork04
10.0.1.0/24
EX2 (10.0.1.15)
False
True
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -
ReplicationEnabled
:$
falseSlide18
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
NameSubnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
192.168.1.0/24
EX1 (192.168.0.15)
EX2 (192.168.1.15)
True
False
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
False
True
DAGNetwork03
DAGNetwork04
10.0.1.0/24
EX2 (10.0.1.15)
False
True
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -
ReplicationEnabled
:$false
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork02 -Subnets
10.0.0.0,10.0.1.0Slide19
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork02 -Subnets
10.0.0.0,10.0.1.0
Name
Subnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
192.168.1.0/24
EX1 (192.168.0.15)
EX2 (192.168.1.15)
True
False
DAGNetwork02
10.0.0.0/24
10.0.1.0/24
EX1 (10.0.0.15)
EX2 (10.0.1.15)
False
True
DAGNetwork03
DAGNetwork04Slide20
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0
Remove-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork03
Remove-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork04
Name
Subnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
192.168.1.0/24
EX1 (192.168.0.15)
EX2 (192.168.1.15)
True
False
DAGNetwork02
10.0.0.0/24
10.0.1.0/24
EX1 (10.0.0.15)
EX2 (10.0.1.15)
False
True
DAGNetwork03
DAGNetwork04Slide21
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
NameSubnet(s)
Interface(s)
MAPI Access Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
192.168.1.0/24
EX1 (192.168.0.15)
EX2 (192.168.1.15)
True
False
DAGNetwork02
10.0.0.0/24
10.0.1.0/24
EX1 (10.0.0.15)
EX2 (10.0.1.15)
False
True
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -
ReplicationEnabled
:$false
Set-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0
Remove-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork03
Remove-
DatabaseAvailabilityGroupNetwork
-Identity DAG2\DAGNetwork04Slide22
Database Availability Group Networks
When using a single NICIt is both the MAPI and the Replication networkEnableReplication is $TrueWhen using multiple NICs
One NIC is the MAPI network (typically)EnableReplication is $FalseOther NIC(s) are Replication network(s)Replication uses LRU to pick network to useIf Replication networks are unavailable, MAPI network is usedSlide23
Database Availability Group Networks
DAG/cluster should ignore iSCSI or dedicated backup networks
Set-DatabaseAvailabilityGroupNetwork-Identity <DAG Network Name>-ReplicationEnabled:$false -IgnoreNetwork:$trueSlide24
Database Availability Group Networks
Install hotfix from KB 2469100Fixes issue with manually added route table entries disappearingBlock cross-network
communication
Blocked
Allowed
Subnet 3
Subnet 4
Subnet 2
Subnet 1
M
M
M
M
R
R
R
RSlide25
Deep Dive
Active ManagerSlide26
Active Manager
Internal Exchange component that manages high availability platformRuns inside the Microsoft Exchange Replication service on every Mailbox serverIs the definitive source of information on where a database is active
Stores this information in cluster databaseProvides this information to Active Manager client running on other server roles (Client Access and Hub Transport)Slide27
Active Manager
Standalone Active ManagerPrimary Active Manager (PAM)Standby Active Manager (SAM)Active Manager Client
Runs in RPC Client Access service on CAS and Transport service on HubSlide28
Active Manager
Primary Active Manager (PAM)Runs on the node that owns the cluster core resources (cluster group)Gets topology change notificationsReacts to server failures
Selects the best database copy on failovers and targetless switchoversDetects failures of local Information Store and local databasesSlide29
Active Manager
Standby Active Manager (SAM)Runs on every other node in the DAGDetects failures of local Information Store and local databasesReacts to failures by asking PAM to initiate a failover
Responds to queries from CAS/Hub about which server hosts the active copyBoth roles are necessary for automatic recoveryIf the Microsoft Exchange Replication service is stopped, automatic recovery will not happenSlide30
Active Manager
Which DAG member is the current PAM?Get-DatabaseAvailabilityGroup DAG1 | fl PrimaryActiveManagerHow can I move the PAM role?
Move-ClusterGroup “Cluster Group” -Node MBX2orCluster group “cluster group” /moveSlide31
Active Manager
Transition of Active Manager role state logged into Microsoft-Exchange-HighAvailability/Operational event log Crimson ChannelSlide32
Active Manager Functionality
Mount and Dismount DatabasesProvide Database Availability InformationProvide Interface for Administrative TasksMaintains Database and Server State Information
Monitor for Failures and Initiate RecoverySlide33
Database *over
Database *oversSwitchover - An administrator action invoked by a taskFailover - Automatic operation initiated by the PAMBegins with a Dismount operation and ends with a Mount operationSlide34
Mount / Dismount Database
Mount DatabaseAn admin action invoked through a taskThe last part of a database *over
Dismount DatabaseAn admin action invoked through a taskThe first part of a database switchoverSlide35
AutoDismount
Occurs when a DAG loses quorumAll DAG members are running (but may not be participating in the cluster)Databases dismounted as quickly as possible by terminating the Information Store service
The only exception to the “SAM can take no action” ruleSlide36
Crimson Channel
Present on all Exchange 2010 Mailbox serversApplications and Services Logs\Microsoft\ExchangeHighAvailabilityBlockReplication
DebugOperationalTruncationDebugMailboxDatabaseFailureItemsDebugOperationalApplications and Services Logs\Microsoft\WindowsFailoverClusteringSlide37
HighAvailability
Events for startup/shutdown of MSExchangeRepl.exe, and it’s components:Active ManagerThird-Party Replication APITasks RPC server
TCP listenerVSS writerUsed by Active Manager for events related to role monitoring, database mount operations, log truncation, and cluster-related eventsSlide38
MailboxDatabaseFailureItems
Used to log events associated with any failures that affect a replicated mailbox databaseSlide39Slide40
Deep Dive
Best Copy SelectionSlide41
Best Copy Selection
Process of finding the best copy of an individual database to activate, given a list potential copies for activation and their statusActive Manager selects the “best” copy to become the new active copy when the existing active copy fails or when an administrator performs a
targetless switchoverDuring best copy selection, any servers that are unreachable or activation-blocked are ignoredSlide42
Best Copy Selection First Three Steps – RTM
Sort copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessary
Select “best” copy from sorted listed based on which set of criteria met by each copyRun Attempt Copy Last Logs (ACLL) and try to copy any missing log files from previous active copySlide43
Best Copy Selection First Three Steps – SP1+
Sort copies by activation preference when auto database mount dial is set to Lossless (otherwise, sort copies based on RTM behavior)
Select “best” copy from sorted listed based on which set of criteria met by each copyRun Attempt Copy Last Logs (ACLL) and try to copy any missing log files from previous active copySlide44
Best Copy Selection Last Step – RTM+
Is database mountable?Is copy queue length <=
AutoDatabaseMountDial?If Yes, database is marked as current active and mount request is issued by Active Manager to the Information StoreIf not, next database in sorted list is tried, if one is available. If one is not available, an administrator must manually resolve the problemSlide45
Best Copy Selection – Selection Criteria
Criteria
Copy Queue LengthReplay Queue Length
Content Index Status
1
< 10 logs
< 50 logs
Healthy
2
< 10 logs
< 50 logs
Crawling
3
N /
A
<
50 logs
Healthy
4
N / A
< 50 logs
Crawling
5
N / A
< 50 logs
N / A
6
< 10 logs
N / A
Healthy
7
< 10 logs
N / A
Crawling
8
N / A
N / A
Healthy
9
N / A
N / A
Crawling
10
Any
database copy
with a status of Healthy,
DisconnectedAndHealthy
,
DisconnectedAndResynchronizing
, or
SeedingSourceSlide46
Example: Best Copy Selection – RTM
Four copies of DB1DB1 active on Server1
Database CopyActivation Preference
Copy Queue Length
Replay Queue Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
DB1
Server1
Server2
Server3
Server4
DB1
DB1
DB1
XSlide47
Example: Best Copy Selection – RTM
Sort list of available copies based by Copy Queue Length (using AP if necessary):Server3\DB1
Server2\DB1Server4\DB1Database Copy
Activation Preference
Copy Queue Length
Replay Queue Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
HealthySlide48
Example: Best Copy Selection – RTM
Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy):Server3\DB1
Server2\DB1Server4\DB1Lowest copy queue length – tried first
Database Copy
Activation Preference
Copy Queue Length
Replay Queue Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
HealthySlide49
Example: Best Copy Selection – SP1+
Four copies of DB1DB1 active on Server1Auto database mountdial set to Lossless
DB1
Server1
Server2
Server3
Server4
DB1
DB1
DB1
X
Database Copy
Activation Preference
Copy Queue Length
Replay Queue Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
HealthySlide50
Example: Best Copy Selection – SP1+
Sort list of available copies based by Activation Preference:Server2\DB1Server3\DB1
Server4\DB1Database Copy
Activation Preference
Copy Queue Length
Replay Queue Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
HealthySlide51
Example: Best Copy Selection – SP1+
Sort list of available copies based by Activation Preference:Server2\DB1
Server3\DB1Server4\DB1Lowest preference value – tried first
Database Copy
Activation Preference
Copy Queue Length
Replay Queue Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
HealthySlide52
Best Copy Selection – Post-Activation Events
Transport Dumpster requests will be initiated for the mailbox database to recover any lost messagesThe new active and mounted mailbox database will generate new log files using the same log generation sequence
When the previous active copy recovers, it will run through divergence detection and either perform an incremental resynchronization or require an administrator to reseed the database copySlide53
Deep Dive
The MommyMayIMount Bit
Datacenter Activation Coordination ModeSlide54
Datacenter Activation Coordination M
odeDAC mode is a property
of a DAGActs as an application-level form of quorumDesigned to prevent multiple copies of same database from mounting on multiple membersEnables the use of Exchange cmdlets for datacenter switchovers of DAGsStop-DatabaseAvailabilityGroup
Restore-DatabaseAvailabilityGroup
Start-DatabaseAvailabilityGroupSlide55
Datacenter Activation Coordination Mode
Uses Datacenter Activation Coordination Protocol (DACP) also known as the MommyMayIMount) bit
DACP is an in-memory bit in the Exchange Replication serviceDACP has two possible values:0 = cannot automatically mount databases on startup1 = can automatically mount databases on startup, provided Automount consensus is TrueSlide56
Datacenter Activation Coordination Mode
Microsoft Exchange Replication service startup sequenceActive Manager initializes
DACP bit is set to 0DAG member communicates with other DAG membersIf the starting DAG member can communicate with all members on the StartedMailboxServers list, starting DAG member DACP bit is set to 1If starting DAG member can communicate with another DAG member that has a DACP
bit
set to 1
, starting DAG member DACP bit
is set to 1
If starting DAG member can communicate only with members that have a DACP bit set to 0, starting DAG member DACP bit remains at
0Slide57
Datacenter Activation Coordination ModeSlide58
Datacenter Activation Coordination ModeSlide59
Datacenter Activation Coordination Mode
0
0
1
1Slide60
Datacenter Activation Coordination Mode
Side effects of enabling DAC modeWhen a DAG in DAC mode is started after a complete shutdown, databases will not be mountable until
all DAG members are up, running, and in communication with each otherWhen performing a datacenter switchover where only a single node remains in the cluster supporting the DAG, any reboot that changes both the boot time of the witness server and the boot time of the DAG member will prevent databases from mounting automaticallyIf the reboots were necessary and valid operations, administrators can force the databases online without causing split brainSlide61
Related Content
EXL308 - Real World High Availability and Site Resilient Design
EXL307 - Using
a Load Balancer in Your
Exchange
Server 2010
Environment
EXL316 - Microsoft
Lync
2010: Availability, Resiliency, and Recovery
EXL203 - How to Tell Your Manager You Need Quotas on Your Mailboxes
Find Me Later
Today At the Exchange Booth from 12:30-1:30Slide62
Track Resources
Lync
Team
Blog
:
http://blogs.technet.com/b/uc
/
Lync Facebook
:
http
://
www.facebook.com/MicrosoftOfficeCommunicator
Lync Website:
http://
lync.microsoft.com/en-us/Pages/unified-communications.aspx
Lync Server Blog
:
http://blogs.technet.com/b/nexthop
/Slide63
Resources
Connect. Share. Discuss.
http://northamerica.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn Slide64
Complete an evaluation on CommNet and enter to win!Slide65
MS Tag
Scan the Tag
to evaluate this
session now on
myTechEd MobileSlide66Slide67
©
2012 Microsoft
Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the
part
of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT
MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.Slide68