/
Exchange Server 2013 Exchange Server 2013

Exchange Server 2013 - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
501 views
Uploaded On 2016-07-03

Exchange Server 2013 - PPT Presentation

High Availability Site Resilience Scott Schnoll Principal Technical Writer OUCB314 Storage High Availability Site Resilience Announcements Storage Storage Challenges Disks Capacity is increasing but IOPS are not ID: 388124

database server copy exchange server database exchange copy site recovery microsoft 2013 copies active resilience databases availability innovations db1

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Exchange Server 2013" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1
Slide2

Exchange Server 2013High Availability | Site Resilience

Scott SchnollPrincipal Technical Writer

OUC-B314Slide3

StorageHigh Availability

Site ResilienceAnnouncementsSlide4

StorageSlide5

Storage Challenges

DisksCapacity is increasing, but IOPS are notDatabases

Database sizes must be manageable

Database Copies

Reseeds must be fast and reliable

Passive database copy IOPS are inefficient

Lagged copies have asymmetric storage requirements require manual careSlide6

Multiple Databases Per VolumeAutoreseed

Self-Recovery from Storage FailuresLagged Copy Innovations

Storage InnovationsSlide7

Multiple database per volumeSlide8

Multiple databases per volume

DB1

DB4

DB3

DB2

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

Passive

Active

Lagged

4-member DAG

4 databases

4 copies of each database

4 databases per volume

Symmetrical

designSlide9

Multiple databases per volume

DB1

DB1

DB1

DB1

Passive

Active

Lagged

Single database copy/disk:

Reseed 2TB Database = ~23

hrs

Reseed 8TB Database = ~93

hrs

20 MB/sSlide10

Multiple databases per volume

DB1

DB4

DB3

DB2

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

Passive

Active

Lagged

Single database copy/disk:

Reseed 2TB Database = ~23

hrs

Reseed 8TB Database = ~93

hrs

4 database copies/disk:

Reseed 2TB Disk = ~9.7

hrs

Reseed 8TB Disk = ~39

hrs

12

MB/s

12

MB/s

20 MB/s

20 MB/sSlide11

RequirementsSingle logical disk/partition per physical

diskRecommendationsDatabases per volume should equal the number of copies per database

Same neighbors on all

servers

Multiple databases per volumeSlide12

AutoreseedSlide13

Disk failure on active copy = database failoverFailed disk and database corruption issues need

to be addressed quicklyFast recovery to restore redundancy is needed

Seeding ChallengesSlide14

Automatically restore redundancy

after disk failure using provisioned sparesSeeding Innovations

In-Use Storage

Spares

X

Disk re-seed operationSlide15

Autoreseed WorkflowSlide16

Detect a copy in an F&S state for 15 min in a row

Try to resume copy 3 times (with 5 min sleeps in between)Try assigning a spare volume 5 times (with 1 hour sleeps in between

)

Try

InPlaceSeed

with

SafeDeleteExistingFiles

5 times (with 1 hour sleeps in between)

Once all retries are exhausted, workflow stops

If 3 days have elapsed and copy is still F&S, workflow state is reset and starts from Step 1

Autoreseed

WorkflowSlide17

PrerequisitesCopy is not ReseedBlocked

or ResumeBlockedLogs and database file(s)

are on same

volume

Database

and

log

folder structure matches

required naming convention

No

active copies on failed

volume

All copies are F&S on the failed volume

No more than 8 F&S copies on the server (if so, might be a controller failure)For InPlaceSeedUp to 10 concurrent seeds are allowedIf a database files exists, wait for 2 days before in-place reseeding

Waiting period based on LastWriteTime of database fileAutoreseed WorkflowSlide18

Autoreseed

\

ExchDbs

ExchVols

Vol1

Vol3

MDB1

MDB2

MDB1

Vol2

MDB2

MDB1.DB

MDB1.log

MDB1.DB

MDB1.log

AutoDagDatabasesRootFolderPath

AutoDagVolumesRootFolderPath

AutoDagDatabaseCopiesPerVolume

= 1 Slide19

RequirementsSingle logical disk/partition per physical

diskSpecific database and log folder structure must be usedRecommendationsSame neighbors on all servers

Databases

per volume should equal the number of copies per database

Configuration instructions (updated April 2013)

http://

aka.ms/autoreseed

AutoreseedSlide20

Autoreseed

Numerous fixes in CU1Autoreseed not detecting spare disks correctly

Autoreseed

not using spare disks

Increased

Autoreseed

copy limits (previously 4, now 8)

Better tracking around mount path and

ExchangeVolume

path

Get-

MailboxDatabaseCopyStatus

displays

ExchangeVolumeMountPointShows the mount point of the database volume under C:\ExchangeVolumesSlide21

Other Seeding Innovations in CU1

Update-MailboxDatabaseCopy includes new parameters

designed

to aid with

automation

Parameter

Description

BeginSeed

Useful for scripting reseeds. Task asynchronously starts the seeding operation and then exits the

cmdlet

.

MaximumSeedsInParallel

Used with Server parameter to specify maximum number of parallel seeding operations across specified server during full server reseed operation. Default is 10.

SafeDeleteExistingFiles

Used to perform a seeding operation with a single copy redundancy pre-check prior to the seed. Because this parameter includes the redundancy safety check, it requires a lower level of permissions than

DeleteExistingFiles

, enabling a limited permission administrator to perform the seeding operationServer

Used as part of a full server reseed operation to reseed all database copies in a F&S state. Can be used with MaximumSeedsInParallel to start reseeds of database copies in parallel across specified server in batches of up to value of MaximumSeedsInParallel

parameter copies at a timeSlide22

Self-recovery from storage failuresSlide23

Storage controllers are basically mini-PCsAs such, they can crash, hang, etc., requiring administrative intervention

Other operator-recoverable conditions can occurLoss of vital system elementsHung or highly latent IO

Recovery ChallengesSlide24

Innovations added in Exchange 2010 carried forwardNew recovery behaviors added to Exchange 2013

Even more added to Exchange 2013 CU1Recovery Innovations

Exchange Server 2010

Exchange Server 2013

ESE Database Hung IO (240s)

System Bad State (302s)

Failure Item Channel Heartbeat (30s)

Long I/O times (41s)

SystemDisk

Heartbeat (120s)

MSExchangeRepl.exe memory threshold (4GB)

Exchange

Server 2013 CU1

Bus

reset (event 129)

Replication service endpoints not responding

Cluster database hang (GUM updates blocked)Slide25

Lagged copy innovationsSlide26

Activation is difficultLagged copies require manual care

Lagged copies cannot be page patchedLagged Copy ChallengesSlide27

Automatic log file replayLow disk space (enable in registry)

Page patching (enabled by default)Less than 3 other healthy copies (enable in Active Directory; configure in registry)

Integration with Safety Net

No need for log surgery or hunting for the point of

corruption

Lagged Copy InnovationsSlide28

High AvailabilitySlide29

High availability focuses on database healthBest copy selection insufficient for new architecture

DAG network configuration still manual

High Availability ChallengesSlide30

High Availability Innovations

Managed AvailabilityBest Copy and Server SelectionDAG Network AutoconfigSlide31

Managed AvailabilitySlide32

Managed Availability

Key tenets for Exchange 2013Access to a mailbox is provided by protocol stack on the Mailbox server that hosts the active copy of the mailboxIf a protocol is down on a Mailbox server, all access to active databases on that server via that protocol is lost

Managed Availability was introduced to detect and automatically recover from these kinds of failures

For most protocols, quick recovery is achieved via a restart action

If the restart action fails, a failover can be triggeredSlide33

An internal framework used by component teamsSequencing mechanism to control when recovery actions are taken versus alerting and escalation

Enhances the Best Copy Selection algorithm by taking into account overall server health of source and target

Managed AvailabilitySlide34

MA failovers are recovery action from

failureDetected via a synthetic operation or live dataThrottled in time and across the DAGMA failovers

can happen at database or server level

Database: Store-detected database failure can trigger database

failover

Server

: Protocol failure can trigger server

failover

Single Copy Alert integrated into MA

ServerOneCopyInternalMonitorProbe

(part of

DataProtection

Health Set)

Alert is per-server to reduce flowStill triggered across all machines with copiesLogs 4138 (red) and 4139 (green) eventsManaged AvailabilitySlide35

Best Copy and Server SelectionSlide36

Exchange 2010 used several criteriaCopy queue length

Replay queue lengthDatabase copy status – including activation blockedContent index statusUsing just this criteria is not

good enough for Exchange

2013

, because protocol health is not

considered

Best Copy Selection ChallengesSlide37

Still an Active Manager algorithm performed at *over time based on extracted health of the systemReplication health still determined by same criteria and phases

Criteria now includes health of entire protocol stackConsiders a prioritized protocol health set in the

selection using four

priorities – critical, high, medium,

low

Failover

responders trigger added checks to select a “protocol not worse”

target

Best Copy and Server SelectionSlide38

Managed Availability imposes 4 new constraints on the

Best Copy Selection algorithm

Best Copy and Server Selection

1

2

3

4Slide39

BCSS Changes in CU1

PAM tracks number of active databases per serverHonors

MaximumActiveDatabases

, if

configured

Allows

Active Manager to exclude servers that are already hosting the maximum amount of active databases when determining potential candidates for activation

Keeps

an in-memory state that tracks the number of active databases per

server

When

the PAM role moves or when the Exchange Replication service is restarted on the PAM, this information is rebuilt from the cluster databaseSlide40

DAG Network InnovationsSlide41

DAG networks must be manually collapsed in a multi-subnet deploymentSmall remaining administrative burden for deployment and initial configuration

DAG Network ChallengesSlide42

Automatically collapsed in multi-subnet environmentAutomatic or manual

configurationDefault is AutomaticRequires specific settings on MAPI and Replication network interfaces

Manual edits and EAC controls blocked

by default

Set DAG to manual network setup to edit or change DAG networks

DAG Network InnovationsSlide43
Slide44

Site ResilienceSlide45

Operationally complexMailbox and Client Access recovery connected

Namespace is a SPOFSite Resilience ChallengesSlide46

Site Resilience Innovations

Key CharacteristicsDNS resolves to multiple IP addressesAlmost all protocol access in Exchange 2013 is HTTP

HTTP clients have built-in IP failover capabilities

Clients skip past IPs that produce hard TCP failures

Admins can switchover by removing VIP from DNS

Namespace no longer a SPOF

No dealing with DNS

latencySlide47

Operationally simplifiedMailbox and Client Access recovery independent

Namespace provides redundancySite Resilience InnovationsSlide48

Operationally SimplifiedPreviously loss of CAS, CAS array, VIP, LB,

etc., required admin to perform a datacenter switchoverIn Exchange Server 2013, recovery happens automaticallyThe admin focuses on fixing the issue, instead of restoring service

Site ResilienceSlide49

Mailbox and CAS recovery independent

Previously, CAS and Mailbox server recovery were tied together in site recoveriesIn Exchange Server 2013, recovery is independent, and may come automatically in the form of failoverThis is dependent on business requirements and

configuration

Site ResilienceSlide50

Namespace provides redundancyPreviously, the namespace was a single point of failure

In Exchange 2013, the namespace provides redundancy by leveraging multiple A records and client’s OS/HTTP stack ability to failover

Site ResilienceSlide51

Support for new deployment scenariosWith the namespace simplification

, consolidation of server roles, separation of CAS array and DAG recovery, de-coupling of CAS and Mailbox by AD site

, and

load balancing changes

, if available, three locations can simplify mailbox recovery in response to datacenter-level events

You must have at least three locations

Two locations with Exchange; one with witness server

Exchange sites must be well-connected

Witness server site must be isolated from network failures affecting Exchange

sites

Site ResilienceSlide52

Site Resilience Failover ExamplesSlide53

alternate datacenter:

Portland

primary datacenter:

Redmond

Site Resilience Failover Examples

cas3

cas4

cas1

cas2

VIP: 192.168.1.50

X

VIP: 10.0.1.50

mail.contoso.com: 192.168.1.50, 10.0.1.50

Removing failing IP from DNS puts you in control of in service time of VIP

With multiple VIP endpoints sharing the same namespace, if one VIP fails, clients automatically failover to alternate VIP(s)

mail.contoso.com: 10.0.1.50Slide54

third datacenter: Paris

alternate datacenter:

Portland

primary datacenter:

Redmond

Site Resilience Failover Examples

dag1

mbx1

mbx2

mbx3

mbx4

Assuming MBX3 and MBX4 are operating and one of them can lock the witness.log

file,

automatic

failover

of active databases should

occur

witness

XSlide55

alternate datacenter:

Portland

primary datacenter:

Redmond

Site Resilience Failover Examples

dag1

witness

mbx1

mbx2

mbx3

mbx4

X

X

XSlide56

alternate datacenter:

Portland

primary datacenter:

Redmond

dag1

Site Resilience Failover Examples

witness

mbx1

mbx2

mbx3

mbx4

alternate witness

Mark the failed servers/site as down:

Stop-

DatabaseAvailabilityGroup

DAG1 –

ActiveDirectorySite:Redmond

Stop the Cluster Service on Remaining DAG members:

Stop-

Clussvc

Activate DAG members in 2

nd

datacenter:

Restore-

DatabaseAvailabilityGroup

DAG1 –

ActiveDirectorySite:Portland

XSlide57

ANNOUNCEMENTSSlide58

Coming in CU2Slide59

Coming in CU2

Microsoft Exchange DAG Management serviceMSExchangeDAGMgmtHas

MonitoringComponent

moved into it

Continues to write events to the same place that the Replication service writes to (Application event log with source of

MSExchangeRepl

and crimson channel)

Additional functionality will be moved from

MSExchangeRepl

to

MSExchangeDAGMgmt

in the futureSlide60

Possibly Coming in CU2

Use Windows Azure for witness serverTesting and validation currently underwayRequires extending internal Active Directory permissions to public cloud

Involves creating a file server on top of Azure

IaaS

VM role

HA file server in

Azure: Two

persistent VMs

can use

XStore

for shared

storageSlide61

Coming in CU2

Enterprise Edition support for 100 databases/serverTo enable thisWe made code changes in CU2

We fixed some blocking bugs

We did extensive testing in validation, including internal

Dogfood

environmentsSlide62

Questions?

Scott Schnoll

scott.schnoll@microsoft.com

Twitter:

@Schnoll

Blog:

http://aka.ms/SchnollSlide63

Related content

Microsoft Exchange Server 2013 Managed Availability

Microsoft Exchange Server 2013 Sizing

Virtualization in Microsoft Exchange Server 2013

Exchange 2013

On-Premises Upgrade

and Coexistence

Exchange Server 2013 Tips & TricksSlide64

Track resources

Exchange

Team Blog:

http

://blogs.technet.com/b/exchange

/

Twitter

:

Follow

@

MSFTExchange

Join the conversation, use #

IamMEC

Check

out:

Microsoft Exchange

Conference 2014:

www.iammec.com

Office 365 FastTrack: http://fasttrack.office.com/

/

Technical Training with Ignite: http://ignite.office.com/Slide65

msdn

Resources for Developers

http://microsoft.com/msdn

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Resources for IT Professionals

http://microsoft.com/technet Slide66

Complete an evaluation on CommNet and enter to win!Slide67

MS tag

Scan the Tag

to evaluate this session now on

myTechEd

Mobile

Required Slide

*delete this box when your slide is finalized

Your MS Tag will be inserted here during the final scrub. Slide68

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.