Boris Lokhvitsky MCM Exchange Principal Consultant Delivery Architect Microsoft Consulting Services BRK3131 Agenda Architecture Concepts and Design Principles Exchange Evolution from Hardware to Software Powered Solution ID: 476812
Download Presentation The PPT/PDF document "Exchange Design Concepts and Best Practi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
Exchange Design Concepts and Best Practices
Boris LokhvitskyMCM | ExchangePrincipal Consultant / Delivery ArchitectMicrosoft Consulting Services
BRK3131Slide3
Agenda
Architecture Concepts and Design PrinciplesExchange Evolution: from Hardware to Software Powered Solution
Exchange Design
Principles
Availability
and Reliability: role of Critical Dependencies and Redundant Components
Architecture Models: Shared Infrastructure vs. Building Blocks
Modern Servers and Datacenters: Scale Like the Cloud
Design Options: Supported vs. Recommended vs. Structured
People and Process, not just Technology
How
Architecture Drives Design Decisions
Consolidated vs. Distributed Design
Service Site Model: Bound vs. Unbound
All about DAGs:
Sizing,
IP-less DAG, Database
Copy Layout, and Dedicated Replication Network
Virtualization and Role Consolidation
Storage Challenges: SAN/DAS, RAID/JBOD, Thin Provisioning, Native vs. Low Level Data Replication
Server and Storage Platform, Disk and Database Layout
Backups and Lagged Copies
Archiving: Retention vs.
Compliance
Client Breakdown and Penalty FactorsSlide4
Architecture Concepts and Design PrinciplesSlide5
Hardware vs. Software powered solution
Exchange 2003
Shared infrastructure with redundant hardware components
Exchange 2010/2013/2016
Commodity building blocks with software controlled redundancy
New architecture and design principles
I/O Meter
I/O Meter
How much disk performance Exchange database needs?Slide6
Exchange Design Principles
In modern Exchange world software
, not hardware, powers and controls the solution
Availability
Reduce complexity, simplify the solution
Decrease the number of system dependencies
to improve
availability
and lower the risksUse native capabilities where possible as it makes the design simplerDeploy redundant solution components to increase availability and protect the solutionAvoid failure domains: do not group redundant solution components into blocks that could be impacted by a single failureFunctionality / ProductivityEnable and enhance user experience
Provide functionality
and access that is required or expected by the end users
Provide
large low cost mailboxes
Use Exchange as a
single data repositoryIncrease value with Lync and SharePoint integrationBuild a bridge to the cloud – ensure feature rich cloud integration and co-existenceOperations: Optimize People and Process, not just TechnologyDecrease complexity of team collaboration by leveraging solution / workload focused teamsSimplify / optimize administration / monitoring / troubleshooting processCost: Reduce / minimize total cost of the ownership (TCO) for the solutionUse commodity hardware and
leverage native product capabilitiesImplement storage solution that minimizes cost, complexity, and administrative overheadSlide7
Availability and Reliability
Failures
*do*
happen!
Critical
system dependencies
decrease
availabilityDeploy Multi-role
serversAvoid intermediate and extra components (e.g. SAN; network teaming; archiving servers)Simpler design is always better: KISSRedundant components increase availabilityMultiple database copiesMultiple balanced serversFailure domains combining redundant components decrease
availabilityExamples: SAN; Blade chassis; Virtualization
hostsSoftware, not hardware
is driving the solution
Exchange powered replication and managed availability
Redundant transport and Safety Net
Load balancing and proxying to the destinationAvailability principles:DAG beyond the “A”http://blogs.technet.com/b/exchange/archive/2011/09/16/dag-beyond-the-a.aspxSlide8
Classical shared infrastructure design introduces numerous critical dependency components
Relying on hardware requires expensive redundant components
Failure domains reduce
availability and
introduce
significant extra complexity
Shared Infrastructure and Failure DomainsSlide9
Building Block Architecture
Inexpensive commodity servers and storage
Scale the solution out, not in; more servers mean better availability
Nearline
SAS storage: provide large mailboxes by using large low cost drives
Exchange I/O reduced by 93% since Exchange 2003
Exchange 2013 database needs ~10 IOPS; single
Nearline
SAS disk provides ~60 IOPS; single 2.5” 15K rpm SAS disk provides ~230 IOPSRedundancy and availability provided and controlled by Exchange, not by infrastructure
3+ redundant database copies eliminate the need for RAID and backups
Redundant servers eliminate the need for redundant server components
(e.g. NIC teaming or MPIO)
DAG is the ultimate building block allowing you to scale the solutionSlide10
Modern
Server: Commodity Hardware
Google, Microsoft, Amazon, Yahoo! u
se commodity hardware for 10+ years already
Not only for messaging but for other technologies as well (started with search, actually)
Inexpensive commodity server and storage as a building block
Easily
replaceable, highly scalable, extremely cost efficient
Software, not hardware is the brain of the solution
Photo Credit
: Stephen
Shankland
/CNETSlide11
People and Process
(
not just Technology)
Decrease complexity of team collaboration
Simplify administration /
monitoring / troubleshooting
Solution focused teams
Traditional
application team owns only a small piece of the solution / workflowMultiple teams must be engaged to implement the design or troubleshoot the issueTeam organization based on solution, not on
specific infrastructure or technology
simplifies
administration / troubleshooting and reduces operational
costs
Own your solution
!Or… if you can’t do it right, reduce your OpEx by moving to O365 ;)Slide12
Exchange
Product Line Architecture
Exchange
PLA
: Special tightly scoped reference architecture offering from Microsoft Consulting Services
Based on deployment best practices and
collective customer
experience
Structured design based on detailed rule sets to avoid common mistakes and misconceptionsBased on cornerstone design principles:4 database copies across 2 sites
Unbound Service Site model (single namespace)
Witness in the 3rd site
Multi-role servers
DAS storage with NL
SAS or
SATA JBOD configurationL7 load balancing (no session affinity)Large low cost mailboxes (25/50 GB standard mailbox size)Enable access for all internal / external clientsSystem Center for monitoring
Exchange Online Protection for messaging hygieneSupported
Recommended
Structured
Standardized
You should be here!Slide13
How Architecture Drives Design DecisionsSlide14
Consolidated vs. Distributed Design
Consolidated design is usually preferred option
Max reduction of server footprint, deployment costs, and administrative overhead
Simplifies use
of single unified
namespace
Still need 2-3 datacenters for proper implementation of site resilience
Distributed design places servers closer to end users
Optimizes client to server traffic (maybe except the DR scenarios)
Might be driven by regulatory compliance
requirements (keep data on the home soil)
Presents challenges for site resilience / DR
design / firewalls / administration
Choice between client-server vs. server-server trafficSlide15
Bound vs. Unbound Service Site Model
Bound
Service
model
Binds user mailboxes to a preferred datacenter
Uses site specific namespaces
Uses Active/Passive DAG design
Building block: two active/passive DAGs
PLA recommended for Exchange 2010
Unbound Service
model (recommended
)
Users don’t have a preferred datacenter
Allows use of single unified
namespace
Uses Active/Active DAG designBuilding block: one active/active DAGPLA recommended for Exchange 2013Slide16
DAG Sizing: Small, Medium, Large?
DAG size is important because DAG is the building block
General guidance is to prefer larger DAG size
Larger DAGs provide better availability and load balancing
If 1 server with X active mailboxes fails in the N-node DAG, active mailbox count on each server increases only by X/(N-1)
Proper symmetric database copy layout is
important
to
achieve good mailbox load balancingLarger DAGs, however, have disadvantages too
Large DAGs are more vulnerable to network issues
as number of network connections in the N-server DAG is
N*(N-1)/2 (16-node DAG needs 240 P2P connections!)
Intermittent network issues can cause databases misbalanced and DAG nodes evicted; see this article for details:
http://aka.ms/partitioned-cluster-networks
More impact on cluster writes and increased failure zoneScalability planning due to growth also impacts the decisionAdding just a few servers to the existing DAG is hard as it requires database copy layout changesSlide17
Database Copy Layout Principles
Goal: Provide symmetric database copy layout to ensure even load distribution
http://blogs.technet.com/b/exchange/archive/2010/09/10/3410995.aspx
Server3
Failure
Server6
Failure
Slide18
Homeless DAG
New capability in Exchange 2013: DAG without a Cluster Administrative Access Point (a.k.a. IP-less DAG)
http://
blogs.technet.com/b/scottschnoll/archive/2014/02/25/database-availability-groups-and-windows-server-2012-r2.aspx
http://
blogs.technet.com/b/timmcmic/archive/2015/04/29/my-exchange-2013-dag-has-gone-commando.aspx
Recommended and preferred model, default in Exchange 2016Advantages: reduced complexityNo need to deal with cluster network object (CNO) computer account and permissions
No need to reserve and manage DAG IP addresses
Single cluster resource left (File Share Witness)
Disadvantages:
Cannot use Failover Cluster Manager – must use Powershell cluster commands
Might present issues to some 3
rd party applications that use cluster name (e.g. backups) – move away from thoseUseful Powershell cmdlets:Get-Cluster -Name DAG01 | select
*Get-ClusterNode -Cluster DAG01 [-Name SVR01] | select *Get-ClusterNetwork -Cluster DAG01 [-Name
DAGNetwork01] | select *
Get-
ClusterQuorum
-Cluster
DAG01
|
fl
Get-
ClusterGroup
-Cluster
DAG01
Move-
ClusterGroup
-Cluster
DAG01
-Name "Cluster Group" -Node
SVR01
Get-
ClusterLog
–Cluster DAG01Slide19
Network: HA/SR and Replication Network
High Availability (HA)
is redundancy of solution components within a
datacenter
Site Resilience (SR)
is redundancy across
datacenters
providing a DR solution
Both HA and SR are based on native Exchange data replicationEach database exists in multiple copies, one of them activeData is shipped to passive copies via transaction log replication over the network
It is possible to use dedicated isolated network for Exchange data replication
Network requirements for
replication:
Each active passive database replication stream generates X bandwidth
The more database copies, the more bandwidth is required
Exchange natively encrypts and compresses replication trafficPros and cons for dedicated replication network => Not recommended Replication network can help isolating client traffic from replication traffic Replication network must be truly isolated along the entire data transfer path: having separate NICs but sharing the network path after the first switch is meaningless Replication network requires configuring static routes and eliminating cross talk; this leads to extra complexity and increases risk of human error
If server NICs are 10Gbps capable, it’s easier to have a single network for everythingNo need for network teaming: think of a NIC as JBODSlide20
Introduces additional critical solution component and associated
performance and maintenance
overhead
Reduces availability and introduces extra complexity
Could make sense for
small
deployments helping consolidate workloads – but this
introduces shared infrastructureConsolidated roles is a guidance since Exchange 2010 – and now there is only a single role in Exchange 2016!
Deploying multiple Exchange servers on the same host would create failure domainHypervisor powered high availability is not needed with proper Exchange DAG designsNo real benefits from Virtualization as Exchange provides equivalent benefits natively at the application levelVirtualization vs. Role ConsolidationSlide21
Storage Design Options / Challenges
SAN
DAS
SAN is NOT faster than DAS
Reduce complexity
No need in expensive redundant high performing intermediate SAN components
SAN concept follows shared infrastructure model, not building block
RAID JBOD (RBOD)
No need for disk redundancy:
data redundancy is moved to application level
Think of Ex2013 servers as software RAID
RAID is supported but doubles disk
count (assuming RAID-10)
and cost
Enable controller caching: 75/25 write/readMany designs are supported; there are three storage design dimensionsFC SAS SATA
Need large disks to provide large mailboxesIn Ex2013 IOPS requirements reduced ~93% from Ex2003!Typical Ex2013 database requires ~10 IOPS7200 rpm LFF (3.5”) SATA/NL-SAS disk provides ~60 IOPS15K rpm SFF (2.5”) SAS/FC disk provides ~230 IOPS
No need for fast but small and expensive high performing disksSlide22
RAID vs. JBOD with Native Replication
Conceptually
similar replication – goal is to introduce redundant copy of the data
Software, not hardware powered
Application aware replication
Enables
each server and associated storage as independent isolated building blockExchange 2013 is capable of automatic reseed using hot spare (no manual actions besides replacing the failed disk!)Finally, cost factor: RAID1/0 requires 2x storage (you still want 4 database copies for Exchange availability)!Slide23
Thick Risks of Thin Provisioning
Exchange mailboxes will grow but they don’t consume that much on Day 1
The desire not to pay for full storage capacity upfront is understood
However, inability to provision more storage and extend capacity quickly when needed is a big risk
Successful thin provisioning requires significant operational maturity and process excellence unseen in the wild
Microsoft guidance and best practice is to use thick provisioning with low cost storage
Incremental provisioning model can be considered a reasonable compromise
Thick Provisioning
Thin Provisioning
Incremental ProvisioningSlide24
Exchange
continuous replication is a
native transactional
replication (
based on transaction
data shipping
)
Database itself is not replicated (transaction logs are played back to target database copy)Each transaction is checked for consistency and integrity before
replay (hence physical corruption cannot propagate)Page patching is automatically activated for corrupted pagesReplication data stream can be natively encrypted and compressed (both settings configurable, default is cross site only)In case of data loss Exchange automatically reseeds or resynchronizes the database (depending on type of loss)If hot spare disk is configured, Exchange automatically uses it for reseeding (like RAID rebuild does)Native vs. Low Level Data ReplicationSlide25
Typical Exchange
Disk Layout
Two mirrored (RAID1) disks for system partition
(OS;
Exchange binaries, transport
queues, logs)
One hot spare disk
Nine
or more RBOD disks (single disk RAID-0) for Exchange databases with collocated transaction logsFour database copies collocated per
disk, not to exceed 2TB database sizeSlide26
Need more? Layout
for more storage
There are servers that can house more than 12 LFF disks (up to 16 with rear bay)
There are already DAS enclosures available that provide 720 TB capacity in a single 4U unit
(90 x 8TB drives)!
Scalability limits: still 100 database copies / server
This means no more than 25 drives @ 4 databases/disk or 50 drives @ 2 databases/diskSlide27
Backup and Recovery
Challenge 1: Backup or Extra copy?
Exchange
Native Data Protection: 3+ highly available database copies
Lagged
copy to protect from unlikely scenarios (logical corruption, admin
error)
Replay Lag Manager with automatic copy conversion
What will you do with your tapes?
Challenge 2: How do I recover data without backup?
Self Service Recovery
to restore items from
Recoverable Items – Deletions
Single Item Recovery to protect items in Recoverable Items and restore them administratively via mailbox search
Large mailboxes can accommodate large “dumpster” – no need for archive mailbox or 3rd party archivingSlide28
Archiving: Retention vs. Compliance
Where is my archive?
10+ years ago archiving enabled offloading Exchange data to the lower cost storage
With large mailboxes on commodity storage it does not make sense anymore
Single data repository is better than multiple solutions and lots of PST files
What do you need archiving for?
There’s still in-place archive a.k.a. online archive which is just a second mailbox available in online mode
Only needed based on client performance considerations or when you cannot extend mailbox capacity
Outlook 2013+ has the magic slider = no impact from large OST file1K 20K 100K 1M items in critical path folders (client has impact too) – are you stubbing? RetentionRetention is to help users delete their data when it is no longer neededRetention tags and retention policiesComplianceCompliance is to prevent users from deleting sensitive data that might be requested by legalLitigation Hold, In-Place Hold to protect and preserve data at rest (in the mailbox)
Data Loss Prevention (DLP) to protect data in transitUnified Compliance Center to combine Exchange/Lync and SharePoint together for eDiscoverySlide29
Client Breakdown
In today’s world, users access Exchange mailboxes from many clients and devices
Cumulative client concurrency can be over 100%
Penalty factors are measured in units of load created by a single Outlook client
Some clients generate more server load than a baseline Outlook client
Penalty factor should be calculated as weighted average across all types of clients
Caveats of this model:
Individual penalty factors are provided for illustration purpose only and should be adjusted based on internal test results, client configurations, vendor guidance and other relevant factors
Penalty factor for BES 5 is based on performance benchmarking guide published by Blackberry
at
http://aka.ms/bes5performance
Penalty factor for Good is based on data published on Good Portal at
http://aka.ms/goodperformance
Continue to monitor system utilization closely and adjust sizing model as necessary as you scale out
Sample client breakdown calculation (for illustration only)Slide30
Exchange 2013 PLA Conceptual Design
Four or more
physical servers
(collocated roles) in each DAG split
symmetrically between
two datacenter sites
Four database
copies (one lagged with lag replay manager) for
HA and SR on DAS storage with JBOD; minimized failure domainsUnbound Service site model with single unified load balanced namespace and Witness in the 3rd datacenterSlide31
Main takeaways
Keep your design simple!
Follow building block architecture principles
Ensure sufficient availability
Do your Exchange design right or go to Office 365!Slide32
Sessions to attend
BRK3197 - Exchange Server Preferred ArchitectureBRK3129 - Deploying Exchange Server 2016BRK3178 - Exchange on IaaS
: Concerns, Tradeoffs, and Best Practices
BRK3173 - Experts Unplugged: Exchange Server Deployment and Architecture
BRK2189 - Desktop Outlook: Evolved and Redefined
BRK3102
- Experts Unplugged: Exchange Server High Availability and Site Resilience
BRK3125 - High Availability and Site Resilience: Learning from the Cloud and FieldBRK3147 - Meeting Complex Security Requirements for Publishing ExchangeBRK3160 - Mail Flow and Transport Deep DiveBRK3163 - Making Managed Availability Easier to Monitor and TroubleshootBRK3180 - Tools and Techniques for Exchange Performance TroubleshootingBRK3186 - Behind the Curtain: Running Exchange OnlineBRK3206 - Exchange Storage for Insiders: It’s ESEBRK4105 - Under the hood with DAGsBRK4115 - Advanced Exchange Hybrid TopologiesSlide33
Pre-Release
Programs
Be first
in line
!
Exchange & SharePoint On-Premises Programs
Customers
get:Early access to new features
Opportunity to shape featuresClose relationship with the product teamsOpportunity to provide feedbackTechnical conference calls with members of the product teamsOpportunity to review and comment on documentationGet selected to be in a program:Sign-up at Ignite at the Preview Program deskORFill out a nomination: http://aka.ms/joinofficeQuestions:Visit the Preview Program desk in the Expo HallContact us at: ignite2015taps@microsoft.com Slide34
Evaluation
Please complete session evaluation!
Contact information
E-mail:
borisl@microsoft.com
Profile:
https
://www.linkedin.com/in/borisl
Social:
https
://www.facebook.com/lokhvitsky
Slide35
Visit
Myignite
at
http://myignite.microsoft.com
or download and use the
Ignite
Mobile
App
with the QR code above.Please evaluate this sessionYour feedback is important to us!Slide36
Thank You!