/
Oracle Oracle

Oracle - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
464 views
Uploaded On 2016-12-09

Oracle - PPT Presentation

Data Guard at CERN Emil Pilecki Credit Luca Canali Marcin Blaszczyk Steffen Pade Agenda About CERN Oracle and Data Guard at CERN DG perks and benefits Zero data loss over long distances far sync ID: 499194

redo data transport sync data redo sync transport database primary standby guard block workload testing recovery production ora file

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Oracle" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Oracle Data Guard at CERN

Emil Pilecki

Credit

: Luca Canali

, Marcin

Blaszczyk,

Steffen

PadeSlide2

Agenda

About CERN

Oracle and Data Guard

at CERN

DG perks and benefits

Zero data loss over long distances (far sync)

Far sync testing resultsSlide3

About CERN

European Organization for Nuclear Research founded in 1954

21 member states, 2 candidates, 6 observers + UNESCO and UE

60 Non-member States collaborate with CERN

2500 staff members and 10 000 scientistsSlide4

LHC and Experiments

Large Hadron Collider (LHC) – particle accelerator collides beams at very high energy

27 km long circular tunnel

Located ~100m underground

Protons travel at 99.9999972% the speed of light

Collisions are analysed with usage of special detectors and software in the experiments dedicated to LHC

New particle discovered!

Consistent with the Higgs Boson

Announced on July 4th 2012Slide5

Oracle at CERN

Since 1982, version 2.3

Oracle DBs play a key role in the LHC production chains

Accelerator logging and monitoring systems

Online acquisition, offline data (re)processing, data distribution, analysis

Grid infrastructure and operation services

Monitoring, dashboards, etc.

Data management services

File catalogues, file transfers, etc.

Metadata and transaction processing for tape storage system

Administrative servicesSlide6

CERN’s Databases

Over 100 Oracle databases, mostly RAC

NAS storage plus some SAN with ASM

~400 TB of data files for production DBs

Examples of CERN’s critical DBs:

LHC logging database ~170 TB, expected growth up to 70 TB / year

13 Production experiments’ databases ~140 TB in total

15 production systems protected with Data Guard

Active Data Guard since 11gSlide7

Our Data Guard architecture

Primary Database

Active Data Guard

for disaster recovery

Active Data Guard

for read only workloads

2. Busy & critical ADG

1. Low load ADG

Active Data Guard

for read only workloads and disaster recovery

Primary Database

Maximum performance

Maximum performance

Redo

Transport

Redo

Transport

Redo

Transport

LOG_ARCHIVE_DEST_X=‘SERVICE=<

tns_alias

> OPTIONAL

ASYNC NOAFFIRM

VALID FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=<

standby_db_unique_name

>’Slide8

(Active) Data Guard benefitsFeatures and functionalities we profit from:Data protection for disaster recovery

Replication and offloading read only workloadDatabase backups from standbySafeguard logical data corruptions with flashback Snapshot standby for testing

Fast upgrades and hardware migrations

Detection of lost writes

Automatic block media recoverySlide9

Disaster recoveryWe have been using it since a few yearsSwitchover/failover is our first line of defence

Saved the day already for production servicesCurrent disaster recovery site at 10 km away from our main datacentreRemote site in Hungary to be used soon

Over 1000km distance

Network latency of 25ms is a challenge

Plan to move most of the standby databases there within 1 yearSlide10

Offloading production databasesEfficient replication of the whole database

Workload distributionTransactional workload runs on primaryRead-only workload can be moved to ADG

Read-mostly workload: DMLs can be redirected to primary with a

dblink

Database backups from standby

Significantly reduces load on primary by

Removes sequential I/O of full backup

ADG allows usage of block change tracking for fast incremental backupsSlide11

Flashback and snapshot standbyFlashback enabled on standby only

Recover from human errors and data corruptionsAvoid impacting primary database with flashback logs generation

Snapshot standby

Testing changes before implementing them on primary

Safe – redo is still sent to standby

Very easy to use

SQL> ALTER DATABASE CONVERT TO SNAPSHOT STANDBY;

SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;Slide12

Fast upgrades and migrations

Clusterware

1

1

g

+

RDBMS 1

1g

Clusterware

1

2c+RDBMS 11g

Redo

Transport

DATA GUARD RAC

database

Primary

database RAC

Redo Transport

RW Access

RW

Acess

Clusterware

1

2c

+

RDBMS 1

2c

RDBMS

upgrade

DATABASE downtime

Upgrade complete!

1

2

3

4

5

6Slide13

Fast upgrades and migrationsRisk mitigation

Fresh installation of the new clusterwareOld system stays untouched

Allows full upgrade test

Allows stress testing of new system

Downtime reduction

~ 1h for RDBMS upgrade

Additional hardware required unless migration to new one is expected anywaySlide14

Lost write detection and ABMRSlave exiting with ORA-752 exceptionErrors in file /ORA/dbs0a/PDBR_RAC50/diag

/rdbms/pdbr_rac50/PDBR1/trace/PDBR1_pr0l_92600.trc:ORA-00752: recovery detected a lost write of a data blockORA-10567: Redo is inconsistent with data block (file# 67, block# 57976209, file offset is 2494701568 bytes)

ORA-10564:

tablespace

STRMMON

ORA-01110: data file 67: '/ORA/dbs03/PDBR_RAC50/

datafile

/STRMMON_67.dbf'ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 435213427 Mon Apr 14 06:52:02 2014 Recovery Slave PR0L previously exited with exception 752Stops redo application when a lost write is detectedPrevious consistent block version still on standbyHelps to diagnose and repair the errorAutomatic Block Media Recovery with ADGFixes physical block corruptions

Works both ways: Primary

 ADGSlide15

Zero data loss replicationUse synchronous redo transport method

DML statements impacted due to commit acknowledgment on standby

LOG_ARCHIVE_DEST_X=‘SERVICE=<

tns_alias

> OPTIONAL

SYNC AFFIRM

VALID FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=<

standby_db_unique_name

>’

Data Guard Standby

Primary Database

Redo

Transport

Commit

Ack

Network latency matters!!!Slide16

Long distances = high network latency =

slow commit acknowledge with SYNC redo transport

Far Sync concepts

Redo

Transport

Redo

Transport

Redo Transport

Redo Transport

Redo

Transport

25 ms

sync

sync

async

asyncSlide17

Far Sync testing at CERNFunctionalDoes it work? Are there any bugs?Performance

Simulated heavy DML workload with and without Far SyncOracle Real Application Testing – workload captured from production databases

Redo

Transport

25 ms

Redo

Far SyncSlide18

Far Sync testing resultsFunctional testsIt works well!!! but…

3-7013523981: FRA not cleaned up automatically on FAR SYNC instance3-7023772221: Failover to alternate destination does not work with FAR SYNCBoth bugs still present in 12.1.0.1 production

Some configuration issues with Data Guard Broker

Redo

Transport

25 ms

Redo

Far

SyncSlide19

Far Sync testing results

Performance tests with simulated heavy DML workload

256 parallel sessions inserting data in 500 row batches, 50 batches per session. The target table partitioned and indexed: 4 local b-tree indexes, 6 local bitmap indexes, global primary key index with reversed keys.

Each session inserting data into it's own partition.Slide20

Far Sync testing resultsPerformance tests with Oracle Real Application Testing frameworkReal production workload captured per schema

Workload replay with and without Far Sync 25ms latency

Replay parameters:

connect_time_scale

=0

think_time_scale

=0

CMSR – DML mostly workloadLCGR – read only workloadSlide21

Far Sync summaryVery promising for long distance replication if data loss is not acceptableUp to 60% performance gain (DML only workloads) with 25ms network latency

Lightweight and easy to deploy (virtual machine)If latency <5ms most likely you don’t need Far SyncThere are still bugs that need fixing

Redo

Transport

25 ms

Redo

Far

SyncSlide22

Discussion