/
Incremental ETL Using CDC for SQL and Oracle with SQL Serve Incremental ETL Using CDC for SQL and Oracle with SQL Serve

Incremental ETL Using CDC for SQL and Oracle with SQL Serve - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
434 views
Uploaded On 2017-05-24

Incremental ETL Using CDC for SQL and Oracle with SQL Serve - PPT Presentation

Matt Masson Senior Program Manager Microsoft Corporation DBI322 Change Data Capture Your First Data Warehouse Challenges More work to do Less time to do it More people using the system Mo Data Mo Problems ID: 551734

data cdc processing microsoft cdc data microsoft processing oracle update sql change flow source server mode load net control

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Incremental ETL Using CDC for SQL and Or..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012

Matt MassonSenior Program ManagerMicrosoft Corporation

DBI322Slide2

Change Data CaptureSlide3

Your First Data WarehouseSlide4

Challenges

More work to do

Less time to do it

More people

using the systemSlide5

Mo’ Data, Mo’ Problems

Processing time increases with data volumeScales linearly (assuming no bottlenecks)

Do Less Work!Slide6

Change Data Capture - How It Works

Enable it on the Source DatabaseInserts, Updates, and Deletes are automatically trackedConsume the changesSlide7

Demo

CDC in SQL Server 2012Slide8

SQL Server CDC – Summary

Enable it on the source systemAutomatic capture of changes

Configuration optionsUses transactional replicationSQL Server 2008 and later

-- enable CDC on the database

exec

sys

.

sp_cdc_enable_db

--

enable CDC on a table

exec

sys.sp_cdc_enable_table @

source_schema = N'dbo',

@

source_name = N'MyTable', @

role_name = N'cdc_admin

',

@supports_net_changes = 1 Slide9

Using Change Data CaptureSlide10

CDC Components for SSIS 2012

Use before and after data flows to handle CDC state information

Read change data from source system

Uses CDC state information

Splits rows based on operation

Smart Conditional SplitSlide11

Workflow

Initial Load

Incremental LoadSlide12

Initial Load

Scenario 1: Active database

Mark Initial Load StartBulk load from source to destinationMark Initial Load EndScenario 2: Inactive database or using snapshot/flashbackMark CDC startOracle – provide the System Change Number (SCN

)

SQL – provide the Log Sequence Number (

LSN

) or snapshot name

CDC Control Task will use current value if none is provided

Bulk load data from source to destinationSlide13

Incremental Load

Use CDC Control Task to get the processing rangeRead from cdc_states

tableStored in package variableUse CDC Source to read changes since last runProcess change rowsUse CDC Control Task to mark processing endSlide14

Using the Reprocessing Indicator

Option in the CDC Source to include a reprocessing flagAdds __$reprocessing column to data flow

Flag is set to True for rows that require special handlingRows that overlap with initial loadReprocessing a range after a failureSlide15

Demo

CDC Components for SSISSlide16

CDC Components - Summary

CDC Control TaskRetrieve and persist stateUse before and after your data flow

CDC SourceReads change data from source tableCDC SplitterSeparate rows by operation typeSlide17

Processing Modes

AllAll with Old ValuesNetNet with Update Mask

Net with MergeSlide18

Which Processing Mode Should I Use?

All vs. Net

All changesDo I need to record every change? Do I enjoy complicated ETL work?Net changesDo I want to push the work to the source system?Update Mask / Old ValuesDo I need to do something when specific columns change?Slide19

Processing Mode - All

Operation

KeyNameInsert50

Smith

Update

50

Smyth

Insert

60

Jones

Delete

60

Jones

Update

10WilliamsDelete20

MartinSlide20

Processing Mode - All

Control Flow

Data FlowSlide21

Processing Mode – All with Old Values

Operation

__$update_maskKeyName

Insert

0xFF

50

Smith

Update Old

0x80

50

Smith

Update

0x80

50Smyth

Insert0xFF

60Jones

Delete0xFF60Jones

Update Old

0x80

10WilliamUpdate0x80

10WilliamsDelete0xFF20MartinSlide22

Processing Mode – All with Old Values

Control Flow

Data FlowSlide23

Processing Mode – Net

Operation

KeyNameInsert

50

Smyth

Update

10

Williams

Delete

20

MartinSlide24

Processing Mode – Net

Control Flow

Data FlowSlide25

Processing Mode – Net with Update Mask

Operation

Key__$Key_ChangedName

__$

Name_Changed

Insert

50

True

Smyth

True

Update

10

False

WilliamsTrueDelete

20TrueMartin

TrueSlide26

Processing Mode – Net with Update Mask

Control Flow

Data FlowSlide27

Processing Mode – Net with Merge

Operation

KeyNameUpdate

50

Smyth

Update

10

Williams

Delete

20

MartinSlide28

Processing Mode – Net with Merge

Control Flow

Data FlowSlide29

CDC For OracleSlide30

How it Works

Oracle

Database

LogMiner

CDC Service

SQL Server

Mirror Tables

CDC Tables

CDC DesignerSlide31

Setup

Oracle CDC components are not installed by defaultMSIs found on SQL media under: \Tools\AttunityCDCOracle

Oracle CDC ServiceAttunityOracleCdcService.msiOracle CDC DesignerAttunityOracleCdcDesigner.msiSlide32

Oracle CDC Service Configuration

Name your serviceProvide a service accountSQL instance information

CDC master passwordOne service per Oracle DBSlide33

Oracle CDC Designer

Allows you to create and manage Oracle CDC InstancesYou can have multiple instances per CDC ServiceAn instance contains

Oracle database connection informationTables and columns being trackedMirroring SQL Server instance informationSlide34

Oracle CDC Designer – Creating an InstanceSlide35

Oracle CDC Designer – Managing an Instance

List all of your services and CDC capture instancesControl the instance state (start, stop, reset)Regenerate setup scriptsSlide36

FAQSlide37

Supported Versions

SQL Server (Enterprise Edition)2008, 2008 R2 and 2012Oracle

10g Release 210.2.0.1—10.2.0.5 (patch set as of April 2010)11g Release 111.1.0.6—11.1.0.7 (patch set as of September 2008)11g Release 211.2.0.1—11.2.0.2 (patch set as of November 2010)Slide38

Common Questions

Do I need any special permissions?Getting current LSN during initial load requires db_owner

Call made to sys.sp_replincrementlsn Will enabling CDC impact performance?Similar overhead to transactional replicationAsynchronous processing reduces overall impactIncreases data sizePrevents minimally logged operationsLess than

10% overhead

with normal workload (and no IO issues)Slide39

Books Online Resources

CDC Flow Componentshttp://msdn.microsoft.com/en-us/library/hh231087.aspx

Tuning the Performance of Change Data Capturehttp://msdn.microsoft.com/en-us/library/dd266396.aspx Comparing Change Data Capture and Change Trackinghttp://msdn.microsoft.com/en-us/library/cc280519.aspx

Slide40

Related Content

Breakout

Sessions

Hands-on

Labs

Related Certification Exam

Find Me Later At…

Required Slide

*delete this box when your slide is finalized

Speakers,

please list the Breakout Sessions,

Labs

, Demo Stations and Certification

Exams

that relate to your session. Also indicate when they can find you staffing in the TLC

.

Breakthrough Insights: Credible, Consistent Data

Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012

Breakthrough Insights: Credible, Consistent Data

Product Demo

Stations

DBI310: EIM: Bringing Together SSIS, MDS and DQS

DBI329: Upgrading SSIS Packages to SQL Server 2012

DBI207: BI Power HourDBI24-HOL: Exploring Microsoft SQL Server Integration ServicesSlide41

Contact

Matt Masson

mmasson@microsoft.com

m

attmasson.com

@

mattmassonSlide42

Track Resources

@sqlserver

@ms_teched

m

v

a

Microsoft Virtual Academy

SQL Server 2012 Eval Copy

Get Certified!

Hands-On LabsSlide43

Resources

Connect. Share. Discuss.

http://northamerica.msteched.com

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Resources for Developers

http://microsoft.com/msdn Slide44

Required Slide

Complete an evaluation on CommNet and enter to win!Slide45

MS Tag

Scan the Tag

to evaluate this

session now on

myTechEd

MobileSlide46

©

2012 Microsoft

Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the

part

of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT

MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.Slide47