Matt Masson Senior Program Manager Microsoft Corporation DBI322 Change Data Capture Your First Data Warehouse Challenges More work to do Less time to do it More people using the system Mo Data Mo Problems ID: 551734
Download Presentation The PPT/PDF document "Incremental ETL Using CDC for SQL and Or..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012
Matt MassonSenior Program ManagerMicrosoft Corporation
DBI322Slide2
Change Data CaptureSlide3
Your First Data WarehouseSlide4
Challenges
More work to do
Less time to do it
More people
using the systemSlide5
Mo’ Data, Mo’ Problems
Processing time increases with data volumeScales linearly (assuming no bottlenecks)
Do Less Work!Slide6
Change Data Capture - How It Works
Enable it on the Source DatabaseInserts, Updates, and Deletes are automatically trackedConsume the changesSlide7
Demo
CDC in SQL Server 2012Slide8
SQL Server CDC – Summary
Enable it on the source systemAutomatic capture of changes
Configuration optionsUses transactional replicationSQL Server 2008 and later
-- enable CDC on the database
exec
sys
.
sp_cdc_enable_db
--
enable CDC on a table
exec
sys.sp_cdc_enable_table @
source_schema = N'dbo',
@
source_name = N'MyTable', @
role_name = N'cdc_admin
',
@supports_net_changes = 1 Slide9
Using Change Data CaptureSlide10
CDC Components for SSIS 2012
Use before and after data flows to handle CDC state information
Read change data from source system
Uses CDC state information
Splits rows based on operation
Smart Conditional SplitSlide11
Workflow
Initial Load
Incremental LoadSlide12
Initial Load
Scenario 1: Active database
Mark Initial Load StartBulk load from source to destinationMark Initial Load EndScenario 2: Inactive database or using snapshot/flashbackMark CDC startOracle – provide the System Change Number (SCN
)
SQL – provide the Log Sequence Number (
LSN
) or snapshot name
CDC Control Task will use current value if none is provided
Bulk load data from source to destinationSlide13
Incremental Load
Use CDC Control Task to get the processing rangeRead from cdc_states
tableStored in package variableUse CDC Source to read changes since last runProcess change rowsUse CDC Control Task to mark processing endSlide14
Using the Reprocessing Indicator
Option in the CDC Source to include a reprocessing flagAdds __$reprocessing column to data flow
Flag is set to True for rows that require special handlingRows that overlap with initial loadReprocessing a range after a failureSlide15
Demo
CDC Components for SSISSlide16
CDC Components - Summary
CDC Control TaskRetrieve and persist stateUse before and after your data flow
CDC SourceReads change data from source tableCDC SplitterSeparate rows by operation typeSlide17
Processing Modes
AllAll with Old ValuesNetNet with Update Mask
Net with MergeSlide18
Which Processing Mode Should I Use?
All vs. Net
All changesDo I need to record every change? Do I enjoy complicated ETL work?Net changesDo I want to push the work to the source system?Update Mask / Old ValuesDo I need to do something when specific columns change?Slide19
Processing Mode - All
Operation
KeyNameInsert50
Smith
Update
50
Smyth
Insert
60
Jones
Delete
60
Jones
Update
10WilliamsDelete20
MartinSlide20
Processing Mode - All
Control Flow
Data FlowSlide21
Processing Mode – All with Old Values
Operation
__$update_maskKeyName
Insert
0xFF
50
Smith
Update Old
0x80
50
Smith
Update
0x80
50Smyth
Insert0xFF
60Jones
Delete0xFF60Jones
Update Old
0x80
10WilliamUpdate0x80
10WilliamsDelete0xFF20MartinSlide22
Processing Mode – All with Old Values
Control Flow
Data FlowSlide23
Processing Mode – Net
Operation
KeyNameInsert
50
Smyth
Update
10
Williams
Delete
20
MartinSlide24
Processing Mode – Net
Control Flow
Data FlowSlide25
Processing Mode – Net with Update Mask
Operation
Key__$Key_ChangedName
__$
Name_Changed
Insert
50
True
Smyth
True
Update
10
False
WilliamsTrueDelete
20TrueMartin
TrueSlide26
Processing Mode – Net with Update Mask
Control Flow
Data FlowSlide27
Processing Mode – Net with Merge
Operation
KeyNameUpdate
50
Smyth
Update
10
Williams
Delete
20
MartinSlide28
Processing Mode – Net with Merge
Control Flow
Data FlowSlide29
CDC For OracleSlide30
How it Works
Oracle
Database
LogMiner
CDC Service
SQL Server
Mirror Tables
CDC Tables
CDC DesignerSlide31
Setup
Oracle CDC components are not installed by defaultMSIs found on SQL media under: \Tools\AttunityCDCOracle
Oracle CDC ServiceAttunityOracleCdcService.msiOracle CDC DesignerAttunityOracleCdcDesigner.msiSlide32
Oracle CDC Service Configuration
Name your serviceProvide a service accountSQL instance information
CDC master passwordOne service per Oracle DBSlide33
Oracle CDC Designer
Allows you to create and manage Oracle CDC InstancesYou can have multiple instances per CDC ServiceAn instance contains
Oracle database connection informationTables and columns being trackedMirroring SQL Server instance informationSlide34
Oracle CDC Designer – Creating an InstanceSlide35
Oracle CDC Designer – Managing an Instance
List all of your services and CDC capture instancesControl the instance state (start, stop, reset)Regenerate setup scriptsSlide36
FAQSlide37
Supported Versions
SQL Server (Enterprise Edition)2008, 2008 R2 and 2012Oracle
10g Release 210.2.0.1—10.2.0.5 (patch set as of April 2010)11g Release 111.1.0.6—11.1.0.7 (patch set as of September 2008)11g Release 211.2.0.1—11.2.0.2 (patch set as of November 2010)Slide38
Common Questions
Do I need any special permissions?Getting current LSN during initial load requires db_owner
Call made to sys.sp_replincrementlsn Will enabling CDC impact performance?Similar overhead to transactional replicationAsynchronous processing reduces overall impactIncreases data sizePrevents minimally logged operationsLess than
10% overhead
with normal workload (and no IO issues)Slide39
Books Online Resources
CDC Flow Componentshttp://msdn.microsoft.com/en-us/library/hh231087.aspx
Tuning the Performance of Change Data Capturehttp://msdn.microsoft.com/en-us/library/dd266396.aspx Comparing Change Data Capture and Change Trackinghttp://msdn.microsoft.com/en-us/library/cc280519.aspx
Slide40
Related Content
Breakout
Sessions
Hands-on
Labs
Related Certification Exam
Find Me Later At…
Required Slide
*delete this box when your slide is finalized
Speakers,
please list the Breakout Sessions,
Labs
, Demo Stations and Certification
Exams
that relate to your session. Also indicate when they can find you staffing in the TLC
.
Breakthrough Insights: Credible, Consistent Data
Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012
Breakthrough Insights: Credible, Consistent Data
Product Demo
Stations
DBI310: EIM: Bringing Together SSIS, MDS and DQS
DBI329: Upgrading SSIS Packages to SQL Server 2012
DBI207: BI Power HourDBI24-HOL: Exploring Microsoft SQL Server Integration ServicesSlide41
Contact
Matt Masson
mmasson@microsoft.com
m
attmasson.com
@
mattmassonSlide42
Track Resources
@sqlserver
@ms_teched
m
v
a
Microsoft Virtual Academy
SQL Server 2012 Eval Copy
Get Certified!
Hands-On LabsSlide43
Resources
Connect. Share. Discuss.
http://northamerica.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn Slide44
Required Slide
Complete an evaluation on CommNet and enter to win!Slide45
MS Tag
Scan the Tag
to evaluate this
session now on
myTechEd
MobileSlide46
©
2012 Microsoft
Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the
part
of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT
MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.Slide47