EXTRACTION, CLEANING AND PowerPoint Presentation

EXTRACTION, CLEANING AND PowerPoint Presentation

2017-12-10 40K 40 0 0

Description

TRANSFORMATION . TOOLS. Prepared . By. Aakanksha . Agrawal & Richa Pandey. Mtech CSE 3. rd. . SEM. Main Function:. Data Extraction . - Involves gathering data from multiple heterogeneous sources.. ID: 614273

Embed code:

Download this presentation



DownloadNote - The PPT/PDF document "EXTRACTION, CLEANING AND" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentations text content in EXTRACTION, CLEANING AND

Slide1

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

Prepared

By

Aakanksha

Agrawal & Richa Pandey

Mtech CSE 3

rd

SEM

Slide2

Main Function:

Data Extraction - Involves gathering data from multiple heterogeneous sources. Data Cleaning - Involves finding and correcting the errors in data. Data Transformation - Involves converting the data from legacy format to warehouse format.

2

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

Slide3

Extract and Load Process

Data extraction takes data from the source systems. Data load takes the extracted data and loads it into the data warehouse. Note: Before loading the data into the data warehouse, the information extracted from the external sources must be reconstructed.

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

3

Slide4

Controlling the process involves determining when to start data extraction and the consistency check on data. Controlling process ensures that the tools, the logic modules, and the programs are executed in correct sequence and at correct time.

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

4

A) Controlling the Process

B) WHEN TO INITIATE EXTRACT

Data needs to be in a consistent state when it is

extracted, i.e., the data warehouse should represent a

single, consistent version of the information to the

user.

Slide5

C) Loading the Data

After extracting the data, it is loaded into a temporary data store where it is cleaned up and made consistent. Note: Consistency checks are executed only when all the data sources have been loaded into the temporary data store.

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

5

Slide6

Clean and Transform Process

Once the data is extracted and loaded into the temporary data store, it is time to perform Cleaning and Transforming. Steps involved in Cleaning and Transforming: A) Clean and transform the loaded data into a structure B) Partition the data C) Aggregation

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

6

Slide7

A) Clean and Transform the Loaded Data into a Structure

Cleaning and transforming the loaded data helpsspeed up the queries. It can be done by makingthe data consistent: within itselfwith other data within the same data sourcewith the data in other source systemswith the existing data present in the warehouse

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

7

Slide8

A) Clean and Transform the Loaded Data into a Structure

Transforming involves converting the source data into a structure. Structuring the data increases the query performance and decreases the operational cost. The data contained in a data warehouse must be transformed to support performance requirements and control the ongoing operational costs.

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

8

Slide9

B) Partition the Data

It will optimize the hardware performance and simplify the management of data warehouse. Here we partition each fact table into multiple separate partitions.

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

9

C) AGGREGATION

Aggregation is required to speed up common queries.

Aggregation relies on the fact that most common queries will analyze a subset or an aggregation of the detailed data.

Slide10

EXTRACTION, CLEANING AND TRANSFORMATION

Tasks of capturing data from source systems, cleansing and transforming it, and loading the results into a target system can be carried out either by separate products, or by a single integrated solution. Integrated solutions can fall into one of the categories below:Code GeneratorsDatabase Data Replication ToolsDynamic Transformation Engines

EXTRACTION, CLEANING AND TRANSFORMATION TOOLS

10

Slide11

11

Thankyou


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.