/
Modularizing Data with  dbt Modularizing Data with  dbt

Modularizing Data with dbt - PowerPoint Presentation

BabyDoll
BabyDoll . @BabyDoll
Follow
342 views
Uploaded On 2022-08-01

Modularizing Data with dbt - PPT Presentation

Oregon State University Jacob Mastel Data Modeler Oregon State University Banner Institution One of only 2 universities with land sea space and sun grant designations Home of the CORE team ID: 931624

dbt data sql sfr2410 data dbt sfr2410 sql models project tests citizenship model unit single test components architecture tools

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Modularizing Data with dbt" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Modularizing Data with dbt

Oregon State University

Jacob Mastel – Data Modeler

Slide2

Oregon State University

Banner Institution

One of only 2 universities with land, sea, space, and sun grant designations

Home of

the:CORE teamYInMn blue aka “Bluetiful”Maraschino cherriesThe computer mouse

Slide3

Talk Roadmap

A high level overview of our AWS architecture

The data tools we evaluated

An example project

Final Q&A

Slide4

High Level AWS Architecture

Extract

Most on-

prem

data is pulled into s3 using either DMS or Lambda

functions.

Load

An in-house application, Elsa, intelligently reads the data in s3 and imports it into

Snowflake.

Transform

dbt

is used to create and manage SQL models

.

Slide5

Evaluation of Data Tools

Primary Criteria:

Cost

Developer time

ELT vs ELTVendor Lock In

Slide6

Why we chose dbt

Mostly SQL and YAML, with some other text files mixed in

Fairly easy to read

Very simple to integrate into a

github workflowBrings DRY principles to SQLAllows us to “unit test” our incoming data as well as the developed modelsVery little vendor lock inSupports multiple RDMS outputs“Compiles” to SQLRelies on our existing service for the heavy lifting

Slide7

SFR2410: A test project

~20,000 lines of SQL-

ish

ETL code written with

AppworkUsed to create official snapshot data used for legal reporting and long term statistical analysisDue to its important nature and complexity, nobody wants to touch it

Slide8

SFR2410: Setting it up

dbt

init

dbt-SFR2410cd dbt-SFR2410git init

Analysis: models not pushed to database

Data: static data used in models

Macros: jinja2 macros

Models: What appears in the database

Tests: Unit tests

Slide9

What’s a Jinja?

dbt

models are built using

jinja

enhanced SQLThink what old school, inline PHP is to HTML

Slide10

SFR2410: Breaking the Monolith

Instead of one monolithic procedure, we have small, specialized components.

Package configuration allows for a significant control over model groups, or even individual models

From

dbt_project.yml:

Slide11

SFR2410: Comparing Blocks

Citizenship in SFR2410

Citizenship in dbt-SFR2410

Slide12

SFR2410: Combining Components

The smaller components are then combined into a larger, complete model

output/

student.sql

Slide13

SFR2410: Unit Testing

Unit tests are just SQL statements looking for errors

Citizenship should return a single entry. This test looks for any GOBINTL_PIDM with two entries

If the test fails, we already know exactly which records caused the failure

tests/single_citizenship_entry.sql

Slide14

SFR2410: But a component is too slow!

Each model can be configured within its individual SQL file, or can be adjusted within the project file

In this case, we’ve selected a single model within the

supporting

group to be materialized into a table.dbt-project.yml

Slide15

Questions?