Dplyr I EPID 799C Mon Sep 24 2017 PowerPoint Presentation, PPT - DocSlides

Dplyr  I EPID 799C Mon Sep 24 2017 PowerPoint Presentation, PPT - DocSlides

2018-10-21 9K 9 0 0

Description

Today’s Overview. Pipes. dplyr. verbs: filter . summarise. group_by. Windows. demos. Homework 1. : thoughts. Homework 2. : due Wed. Homework 3. : coming soon!. dplyr. theory, verbs. Dplyr. big-picture. ID: 692413

Embed code:

Download this presentation



DownloadNote - The PPT/PDF document "Dplyr I EPID 799C Mon Sep 24 2017" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentations text content in Dplyr I EPID 799C Mon Sep 24 2017

Slide1

Dplyr I

EPID 799CMon Sep 24 2017

Slide2

Today’s Overview

Pipesdplyrverbs: filter summarise

group_by

Windows

demos

Homework 1

: thoughts

Homework 2

: due Wed

Homework 3

: coming soon!

Slide3

dplyr

theory, verbs

Slide4

Dplyr

big-picture

Standard grammar of data manipulation

: Standard “words” and “phrases.” More abstraction for us humans.

Dataset abstracted

. Base R largely operates on vectors.

Dplyr

is oriented toward operating on data sets all at once. Functions aim at returning datasets.

Smart & efficient.

E.g. use

dplyr

on a database connection, and

dplyr

translates to

sql

for you.

Slide5

Dplyr

big-picture

One Table Verbs

filter, select, arrange, summarize, mutate,

group_by

Linking Phrases

Pipe %>% (think “…then…”)

Multi-Table Verbs

mutating & filtering table joins, set operations, binding

Concepts / tidy data

wide & long data

Slide6

Whiteboard Overview

Use the words in a sentence!

Slide7

Sidenote

:Star Wars

One of a few datasets included in

tidyr

/

dplyr

http://dplyr.tidyverse.org/reference/starwars.html#examples

Slide8

filter()

Slide9

filter()

We have ways to do this []

filter(

starwars

,

homeworld

=="Tatooine")

Almost same as :

starwars

[

starwars$homeworld

== "Tatooine",]

Slide10

select()

Slide11

select()

select(

starwars

, name, height, mass)

Slide12

arrange()

arrange(

starwars

, name)

arrange(

starwars

,

desc

(

homeworld

))

Slide13

mutate()

Row-by-row actions

Slide14

mutate()

mutate(

starwars

,

is_tatooine_native

=

homeworld

=="Tatooine")

transmute(

starwars

,

is_tatooine_native

=

homeworld

=="Tatooine")

Slide15

mutate()

Window functions

Others (rolling & recycled aggregates) are beyond the scope of this introduction

Slide16

summarise()

Many to one operations

Slide17

summarise()

summarise

(

starwars

,

avg_height

= mean(height, na.rm=T),

avg_mass

= mean(mass, na.rm=T))

summarise_at

(

starwars

, c("height", "mass"), mean, na.rm=T)

Slide18

group_by()

Groups variables within a

data.frame

* to perform multiple summarizing (or windowed*) actions on.

Slide19

group_by()

group_by

(

starwars

,

homeworld

)

summarise_at

(

group_by

(

starwars

,

homeworld

),

c("height", "mass"), mean, na.rm=T)

Slide20

Multi-Table Operations

Slide21

Tibble

sidenote

Slide22

Tibbles

A layer built on

data.frames

Largely work the same (if not,

as.data.frame

() it), but support retaining groups, prettier printing, etc.

class(

starwars

)

str

(

starwars

)

Note a slick move with films, vehicles, starships…

Slide23

Pipes

Slide24

The Pipe

What?

Simplest pipe (%>%) takes what’s on the left and makes it the new first argument of what’s on the right

a %>% b(arg1=1, arg2=2)

becomes

b(a, arg1=1, arg2=2)

Slide25

The Pipe

Slide26

The Pipe

Why?

Easier to chain than nesting or multiple temporary datasets. And we often think or operate in chains, doing something new to the thing we just worked on.

“Take this thing, do this to it, do this other thing, then another, then group that, summarize that, and plot it.”

Helps reorder R constructs to human language.

Dplyr

(with pipes) create a “grammar” of data manipulation, which help translate concepts into “sentences.”

Slide27

The Pipe

a1 <-

group_by

(flights, year, month, day)

a2 <- select(a1,

arr_delay

,

dep_delay

)

a3 <-

summarise

(a2,

arr

= mean(

arr_delay

, na.rm = TRUE),

dep = mean(

dep_delay

, na.rm = TRUE))

a4 <- filter(a3,

arr

> 30 | dep > 30)

Slide28

The Pipe

filter(

summarise

(

select(

group_by

(flights, year, month, day),

arr_delay

,

dep_delay

),

arr

= mean(

arr_delay

, na.rm = TRUE),

dep = mean(

dep_delay

, na.rm = TRUE)

),

arr

> 30 | dep > 30

)

Slide29

The Pipe

flights %>%

group_by

(year, month, day) %>%

select(

arr_delay

,

dep_delay

) %>%

summarise

(

arr

= mean(

arr_delay

, na.rm = TRUE),

dep = mean(

dep_delay

, na.rm = TRUE)

) %>%

filter(

arr

> 30 | dep > 30)

Slide30

The Pipe

births$sex

%>%

hist

()

starwars

%>% filter(mass > 100)

starwars

%>%

filter(films %in% "Revenge of the

Sith

")

# How new sf in GIS works....

Slide31

The Pipe

planet_bmi

=

starwars

%>%

group_by

(

homeworld

) %>%

summarise_at

(c("height", "mass"), mean, na.rm=T) %>%

mutate(

bmi

= mass / (height/100)^2)

Slide32

The Pipe

More complex piping here:

https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html

Slide33

Tidy Data

wide? long?

Slide34

tidyr

Slide35

tidyr

Slide36

tidyr

Most common:

gather() spread()

Less common:

separate() unite()

Slide37

a =

starwars

%>%

gather("

num

", "

val

", height, mass,

birth_year

)

b = a %>%

spread(

num

,

val

)

Slide38

Advanced Concepts

Things we’re not covering, but you should know exist

http://dplyr.tidyverse.org/reference/index.html

Slide39

Working with Databases

Package

dbplyr

Translates your

dplyr

into SQL code to send to a connection

Try it out if you have access to a server!

https://github.com/tidyverse/dbplyr

https://github.com/tidyverse/dbplyr/blob/master/vignettes/dbplyr.Rmd

Slide40

Non-Standard Evaluation

It’s why not quoting things works

It gets really hairy

Use case:

What if you want to “program” with

dplyr

?

Slide41

Integration w/ other packages

%>% passes objects (often data) around into first argument.

What have we seen recently that starts with data?

Slide42

What does this do?

#

Instaggplot

starwars

%>%

group_by

(

homeworld

) %>%

summarise_at

(c("height", "mass"), mean, na.rm=T) %>%

mutate(

bmi

= mass / (height/100)^2) %>%

ggplot

(

aes

(

homeworld

,

bmi

, fill=

homeworld

)) +

geom_col

(

show.legend

= F)+

coord_flip

()

Slide43

Putting it all together

Back to births

Slide44

Let’s Try

What is the mean and sd

weeks of gestation by race-ethnicity group?

Construct a

dplyr

“sentence” to look at county-specific effects on preterm and pnc5. (HW3!)

Slide45

Answers

births %>%

left_join

(

data.frame

(

mrace

=1:4,

race_f

=c("W", "B", "AI/AN", "O"))) %>%

group_by

(

race_f

,

methnic

) %>%

summarise

(

avg_gest

= mean(

wksgest

, na.rm = T),

gest_sd

=

sd

(

wksgest

, na.rm=T),

n=n()) %>%

mutate(

ci_low

= avg_gest-0.5*1.96*

gest_sd

,

ci_high

= avg_gest+0.5*1.96*

gest_sd

) %>%

arrange(

avg_gest

) %>%

filter(

methnic

!= "U" &

race_f

!= "O") %>%

unite(

raceeth

,

race_f

,

methnic

,

sep

=".") %>%

ggplot

(

aes

(

raceeth

,

avg_gest

, fill=

avg_gest

))+

geom_col

()+

geom_linerange

(

aes

(x=

raceeth

,

ymin

=

ci_low

,

ymax

=

ci_high

), color="grey")+

geom_text

(

aes

(label=round(

avg_gest

, 1)),

nudge_y

= 1)

#Q2: See you on Wednesday! Homework 3!


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.