/
Dplyr  I EPID 799C Mon Sep 24 2017 Dplyr  I EPID 799C Mon Sep 24 2017

Dplyr I EPID 799C Mon Sep 24 2017 - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
356 views
Uploaded On 2018-10-21

Dplyr I EPID 799C Mon Sep 24 2017 - PPT Presentation

Todays Overview Pipes dplyr verbs filter summarise groupby Windows demos Homework 1 thoughts Homework 2 due Wed Homework 3 coming soon dplyr theory verbs Dplyr bigpicture ID: 692412

group starwars data dplyr starwars group dplyr data arr delay dep summarise mass pipe homeworld filter height gest avg

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Dplyr I EPID 799C Mon Sep 24 2017" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Dplyr I

EPID 799CMon Sep 24 2017Slide2

Today’s Overview

Pipesdplyrverbs: filter summarise

group_by

Windows

demos

Homework 1

: thoughts

Homework 2

: due Wed

Homework 3

: coming soon!Slide3

dplyr

theory, verbsSlide4

Dplyr

big-picture

Standard grammar of data manipulation

: Standard “words” and “phrases.” More abstraction for us humans.

Dataset abstracted

. Base R largely operates on vectors.

Dplyr

is oriented toward operating on data sets all at once. Functions aim at returning datasets.

Smart & efficient.

E.g. use

dplyr

on a database connection, and

dplyr

translates to

sql

for you.Slide5

Dplyr

big-picture

One Table Verbs

filter, select, arrange, summarize, mutate,

group_by

Linking Phrases

Pipe %>% (think “…then…”)

Multi-Table Verbs

mutating & filtering table joins, set operations, binding

Concepts / tidy data

wide & long dataSlide6

Whiteboard Overview

Use the words in a sentence!Slide7

Sidenote

:Star Wars

One of a few datasets included in

tidyr

/

dplyr

http://dplyr.tidyverse.org/reference/starwars.html#examples

Slide8

filter()Slide9

filter()

We have ways to do this []

filter(

starwars

,

homeworld

=="Tatooine")

Almost same as :

starwars

[

starwars$homeworld

== "Tatooine",]Slide10

select()Slide11

select()

select(

starwars

, name, height, mass)Slide12

arrange()

arrange(

starwars

, name)

arrange(

starwars

,

desc

(

homeworld

))Slide13

mutate()

Row-by-row actionsSlide14

mutate()

mutate(

starwars

,

is_tatooine_native

=

homeworld

=="Tatooine")

transmute(

starwars

,

is_tatooine_native

=

homeworld

=="Tatooine")Slide15

mutate()

Window functions

Others (rolling & recycled aggregates) are beyond the scope of this introductionSlide16

summarise()

Many to one operationsSlide17

summarise()

summarise

(

starwars

,

avg_height

= mean(height, na.rm=T),

avg_mass

= mean(mass, na.rm=T))

summarise_at

(

starwars

, c("height", "mass"), mean, na.rm=T)Slide18

group_by()

Groups variables within a

data.frame

* to perform multiple summarizing (or windowed*) actions on.Slide19

group_by()

group_by

(

starwars

,

homeworld

)

summarise_at

(

group_by

(

starwars

,

homeworld

),

c("height", "mass"), mean, na.rm=T)Slide20

Multi-Table OperationsSlide21

Tibble

sidenoteSlide22

Tibbles

A layer built on

data.frames

Largely work the same (if not,

as.data.frame

() it), but support retaining groups, prettier printing, etc.

class(

starwars

)

str

(

starwars

)

Note a slick move with films, vehicles, starships…Slide23

PipesSlide24

The Pipe

What?

Simplest pipe (%>%) takes what’s on the left and makes it the new first argument of what’s on the right

a %>% b(arg1=1, arg2=2)

becomes

b(a, arg1=1, arg2=2)

Slide25

The PipeSlide26

The Pipe

Why?

Easier to chain than nesting or multiple temporary datasets. And we often think or operate in chains, doing something new to the thing we just worked on.

“Take this thing, do this to it, do this other thing, then another, then group that, summarize that, and plot it.”

Helps reorder R constructs to human language.

Dplyr

(with pipes) create a “grammar” of data manipulation, which help translate concepts into “sentences.”Slide27

The Pipe

a1 <-

group_by

(flights, year, month, day)

a2 <- select(a1,

arr_delay

,

dep_delay

)

a3 <-

summarise

(a2,

arr

= mean(

arr_delay

, na.rm = TRUE),

dep = mean(

dep_delay

, na.rm = TRUE))

a4 <- filter(a3,

arr

> 30 | dep > 30)Slide28

The Pipe

filter(

summarise

(

select(

group_by

(flights, year, month, day),

arr_delay

,

dep_delay

),

arr

= mean(

arr_delay

, na.rm = TRUE),

dep = mean(

dep_delay

, na.rm = TRUE)

),

arr

> 30 | dep > 30

)Slide29

The Pipe

flights %>%

group_by

(year, month, day) %>%

select(

arr_delay

,

dep_delay

) %>%

summarise

(

arr

= mean(

arr_delay

, na.rm = TRUE),

dep = mean(

dep_delay

, na.rm = TRUE)

) %>%

filter(

arr

> 30 | dep > 30)Slide30

The Pipe

births$sex

%>%

hist

()

starwars

%>% filter(mass > 100)

starwars

%>%

filter(films %in% "Revenge of the

Sith

")

# How new sf in GIS works....Slide31

The Pipe

planet_bmi

=

starwars

%>%

group_by

(

homeworld

) %>%

summarise_at

(c("height", "mass"), mean, na.rm=T) %>%

mutate(

bmi

= mass / (height/100)^2)Slide32

The Pipe

More complex piping here:

https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html

Slide33

Tidy Data

wide? long?Slide34

tidyrSlide35

tidyrSlide36

tidyr

Most common:

gather() spread()

Less common:

separate() unite()Slide37

a =

starwars

%>%

gather("

num

", "

val

", height, mass,

birth_year

)

b = a %>%

spread(

num

,

val

)Slide38

Advanced Concepts

Things we’re not covering, but you should know exist

http://dplyr.tidyverse.org/reference/index.html

Slide39

Working with Databases

Package

dbplyr

Translates your

dplyr

into SQL code to send to a connection

Try it out if you have access to a server!

https://github.com/tidyverse/dbplyr

https://github.com/tidyverse/dbplyr/blob/master/vignettes/dbplyr.Rmd

Slide40

Non-Standard Evaluation

It’s why not quoting things works

It gets really hairy

Use case:

What if you want to “program” with

dplyr

?Slide41

Integration w/ other packages

%>% passes objects (often data) around into first argument.

What have we seen recently that starts with data?Slide42

What does this do?

#

Instaggplot

starwars

%>%

group_by

(

homeworld

) %>%

summarise_at

(c("height", "mass"), mean, na.rm=T) %>%

mutate(

bmi

= mass / (height/100)^2) %>%

ggplot

(

aes

(

homeworld

,

bmi

, fill=

homeworld

)) +

geom_col

(

show.legend

= F)+

coord_flip

()Slide43

Putting it all together

Back to birthsSlide44

Let’s Try

What is the mean and sd

weeks of gestation by race-ethnicity group?

Construct a

dplyr

“sentence” to look at county-specific effects on preterm and pnc5. (HW3!)Slide45

Answers

births %>%

left_join

(

data.frame

(

mrace

=1:4,

race_f

=c("W", "B", "AI/AN", "O"))) %>%

group_by

(

race_f

,

methnic

) %>%

summarise

(

avg_gest

= mean(

wksgest

, na.rm = T),

gest_sd

=

sd

(

wksgest

, na.rm=T),

n=n()) %>%

mutate(

ci_low

= avg_gest-0.5*1.96*

gest_sd

,

ci_high

= avg_gest+0.5*1.96*

gest_sd

) %>%

arrange(

avg_gest

) %>%

filter(

methnic

!= "U" &

race_f

!= "O") %>%

unite(

raceeth

,

race_f

,

methnic

,

sep

=".") %>%

ggplot

(

aes

(

raceeth

,

avg_gest

, fill=

avg_gest

))+

geom_col

()+

geom_linerange

(

aes

(x=

raceeth

,

ymin

=

ci_low

,

ymax

=

ci_high

), color="grey")+

geom_text

(

aes

(label=round(

avg_gest

, 1)),

nudge_y

= 1)

#Q2: See you on Wednesday! Homework 3!