/
R workshop for Advanced users R workshop for Advanced users

R workshop for Advanced users - PowerPoint Presentation

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
383 views
Uploaded On 2017-01-14

R workshop for Advanced users - PPT Presentation

Sohee Kang PhD Math and Stats Learning Centre amp Computer and Mathematical Sciences Subscripting Subscripting can be used to access and manipulate the elements of objects like vectors matrices arrays data frames and lists ID: 509680

length iris cex sepal iris length sepal cex petal width pch plot axis function col graphical seq ylab data

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "R workshop for Advanced users" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

R workshop for Advanced users

Sohee

Kang, PhD

Math and Stats Learning Centre &

Computer and Mathematical SciencesSlide2

Subscripting

Subscripting can be used to access and manipulate the elements

of objects

like vectors, matrices, arrays, data frames and lists.

Subscripting operations are fast and

efficient

, and should be the

preferred method

when dealing with data in

R.Slide3

Numeric subscripts

In R, the

first

element of an object has subscript 1

.

A vector of subscripts can be used to access multiple elements of an object.> x <- 1:10> x[1] 1 2 3 4 5 6 7 8 9 10> x[c(1,3,5)][1] 1 3 5Negative subscripts extract all elements of an object except the one specified.

> x[-c(1,3,5)]

[1] 2 4 6 7 8 9 10Slide4

Logical Testing

== equals

>, < greater, less than

>=, <=

greater,less

than or equal to

! not &, && and (single is element-by-element, double is first element) |, || or

> x <- 1:10

> x < 5 & x > 2

[1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE

> x < 5 && x > 2

[1] FALSESlide5

logical subscripts

We can use logical values to choose which elements of the object

to access

. Elements corresponding to TRUE in the logical vector

are included

, and elements corresponding to FALSE are ignored.> x <- 1:10; names(x) <- letters[1:10]> x>5 a b c d e f g h i j FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE > x[x > 5]f g h i j6 7 8 9 10

# using logical subscript to modify the object

> x[x > 5] <- 0

> x

a b c d e f g h i j

1 2 3 4 5 0 0 0 0 0Slide6

subscripting multidimensional objects

For multidimensional objects, subscripts can be provided for

each dimension

.

To select all elements of a given dimension, use the

“empty" subscript.> mat <- matrix(1:12, 3, 4, byrow=TRUE)> mat [,1] [,2] [,3] [,4][1,] 1 2 3 4[2,] 5 6 7 8[3,] 9 10 11 12>

mat[5]

[1] 6

> mat[2,2]

[1] 6

> mat[1, ]

[1] 1 2 3

4

> mat[c(1,3),]

[,1] [,2] [,3] [,4]

[1,] 1 2 3 4

[2,] 9 10 11 12Slide7

Exercise for Vectors

Create a vector of the positive odd integers less than 100

Remove the values greater than 60 and less than 80

Find the variance of the remaining set of valuesSlide8

the

order()

function

sorting rows of a matrix arbitrarily

# sort the iris data frame

by Sepal.Length> iris.sort <- iris[order(iris[,"Sepal.Length"]),]> head(iris.sort) Sepal.Length Sepal.Width

Petal.Length

Petal.Width

Species

14 4.3 3.0 1.1 0.1

setosa

9 4.4 2.9 1.4 0.2

setosa

39 4.4 3.0 1.3 0.2

setosa

43 4.4 3.2 1.3 0.2

setosa

42 4.5 2.3 1.3 0.3

setosa

4 4.6 3.1 1.5 0.2 setosa

Try to sort the iris data frame in decreasing order with respect to

Sepal.Width

.Slide9

combined selections for matrices

Suppose we want to get all the columns for which the element at

the first row

is less than 3

:

> mat[ , mat[1, ] <3] [,1] [,2][1,] 1 2[2,] 5 6[3,] 9 10Slide10

complex logical expressions

subscripting data frames

>

dat

<-

data.frame(a = seq(5, 20, by=3), b = c(8, NA, 12, 15, NA, 21))> dat a b1 5 82 8 NA

3 11 12

4 14 15

5 17 NA

6 20 21

>

dat

[

dat$b

< 10, ]

a b

1 5 8

NA

NA

NA

NA.1 NA NA> > # removing the missing values> dat[!is.na(dat$b

) & (

dat$b

< 10), ]

a b

1 5 8Slide11

the function

subset

()

subscripting data frames

The function

subset()allows one to perform selections of the elements in a data frame in very simple way.> dat <- data.frame(a = seq(5, 20, by=3), b = c(8, NA, 12, 15, NA, 21))>

subset(dat, b < 10)

a b

1 5 8

Note

: The subset() function always returns a new data frame, matrix

of vector, and is not adequate for modifying elements of a data frame.Slide12

Data Selection and Manipulation

sample(x, size, replace,

prob

)

take a random sample from

x%in% return logical vector of matcheswhich(x) return index of TRUE resultsall(…), any(…) return TRUE if all or any arguments are TRUEunique(x) return unique observations in vectorduplicated(x)

return

duplicated observations

sort sort vector or factor

order

sort based on multiple arguments

merge()

merge two data frames by common cols or rows

ceiling, floor,

trunc

, round,

signif

rounding functionsSlide13

%in%, which

> x <- sample(1:10, 20, replace = TRUE)

> x

[1] 4 10 2 3 4 3 6 4 7 3 9 1 3 4 7 1 3 2 8

[20] 5

> x %in% c(3, 10, 2, 1) [1] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE[10] TRUE FALSE TRUE TRUE

FALSE

FALSE

TRUE TRUE

TRUE

[19] FALSE

FALSE

> x[x %in% c(3, 10, 2, 1)]

[1] 10 2 3 3 3 1 3 1 3 2

> which(x %in% c(3, 10, 2, 1))

[1] 2 3 4 6 10 12 13 16 17 18Slide14

Exercise 1Slide15

Loops and Functions

A loop allows the program to repeatedly execute commands. Loops

are common

to many programming languages and their use may

facilitate the

implementation of many operations.There are three kinds of loops in R:`for' loops`while' loops`repeat' loopsNote: Loops can be very inefficient in R.For that reason, their use is not advised, unless necessary.Slide16

`for' loops

General form:

for (variable in sequence) {

set_of_expressions

}

> for(i in 1:10) {print(sqrt(i))}[1] 1[1] 1.414214[1] 1.732051...[1] 3.162278Slide17

Easy Example:

col.v

<- rainbow(100)

cex.v

<-

seq(1, 10, length.out=100)plot(0:1, 0:1, type="n")for(i in 1:200){ print(i

)

points(x=

runif(1), y=

runif

(1),

pch

=16,

col

=sample(

col.v

, size=1),

cex

=sample(

cex.v

, size=1))

Sys.sleep(0.1)}Slide18

`while' loops

General form:

while (condition) {

set_of_expressions

}

> a <- 0; b <- 1> while(b < 10) {print(b)temp <- a+ba <- bb <- temp}[1] 1[1] 1[1] 2

[1] 3

[1] 5

[1] 8Slide19

`repeat' loops

General form:

repeat (condition) {

set_of_expressions

if (condition) { break }

}> a <- 0; b <- 1> repeat {print(b)temp <- a+ba <- bb <- tempif(b>=10){break}}[1] 1[1] 1

[1] 2

[1] 3

[1] 5

[1] 8

Note

: The loop is terminated by the break command.Slide20

cleaning the mess

To have a cleaner version when working with loops, we can do:

# Arithmetic Progression

> x <- 1; d <- 2

> while (length(x) < 10) {

position <- length(x)new <- x[position]+dx <- c(x,new)}> print(x)[1] 1 3 5 7 9 11 13 15 17 19Slide21

writing functions

A function is a collection of commands that perform a specific task.

General form:

function.name <- function (arguments){

set_of_expressions

return (answer)}Slide22

writing functions

Example: Arithmetic Progression

> AP <- function(a, d, n){

x <- a

while (length(x) < n){

position <- length(x)new <- x[position]+dx <- c(x, new)}return(x)}Slide23

writing functions

Once you run this code, you will have available a new function called AP.

To run the function, type on the console:

> AP(1,2,10)

[1] 1 3 5 7 9 11 13 15 17 19

> AP(1,0,10)[1] 1 1 1 1 1 1 1 1 1 1Note that for d==0 the function is returning a sequence of ones.We can easily x this with an if statement.Slide24

the `if' statement

General form:

if (condition) {

set_of_expressions

}

We can also combine the `if' with the `else' statement:if (condition) {set_of_expressions} else {set_of_expressions}Slide25

the `if' statement

> AP <- function(a, d, n){

if(d ==0) {

return("Error: argument `d' should not be 0")

break

} else {x <- awhile (length(x) < n){position <- length(x)new <- x[position]+dx <- c(x, new)}return(x)}

}

> AP(1, 0, 3)

[1] "Error: argument `d' should not be 0"Slide26

Exercise 2

Generate 10000 exponential random numbers with rate = 0.1

Check the distribution by histogram

Check the mean and standard deviation

Using a “for” loop, generate 1000 random samples of size = 30 from (1), compute its mean and store them.

Check the distribution of sample means from (4) Slide27

Graphics

R offers and incredible variety of graphs. Type this code to get a sense of what is possible:

demo(graphics)

x <- 10*(1:nrow(volcano))

y <- 10*(1:ncol(volcano))image(x, y, volcano, col = terrain.colors(100), axes = FALSE)contour(x, y, volcano, levels = seq

(90, 200, by = 5),

add = TRUE, col = "

peru

")

axis(1, at =

seq

(100, 800, by = 100))

axis(2, at =

seq

(100, 600, by = 100))

box()

title(main = "

Maunga

Whau

Volcano", font.main = 4)http://cran.r-project.org/web/views/Graphics.htmlSlide28

Managing graphics and graphical devices: opening multiple graphical devices

data(iris)

plot(

iris$Sepal.Length

,

iris$Sepal.Width, pch=19)dev.new()plot(iris$Sepal.Length,

iris$Petal.Length

,

pch

=19)

#you can also use "X11()", but it may not work in some Mac computersSlide29

jpeg(file="SepalLenght_vs_SepalWidth.jpeg")

plot(

iris$Sepal.Length

,

iris$Sepal.Width

, pch=19)dev.off #closes graphical devicepng(file="SepalLenght_vs_SepalWidth.png")

plot(

iris$Sepal.Length

,

iris$Sepal.Width

,

pch

=19)

dev.off

()

pdf(file="SepalLenght_vs_SepalWidth.pdf")

plot(

iris$Sepal.Length

,

iris$Sepal.Width

,

pch=19)dev.off()postscript(file="SepalLenght_vs_SepalWidth.ps") #

often used for publication

plot(

iris$Sepal.Length

,

iris$Sepal.Width

,

pch

=19)

dev.off

()Slide30

High level graphical functions (create a graph)

iris[1:5,]

plot(

iris$Sepal.Length

)

plot(iris$Sepal.Length, iris$Sepal.Width)plot(iris$Petal.Length, iris$Petal.Width)plot(iris$Petal.Length,

iris$Petal.Width

,

xlab

="Sepal length (cm)",

ylab

="Petal Width (cm)",

cex.axis

=1.5, cex.lab=1.5,

bty

="n",

pch

=19)

boxplot(

iris$Sepal.Length

~ iris$Species)boxplot(iris$Sepal.Length ~ iris$Species, names=expression(italic("Iris setosa"), italic("Iris versicolor"), italic("Iris virginica

")),

ylab

="Sepal length (cm)",

cex.axis

=1.5, cex.lab=1.5)

coplot

(

iris$Petal.Length

~

iris$Petal.Width

|

iris$Sepal.Length

, overlap=0,

pch

=19)

pairs(iris)

hist

(

iris$Sepal.Length

)Slide31

Low level graphical functions (affect an existing graph)

plot(

iris$Petal.Length

,

iris$Petal.Width

, xlab="Sepal length (cm)", ylab="Petal Width (cm)", cex=1.3, cex.axis=1.5, cex.lab=1.5, bty="n")

points(

iris$Petal.Length

[iris$Species

=="

setosa

"],

iris$Petal.Width

[

iris$Species

=="

setosa

"],

cex

=1.3,

pch=19, col="red")points(iris$Petal.Length[iris$Species=="versicolor"], iris$Petal.Width[iris$Species

=="

versicolor

"],

cex

=1.3,

pch

=19,

col

="blue")

points(

iris$Petal.Length

[

iris$Species

=="

virginica

"],

iris$Petal.Width

[

iris$Species

=="

virginica

"],

cex

=1.3,

pch

=19,

col

="green")

legend("

topleft

", c("Iris

setosa

", "I.

versicolor

", "I.

virginica

"),

pch

=19,

col

=c("red", "blue", "green"),

cex

=1.3)

legend("

bottomright

", c(expression(italic("Iris

setosa

")),

expression(italic("Iris

versicolor

")), expression(italic("Iris

virginica

"))),

pch

=19,

col

=c("red", "blue", "green"),

cex

=1.3)

legend(1.3, 1.9, c(expression(italic("Iris

setosa

")),

expression(italic("Iris

versicolor

")), expression(italic("Iris

virginica

"))),

pch

=19,

col

=c("red", "blue", "green"),

cex

=1.3)Slide32

Plotting in 2d

The cars data frame is a two-column data set of cars speeds and

stopping

distances from the 1920s

head(cars)

speed dist1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 Slide33

Plotting in 2d

By default plot() produces a scatterplot

>plot(cars)

Axis

labels are from the names

in the data frame Axis scale is from the range of the data Slide34

Plotting in 2d

To add details it’s better to use

low-level functions.

plot(

cars,type

="p")line(lowess(cars),col="red")Slide35

Combining plots

To show

multiple

plots in one

graphics window use the par()

command with the mfrowParameter: par(mfrow=c(2,2)) plot(cars,type="p") plot(cars,type="l") plot(cars,type

="h")

plot(

cars,type

="s")Slide36

Managing graphics and graphical devices: partitioning a graphical device

l

ayout(matrix(1:4

, 2, 2)) #see help for "

layout“

layout.show(4) #see help for "layout.show"plot(iris$Sepal.Length, iris$Sepal.Width, pch

=19,

cex.lab

=1.5,

cex.axis

=1.5,

xlab

="Sepal length (cm)",

ylab

="Sepal width (cm)")

plot(

iris$Sepal.Length

,

iris$Petal.Length

,

pch=19, cex.lab=1.5, cex.axis=1.5, xlab="Sepal length (cm)", ylab

="Petal length (cm)")

plot(

iris$Sepal.Length

,

iris$Petal.Width

,

pch

=19,

cex.lab

=1.5,

cex.axis

=1.5,

xlab

="Sepal length (cm)",

ylab

="Petal width (cm)")

plot(

iris$Sepal.Length

,

iris$Petal.Width

,

pch

=19, type="n", axes=F,

bty

="n",

xlab

="",

ylab

="")

mtext

("Sepal length", line=-3,

cex

=1.5)

mtext

("versus", line=-5,

cex

=1.5)

mtext

("other variables", line=-7,

cex

=1.5)

mtext

("in Anderson's Iris", line=-9,

cex

=1.5)

layout(matrix(1:6, 3, 2))

layout.show

(6)

plot(

iris$Sepal.Length

,

iris$Sepal.Width

,

pch

=19)

plot(

iris$Sepal.Length

,

iris$Petal.Length

,

pch

=19)

plot(

iris$Sepal.Length

,

iris$Petal.Width

,

pch

=19)

plot(

iris$Sepal.Width

,

iris$Petal.Length

,

pch

=19)

plot(

iris$Sepal.Width

,

iris$Petal.Width

,

pch

=19)

plot(

iris$Petal.Length

,

iris$Petal.Width

,

pch

=19)Slide37

Plotting in 3D

Some data is well-suited to three dimensional plots

The

matrix volcano records elevations of the volcano

Maunga

Whau in New Zealandvolcano[1:4,1:4]dim(volcano)x<-1:87y<-1:61#Default are a heat mapimage(

x,y,volcano

)Slide38

Plotting in 3D

#Changing the color

map

image(

x,y,volcano,col

=terrain.colors(range(volcano)))Slide39

Plotting in 3D

#A contour map

contour(

x,y,volcano

)

#A perspective plotpersp(x,y,volcano)Slide40

The package ggplot2

The ggplot2 package was created in 2005 by Hadley Wickham

It

implements the object oriented design ideas of Leland

Wilkinson’s

The Grammar of Graphics.Slide41

The Package ggplot2

The function

qplot

() is a ggplot2

version of the regular function

plot()library(ggplot2)qplot(speed,dist,data=cars)ggplot(data=cars, aes(x=speed, y=

dist

))+

geom_point

()Slide42

The package ggplot2

Combining

geoms

can produce

more complex

plotslibrary(ggplot2)library(foreign)dat <- read.dta("http://www.ats.ucla.edu/stat/data/ologit.dta")ggplot

(

dat

,

aes

(x = apply,

y =

gpa

)) +

geom_boxplot

(size = .75)Slide43

The package ggplot2

p<-

ggplot

(

dat

, aes(x = apply,y =gpa))+geom_boxplot(size = .75)p

<-

p+geom_jitter

(alpha = .5)

p