Sohee Kang PhD Math and Stats Learning Centre amp Computer and Mathematical Sciences Subscripting Subscripting can be used to access and manipulate the elements of objects like vectors matrices arrays data frames and lists ID: 509680
Download Presentation The PPT/PDF document "R workshop for Advanced users" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
R workshop for Advanced users
Sohee
Kang, PhD
Math and Stats Learning Centre &
Computer and Mathematical SciencesSlide2
Subscripting
Subscripting can be used to access and manipulate the elements
of objects
like vectors, matrices, arrays, data frames and lists.
Subscripting operations are fast and
efficient
, and should be the
preferred method
when dealing with data in
R.Slide3
Numeric subscripts
In R, the
first
element of an object has subscript 1
.
A vector of subscripts can be used to access multiple elements of an object.> x <- 1:10> x[1] 1 2 3 4 5 6 7 8 9 10> x[c(1,3,5)][1] 1 3 5Negative subscripts extract all elements of an object except the one specified.
> x[-c(1,3,5)]
[1] 2 4 6 7 8 9 10Slide4
Logical Testing
== equals
>, < greater, less than
>=, <=
greater,less
than or equal to
! not &, && and (single is element-by-element, double is first element) |, || or
> x <- 1:10
> x < 5 & x > 2
[1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
> x < 5 && x > 2
[1] FALSESlide5
logical subscripts
We can use logical values to choose which elements of the object
to access
. Elements corresponding to TRUE in the logical vector
are included
, and elements corresponding to FALSE are ignored.> x <- 1:10; names(x) <- letters[1:10]> x>5 a b c d e f g h i j FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE > x[x > 5]f g h i j6 7 8 9 10
# using logical subscript to modify the object
> x[x > 5] <- 0
> x
a b c d e f g h i j
1 2 3 4 5 0 0 0 0 0Slide6
subscripting multidimensional objects
For multidimensional objects, subscripts can be provided for
each dimension
.
To select all elements of a given dimension, use the
“empty" subscript.> mat <- matrix(1:12, 3, 4, byrow=TRUE)> mat [,1] [,2] [,3] [,4][1,] 1 2 3 4[2,] 5 6 7 8[3,] 9 10 11 12>
mat[5]
[1] 6
> mat[2,2]
[1] 6
> mat[1, ]
[1] 1 2 3
4
> mat[c(1,3),]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 9 10 11 12Slide7
Exercise for Vectors
Create a vector of the positive odd integers less than 100
Remove the values greater than 60 and less than 80
Find the variance of the remaining set of valuesSlide8
the
order()
function
sorting rows of a matrix arbitrarily
# sort the iris data frame
by Sepal.Length> iris.sort <- iris[order(iris[,"Sepal.Length"]),]> head(iris.sort) Sepal.Length Sepal.Width
Petal.Length
Petal.Width
Species
14 4.3 3.0 1.1 0.1
setosa
9 4.4 2.9 1.4 0.2
setosa
39 4.4 3.0 1.3 0.2
setosa
43 4.4 3.2 1.3 0.2
setosa
42 4.5 2.3 1.3 0.3
setosa
4 4.6 3.1 1.5 0.2 setosa
Try to sort the iris data frame in decreasing order with respect to
Sepal.Width
.Slide9
combined selections for matrices
Suppose we want to get all the columns for which the element at
the first row
is less than 3
:
> mat[ , mat[1, ] <3] [,1] [,2][1,] 1 2[2,] 5 6[3,] 9 10Slide10
complex logical expressions
subscripting data frames
>
dat
<-
data.frame(a = seq(5, 20, by=3), b = c(8, NA, 12, 15, NA, 21))> dat a b1 5 82 8 NA
3 11 12
4 14 15
5 17 NA
6 20 21
>
dat
[
dat$b
< 10, ]
a b
1 5 8
NA
NA
NA
NA.1 NA NA> > # removing the missing values> dat[!is.na(dat$b
) & (
dat$b
< 10), ]
a b
1 5 8Slide11
the function
subset
()
subscripting data frames
The function
subset()allows one to perform selections of the elements in a data frame in very simple way.> dat <- data.frame(a = seq(5, 20, by=3), b = c(8, NA, 12, 15, NA, 21))>
subset(dat, b < 10)
a b
1 5 8
Note
: The subset() function always returns a new data frame, matrix
of vector, and is not adequate for modifying elements of a data frame.Slide12
Data Selection and Manipulation
sample(x, size, replace,
prob
)
take a random sample from
x%in% return logical vector of matcheswhich(x) return index of TRUE resultsall(…), any(…) return TRUE if all or any arguments are TRUEunique(x) return unique observations in vectorduplicated(x)
return
duplicated observations
sort sort vector or factor
order
sort based on multiple arguments
merge()
merge two data frames by common cols or rows
ceiling, floor,
trunc
, round,
signif
rounding functionsSlide13
%in%, which
> x <- sample(1:10, 20, replace = TRUE)
> x
[1] 4 10 2 3 4 3 6 4 7 3 9 1 3 4 7 1 3 2 8
[20] 5
> x %in% c(3, 10, 2, 1) [1] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE[10] TRUE FALSE TRUE TRUE
FALSE
FALSE
TRUE TRUE
TRUE
[19] FALSE
FALSE
> x[x %in% c(3, 10, 2, 1)]
[1] 10 2 3 3 3 1 3 1 3 2
> which(x %in% c(3, 10, 2, 1))
[1] 2 3 4 6 10 12 13 16 17 18Slide14
Exercise 1Slide15
Loops and Functions
A loop allows the program to repeatedly execute commands. Loops
are common
to many programming languages and their use may
facilitate the
implementation of many operations.There are three kinds of loops in R:`for' loops`while' loops`repeat' loopsNote: Loops can be very inefficient in R.For that reason, their use is not advised, unless necessary.Slide16
`for' loops
General form:
for (variable in sequence) {
set_of_expressions
}
> for(i in 1:10) {print(sqrt(i))}[1] 1[1] 1.414214[1] 1.732051...[1] 3.162278Slide17
Easy Example:
col.v
<- rainbow(100)
cex.v
<-
seq(1, 10, length.out=100)plot(0:1, 0:1, type="n")for(i in 1:200){ print(i
)
points(x=
runif(1), y=
runif
(1),
pch
=16,
col
=sample(
col.v
, size=1),
cex
=sample(
cex.v
, size=1))
Sys.sleep(0.1)}Slide18
`while' loops
General form:
while (condition) {
set_of_expressions
}
> a <- 0; b <- 1> while(b < 10) {print(b)temp <- a+ba <- bb <- temp}[1] 1[1] 1[1] 2
[1] 3
[1] 5
[1] 8Slide19
`repeat' loops
General form:
repeat (condition) {
set_of_expressions
if (condition) { break }
}> a <- 0; b <- 1> repeat {print(b)temp <- a+ba <- bb <- tempif(b>=10){break}}[1] 1[1] 1
[1] 2
[1] 3
[1] 5
[1] 8
Note
: The loop is terminated by the break command.Slide20
cleaning the mess
To have a cleaner version when working with loops, we can do:
# Arithmetic Progression
> x <- 1; d <- 2
> while (length(x) < 10) {
position <- length(x)new <- x[position]+dx <- c(x,new)}> print(x)[1] 1 3 5 7 9 11 13 15 17 19Slide21
writing functions
A function is a collection of commands that perform a specific task.
General form:
function.name <- function (arguments){
set_of_expressions
return (answer)}Slide22
writing functions
Example: Arithmetic Progression
> AP <- function(a, d, n){
x <- a
while (length(x) < n){
position <- length(x)new <- x[position]+dx <- c(x, new)}return(x)}Slide23
writing functions
Once you run this code, you will have available a new function called AP.
To run the function, type on the console:
> AP(1,2,10)
[1] 1 3 5 7 9 11 13 15 17 19
> AP(1,0,10)[1] 1 1 1 1 1 1 1 1 1 1Note that for d==0 the function is returning a sequence of ones.We can easily x this with an if statement.Slide24
the `if' statement
General form:
if (condition) {
set_of_expressions
}
We can also combine the `if' with the `else' statement:if (condition) {set_of_expressions} else {set_of_expressions}Slide25
the `if' statement
> AP <- function(a, d, n){
if(d ==0) {
return("Error: argument `d' should not be 0")
break
} else {x <- awhile (length(x) < n){position <- length(x)new <- x[position]+dx <- c(x, new)}return(x)}
}
> AP(1, 0, 3)
[1] "Error: argument `d' should not be 0"Slide26
Exercise 2
Generate 10000 exponential random numbers with rate = 0.1
Check the distribution by histogram
Check the mean and standard deviation
Using a “for” loop, generate 1000 random samples of size = 30 from (1), compute its mean and store them.
Check the distribution of sample means from (4) Slide27
Graphics
R offers and incredible variety of graphs. Type this code to get a sense of what is possible:
demo(graphics)
x <- 10*(1:nrow(volcano))
y <- 10*(1:ncol(volcano))image(x, y, volcano, col = terrain.colors(100), axes = FALSE)contour(x, y, volcano, levels = seq
(90, 200, by = 5),
add = TRUE, col = "
peru
")
axis(1, at =
seq
(100, 800, by = 100))
axis(2, at =
seq
(100, 600, by = 100))
box()
title(main = "
Maunga
Whau
Volcano", font.main = 4)http://cran.r-project.org/web/views/Graphics.htmlSlide28
Managing graphics and graphical devices: opening multiple graphical devices
data(iris)
plot(
iris$Sepal.Length
,
iris$Sepal.Width, pch=19)dev.new()plot(iris$Sepal.Length,
iris$Petal.Length
,
pch
=19)
#you can also use "X11()", but it may not work in some Mac computersSlide29
jpeg(file="SepalLenght_vs_SepalWidth.jpeg")
plot(
iris$Sepal.Length
,
iris$Sepal.Width
, pch=19)dev.off #closes graphical devicepng(file="SepalLenght_vs_SepalWidth.png")
plot(
iris$Sepal.Length
,
iris$Sepal.Width
,
pch
=19)
dev.off
()
pdf(file="SepalLenght_vs_SepalWidth.pdf")
plot(
iris$Sepal.Length
,
iris$Sepal.Width
,
pch=19)dev.off()postscript(file="SepalLenght_vs_SepalWidth.ps") #
often used for publication
plot(
iris$Sepal.Length
,
iris$Sepal.Width
,
pch
=19)
dev.off
()Slide30
High level graphical functions (create a graph)
iris[1:5,]
plot(
iris$Sepal.Length
)
plot(iris$Sepal.Length, iris$Sepal.Width)plot(iris$Petal.Length, iris$Petal.Width)plot(iris$Petal.Length,
iris$Petal.Width
,
xlab
="Sepal length (cm)",
ylab
="Petal Width (cm)",
cex.axis
=1.5, cex.lab=1.5,
bty
="n",
pch
=19)
boxplot(
iris$Sepal.Length
~ iris$Species)boxplot(iris$Sepal.Length ~ iris$Species, names=expression(italic("Iris setosa"), italic("Iris versicolor"), italic("Iris virginica
")),
ylab
="Sepal length (cm)",
cex.axis
=1.5, cex.lab=1.5)
coplot
(
iris$Petal.Length
~
iris$Petal.Width
|
iris$Sepal.Length
, overlap=0,
pch
=19)
pairs(iris)
hist
(
iris$Sepal.Length
)Slide31
Low level graphical functions (affect an existing graph)
plot(
iris$Petal.Length
,
iris$Petal.Width
, xlab="Sepal length (cm)", ylab="Petal Width (cm)", cex=1.3, cex.axis=1.5, cex.lab=1.5, bty="n")
points(
iris$Petal.Length
[iris$Species
=="
setosa
"],
iris$Petal.Width
[
iris$Species
=="
setosa
"],
cex
=1.3,
pch=19, col="red")points(iris$Petal.Length[iris$Species=="versicolor"], iris$Petal.Width[iris$Species
=="
versicolor
"],
cex
=1.3,
pch
=19,
col
="blue")
points(
iris$Petal.Length
[
iris$Species
=="
virginica
"],
iris$Petal.Width
[
iris$Species
=="
virginica
"],
cex
=1.3,
pch
=19,
col
="green")
legend("
topleft
", c("Iris
setosa
", "I.
versicolor
", "I.
virginica
"),
pch
=19,
col
=c("red", "blue", "green"),
cex
=1.3)
legend("
bottomright
", c(expression(italic("Iris
setosa
")),
expression(italic("Iris
versicolor
")), expression(italic("Iris
virginica
"))),
pch
=19,
col
=c("red", "blue", "green"),
cex
=1.3)
legend(1.3, 1.9, c(expression(italic("Iris
setosa
")),
expression(italic("Iris
versicolor
")), expression(italic("Iris
virginica
"))),
pch
=19,
col
=c("red", "blue", "green"),
cex
=1.3)Slide32
Plotting in 2d
The cars data frame is a two-column data set of cars speeds and
stopping
distances from the 1920s
head(cars)
speed dist1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 Slide33
Plotting in 2d
By default plot() produces a scatterplot
>plot(cars)
Axis
labels are from the names
in the data frame Axis scale is from the range of the data Slide34
Plotting in 2d
To add details it’s better to use
low-level functions.
plot(
cars,type
="p")line(lowess(cars),col="red")Slide35
Combining plots
To show
multiple
plots in one
graphics window use the par()
command with the mfrowParameter: par(mfrow=c(2,2)) plot(cars,type="p") plot(cars,type="l") plot(cars,type
="h")
plot(
cars,type
="s")Slide36
Managing graphics and graphical devices: partitioning a graphical device
l
ayout(matrix(1:4
, 2, 2)) #see help for "
layout“
layout.show(4) #see help for "layout.show"plot(iris$Sepal.Length, iris$Sepal.Width, pch
=19,
cex.lab
=1.5,
cex.axis
=1.5,
xlab
="Sepal length (cm)",
ylab
="Sepal width (cm)")
plot(
iris$Sepal.Length
,
iris$Petal.Length
,
pch=19, cex.lab=1.5, cex.axis=1.5, xlab="Sepal length (cm)", ylab
="Petal length (cm)")
plot(
iris$Sepal.Length
,
iris$Petal.Width
,
pch
=19,
cex.lab
=1.5,
cex.axis
=1.5,
xlab
="Sepal length (cm)",
ylab
="Petal width (cm)")
plot(
iris$Sepal.Length
,
iris$Petal.Width
,
pch
=19, type="n", axes=F,
bty
="n",
xlab
="",
ylab
="")
mtext
("Sepal length", line=-3,
cex
=1.5)
mtext
("versus", line=-5,
cex
=1.5)
mtext
("other variables", line=-7,
cex
=1.5)
mtext
("in Anderson's Iris", line=-9,
cex
=1.5)
layout(matrix(1:6, 3, 2))
layout.show
(6)
plot(
iris$Sepal.Length
,
iris$Sepal.Width
,
pch
=19)
plot(
iris$Sepal.Length
,
iris$Petal.Length
,
pch
=19)
plot(
iris$Sepal.Length
,
iris$Petal.Width
,
pch
=19)
plot(
iris$Sepal.Width
,
iris$Petal.Length
,
pch
=19)
plot(
iris$Sepal.Width
,
iris$Petal.Width
,
pch
=19)
plot(
iris$Petal.Length
,
iris$Petal.Width
,
pch
=19)Slide37
Plotting in 3D
Some data is well-suited to three dimensional plots
The
matrix volcano records elevations of the volcano
Maunga
Whau in New Zealandvolcano[1:4,1:4]dim(volcano)x<-1:87y<-1:61#Default are a heat mapimage(
x,y,volcano
)Slide38
Plotting in 3D
#Changing the color
map
image(
x,y,volcano,col
=terrain.colors(range(volcano)))Slide39
Plotting in 3D
#A contour map
contour(
x,y,volcano
)
#A perspective plotpersp(x,y,volcano)Slide40
The package ggplot2
The ggplot2 package was created in 2005 by Hadley Wickham
It
implements the object oriented design ideas of Leland
Wilkinson’s
The Grammar of Graphics.Slide41
The Package ggplot2
The function
qplot
() is a ggplot2
version of the regular function
plot()library(ggplot2)qplot(speed,dist,data=cars)ggplot(data=cars, aes(x=speed, y=
dist
))+
geom_point
()Slide42
The package ggplot2
Combining
geoms
can produce
more complex
plotslibrary(ggplot2)library(foreign)dat <- read.dta("http://www.ats.ucla.edu/stat/data/ologit.dta")ggplot
(
dat
,
aes
(x = apply,
y =
gpa
)) +
geom_boxplot
(size = .75)Slide43
The package ggplot2
p<-
ggplot
(
dat
, aes(x = apply,y =gpa))+geom_boxplot(size = .75)p
<-
p+geom_jitter
(alpha = .5)
p