3122014 First lets get some data Load the Duncan dataset Its in the car package Remember how to get it librarycar dataDuncan Getting started Okay now plot income levels plot ID: 496315
Download Presentation The PPT/PDF document "Introduction to Graphics in R" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to Graphics in R
3/12/2014Slide2
First, let’s get some data
Load the Duncan dataset
It’s in the car package. Remember how to get it?
library(car)
data(Duncan)Slide3
Getting started
Okay, now plot income levels:
plot(
Duncan$income
)
What is this graph? Can you make it a line plot instead?
plot(
Duncan$income
, type=“l”)Slide4
Histogram
The X axis is useless. Wouldn’t a histogram be more informative?
Make a histogram
If you’re stuck, use google
hist
(
Duncan$income
)Slide5
Fix the title
‘Histogram of
Duncan$income
’ is not a good title
Change it to ‘Income Distribution in Duncan Dataset’
hist
(
Duncan$income
, main="Income Distribution in Duncan Dataset")Slide6
Another option
There’s another way to set the title. Maybe some of you will have done this (my crystal ball is murky):
hist
(
Duncan$income
)
title("Income Distribution in Duncan Dataset“)
But wait. That looks awful. We need to not print the title as part of the
hist
() call. How do we do that?
hist
(
Duncan$income
, main="")Slide7
Scatterplot
Okay, let’s look at income vs. prestige
Make a scatterplot comparing income (x-axis) to prestige (y-axis)
plot(
Duncan$income
,
Duncan$prestige
)
Did you get the x- and y- axes right?
Add a title: Income vs. Prestige
title(“Income vs. Prestige”)Slide8
Scatterplot: Axis labels
The axis labels display the variable names. Can we do better than that?
Label the X axis “Income” and the Y axis “Prestige”
plot(
Duncan$income
,
Duncan$prestige
,
xlab
="Income",
ylab
="Prestige")Slide9
Scatterplot: Axis range
How come income doesn’t have ticks at 0 and 100 but prestige does?
Make both axes run from 0 to 100
plot(
Duncan$income
,
Duncan$prestige
,
xlab
="Income",
ylab
="Prestige",
xlim
=c(0,100))Slide10
Scatterplot Axis Tick Marks
Actually, your
collaborator wants
tick marks every 5 points on the X axis.
DO IT
Caveat: this is trickier:
plot(
Duncan$income
,
Duncan$prestige
,
xlab
="Income",
ylab
="Prestige",
xlim
=c(0,100),
xaxt
="n")
axis(1, at=
seq
(0,100, by=5))Slide11
Axis labels sideways
Your
collaborator still
isn’t happy. Turn the x labels sideways.
plot(
Duncan$income
,
Duncan$prestige
,
xlab
="Income",
ylab
="Prestige",
xlim
=c(0,100),
xaxt
="n")
axis(1,
las
=2, at=
seq
(0,100, by=5))Slide12
More columns
Now your
collaborator wants
to see how education affect this relationship. Create a dichotomous variable named ‘
high_education
’ categorizing education > 50 as TRUE and <= 50 as FALSE
Duncan$high_education
<-
Duncan$education
> 50Slide13
High education: sanity check
How many high and low education jobs are there?
table(
Duncan$high_education
)
Plot education (y-axis) by
high_education
(x-axis)
plot(
Duncan$high_education
,
Duncan$education
)
Does it look right?Slide14
Adding color
Okay, now color your income/prestige graph so high-education jobs are blue and low-education jobs are red
This is a little tricky
colors <-
as.numeric
(
Duncan$high_education
)+1
plot(Duncan$income
,
Duncan$prestige
, col=c("red", "blue")[colors],
xlab
="Income",
ylab
="Prestige",
xlim
=c(0,100),
xaxt
="n")
axis(1, at=seq(0,100, by=5))Slide15
Bar plot
Okay, now run this code:
plot(Duncan$type
,
Duncan$income
)
What happened? Why didn't we get a
scatterplot
? Can you get one?
plot(as.numeric(Duncan$type
),
Duncan$income
)Slide16
More than one plot at a time
Now your collaborator wants your
scatterplot
and histogram side-by-side. (Don’t worry about color if you don't want to)
opar
<-par()
par(mfrow
=c(1,2))
hist(Duncan$income
, main="Income Distribution in Duncan Dataset")
plot(Duncan$income
,
Duncan$prestige
,
xlab
="Income",
ylab
="Prestige",
xlim
=c(0,100),
xaxt
="n")
axis(1, at=seq(0,100, by=5))
par(opar
)Slide17
ggplot
ggplot
is a whole different beast from base graphics
ggplot
is like R itself – some work to get oriented, but powerful once you do
You don't have to know
ggplot
to be successful using R
But you do have to experiment with it for this classSlide18
Load the ggplot library
Hint: the package name, confusingly, is ggplot2Slide19
Plot income vs. prestige
It will be easiest to start using
qplot
.
Qplot
mimics plot(), but uses the
ggplot
layout engine.
qplot(Duncan$income
,
Duncan$prestige
)Slide20
ggplot
qplot
is the training wheels version of
ggplot
ggplot's
syntax takes some getting used to. Try this:
ggplot(Duncan
) +
aes(x
=income,
y
=prestige) +
geom_point
(
)
Huh? What are the pluses about?Slide21
ggplot syntax
ggplot
objects are weird
You execute them (like a command) to draw their plot
But you construct them by adding options to them
Options specify data source, data columns, etc, resulting in code like this:
p
<-
ggplot
(Duncan
)
p
<-
p
+
aes
(x
=income,
y
=prestige
)
p
+
geom_point
()Slide22
Where ggplot shines
In my opinion, it's harder to think about doing simple plots in
ggplot
But when I want to do something multi-faceted (e.g. with different colors, sizes, etc.),
ggplot
makes it really easy
I use it a lot for to understand 3+-way relationships in dataSlide23
ggplot example (one of many)Slide24
ggplot code for that example
ggplot(data
=
nycnames
)
+
aes
(x
=
as.factor(race
),
y
=n1_013002p, color=
as.factor(
nbhdarkwalk
)) +
geom_point
(position
="jitter")
+
scale_x_discrete
(breaks
=1:7, limits=1:7, name="Subject Race", labels=
c('Asian
', 'Black', 'First\
nPeoples
', 'Pacific\
nIslander
', 'Non-Hispanic\
nWhite
', 'Other', 'Hispanic')
) +
scale_color_discrete
(breaks
=1:4, limits=1:4, name="Neighborhood Safe After Dark", labels=
c('Strongly
Agree', 'Somewhat Agree', 'Somewhat disagree', 'Strongly Disagree')
) +
scale_y_continuous
(name
="Neighborhood percent white (1km buffer)")Slide25
Exercises