/
Introduction to Graphics in R Introduction to Graphics in R

Introduction to Graphics in R - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
400 views
Uploaded On 2016-12-02

Introduction to Graphics in R - PPT Presentation

3122014 First lets get some data Load the Duncan dataset Its in the car package Remember how to get it librarycar dataDuncan Getting started Okay now plot income levels plot ID: 496315

income duncan plot prestige duncan income prestige plot ggplot axis education 100 high title scatterplot labels xlab hist ylab data color histogram

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction to Graphics in R" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction to Graphics in R

3/12/2014Slide2

First, let’s get some data

Load the Duncan dataset

It’s in the car package. Remember how to get it?

library(car)

data(Duncan)Slide3

Getting started

Okay, now plot income levels:

plot(

Duncan$income

)

What is this graph? Can you make it a line plot instead?

plot(

Duncan$income

, type=“l”)Slide4

Histogram

The X axis is useless. Wouldn’t a histogram be more informative?

Make a histogram

If you’re stuck, use google

hist

(

Duncan$income

)Slide5

Fix the title

‘Histogram of

Duncan$income

’ is not a good title

Change it to ‘Income Distribution in Duncan Dataset’

hist

(

Duncan$income

, main="Income Distribution in Duncan Dataset")Slide6

Another option

There’s another way to set the title. Maybe some of you will have done this (my crystal ball is murky):

hist

(

Duncan$income

)

title("Income Distribution in Duncan Dataset“)

But wait. That looks awful. We need to not print the title as part of the

hist

() call. How do we do that?

hist

(

Duncan$income

, main="")Slide7

Scatterplot

Okay, let’s look at income vs. prestige

Make a scatterplot comparing income (x-axis) to prestige (y-axis)

plot(

Duncan$income

,

Duncan$prestige

)

Did you get the x- and y- axes right?

Add a title: Income vs. Prestige

title(“Income vs. Prestige”)Slide8

Scatterplot: Axis labels

The axis labels display the variable names. Can we do better than that?

Label the X axis “Income” and the Y axis “Prestige”

plot(

Duncan$income

,

Duncan$prestige

,

xlab

="Income",

ylab

="Prestige")Slide9

Scatterplot: Axis range

How come income doesn’t have ticks at 0 and 100 but prestige does?

Make both axes run from 0 to 100

plot(

Duncan$income

,

Duncan$prestige

,

xlab

="Income",

ylab

="Prestige",

xlim

=c(0,100))Slide10

Scatterplot Axis Tick Marks

Actually, your

collaborator wants

tick marks every 5 points on the X axis.

DO IT

Caveat: this is trickier:

plot(

Duncan$income

,

Duncan$prestige

,

xlab

="Income",

ylab

="Prestige",

xlim

=c(0,100),

xaxt

="n")

axis(1, at=

seq

(0,100, by=5))Slide11

Axis labels sideways

Your

collaborator still

isn’t happy. Turn the x labels sideways.

plot(

Duncan$income

,

Duncan$prestige

,

xlab

="Income",

ylab

="Prestige",

xlim

=c(0,100),

xaxt

="n")

axis(1,

las

=2, at=

seq

(0,100, by=5))Slide12

More columns

Now your

collaborator wants

to see how education affect this relationship. Create a dichotomous variable named ‘

high_education

’ categorizing education > 50 as TRUE and <= 50 as FALSE

Duncan$high_education

<-

Duncan$education

> 50Slide13

High education: sanity check

How many high and low education jobs are there?

table(

Duncan$high_education

)

Plot education (y-axis) by

high_education

(x-axis)

plot(

Duncan$high_education

,

Duncan$education

)

Does it look right?Slide14

Adding color

Okay, now color your income/prestige graph so high-education jobs are blue and low-education jobs are red

This is a little tricky

colors <-

as.numeric

(

Duncan$high_education

)+1

plot(Duncan$income

,

Duncan$prestige

, col=c("red", "blue")[colors],

xlab

="Income",

ylab

="Prestige",

xlim

=c(0,100),

xaxt

="n")

axis(1, at=seq(0,100, by=5))Slide15

Bar plot

Okay, now run this code:

plot(Duncan$type

,

Duncan$income

)

What happened? Why didn't we get a

scatterplot

? Can you get one?

plot(as.numeric(Duncan$type

),

Duncan$income

)Slide16

More than one plot at a time

Now your collaborator wants your

scatterplot

and histogram side-by-side. (Don’t worry about color if you don't want to)

opar

<-par()

par(mfrow

=c(1,2))

hist(Duncan$income

, main="Income Distribution in Duncan Dataset")

plot(Duncan$income

,

Duncan$prestige

,

xlab

="Income",

ylab

="Prestige",

xlim

=c(0,100),

xaxt

="n")

axis(1, at=seq(0,100, by=5))

par(opar

)Slide17

ggplot

ggplot

is a whole different beast from base graphics

ggplot

is like R itself – some work to get oriented, but powerful once you do

You don't have to know

ggplot

to be successful using R

But you do have to experiment with it for this classSlide18

Load the ggplot library

Hint: the package name, confusingly, is ggplot2Slide19

Plot income vs. prestige

It will be easiest to start using

qplot

.

Qplot

mimics plot(), but uses the

ggplot

layout engine.

qplot(Duncan$income

,

Duncan$prestige

)Slide20

ggplot

qplot

is the training wheels version of

ggplot

ggplot's

syntax takes some getting used to. Try this:

ggplot(Duncan

) +

aes(x

=income,

y

=prestige) +

geom_point

(

)

Huh? What are the pluses about?Slide21

ggplot syntax

ggplot

objects are weird

You execute them (like a command) to draw their plot

But you construct them by adding options to them

Options specify data source, data columns, etc, resulting in code like this:

p

<-

ggplot

(Duncan

)

p

<-

p

+

aes

(x

=income,

y

=prestige

)

p

+

geom_point

()Slide22

Where ggplot shines

In my opinion, it's harder to think about doing simple plots in

ggplot

But when I want to do something multi-faceted (e.g. with different colors, sizes, etc.),

ggplot

makes it really easy

I use it a lot for to understand 3+-way relationships in dataSlide23

ggplot example (one of many)Slide24

ggplot code for that example

ggplot(data

=

nycnames

)

+

aes

(x

=

as.factor(race

),

y

=n1_013002p, color=

as.factor(

nbhdarkwalk

)) +

geom_point

(position

="jitter")

+

scale_x_discrete

(breaks

=1:7, limits=1:7, name="Subject Race", labels=

c('Asian

', 'Black', 'First\

nPeoples

', 'Pacific\

nIslander

', 'Non-Hispanic\

nWhite

', 'Other', 'Hispanic')

) +

scale_color_discrete

(breaks

=1:4, limits=1:4, name="Neighborhood Safe After Dark", labels=

c('Strongly

Agree', 'Somewhat Agree', 'Somewhat disagree', 'Strongly Disagree')

) +

scale_y_continuous

(name

="Neighborhood percent white (1km buffer)")Slide25

Exercises