/
R-Data Structure R-Data Structure

R-Data Structure - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
418 views
Uploaded On 2015-09-27

R-Data Structure - PPT Presentation

The simplest data structure R operates on is the vector Vector Can contain numerical data string or mix values Can increase the size of the vector by adding concatenating additional columns ID: 141947

dataset data column read data dataset read column csv select setting vector row operations frame columns true list file

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "R-Data Structure" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

R-Data StructureThe simplest data structure R operates on is the vector

VectorCan contain numerical data, string, or mix valuesCan increase the size of the vector by adding “concatenating additional columns”

Numerical content

Adding a string to

A numerical vector

changes the vector to a list (vector withmixed content type

1Slide2

Operation on numerical vectorNormal operataion: -,+,* 1/x (reciprocal),mean, etc.

For example:v<-c(1,2,3,4)inv <- 1/v #will assign to inv the reciprocal of each value of vExample:y <- c(v, 0,

v)z<-mean(y)

2Slide3

Sequences (special vectors of numeric values)

1:n means 1,2,..nExample1V<-c(1:3) means

v<-c(1,2,3) Example2: n<-1:30 Example3: n<-2*1:15 “

:” has higher priorityExample4:

n<-seq(-5:5)

ExerciseTry seq(-5,5) and compare with seq(-5:5)

Use help(seq) to learn more about the

seq

instruction

3Slide4

Matrixmatrix(data, nrow,

ncol, byrow)The data is a list of the elements that will fill the matrix

The nrow and ncol arguments specify the dimension of the matrix. Often only one dimension argument is needed. For example,

if there are 20 elements in the data list and ncol is specified to be 4 then R will automatically determine that there should be 5 rows and 4 columns since 4*5=20.

byrow takes value in {TRUE,FALSE}

The byrow argument specifies how the matrix is to be filled. The default value for byrow is FALSE which means that by default the matrix will be filled column by column.

R-Data Structure4Slide5

[,1] means

“all the rows of column 1”

[1,] means“all the columns of row 1”

5Slide6

Data FrameA data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors v1, v2, v3.

v1 = c(2, 3, 5) v2 = c("aa", "bb", "cc") v3 = c(TRUE, FALSE, TRUE) df = 

data.frame(v1, v2, v3)# df is a data framedf

R-Data Structure

6Slide7

List: A list is a vector in which the various elements need not be of the same typeExampleV<-c(1,”2”,”hello”,TRUE)Factor: A factor is a vector of categorical data.

Storing data as factors insures that the modeling functions will treat such data correctly. Example:> data = c(1,2,2,3,1,2,3,3,1,2,3,3,1) > fdata = factor(data) > fdata  

[1] 1 2 2 3 1 2 3 3 1 2 3 3 1 Levels: 1 2 3

R-Data StructureThe output shows the content of

fdata but also The distinct values of the categorical attribute

7Slide8

Importing Dataread.table

(path, more parameters…)mydata <- read.table("c:/mydata.csv", header=TRUE,   sep=",", row.names="id

")Path to the filenote the “/” instead of “\” on Ms

windows systems

TRUE=Include

the header row

DelimiterUsed in the file

Optional

Row names

Use

help(

read.table

)

for more info

Also consider

read.csv()

instruction to import commas delimited

For example:

read.csv

("http://www2.cs.uh.edu/~zechun_cao/TA_Resources/iris.data")

8Slide9

Original file

Output

9Slide10

Excel File read.xlsExercise:

The best way to import data in Excel format is to save the data as .csv and then use read.table() to import it. However, the read.xls is often used. Since it is not part of the core R library, it has to be installed and loaded into the workspace.Use read.xls to read an excel file into R. (read.xls is part of the gdata

packageImporting Data10Slide11

Answer>install.packages(

pkgs="gdata")>library(gdata)>data <- read.xls(path)

11Slide12

Select columns (variables)Drop columns (variables)

Select Observations (rows)

Random Sampling (exercise)Operations on Dataset/sub-setting

x1,x2,x3,class0,2,2,A0,3,2.5,B0,3,3,A1,3,3,B1,3.5,4,c

1,3,2,A1,4,2,c1,4,3,A0,1,3,B0,1,4,A1,2,2,c1,2.5,1,A

dataset

12Slide13

Sub-setting by selecting columnsExample1:# select variables

x1, x3myvars <- c(“x1", “x3“)mysubSet<- dataset [

myvars]mysubSetExample2:# select jth variable and

kth thru mth variablesnewdata

<- dataset[c(j,k:m)]

Operations on Dataset/sub-setting

13Slide14

Drop some columns# exclude 1st and 3rd variable

mysubSet <- dataset[c(-1,-3)]Also to delete a column assign NULL to the columnExample:# delete variables

x1mydata$x1<- NULL Operations on Dataset/sub-setting

14Slide15

Select Observations# first n observationsmysubSet

<- dataset[1:n,]# based on variable valuesmysubSet <- dataset[ which(dataset$x3==2 & dataset$x2

> 2), ]Or equivalently# based on variable values

attach(dataset)

mysubSet <- dataset[ which(x3

==2 & x2 > 2), ]detach(dataset)

Operations on Dataset/sub-setting

Get row 1 to n, for all columns

15Slide16

Sampling dataset = read.csv("C:/Users/paul/Desktop/R_wd/Lab/example.csv")

datasetdataset[sample(nrow(dataset), 3), ]Using the dataset in the next box write a script that selects 4 rows randomly.Step 1: import the file.

Step2: use srsdf to sample 16

Operations on Dataset/sub-setting

x1,x2,x3,class0,2,2,A1,3,2.5,B1.5,3.8,3,A2,4,3,B2.1,3.5,4,c2.3,3.8,2,A

2.8,4,2,c3,4,3,A3.2,4.5,3,B3.4,4.6,4,A3.6,4.8,2,c3.6,5,1,ASlide17

Answerdataset = read.csv("C:/Users/paul

/Desktop/R_wd/Lab/example.csv")datasetdataset[sample(nrow(dataset), 3), ]

17Slide18

Operations on DatasetSplit

data frame or matrix split() #divide into groups by vector/factorExample>dataset = read.csv("C:/Users/

paul/Desktop/R_wd/input/Data_TPRTI/weka/EXAMPLE.csv")>classes<-split(

dataset,dataset$class)>classes

Observe that

split()

has

grouped the row of

same class together

because the group column

was specified to be the

class column

18Slide19

subset() #subset data with logical statementThe subset( ) function is the easiest way to select variables and observations. In the following example, we select all rows that have a value of

x3==2 and x2>2. We keep the x1, x2, and class columns. mysubSet<- subset(dataset, x3==2 & x2 >2, select=c(x1, x2,class))

Operations on Dataset/sub-setting

19Slide20

Use help() to learn about Merge data framesmerge() #merges two data frames d1, and d2 into one data frameCombine a row or column to a data frame

cbind() :Add a new column to a data frame rbind() : a new row to a data frame

Operations on Dataset/sub-setting20Slide21

Practice exerciseExercise1-Download the dataset Iris from

www.cs.uh.edu/~zechun_cao/DM12F.html andimport the data Into your R session2-Find out how many classes are in the file. The output column is the last column

3-Multiply the 3rd column by 2 and combine this new column to the data frame.21Slide22

Complete the exercise Thank you!

22