/
Applied Bioinformatics Applied Bioinformatics

Applied Bioinformatics - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
419 views
Uploaded On 2015-10-02

Applied Bioinformatics - PPT Presentation

Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University bingzhangvanderbiltedu Quick summary of the introduced L inux commands 2 Command Meaning ID: 147196

file data txt sample data file sample txt directory lines log command mymatrix task view bashrc copy display numeric

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Applied Bioinformatics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Applied BioinformaticsIntroduction to Linux and R

Bing Zhang

Department of Biomedical Informatics

Vanderbilt University

bing.zhang@vanderbilt.eduSlide2

Quick summary of the introduced Linux commands

2

Command

Meaning

rsh

<hostname>

Remote shell

passwd

Modify a user’s password

exit

Exit the shell

pwd

Display the path of the current directory

ls

List files and directories

ls

-a

List all files and directories

ls

-a -l

List all files and directories

in a long listing format

mkdir

<directory

name>

Make a directory

cd <directory name>

Change to named directory

cd

Change to home directory

cd ~

Change

to home directory

cd ..

Change to parent directory

rmdir

<directory name>

Remove a directory

more

View the contents of a file

cp

<file1> <file2>

Copy file1

and name the copied file file2

mv <file1>

<file2>

Move or rename

file1 to file2

rm

<file name>

Remove a file

man <command>

Display manual pages for a commandSlide3

Getting help

man

(display manual pages for a command)

space bar to show next page

u

p and down arrows to move up and down

q to exist

3Slide4

Exercise

4

Task

Command

Go to home directory

cd

Display

manual pages for the command

ls

man

ls

List the

contents of the current directory

ls

List the contents of the current directory,

including entries starting with . and using a long listing format

ls

-a -l

Create a test directory if you don’t have one yet, ignore

this if you already have it

mkdir

test

Go to the test directory

cd test

Copy the file

sample_data.txt

under directory

/home/

igptest

to current

directory with the same name

cp

/home/

igptest

/

sample_data.txt

.

View the content of the created file

more

sample_data.txt

Make a copy of the file

cp

sample_data.txt

sample_data_copy.txt

View

the content of the new copy

more

sample_data_copy.txt

List the

contents of the current directory

ls

Remove the new copy

rm

sample_data_copy.txt

List the

contents of the current directory

lsSlide5

Data manipulation with filters

Filters: programs that accept textual data and then transform it in a particular way.

h

ead, tail, cut, sort,

uniq

, sed

5

Task

Command

View

the content of a file

more

sample_data.txt

Get the first 10 lines of the file

head

sample_data.txt

Get the first 5 lines of the file

head -n

5

sample_data.txt

Get all but the last

5 lines of the file

head -n -5

sample_data.txt

Get the last 10 lines of the file

tail

sample_data.txt

Get

the last 5 lines of the file

tail -n 5

sample_data.txt

Get

all lines starting from line 5

tail -n +5

sample_data.txt

Get the first

three columns of the file

cut -f 1-3

sample_data.txt

Get selected columns of the file

cut

-f 1,3,5

sample_data.txt

Sort all

lines based on the numerical values in the second column (non-numeric entries are interpreted as zero)

sort -k 2 -n

sample_data.txtSlide6

Data manipulation with piping and redirectionPiping (|) : sending data from one program to another program.

Redirection: sending output from one program to a file

>: save output to a file

>>: append output to a file

6

Task

Command

Get

the first 10 lines of the file and then get the first three columns

head

sample_data.txt

| cut -f 1-3

Get the first 10 lines of the file, then get the first three columns of these lines, and then redirect

the content to a new file

head

sample_data.txt

|

cut -f 1-3 >

sample_data_subset.txt

View the new file

more

sample_data_subset.txt

Append the last 10 lines of the old file to the end of the new file

tail

sample_data.txt

>>

sample_data_subset.txt

View

the new file

more

sample_data_subset.txtSlide7

Editing files with nano

n

ano

is a user-friendly text editor

A quick tutorial

http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html

7

Task

Command

Open

sample_data.txt

for editing

nano

sample_data.txt

Delete the text “Line_01” and the space

after it, save the file, and then exit

In

nano

, ^O for saving and ^X for exit

View

the edited file

more

sample_data.txt

View the content of the .

bashrc

file, which is located under your home directory. The file includes commands that are executed when starting

the system.

more ~/.

bashrc

Open .

bashrc

file under your home directory for editing.

nano

~/.

bashrc

Add “

setpkgs

–a R” to the end of this file.

This

will allow you to use the R environment which has been installed in the ACCRE system for statistical computing.

In

nano

, ^O for saving and ^X for exit

View

the edited .

bashrc

file

more ~/.

bashrc

Run the .

bashrc

file

source ~/.

bashrcSlide8

What is R

R is a free software environment for statistical computing and graphics

. It includes:

an effective data handling and storage

facility

a suite of operators for calculations on arrays, in particular

matrices

a large, coherent, integrated collection of intermediate tools for data

analysis

graphical facilities for data analysis and display either on-screen or on

hardcopy

a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output

facilities

8Slide9

R Installation and tutorial

Download and install R

http

://www.r-project.org

/

Choose a CRAN (Comprehensive R Archive Network) mirror

Binary distributions of the base system and contributed packages

Windows version

Mac OS X version

Linux version (already installed on the ACCRE cluster, will be used for this module)

Tutorials

http://cran.r-project.org/doc/manuals/r-release/R-

intro.html

An introduction to R

9Slide10

R interface

10

Command-line R: Linux/OS X

T

ype R in your Linux shell to start R;

Type q() in the R interface to close R.

R

Gui

: OS X (Windows

Gui

is similar)

Download and Install on your laptop

Rstudio

: Power and user-friendly user interface for R. Excellent for both beginners and

developers (http://

www.rstudio.com

/) Slide11

Install and load packages

CRAN packages

http://cran.r-project.org/web/packages

/

>6000 packages

BioConductor

packages

http

://www.bioconductor.org

/

~

1000 packages for the analysis of high-

throughput genomics data

11

Task

R code

Install a CRAN package

install.packages

(“package name”)

Install a

BioConductor

package

souce

(“http://www.bioconductor.org/biocLite.R”)

biocLite

(“package name”)

Load a package

/library

library (“package

name”)Slide12

Basic R syntaxObject <- function (arguments)

<-: assignment operator

Object <- object[arguments]

12

Task

R code

Assign a numeric vector with five numbers to object x using the c() function

x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7)

Assign a subset of x to a new object y

y

<- x[1:3]

Show the content of x

x

Show the content of y

y

Getting information on function c

?c

Display

the output of a function without assignment

c(

1,2,5)Slide13

Data typesNumeric data

1, 2, 3

Character data

“a”, “b”, “c”

Logical data

TRUE, FALSE, TRUE

13

Task

R code

Assign a

numeric vector

with five numbers to object x using the c() function

x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7)

Create

a

character vector

from x

as.character

(x)

Create a

logical vector

from x

x>5Slide14

Data objectsVectors:

an ordered collection of items of the same data type (numeric, character, or logical), 1-dimensional

Matrices:

2-dimensional objects, all items must have the same data type

Arrays: similar to matrices but can have more than two dimensions

Data frames: similar to a matrices but can have different data types

Lists: an ordered collection of objects

Functions

14

Task

R code

Create a numeric vector with numbers ranging from 1 to 9

c(1:9)

Create

a 3x3 numeric matrix

matrix(c(1:9),nrow=3,ncol=3,byrow=TRUE)

Create another 3x3 numeric matrix by changing an argument

matrix(c(1:9),

nrow

=3,ncol=3,byrow=FALSE)Slide15

Operators and calculations

Comparison operators: ==, !=, <, >, <=, >=

Logical operators: & (AND), | (OR), ! (NOT)

Calculations

Arithmetic operators: +,-,*,/,^

Arithmetic functions: log,

exp

,

sqrt

, mean,

var

,

sd

, sum, etc.

15

Task

R code

Comparisons

3==5

3!=5

3<5

Logical operators

x<-5

y<-(-8)

x>0 | y>0

x>0 & y>0

Calculations

(4+2^2)/(2*2)

x<-c(1,3,5,7,9)

y<-c(2,4,6,8,10)

x+y

sum((x-mean(x))^2)/(length(x)-1)

var

(x)Slide16

Data import, simple analyses, and export

16

Task

R code

Import data from a tabular file

myData

<-

read.table

(“~/test/

sample_data.txt”,head

=

T,sep

=“\t”)

Display

the new object

myData

Get class

name of the object

class(

myData

)

Convert data frame to matrix

myMatrix

<-

as.matrix

(

myData

)

Get class name of the matrix

class(

myMatrix

)

Display the matrix

object

myMatrix

Get dimensions of the matrix

dim(

myMatrix

)

Get

a high-level summary

summary(

myMatrix

)

Log transformation

of the data

myMatrix_log

<-log2(

myMatrix

)

Calculate

variance for row #1

var

(

myMatrix_log

[1,])

Calculate variances for all rows

variances<-apply(myMatrix_log,1,var)

Calculate means for all rows means<-apply(myMatrix_log,1,mean)Data subsettingmyMatrix_log[1:3,1:2]myMatrix_log[c(“Line_02”,”Line_04”),]myMatrix_log[means>median(means),]Combining dataresults<-cbind(myMatrix_log,means,variances)Write data to a tabular filewrite.table(results, “~/test/sample_data_output.txt”, sep=“\t”, quote=FALSE)Quit Rq()

G

o to your test directory, and check the file

sample_data_output.txtSlide17

Copying files to/from a local computer

Windows

Application:

Bitvise

SSH

(

https://www.bitvise.com/ssh-client-download

)

Mac

Application:

Cyberduck

(

https://cyberduck.io

/

)

Click on “Open Connection”

Select “SFTP (SSH File Transfer Protocol)”

Server:

vmplogin.accre.vanderbilt.edu

Username:

your_user_name

Password: your-password

Don’t change other items

17Slide18

Copying files to/from a local computer (using Bitvise SFTP in Windows)

18Slide19

Copying files to/from a local computer (using Cyberduck in Mac)

19