Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University bingzhangvanderbiltedu Quick summary of the introduced L inux commands 2 Command Meaning ID: 147196
Download Presentation The PPT/PDF document "Applied Bioinformatics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Applied BioinformaticsIntroduction to Linux and R
Bing Zhang
Department of Biomedical Informatics
Vanderbilt University
bing.zhang@vanderbilt.eduSlide2
Quick summary of the introduced Linux commands
2
Command
Meaning
rsh
<hostname>
Remote shell
passwd
Modify a user’s password
exit
Exit the shell
pwd
Display the path of the current directory
ls
List files and directories
ls
-a
List all files and directories
ls
-a -l
List all files and directories
in a long listing format
mkdir
<directory
name>
Make a directory
cd <directory name>
Change to named directory
cd
Change to home directory
cd ~
Change
to home directory
cd ..
Change to parent directory
rmdir
<directory name>
Remove a directory
more
View the contents of a file
cp
<file1> <file2>
Copy file1
and name the copied file file2
mv <file1>
<file2>
Move or rename
file1 to file2
rm
<file name>
Remove a file
man <command>
Display manual pages for a commandSlide3
Getting help
man
(display manual pages for a command)
space bar to show next page
u
p and down arrows to move up and down
q to exist
3Slide4
Exercise
4
Task
Command
Go to home directory
cd
Display
manual pages for the command
ls
man
ls
List the
contents of the current directory
ls
List the contents of the current directory,
including entries starting with . and using a long listing format
ls
-a -l
Create a test directory if you don’t have one yet, ignore
this if you already have it
mkdir
test
Go to the test directory
cd test
Copy the file
sample_data.txt
under directory
/home/
igptest
to current
directory with the same name
cp
/home/
igptest
/
sample_data.txt
.
View the content of the created file
more
sample_data.txt
Make a copy of the file
cp
sample_data.txt
sample_data_copy.txt
View
the content of the new copy
more
sample_data_copy.txt
List the
contents of the current directory
ls
Remove the new copy
rm
sample_data_copy.txt
List the
contents of the current directory
lsSlide5
Data manipulation with filters
Filters: programs that accept textual data and then transform it in a particular way.
h
ead, tail, cut, sort,
uniq
, sed
…
5
Task
Command
View
the content of a file
more
sample_data.txt
Get the first 10 lines of the file
head
sample_data.txt
Get the first 5 lines of the file
head -n
5
sample_data.txt
Get all but the last
5 lines of the file
head -n -5
sample_data.txt
Get the last 10 lines of the file
tail
sample_data.txt
Get
the last 5 lines of the file
tail -n 5
sample_data.txt
Get
all lines starting from line 5
tail -n +5
sample_data.txt
Get the first
three columns of the file
cut -f 1-3
sample_data.txt
Get selected columns of the file
cut
-f 1,3,5
sample_data.txt
Sort all
lines based on the numerical values in the second column (non-numeric entries are interpreted as zero)
sort -k 2 -n
sample_data.txtSlide6
Data manipulation with piping and redirectionPiping (|) : sending data from one program to another program.
Redirection: sending output from one program to a file
>: save output to a file
>>: append output to a file
6
Task
Command
Get
the first 10 lines of the file and then get the first three columns
head
sample_data.txt
| cut -f 1-3
Get the first 10 lines of the file, then get the first three columns of these lines, and then redirect
the content to a new file
head
sample_data.txt
|
cut -f 1-3 >
sample_data_subset.txt
View the new file
more
sample_data_subset.txt
Append the last 10 lines of the old file to the end of the new file
tail
sample_data.txt
>>
sample_data_subset.txt
View
the new file
more
sample_data_subset.txtSlide7
Editing files with nano
n
ano
is a user-friendly text editor
A quick tutorial
http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html
7
Task
Command
Open
sample_data.txt
for editing
nano
sample_data.txt
Delete the text “Line_01” and the space
after it, save the file, and then exit
In
nano
, ^O for saving and ^X for exit
View
the edited file
more
sample_data.txt
View the content of the .
bashrc
file, which is located under your home directory. The file includes commands that are executed when starting
the system.
more ~/.
bashrc
Open .
bashrc
file under your home directory for editing.
nano
~/.
bashrc
Add “
setpkgs
–a R” to the end of this file.
This
will allow you to use the R environment which has been installed in the ACCRE system for statistical computing.
In
nano
, ^O for saving and ^X for exit
View
the edited .
bashrc
file
more ~/.
bashrc
Run the .
bashrc
file
source ~/.
bashrcSlide8
What is R
R is a free software environment for statistical computing and graphics
. It includes:
an effective data handling and storage
facility
a suite of operators for calculations on arrays, in particular
matrices
a large, coherent, integrated collection of intermediate tools for data
analysis
graphical facilities for data analysis and display either on-screen or on
hardcopy
a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output
facilities
8Slide9
R Installation and tutorial
Download and install R
http
://www.r-project.org
/
Choose a CRAN (Comprehensive R Archive Network) mirror
Binary distributions of the base system and contributed packages
Windows version
Mac OS X version
Linux version (already installed on the ACCRE cluster, will be used for this module)
Tutorials
http://cran.r-project.org/doc/manuals/r-release/R-
intro.html
An introduction to R
9Slide10
R interface
10
Command-line R: Linux/OS X
T
ype R in your Linux shell to start R;
Type q() in the R interface to close R.
R
Gui
: OS X (Windows
Gui
is similar)
Download and Install on your laptop
Rstudio
: Power and user-friendly user interface for R. Excellent for both beginners and
developers (http://
www.rstudio.com
/) Slide11
Install and load packages
CRAN packages
http://cran.r-project.org/web/packages
/
>6000 packages
BioConductor
packages
http
://www.bioconductor.org
/
~
1000 packages for the analysis of high-
throughput genomics data
11
Task
R code
Install a CRAN package
install.packages
(“package name”)
Install a
BioConductor
package
souce
(“http://www.bioconductor.org/biocLite.R”)
biocLite
(“package name”)
Load a package
/library
library (“package
name”)Slide12
Basic R syntaxObject <- function (arguments)
<-: assignment operator
Object <- object[arguments]
12
Task
R code
Assign a numeric vector with five numbers to object x using the c() function
x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7)
Assign a subset of x to a new object y
y
<- x[1:3]
Show the content of x
x
Show the content of y
y
Getting information on function c
?c
Display
the output of a function without assignment
c(
1,2,5)Slide13
Data typesNumeric data
1, 2, 3
Character data
“a”, “b”, “c”
Logical data
TRUE, FALSE, TRUE
13
Task
R code
Assign a
numeric vector
with five numbers to object x using the c() function
x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7)
Create
a
character vector
from x
as.character
(x)
Create a
logical vector
from x
x>5Slide14
Data objectsVectors:
an ordered collection of items of the same data type (numeric, character, or logical), 1-dimensional
Matrices:
2-dimensional objects, all items must have the same data type
Arrays: similar to matrices but can have more than two dimensions
Data frames: similar to a matrices but can have different data types
Lists: an ordered collection of objects
Functions
14
Task
R code
Create a numeric vector with numbers ranging from 1 to 9
c(1:9)
Create
a 3x3 numeric matrix
matrix(c(1:9),nrow=3,ncol=3,byrow=TRUE)
Create another 3x3 numeric matrix by changing an argument
matrix(c(1:9),
nrow
=3,ncol=3,byrow=FALSE)Slide15
Operators and calculations
Comparison operators: ==, !=, <, >, <=, >=
Logical operators: & (AND), | (OR), ! (NOT)
Calculations
Arithmetic operators: +,-,*,/,^
Arithmetic functions: log,
exp
,
sqrt
, mean,
var
,
sd
, sum, etc.
15
Task
R code
Comparisons
3==5
3!=5
3<5
Logical operators
x<-5
y<-(-8)
x>0 | y>0
x>0 & y>0
Calculations
(4+2^2)/(2*2)
x<-c(1,3,5,7,9)
y<-c(2,4,6,8,10)
x+y
sum((x-mean(x))^2)/(length(x)-1)
var
(x)Slide16
Data import, simple analyses, and export
16
Task
R code
Import data from a tabular file
myData
<-
read.table
(“~/test/
sample_data.txt”,head
=
T,sep
=“\t”)
Display
the new object
myData
Get class
name of the object
class(
myData
)
Convert data frame to matrix
myMatrix
<-
as.matrix
(
myData
)
Get class name of the matrix
class(
myMatrix
)
Display the matrix
object
myMatrix
Get dimensions of the matrix
dim(
myMatrix
)
Get
a high-level summary
summary(
myMatrix
)
Log transformation
of the data
myMatrix_log
<-log2(
myMatrix
)
Calculate
variance for row #1
var
(
myMatrix_log
[1,])
Calculate variances for all rows
variances<-apply(myMatrix_log,1,var)
Calculate means for all rows means<-apply(myMatrix_log,1,mean)Data subsettingmyMatrix_log[1:3,1:2]myMatrix_log[c(“Line_02”,”Line_04”),]myMatrix_log[means>median(means),]Combining dataresults<-cbind(myMatrix_log,means,variances)Write data to a tabular filewrite.table(results, “~/test/sample_data_output.txt”, sep=“\t”, quote=FALSE)Quit Rq()
G
o to your test directory, and check the file
sample_data_output.txtSlide17
Copying files to/from a local computer
Windows
Application:
Bitvise
SSH
(
https://www.bitvise.com/ssh-client-download
)
Mac
Application:
Cyberduck
(
https://cyberduck.io
/
)
Click on “Open Connection”
Select “SFTP (SSH File Transfer Protocol)”
Server:
vmplogin.accre.vanderbilt.edu
Username:
your_user_name
Password: your-password
Don’t change other items
17Slide18
Copying files to/from a local computer (using Bitvise SFTP in Windows)
18Slide19
Copying files to/from a local computer (using Cyberduck in Mac)
19