Part 2 Loops shell scripts and Finding things Quick review OF UNIX COMMANDS ls list all files and directories in current directory p wd shows current locationdirectory mkdir ID: 557746
Download Presentation The PPT/PDF document "Unix Shell Workshop" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
Unix Shell Workshop
Part 2 – Loops, shell scripts, and Finding thingsSlide3
Quick
review OF UNIX COMMANDS
ls – list all files and directories in
current directory
p
wd
– shows current location/directory
mkdir
name
–
creates a
directory calle
d ‘name’
rm
filename
or
dirname
–
removes files, -r removes directories and all files inside them
wc
– gets the line, word, and character count in a file, with the –l it just returns
the number of lines (
wc
-l
filename
)
sort – sort the
content of a file or
output created by
a certain
command
cat – output the
content of a file
to the terminal
head/tail – show either the first few lines or the last few lines respectivelySlide4
Loops
Loops allow us to run the same set of commands many times without having to
retype
them
Imagine having several hundred genome files with the same endings
The wildcard (*) operator will not work on some commands because of how it expands
EX:
cp
*.
dat
original-*.dat becomes
cp
basilisk.dat unicorn.dat original-*.dat
Instead we can use a loopSlide5
Loops
Type the following:
$ for filename in basilisk.dat unicorn.dat
> do
>
h
ead -n
3 $filename
>
doneSlide6
Loops
W
hat does this
loop
do? Slide7
Loops
Output:
COMMON NAME: basilisk
CLASSIFICATION:
basiliscus
vulgaris
UPDATED: 1745-05-02
COMMON NAME: unicorn
CLASSIFICATION:
equus
monoceros
UPDATED: 1738-11-24Slide8
LOOPS
What else can loops be used for?
Well…anything really!
Now to the fun part!!Slide9
SHEll
scripts
Now you will see the true power of the shell.
We will write our first shell script.
The purpose is to condense many shell commands into a single place. Slide10
SHEll
scripts
Navigate back to the top directory
(type cd at the
prompt)
and then type
:
cd
molecules
nano
middle.shSlide11
SHEll
scripts
Inside
nano
type
head
-n
15 octane.pdb | tail –n 5
Hit CTRL-O and then CTRL-X to exit Nano.
Check the directory to confirm that middle.sh exists
Run
this simple script with the following command:
bash middle.shSlide12
SHEll
scripts
Now, what if we wanted to
give/pass
the script the filename instead of having it “hard-coded”?
Open up middle.sh with
nano
Change the line in the file to
head –n 15 “$1” | tail –n 5
Now you run the script with bash middle.sh octane.pdb
Or on a different file with bash middle.sh pentane.pdbSlide13
SHEll
scripts
However, we still have to edit middle.sh every time we want to change the range of the lines found!
In the name of making everything as easy as possible, we should change this script to take inputs for the range values as well.
Open it again in Nano
Change the command to
head –n
“$2”
“$1” | tail –n
“$3”Slide14
SHEll
scripts
Now the script can be
executed
like this: bash middle.sh octane.pdb 15 5
You can also add any file name and any other range Slide15
SHEll
scripts
What if we wanted to run this script on everything in a directory?
We would make another script with a loop in it that we could give the directory name to and then run middle.sh on each file.
This is sort of like what we just did with loops
Scripts can do everything you have learned so far, they just make it easier to do many repetitive commands at
once
Imagine processing hundreds of thousands of sequences. You certainly do not want to type commands that many times. Computers do that very efficiently.Slide16
Before we move on
Any questions?Slide17
FINDING THINGS
grep – this command is used to find matching text within a file, and it’s very powerful when combined with regular expressions
find – this command is used to find files and directories
Now return to main
directory by typing cd and pressing the Return key
Type cd writing
Type cat haiku.txtSlide18
FINDING THINGS
Output:
The Tao that is seen
Is not the true Tao, until
You bring fresh toner.
With searching comes loss
and the presence of absence:
“My Thesis” not found.
Yesterday it worked
Today it is not working
Software is like that. Slide19
Finding
THings
Type
grep not haiku.txt
This will output every line that contains not
Now type
grep day haiku.txt
Should be two lines but day is within larger words!
If we want to match day exactly we need to add the –w flag
Try it with
grep –w day haiku.txt
Should get no outputSlide20
Finding
THings
If you want to look for a phrase you have to use double quotes (“”)
So for instance type grep –w “is not” haiku.txt
Other options are –n which numbers lines that match, -
i
to ignore the case, and –v to find the lines where the word or phrase doesn’t exist
Type grep –n –w “the” haiku.txt
You should get lines 2 and 6
If you type –n –w -
i
“the” haiku.txt then you will also get line 1
If you add a –v, by typing –n –w –v “the” haiku.txt the output should be every line but 2 and 6 Slide21
Finding
THings
Now navigate to the top level directory
Type
find . –type d
This command finds all directories in the current directory (.)
If we changes the d to an f (
find . –type f
) we get a listing of all of the files
This automatically goes into folders and finds every file thereSlide22
Finding
THings
A more useful approach is just finding all files in the current folder
This can be done with
maxdepth
:
find . –
maxdepth
1 –type f
If you use
mindepth
you can have it return only things that are at or below a certain depth:
find . –
mindepth
2 –type f
To search for things by name you can use –name:
find . –name ‘*.txt’
Notice that we use the wildcard character here to return all files in the current directory and all subdirectories that end in the .txt extensionSlide23
Before we move on
Any questions?Slide24
Let’s write a shell script
Now we are going to write our own shell script that will combine a few of the things we’ve learned in order to count the number of lines in every file in the current directory and subdirectories and output that into a summary file
First I will walk you through the steps of what we need to do, then I will have you type the script, then we will go through it line by line so you know what each part is doingSlide25
Let’s write a shell script
First we are going to use find to get a list of all files in the directory and all subdirectories
Then we are going to have it call the
wc
–l command on each file in the directorySlide26
Let’s write a shell script
FILES=$(find . –type f)
for FILE in $FILES
do
wc
– l $FILE >> line_number_summary.txt
doneSlide27
Let’s write a shell script
The first line (
FILES=$(find . –type f
)
) puts a list of all the files into a variable called $FILES
The next line (
for FILE in $
FILES
) starts a loop to step through each file in the list, the do and done lines show the beginning and end of the loop
Finally, the line inside the loop (
wc
– l $FILE >>
line_number_summary.txt
) runs the word count command for each file and appends the output to the file.
We are appending, or adding to the end of the file, with the >> operator rather than overwriting with the > operator we used earlier to create filesSlide28
Let’s write a shell script
Now let’s write a script to set up a file structure for research
Lets say you have standard file structure set up you like to work with, we will make this script so that you can set up a file structure with your own nameSlide29
Let’s write a shell script
#this will create a top-level directory named after the first argument
#then it will create as many folders as you give it arguments in that directory
#the first argument is the top level of the file structure
mkdir
${
args
[0]}Slide30
Let’s write a shell script
#use a for loop to go through the rest of the arguments and create folders for them
for FOLDER in ${@:2}
do
mkdir
${
args
[0]}/$FOLDER
doneSlide31
Let’s write a shell script
All the lines starting with “#” are comments, which are there to explain what you are doing
First we make the initial directory with
mkdir
$1
The we use a for loop to go over all of the arguments from the second one on (
for FOLDER in ${@:2}
)
Finally inside the loop we do
mkdir
$1/$FOLDER
to create the rest of the file structureSlide32
Practical Example
For this example we want to pull all the molecule
pdb
files which have our data in them with a find statement.
We then want to grab only the lines marked with an H using a grep statement.
We will use a loop to process all our files without having to write repeating commands.
Finally as part of our grep statement we will write the output to a new file for us to read.Slide33
Practical Example
# grab
pdb
files from molecules
folder
FILES
=$(find ~/data-shell/molecules -name "*.
pdb
")
#
going file by
file
for
FILE in $
FILES
do
#
grab all the lines that have H in
it, write it out to an output
grep
-w "H" $FILE >
"$
FILE.out
"
doneSlide34
Practical Example
Our first command
FILES=$(find ~/data-shell/molecules -name "*.
pdb
") will grab all files, by name in the molecules directory by name that have the extension .
pdb
and store it to the FILES variable
Our loop is the same as we used in the previous example.
Our second command is the heart of the script: grep -w "H" $FILE > "$
FILE.out
"
In this command we search the file for all the lines that have H in it, then write this output to an output file that makes use of the name of the file we search with .out appended to the end.Slide35
THIS IS IT FOR TODAY
Please, see your email for information about the next meeting.
Questions?
Comments?Slide36