/
Galaxy for Bioinformatics Analysis Galaxy for Bioinformatics Analysis

Galaxy for Bioinformatics Analysis - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
452 views
Uploaded On 2017-09-11

Galaxy for Bioinformatics Analysis - PPT Presentation

An Introduction TCD Bioinformatics Support Team Fiona Roche PhD Date 310815 Overview What is Galaxy Why is it useful Commandline vs Galaxy A Basic Analysis with Galaxy Resources ID: 587417

file click ucsc dataset click file dataset ucsc step data galaxy analysis regions history promoter extract edit taf1 select

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Galaxy for Bioinformatics Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Galaxy for Bioinformatics Analysis

An Introduction

TCD Bioinformatics Support Team

Fiona Roche, PhD

Date: 31/08/15Slide2

Overview

What is Galaxy?

Why is it useful?

Command-line

vs GalaxyA Basic Analysis with GalaxyResources for LearningSlide3

What is Galaxy?

A web-based genome analysis platform designed for experimental biologists

www.galaxyproject.orgSlide4

Why is it useful to a biologist?

Easy to use!

Allows data import from popular resources

Provides access to best practice bioinformatics tools

Allows you to build analysis pipelines and share themProvides multiple ways to visualise your dataSlide5

Case Study: Chip-

seq Analysis Pipeline

Peak calling

Enriched regions

Quality control

Map reads to

reference genome

Pre-processing

of raw reads

SequencingSlide6

Case Study: Chip-

seq Analysis Pipeline

Quality control

Map reads to

reference genome

Peak calling

Pre-processing

of raw reads

Enriched regions

Sequencing

Visualisation

with

genome browser

Motif discovery

Relationship with

gene structure

Gene set analysis

Differential

profile analysisSlide7

Question?

Which promoter regions of genes do these

enriched regions map to??? Slide8

Command-line approach

1. Extract

gene coordinates from UCSC

2. Extract

1kb upstream coordinates from UCSC3. Merge upstream coordinates and gene annotation

5. Join the input

files

6. Create

user track for

UCSC

7. Import to UCSC

8. Run a Wrapper script to enable a re

-run

of this pipeline with

different parameters.

4. Clean

filesSlide9

Command-line approach

1. Extract

gene coordinates from UCSC

2. Extract

1kb upstream coordinates from UCSC3. Merge upstream coordinates and gene annotation

5. Join the input

files

6. Create

user track for

UCSC

7. Import to UCSC

8. Run a Wrapper script to enable a re

-run

of this pipeline with

different parameters.

4. Clean

filesSlide10

Galaxy ApproachSlide11

The Galaxy Interface

Datasources

and Tools

Main Analysis window

History of commands

Main MenusSlide12

Overview of Analysis

Import two datasets into Galaxy

Genomic coordinates of enriched peaks

Genomic coordinates of genes

Extract upstream regions of genesData cleaningIdentify overlap between promoter regions and enriched regions

Visualise on a genome browser

Question:

Which gene promoter

regions

do

these enriched regions map to???

Analysis steps:Slide13

Let’s begin!

Register an account

http://

bioinf.gen.tcd.ie

/workshops/Galaxy/Slide14

Let’s begin!

Step 1: Get Data into GalaxySlide15

Step 1: Get data #1

TAF1 peaks

Get Data -> Upload File -> Paste/Fetch ->

Enter URL ->

Start

1.

Click Upload File

2.

Click Paste/Fetch

t

o display the URL

box above

3.

Paste in the URL

containing your data

6.

Click Start to

upload the data

to your history!

5

.

Type hg19 and

s

pecify

H

uman Feb. 2009 (GRCh37/hg19) (hg19)

7

.

Click Close

4

.

Select ‘tabular’ file type

http://

bioinf.gen.tcd.ie

/workshops/Galaxy/TAF1_peaks.txtSlide16

Data uploaded to your history!

The file was sent to your history and given a number

The history keeps track of all steps in your analysisSlide17

Step 2: Rename your History

1

.

Click here to rename your history

You can have multiple histories with different names

2.

Click the

cog wheel

if you want to

create a new

history or see a list of your saved historiesSlide18

Step

3:

Review

your dataset

1. Click on dataset name to expand/collapse

the meta

data and mini view of the

file content

3. Click the pencil icon

to edit the file attributes

2.

Click the

eye icon

to

s

ee the file contents in the

main

analysis window

4

.

Click the

x

to delete the fileSlide19

Step

4a:

Edit

dataset

1

.

Click the

pencil icon

to

edit the file attributes

3

.

First rename the file

5

.

Click save

Change File name to a shorter name

4

.

Copy and paste the old

n

ame into the info to keep

a

record of it

2.

There are four tabs in edit mode:

To change file name click

AttributesSlide20

Step

4b:

Edit

dataset

1

.

C

lick

Datatype

to change the file format

3

.

D

efine

which columns

of your TAF1 file are

chrom

”, “start”

and

“end

”. Look at the mini view image to see your TAF1 file

4

.

Click save

Change File format so Galaxy knows where to find

chr

, start, end

2

.

Select

interval

from drop down and then click

save

5

.

Format changed to interval.

Galaxy now knows where

chr

, start and end are.Slide21

Step 5

: Get data #2 -> Genes

Get Data -> UCSC Main Table BrowserSlide22

Step 5:

Get data #2 -> Genes

Ensure all drop downs as shown below are selected

1.

Select all fields from drop downs as shown above, then click get output

2.

Click Send query to GalaxySlide23

Step

6

:

Edit

dataset

Click the

pencil icon

to

e

dit the file name

Change File name to a shorter name

File name changed

File format =

bed

Galaxy already knows

where

Chr

, start and end areSlide24

Step 7: Get

Promoter Regions

Tool:

Operate on Genomic Intervals

 Get Flanks

4

.

Click Execute

3

.

Select 1000bp upstream

1.

Select Genes dataset

2.

Select upstream

5

.

Output sent to history!

Same file content as ‘Genes’ but start and end coordinates are replaced with promoter regions

6.

Rename file to ‘Promoters’Slide25

Step

8

: Clean dataset

Tool:

Text Manipulation

 Cut

1.

Cut out the specific columns we want from the ‘Promoters’ file

2

.

Click Execute

3

.

Rename

the

output

file to ‘Clean Promoters

’Slide26

Datasets ready for analysis!

Both files are associated with human hg19

Galaxy knows for each file where

chr

, start and end are.

Now, we are ready to join these files and see which promoters have TAF1 peaks!

Dataset #1

Dataset #2Slide27

How do we Join Genomic Intervals?

Chr1 100

500

int1

+

Chr1 200

400 cloneA +

Chr Start End Name Strand Chr1 100 500 int1

+

Chr1 1000

1200

int2

+

Intervals that overlap!

Interval file #1

Interval file #2

Example

Chr

Start End Name Strand

Chr1 200

4

00 cloneA +Chr1 900

1000

cloneB

+

100-500

200-400

1000-1200

9

00-1000

#1

#2Slide28

Step 9:

Join on Genomic Intervals

Tool: Operate on Genomic Intervals

 Join

The second dataset is the one we use

for the filter

(

i.e. we want to filter the promoter dataset for just

those regions that

contain the TAF1 peaks)

The first dataset is the one we want to filter

(i.e. the large dataset containing all of the promoter regions)

Click Execute

Inner join returns only the genomic regions that overlap in both filesSlide29

Step

9:

Join on Genomic Intervals

Output

We have reduced the promoters from >54,000 to 154!

All of these promoter regions contain a TAF1 peak region.

Rename the output file to ‘Overlap’Slide30

Step 10

: Build Custom Tracks for UCSC

Tool: Graph/Display Data

 Build custom track

Click ‘Insert Track’ to open the track information.

We will add three tracks to UCSC:

1. TAF1 peaks

2. Promoter regions

3. TAF1 peaks in promoter regionsSlide31

Step

10

: Build Custom Tracks for UCSC

Click ‘Insert Track’ to open another track

Select dataset

Label the track

Describe the track

Select the

colour

of the track

Track 1: TAF1 peaksSlide32

Step 10

: Build Custom Tracks for UCSC

Tracks 2 and 3:

Click Execute when all three tracks are filled inSlide33

Click here to

visualise

your three tracks on UCSC Genome Browser

This single output file contains the information to

visualise

three tracks

o

n UCSC Genome Browser

Step 10: Build Custom Tracks for UCSC

OutputSlide34

Visualisation on

UCSC Genome Browser

The three tracks

Zoom out to see a larger genomic context Slide35

Extract Workflow from History

Want to rerun your analysis but extract 3kb upstream?

Click the

cog wheel

and select

‘Extract Workflow’ from the drop down menuSlide36

Extract Workflow from History

Create a workflow name

Lists all the tools used to create your history

Click

Create workflowSlide37

Extract Workflow from History

Click

edit

workflow

Or access your workflows from the top menuSlide38

Editing Workflows

Click

on

a box and

you can edit the variables of that step in the Details

section

on

the

right (in orange)

Each box is a step of the analysis

Noodles connect the steps

Use

blue window

to move around the workflowSlide39

Editing Workflows

This input dataset is

the

t

ranscription factor dataset . Label this dataset in the details b

ox on the rightSlide40

Editing Workflows

This input dataset is

the

Gene

dataset . Label this dataset in the d

etails b

ox on the rightSlide41

Editing Workflows

1.

Click on Get Flanks

t

ool to edit the upstream promoter region

2.

Change

the upstream

promoter region

to 3000

3.

Click

cog wheel to

save

w

orkflow.

Then click cog wheel again to

Run

the

workflowSlide42

Running Workflows

1.

Select Transcription factor file

(e.g. TAF1_peaks)

3

.

Send

output to a new history

4

.

Run

workflow and go for a coffee!!

2

.

Select Genes file

(e.g. Genes)Slide43

Your new History!Slide44

Summary

What you learned today

Getting data into Galaxy

How to review and edit

datasetsRunning Common Galaxy ToolsHow to visualise your data in UCSC genome browserHow to extract workflows from a historySlide45

Large Tool RepositorySlide46

Data Visualisations

UCSC Genome Browser

Clustered

Heatmaps

Visualisation

of Chip-

seq

data

Charts

Circster

– structural variationSlide47

Galaxy Learning ResourcesSlide48

Thank You

Please fill in the online survey at

bioinf.gen.tcd.ie

/surveys/Galaxy