/
Deep Learning in  Bioinformatics Deep Learning in  Bioinformatics

Deep Learning in Bioinformatics - PowerPoint Presentation

ImNotABaby
ImNotABaby . @ImNotABaby
Follow
343 views
Uploaded On 2022-08-01

Deep Learning in Bioinformatics - PPT Presentation

Asmitha Rathis Why Bioinformatics Protein structure Genetic Variants Anomaly classification Protein classification SegmentationSplicing Why is Deep Learning beneficial scalable with large datasets and are effective in identifying complex patterns from featurerich datasets ID: 932044

binding dna deep protein dna binding protein deep sequence data function results structure motifs training sequences variants network size

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Deep Learning in Bioinformatics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Deep Learning in Bioinformatics

Asmitha Rathis

Slide2

Why Bioinformatics?

Protein structure

Genetic Variants

Anomaly classification Protein classificationSegmentation/Splicing

Slide3

Why is Deep Learning beneficial?

scalable with large datasets and are effective in identifying complex patterns from feature-rich datasets

learn high levels of abstractions from multiple layers of non-linear transformations.

Slide4

Terms

What are Motifs?

short, recurring patterns in DNA that are presumed to have a biological function

What is non-coding DNA? DNA that do not encode protein sequences. 

Slide5

Papers

DanQ

: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

- Daniel Quang and Xiaohui Xie [2016]Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning -

Babak

Alipanahi

et al [2015]

Exploiting the past and the future in protein secondary structure prediction -

Pierre Bald et al [1999]

Slide6

DanQ:a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

A predictive model for the function of non-coding DNA has enormous benefit for translation research

98% of human genome is non coding DNA and 93% of disease variants lie in this region

Previous work:

DeepSea

model

Propose a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework

Slide7

Network Model

Convolution for motifs

Recurrent layer for capturing dependency between the motifs and grammar

Slide8

Training Details

Random initialization and initialize kernels from known motifs

Dropout is included

RMSprop algorithm with a minibatch size of 10060 epochs to fully train and each epoch of training takes ∼6 h

Slide9

Results

Calculated ROC for each of the 919 binary targets on the test set

Predicted probability was the average of the forward and reverse complement sequence pairs

Slide10

Results

Precision recall curve

Slide11

Future Work

Better initialization techniques

Half are initialized with known motifs from JASPAR dataset

Datasets from more cell types

Slide12

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning: DeepBind

DNA- and RNA-binding proteins play a central role in gene regulation, including transcription and alternative splicing.

In the field of transcription, sequence specificity of DNA usually means how specific a protein, usually a transcription factor, recognizes its target DNA motif.

Slide13

Challenges

Data come in qualitatively different forms,

eg

: microarray and sequencing data Quantity is very largeNeed to overcome the biases of existing technologies

Slide14

Data

For training,

DeepBind

uses a set of sequences and, for each sequence, an experimentally determined binding score.

Slide15

Binding score :

Slide16

Training/Testing Details

training on

in vitro

data and testing on

in vivo

data.

vitro :

refers to the technique of performing a given procedure in a controlled environment outside of a living organism

Vivo :

tested on whole, living organisms or cells, usually animals, including humans, and plants,

Slide17

Slide18

Results

Slide19

Analysis of potentially disease-causing genomic variants

Use binding models to identify, group and visualize variants that potentially change protein binding

Importance of each base based on the height of the letter

The mutation map indicating how much each possible mutation will increase or decrease the binding score.

A cancer risk variant in a

MYC

enhancer weakens a TCF7L2 binding site.

Slide20

Analysis of Splicing Patterns

Slide21

Exploiting the past and the future in protein secondary structure prediction

Predicting the secondary structure of a protein (alpha-helix, beta sheet, coil) is an important step towards understanding its three dimensional structure as well as its function.

Old methods : ML models that don’t capture variable long ranged information, Increasing size of window leads to overfitting

Slide22

Slide23

Slide24

Results

Slide25

Results

Overall performance close to 76% correct classification with 6 BRNNs

Use a range to limit the size of the window

Size of window

Slide26

Questions

Based on the more recent models and technologies seen in class, which of them can be applied to these problems?

Can these techniques be applied to other bioinformatics tasks?