/
Gene Signatures and Knowledge-Guided Gene Set Characterization Lab Gene Signatures and Knowledge-Guided Gene Set Characterization Lab

Gene Signatures and Knowledge-Guided Gene Set Characterization Lab - PowerPoint Presentation

Tigerwoods
Tigerwoods . @Tigerwoods
Follow
342 views
Uploaded On 2022-08-02

Gene Signatures and Knowledge-Guided Gene Set Characterization Lab - PPT Presentation

KnowEnG Center Signatures and KnowledgeGuided Characterization KnowEnG Center 1 PowerPoint by Charles Blatti Introduction This goals of the lab are as follows Define a novel gene expression signature ID: 932749

gene signatures knowledge characterization signatures gene characterization knowledge signature knoweng center guided step click genes network expression status set

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Gene Signatures and Knowledge-Guided Gen..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Gene Signatures and Knowledge-Guided Gene Set Characterization Lab

KnowEnG Center

Signatures and Knowledge-Guided Characterization | KnowEnG Center

1

PowerPoint by Charles Blatti

Slide2

Introduction

This goals of the lab are as follows:

Define a novel gene expression signature based on estrogen receptor status in TCGA samples using the integrative (iLINCS) Data Portal and identify other similar known gene expression signatures

Use

networks of prior knowledge to identify pathways, additional genes, and other annotations that relate to the gene set of our novel signature using SPIA,

GeneMANIA, and KnowEnG’s DRaWR.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

2

Slide3

Step 0A: Start the VM

Follow instructions for starting VM. (This is the Remote Desktop software.)The instructions are different for UIUC and Mayo participants.Instructions for

UIUC users are here: http://publish.illinois.edu/compgenomicscourse/files/2020/06/SetupVM_UIUC.pdf

Instructions for Mayo users are here

:http://publish.illinois.edu/compgenomicscourse/files/2020/06/VM_Setup_Mayo.pdf

Variant Calling Workshop | Chris Fields | 2020

3

Slide4

Step 0B: Local Files (for UIUC users)

For viewing and manipulating the files needed for this laboratory exercise, denote the path C:\Users\IGB\Desktop\VM

on the VM as the following:

[course_directory]

We will use the files found in:

[

course_directory]\

07_Signatures_and_Characterization

Variant Calling Workshop | Chris Fields | 2020

4

Slide5

Step 0B: Local Files (for mayo clinic users)

For viewing and manipulating the files needed for this laboratory exercise, denote the path C:\Users\Public\Desktop\datafiles

on the VDI as the following:

[course_directory

]We will use the files found in:

[

course_directory]\

07_Signatures_and_Characterization

Variant Calling Workshop | Chris Fields | 2020

5

Slide6

Creating a Novel Gene Expression Signature

In this exercise, we will use the integrative iLINCS data portal to extract gene expression data from TCGA BRCA samples and build a gene signature based on the estrogen receptor status.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

6

Slide7

Step 1: Perturbagen and Disease Datasets

Open your web browser and go to the iLINCS data portal: http://www.ilincs.org/ilincs/

This portal, curated by the LINCS Data Coordination and Integration Center, contains transcriptomic and proteomic datasets from the many LINCS affiliated projects, including the LINCS L1000 assay. It also contains several other large public datasets of perturbations to cell lines and samples of disease.

We will define a custom gene signature from TCGA data and see how it relates to the library of signatures generated from the LINCS L1000 project.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

7

Slide8

Step 2: Select Breast Cancer Dataset

Click on “Datasets” in the options along the top

Select the “All Datasets” tab

Click “Choose” button for TCGA datasets

Find “919 mRNA-seq breast invasive carcinoma (BRCA) samples from TCGA project” by Collins, et al. Click “Analyze”.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

8

Slide9

Step 3A: Creating a Novel Gene Signature

Click on “Create a Signature”

In “Select grouping variable” dropdown select “breast_carcinoma_estrogen_receptor_status”

In “Select group 1” dropdown select “Negative”

In “Select group 2” dropdown select “Positive”

Finally, click on “Create Signature” button

Signatures and Knowledge-Guided Characterization | KnowEnG Center

9

Slide10

When the signature is calculated, a quick summary of the number of samples from each group is presented

Step 3B: Our ER Status Gene Signature

Signatures and Knowledge-Guided Characterization | KnowEnG Center

10

Next, we will look more closely at the genes involved in our signature.

Slide11

Step 4A: Examining Gene Expression of our Signature

To get statistics about how the signature is defined, we will select “Modify the list of selected genes

” We are presented with a volcano plot for the log fold change (x-axis) and differential expression significance (y-value) of each gene.

Thresholds on both of these criteria define the genes selected for the signature

Slide the differential expression range to the values -3 and

3 so there are only about 100 genes in the our ER status signature. Click Analyze

.Signatures and Knowledge-Guided Characterization | KnowEnG Center

11

Slide12

Step 4B: Examining Gene Expression of our Signature

Click the “Signature Data” tab and “

Show selected genes” to see the list of selected signature genes

Note that ESR1, estrogen receptor 1, is the most significantly differentially expressed gene, which is consistent with the immunohistochemical staining assay result that defined the positive and negative groups.

Because of the number of samples (868) is high, the differential expression p-values are very significant for these top signature genes

Click the “

Download ” button to save “Signature with only selected genes

” table as an Excel file called “subsetSig_*.xls” and move to [course_directory]/07_Signatures_and_Characterization/. We will use this later.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

12

Slide13

Finding Similar Signatures from Public Libraries

Here we will attempt to find signatures from large public collections that relate to the ER status signature we defined. These public signatures are defined using both basic methods as well as the Characteristic Direction method.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

13

Slide14

Step 5A: Finding Related L1000 Signatures

Click on the “Connected Signatures” tab and “

Use Complete Signature” in the bottom half of the screen.We will start by looking at shRNA gene knockdown signatures defined from the LINCS L1000 project, thus only 976 genes are measured.

Click the checkbox next to “LINCS consensus (CGS) gene knockdown signature

”. These consensus signatures are defined by combining all different shRNA with different seeds that target the same gene and by comparing to appropriate control experiments

Signatures and Knowledge-Guided Characterization | KnowEnG Center

14

Slide15

Step 5B: Finding Related L1000 Signatures

To view the similar signatures when the calculation is complete, expand the results by clicking on “992 of LINCS consensus (CGS) gene knockdown signature

The third result is CDK4, cell division protein kinase 4, an important regulator of cell cycle progression.

Previous literature has shown that silencing of CDK4 will have a variable influence on cancer progression based on the expression of estrogen receptor.

Inhibition of CDK4 increases migration and stem-like cell activity in ER negative breast cancer.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

15

Slide16

Step 5C: Finding Related L1000 Signatures

By clicking on the small graph icon in the Concordance column, we can see the correlation between our ER status signature and the LINCS 1000 signature for CDK4 knockdown

Signatures and Knowledge-Guided Characterization | KnowEnG Center

16

Slide17

Step 6: Related ENCODE Signatures

Uncheck the checkbox for “LINCS consensus (CGS) gene knockdown signature” and Click the checkbox next to “

ENCODE Transcription Factor Binding Signatures”. Expand the results when computed.These signatures are defined by creating gene level scores that integrate the distance of transcription factor ChIP peaks and the likelihood that the gene is regulated in the condition using the TREG method, True REGulatory TF-gene interactions.

Two of the top five results we recover are TF signatures of estrogen receptor binding meaning our ER status differential expression signature matches ER differential binding signatures. The other TFs in the top signatures also have known roles in mediating ER binding.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

17

Slide18

Step 7A: Finding Related Signatures Using Characteristic Direction

Click on the “Analysis Results

” tab which contains many different methods for analyzing our novel ER status gene signature. We will discuss some of these next.

The public signatures so far have been defined by independently considering whether each gene is differential expressed. The following exercise uses signatures defined with the characteristic direction

method (L1000CDS²), which represents each signature with an arrow in gene expression space.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

18

Slide19

Step 7B: Finding Related Signatures Using Characteristic Direction

The L1000CDS² tool is a LINCS L1000 characteristic direction signature search engine where users can find matches to their input signature from 33K small molecule perturbagen signatures covering 62 cell lines and 4K small molecules.

We will go directly to the

L1000CDS² tool by pasting this link in our browser: http://amp.pharm.mssm.edu/L1000CDS2/

Signatures and Knowledge-Guided Characterization | KnowEnG Center

19

Slide20

Step 7C: Finding Related Signatures Using Characteristic Direction

Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characterization/ERstatus_top100_logDiffExp.csv

This is our ER status gene signature with the pair of columns [Name_GeneSymbol, Value_LogDiffExp] extracted from our earlier Excel download

Paste the contents of this file into the “up genes” text box on the left, its name will change to “signature

”.In the Configuration panel, switch mimic to reverse

small molecule signaturemake sure latest

database version is selecteduncheck the three remaining checkboxesClick the “Search

” buttonSignatures and Knowledge-Guided Characterization | KnowEnG Center

20

Slide21

Step 7D: Finding Related Signatures Using Characteristic Direction

These are the top small molecule LINCS L1000 signatures that are the most opposite to our ER status signature.

The idea is that if our ER status signature represents a direction in gene space, these signatures of small molecules perturbations represent the best reversal of that signature.

The top two results are unnamed Broad compounds, but the Jak2 inhibitor curcubitacin I is known to reduce mammary tumorigenesis and metastasis by inhibiting Rac1 activity which is frequently elevated in ER positive tumors.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

21

Slide22

Step 8: Caveats about using LINCS signatures

Only 978 genes are measured in L1000 probe. Other gene values can be imputed from Connectivity Map dataset. However, the missing or imputed values can make signature analysis less reliable.

Also, although tens of thousands of signatures exist, most are still missing. Tools are being developed to identify signatures by learning models on dense parts of the cube and then learning how to correctly transfer those models to sparse regions.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

22

Slide23

Discovering Pathways Related to Our Gene Signature

In this section, we will consider some of the characterization resources that available for gene signatures and gene sets.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

23

Slide24

Step 9: Standard Gene Set Enrichment

Back on the iLINCS tab, two of the “Analysis Results” tools that are linked to are

Enrichr and DAVID.

DAVID is the enrichment tool used in the Regulatory Genomics lab.

Both tools use standard statistical enrichment tests to examine the overlap of the 100 genes of our ER status gene signature with Gene Ontology term annotations, pathways, and other gene sets.

These tools output the results in slightly different ways, so you may want to explore them in your own time.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

24

Slide25

Step 10A: Pathway Network Enrichment Test

Signaling Pathway Enrichment Analysis (SPIA) is a method for assessing the impact of a gene set on a pathway. It combines standard enrichment p-values with network perturbation based p-values.

Click on “Pathway Analysis

”Estrogen signaling pathway is the third result related to our ER status gene signature, although the overall adjusted p-value “SPIA padj” is not significant. Our gene signature is computed to activate the pathway.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

25

Slide26

Step 10B: Pathway Network Enrichment Test

Click on the KEGG icon in the last column for Estrogen signaling pathway.

Yellow nodes are up-regulated genes in our signature, blue are down-regulated

Signatures and Knowledge-Guided Characterization | KnowEnG Center

26

Slide27

Step 11A: GeneMANIA

Return to the analysis result by clicking on “Differential Expression Signature” in the tool bar at the top

The last linked tool we will explore today from iLINCS is

GeneMANIA.

GeneMANIA is a network-based guilt-by-association algorithm that finds the network neighbors of an input gene set from a heterogeneous collection of interaction networks

Go to https://genemania.org/

Signatures and Knowledge-Guided Characterization | KnowEnG Center

27

Slide28

Step 11A: GeneMANIA

We are going to enter the top 20 differentially expressed genes from our ER status gene signature. We will use GeneMANIA to return 20 additional network neighbor genes (not necessarily differentially expressed themselves)

Then we will look at functional enrichment of this combined set of 40 genes.

Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characterization/ERstatus_top20.txtThis is the top 20 differentially expressed genes of our ER status gene signature extracted from the Name_GeneSymbol column our earlier Excel download

Paste this list into the text box at the top left corner of the main page

Signatures and Knowledge-Guided Characterization | KnowEnG Center

28

Slide29

Step 11B: GeneMANIA

Click on the stacked-dots options button

This first list shows all the possible networks that GeneMANIA will consider combining for the analysis of our twenty genes

Select “Customise advanced options”

This menu shows that we are going to find at most 20 neighbors using the automatic network weighting scheme, which is based on our 20 query genesClick the search

magnifying glass.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

29

Slide30

Step 11C: GeneMANIA

The resulting network contains our 20 input genes (striped) and our 20 predicted network neighbors (solids). The size of the network neighbors indicates its final guilt-by-association value on the composite affinity network.

You may choose between three arrangements of the graph. The stacked arrangement may be easiest for understanding the nodes. You can hover over any node to highlight its neighbors.

For example,NCOA7 is also known as Estrogen Nuclear Receptor Coactivator 1 NCOA3 is associated with Estrogen-Receptor Positive Breast Cancer

Both are connected to ESR1 (and other top 20 genes) through pathways edges and neither are in our original 100 differentially expressed gene signature

Signatures and Knowledge-Guided Characterization | KnowEnG Center

30

Slide31

Step 11D: GeneMANIA

On the right side is the selected interaction networks that were relevant to the 20 input genes, sorted by type and by weight. You can toggle the networks to display any set of edges.

The highest weighted co-expression network is from breast tumors and relates the top 20 genes to each other fairly well, but does not connect them to the predicted 20.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

31

Slide32

Step 11E: GeneMANIA

Finally, we can perform the standard enrichment tests incorporating our predicted neighbors into our gene set.

Click on the pie chart in the bottom left corner

We see most of the results relating to hormone and steroid signaling pathways and receptors.

Signatures and Knowledge-Guided Characterization | KnowEnG Center 32

Slide33

Gene Set Characterization Using Discriminative Random Walks

In this final exercise, we will find terms related to the 100 top differentially expressed genes of our ER status signature using the DRaWR method that incorporates the functional annotation terms directly in the network-based algorithm.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

33

Slide34

Step 12A: Login Into KnowEnG Platform

Signatures and Knowledge-Guided Characterization | KnowEnG Center

34

KnowEnG

Platform:

https://knoweng.org/analyze/

Login

with

CILogon

- Login service through other accounts

Search:

Urbana, Mayo, Google,

Github

Slide35

Step 12B: Launch DRaWR Analysis

The first page has links to many resources, but we will get started by clicking “Start a New Pipeline”

The KnowEnG Analysis Platform has many knowledge network-informed pipelines. You will learn about more of them in the afternoon session.

For now, hover over Gene Set Characterization and click “Start Pipeline”.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

35

Slide36

Step 12C: Upload Data

Leave the default species “Human”

Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characterization/ERstatus_top100.txt

This is the top 100 differentially expressed genes of our ER status gene signature extracted from the Name_GeneSymbol column our earlier Excel downloadClick on the “

Upload New Data” tabSelect the “Paste a Gene List” button.

Give your gene list a name, e.g. “ERstatus_gene_list”

Paste the file contents into the gene list text box. Click “Done”Click “

Select” next to the name of your pasted list and you should see a checkmarkClick “Next

Signatures and Knowledge-Guided Characterization | KnowEnG Center

36

Slide37

Step 12D: Configure Algorithm Parameters

We will choose to use a subset of 4 gene set collections available in the knowledge network

Ontologies: Gene Ontology (default)

Pathways: Enrichr Pathway Membership (must add)Pathways: Reactome Pathways Curated

(must add)Tissue Expression: GEO Expression Set (must add)

(unclick Protein Domains: PFam Protein Domains)

Click “

Next

Signatures and Knowledge-Guided Characterization | KnowEnG Center

37

Slide38

Step 12E: Configure Network Parameters

Click “Yes

” for question about using the Knowledge NetworkThe Knowledge Network we will use is an integrated network from the

HumanNet project (“HumanNet Integrated Network

”)Network size information can be found here

The amount of network smoothing controls how much importance is put on network connections instead of the original 100 genes. We will use the default of

50%

Click “

Next”

Signatures and Knowledge-Guided Characterization | KnowEnG Center

38

Slide39

Step 12F: Reminder about DRaWR Algorithm

Squares are the Gene Ontology and pathway terms we selectedQuery Genes are our 100 ER status signature genesGray edges are the HumanNet Integrated Network

We are asking the algorithm to find property squares that a random walker who is forced to restart often at the query genes will visit unusually frequently

Signatures and Knowledge-Guided Characterization | KnowEnG Center

39

Slide40

Step 12G: Launch DRaWR Job

Change job name to “gene_set_characterization-DRaWR-HN

Verify all the parameters are correct.Click “

Submit Job”

While this is running, we are going to launch the standard fisher exact enrichment tests with the same data sets.

Click “Start New Pipeline

”Signatures and Knowledge-Guided Characterization | KnowEnG Center

40

Slide41

Step 13: Launch Standard Enrichment Tests

Hover over Gene Set Characterization and click “Start Pipeline”

Click “Select

” next to the name of your pasted list and you should see a checkmark. Click “Next”

Select same 4 collections:Ontologies: Gene Ontology (default)Pathways: Enrichr Pathway Membership (must add)Pathways: Reactome Pathways Curated (must add)

Tissue Expression: GEO Expression Set (must add)

(unclick Protein Domains: PFam Protein Domains)Click “Next”

Click “No” for question about using the Knowledge Network. Click “

Next”

Change job name to “

gene_set_characterization-fisher”Verify all the parameters are correct.

Click “

Submit Job

Signatures and Knowledge-Guided Characterization | KnowEnG Center

41

Slide42

Step 14A: View DRaWR Results

Click the “Go to Data Page” button

You can check the status of your jobs here. Gray arrows mean that your job is currently queued or running. A red icon means something went wrong.

Otherwise, when your job is successfully finished, you should be able to click the green arrow and see the primary result files.

Click on the DRaWR job “

gene_set_characterization-DRaWR-HN”

Then click on the “View Results” button

Signatures and Knowledge-Guided Characterization | KnowEnG Center

42

Slide43

Step 14B: View DRaWR Results

Slide the filter slider all the way to the right.

The DRaWR method picks up many GEO Expression gene sets that relate to ESR1 and estrogen and estradiol.

DRaWR also ranks highly a number of pathway and Gene Ontology terms related to extracellular matrix, which is known to have many molecules effected by estrogens and related to ER expression

Signatures and Knowledge-Guided Characterization | KnowEnG Center

43

Slide44

Step 14C: View Fisher Results

Click the “Data” link at the top of the page

Click on the DRaWR job “gene_set_characterization-fisher

”Then click on the “View Results” button

Slide the filter slider all the way to the right.

The Fisher method finds the same GEO Expression gene sets that relate to ESR1 and estrogen and estradiol, as well as some additional estradiol ones that DRaWR missed. It also detects many more less obviously related GEO gene sets.

The standard enrichment method does not detect any highly significant enrichments with pathways or Gene Ontology terms.

The extracellular matrix terms detected by DRaWR are strongly connected to the signature genes, but mostly through their HumanNet network neighbors and not direct connections.

Signatures and Knowledge-Guided Characterization | KnowEnG Center

44

Slide45

Main Take Home Messages

When you create your own gene signature, you can search libraries of public gene signatures that might provide you with insights relating to

mechanisms (e.g. gene knockdowns and transcription factor binding) or treatments (e.g. reverse small molecule perturbagens).

A gene signature or more simply a gene set can be analyzed in the context of a pathway, interaction, or other affinity network

to provide complementary annotations to standard enrichment tests

Signatures and Knowledge-Guided Characterization | KnowEnG Center

45