19 July 2011 Richard H Scheuermann PhD Department of Pathology UT Southwestern Medical Center Outline Brief o verview of NIAIDSponsored Influenza Research Database IRD Comprehensive integrated database ID: 933194
Download Presentation The PPT/PDF document "Sequence Feature Variant Type and Evolut..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Sequence Feature Variant Type and Evolutionary Trajectory Analysis using the Influenza Research Database (IRD)
19 July 2011Richard H. Scheuermann, Ph.D.Department of PathologyU.T. Southwestern Medical Center
Slide2Outline
Brief overview of NIAID-Sponsored Influenza Research Database (IRD)Comprehensive integrated database
Analysis and visualization tools
U.S. NIH-funded, free access, open to all
Developed by a team of research scientists,
bioinformaticians
and professional software developers
www.fludb.org
www.viprbrc.org
for other human viral pathogens
Novel approach to genotype-phenotype association studies – Sequence Feature Variant Type (SFVT) analysis
Evolutionary Trajectory analysis of the pandemic (H1N1) 2009 strain
Slide3Public Health Impact of Influenza
Seasonal flu epidemics occur yearly during the fall/ winter months and result in 3-5 million cases of severe illness worldwide.More than 200,000 people are hospitalized each year with seasonal flu-related complications in the U.S.Approximately 36,000 deaths occur due to seasonal flu each year in the U.S.
Populations at highest risk are children under age 2, adults age 65 and older, and
groups with other
comorbidities
.
Pandemics1918 Spanish flu (H1N1); 20 - 100 million deaths1957 Asian flu (H2N2); 1 - 1.5 million deaths1968 Hong Kong flu (H3N2); 750,000 - 1 million deaths2009 Swine origin (H1N1); > 16,000 deaths as of March 2010
Source: World Health Organization - http://www.who.int/mediacentre/factsheets/fs211/en/index.html
Slide4Influenza Virus
Orthomyxoviridae
family
Negative-
strand RNA
Segmented
Enveloped
8 RNA segments encode
11 proteins
Classified based on serology of HA and NA
Slide5IRD Overview
www.fludb.org
Slide6Slide7Search Access to Data
www.fludb.org
Slide8Data Types
Slide9Core Query Attributes
Slide10Advanced Query Options
Slide11Segment search results
Slide12Analysis and Visualization
www.fludb.org
Slide13Analysis and Visualization Tools
Slide14Workbench Access
www.fludb.org
Slide15My Private Workbench
Slide16Slide17Slide18Slide19Slide20www.viprbrc.org
Slide21IRD Summary
Funded by U.S. National Institute of Allergy and Infectious Diseases (NIAID)Free and open access with no use restrictionsDeveloped by a team of research scientists, bioinformaticians and professional software developers
Comprehensive collection of public data
Novel derived data, novel analytical tools, unique functions
Integration – Integration – Integration
www.fludb.org
www.viprbrc.org
Slide22Novel approach to genotype-phenotype association studies – Sequence Feature Variant Type (SFVT) Analysis
Slide23Limitations to PhylogeneticsTraditional virus
phylogenetics focuses on comparative analysis of whole genome/genome segments, and is most useful to understand virus evolutionHowever, the genetic determinants of important viral phenotypes, e.g. virulence, host range, replication efficiency, immune response evation, etc., are determined by focused functional regions of viral proteinsTherefore, specific genotype-phenotype association can be masked by other evolutionary factors that contribute to traditional phylogenetic analysis
Slide24SFVT approach
VT-1 I F D R L E T L I L
VT-2 I F
N
R L E T L I L
VT-3 I F D R L E T
I V L
VT-4
L
F D
Q
L E T L
V
S
VT-5 I F D R L E
N
L
T
L
VT-6 I F
N
R L E
A
L I L
VT-7 I
Y
D R L E T L I L
VT-8 I F D R L E T L
V
L
VT-9 I F D R L E
N
I
V
L
VT-10 I F
E
R L E T L I L
VT-11
L
F D
Q
M
E T L
V
S
Influenza A_NS1_nuclear-export-signal_137(10)
Identify regions of protein/gene with known structural or functional properties – Sequence Features (SF)
an alpha-helical region, the binding site for another protein, an enzyme active site, an immune
epitope
Determine the extent of sequence variation for each SF by defining each unique sequence as a Variant Type (VT)
High-level, comprehensive grouping of all virus strains by VT membership for each SF independently
Genotype-phenotype association statistical analysis, e.g. genetic determinants of host range, virulence, replication rate
Influenza A_NS1_alpha-helix_171(17)
Slide25SF definition
Based on experimentation reported in the literature and 3D protein structures (PDB records)Captured by manual curationDefined by the specific amino acid positions in the polypeptide chainAnnotated with the know structural or functional properties
Slide26Influenza A Sequence Features as of 18JUL2011
4128 SFs total
Slide27NS1 Sequence Features
Slide28SF8 (nuclear export signal)
Slide29VT for SF8 (nuclear export signal)
Slide30VT-1 strains
Slide31Do variations in NS1 sequence
featureS influence influenza virus host range?
Slide32NS1 Sequence Features
Slide33VT for SF8 (nuclear export signal)
Slide34VT distribution by host
Slide35Causes of apparent NS1 VT-associated host range restriction
Virus spread - capability + opportunityPhenotypic property of the virus – limited capacityRestricted founder effect – limited opportunityRestricted spatial-temporal distribution
Sampling bias – assumption of random sampling
Oversampling – avian H5N1 in Asia; 2009 H1N1
Undersampling
– large and domestic cats
Linkage to causative variant
Slide36VT-11 strains
Slide37VT for SF8 (nuclear export signal)
Slide38VT lineages
Slide39VT-4 lineage
Slide40Slide41VT-4 lineage = B allele/group
Slide42VT-16 & VT-9 lineages
Slide43Slide44VT-7 lineage
Slide45Slide46Evolutionary Trajectory analysis of the pandemic (H1N1) 2009 strain
Slide47Phylogenetic Analysis
Evolutionary originSelect a representative pandemic (H1N1) 2009 sequence from the IRD databaseBLAST to identify most similar sequences
Assess phylogenetic relationships
Slide48Pandemic (H1N1) 2009 selection
Slide49BLAST Result
Slide50Segment 1
phylogenetic tree
Swine/Ohio/2004
Duck/USA/2000s
Human/USA/2007 (seasonal)
Swine/USA/1990s
Pandemic (H1N1) 2009
Slide51Temporal componentReference strain
A/California/04/2009BLASTReturn top 1000 resultsNormalize dataGraph nucleotide differences versus isolation year differences
Slide52NP chart
Slide53NS chart
Slide54HA chart
Slide55Group 1
Group 3
Group 2
Slide56<= Cali/04/09
NS blue cluster (G1)
Slide57<= Cali/04/09
NS green cluster (G2)
Slide58Phylogenetic Trees QuantificationAnalysis method
Build tree for Group 1 and Group 2 strains separatelyAnalyze branch lengths of treesResultsAvg. Group 1 Branch Length: 0.0034 (S.D. 0.0062)Avg. Group 2 Branch Length: 0.0075 (S.D. 0.0118)T-test (2 sample, unequal variance): 3.22 10-05
Slide59Group 1
Group 3
Group 2
Slide60HA trendline
Slide61Evolutionary Trajectory Slopes vs. Mutation Rate
Segment
Group
1
Slope
Group
2 SlopeMutation Rate
PB2
6.8
24.9
4.3
PB1
7.6
26.9
PA
5.9
23.2
HA
5.5
28.8
5.7
NP
2.9
18.2
3.6
NA
3.8
23.1
3.2
M
1.3
5.6
1.5
NS
2.0
12.5
1.6
Substitutions/segment/year
Slide62Evolutionary Trajectory (E.T.)
Similar but
Distantly Related
(
SDR
)
Slide63Garten
, et al. Science 2009
Slide64Garten, et al. Science 2009
Slide65<= Cali/04/09
ET
Slide66<= Cali/04/09
SDR
Slide67North American H1N1 Lineage - HA
H1N1 2009
American Swine, 2000’s
North American H1N1
Lineage
HA – Group 1
American Swine, 90’s
American Swine, 80’s
American Swine, 70’s
American Swine, 40 - 60’s
Slide68Evolutionary Trajectory Plots
Evolutionary Trajectory of a strain, with candidates displayed.
Slide69Slide70Slide71Summary
The Influenza Research Database (IRD) provides a comprehensive resource of data, analysis and visualization tools about influenza virus – www.fludb.org SFVT represents a novel tool that can be used to better understand genotype-phenotype relationships for flu
Use of
IRD to illuminate the viral origins of the pandemic (H1N1) 2009 virus
IRD is continually evolving to capture and integrate addition data and analytical tools to support the needs of the influenza research community
Slide7272
U.T. Southwestern
Richard
Scheuermann (PI)
Burke Squires
Jyothi
Noronha
Victoria Hunt
Shubhada
Godbole
Brett Pickett
Yun
Zhang
MSSM
Adolfo Garcia-
Sastre
Eric
Bortz
Gina
Conenello
Peter
Palese
Vecna
Chris Larsen
Al Ramsey
LANL
Catherine
Macken
Mira
Dimitrijevic
U.C. Davis
Nicole
Baumgarth
Northrop Grumman
Ed
Klem
Mike
Atassi
Kevin
Biersack
Jon
Dietrich
Wenjie
Hua
Wei Jen
Sanjeev
Kumar
Xiaomei
Li
Zaigang
Liu
Jason
Lucas
Michelle
Lu
Bruce
Quesenberry
Barbara
Rotchford
Hongbo
Su
Bryan Walters
Jianjun
Wang
Sam
Zaremba
Liwei
Zhou
IRD SWG
Gillian Air, OMRF
Carol Cardona, Univ. Minnesota
Adolfo Garcia-
Sastre
, Mt Sinai
Elodie
Ghedin
, Univ. Pittsburgh
Martha Nelson, Fogarty
Daniel Perez, Univ. Maryland
Gavin Smith, Duke Singapore
David Spiro, JCVI
Dave
Stallknecht
, Univ. Georgia
David
Topham
, Rochester
Richard Webby, St Jude
USDA
David Suarez
Sage
Analytica
Robert Taylor
Lone
Simonsen
CEIRS Centers
Acknowledgments
N01AI40041
Slide73Segment 6 (NA) By Host