/
Richard H. Scheuermann, Ph.D. Richard H. Scheuermann, Ph.D.

Richard H. Scheuermann, Ph.D. - PowerPoint Presentation

SpunkyFunkyGirl
SpunkyFunkyGirl . @SpunkyFunkyGirl
Follow
342 views
Uploaded On 2022-08-03

Richard H. Scheuermann, Ph.D. - PPT Presentation

Director of Informatics J Craig Venter Institute On behalf of the GSCBRC Metadata Working Group Standardized Metadata for Human PathogenVector Genomic Sequences Genome Sequencing Centers for Infectious Disease GSCID ID: 933837

specimen obo obolibrary org obo specimen org obolibrary purl http obi data metadata sample host sequencing project bioinformatics source

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Richard H. Scheuermann, Ph.D." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Richard H. Scheuermann, Ph.D.Director of InformaticsJ. Craig Venter InstituteOn behalf of theGSC-BRC Metadata Working Group

Standardized Metadata for Human Pathogen/Vector Genomic

Sequences

Slide2

Genome Sequencing Centers for Infectious Disease (GSCID)

Bioinformatics Resource Centers (BRC)

www.viprbrc.org

www.fludb.org

Slide3

High Throughput SequencingEnabling technologyEpidemiology of outbreaksPathogen evolutionHost range restrictionGenetic determinants of virulence and pathogenicityMetadata requirementsTemporal-spatial information about isolatesSelective pressuresHost species of specimen source

Disease severity and clinical manifestations

Slide4

Metadata Submission Spreadsheets

1

1

1

1

2

2

3

3

4

4

4

Slide5

Complex Query Interface

Slide6

Metadata InconsistenciesEach project was providing different types of metadataNo consistent nomenclature being usedImpossible to perform reliable comparative genomics analysisRequired extensive custom bioinformatics system development

Slide7

GSC-BRC Metadata Standards Working GroupNIAID assembled a group of representatives from their three Genome Sequencing Centers for Infectious Diseases (Broad, JCVI, UMD) and five Bioinformatics Resource Centers (EuPathDB, IRD, PATRIC, VectorBase, ViPR) programsDevelop an approach for capturing standardized metadata for pathogen isolate sequencing projectsBottom up approach to capture data considered to be important by users

Compatible with data standards and submission requirements

Slide8

Metadata Standardization ProcessCollect example metadata sets from sequencing project white papers and other project sources (e.g. CEIRS)Identify data fields that appear to be common across projects and samples (core) and data fields that appear to be pathogen or project specificFor each data field, provide common set of attributes, including preferred term, definition, synonyms, allowed value sets preferably using controlled vocabularies, expected syntax, etc.Assemble all metadata fields into a semantic network based on the Ontology of Biomedical Investigation (OBI)

Compare, map, and harmonize to other relevant initiatives, including Genome Standards Consortium MIxS and NCBI BioProjects/

BioSamplesDraft data submission spreadsheets Beta test version 1.0 standard with new GSCID white paper projects, collecting feedback

Adopt version 1.1 metadata standard and data submission spreadsheets for all

GSCID white paper and BRC-associated projects

Slide9

Core Project

Metadata Field ID

Metadata Field Descriptor

OBO Foundry ID

BioProject

/

BioSample

MIxS

CP1

Project Title

 

http://

purl.obolibrary.org

/obo/OBI_0001622

Title

project name

CP2

Project ID

http://purl.obolibrary.org/obo/OBI_0001628

 

 

CP3

Project Description

http://purl.obolibrary.org/obo/OBI_0001615

Description

 

CP4

Supporting Grants/Contract ID

http://purl.obolibrary.org/obo/OBI_0001629

Grant Agency

 

CP5

Publication Citation

http://purl.obolibrary.org/obo/OBI_0001617

PubMed ID

ref_biomaterial

CP6

Sample Provider Principal Investigator (PI) Name

 

 

 

CP7

Sample Provider PI's Institution

 

 

 

CP8

Sample Provider PI's email

 

 

 

CP9

Sequencing Facility

 

 

 

CP10

Sequencing Facility Contact Name

 

 

 

CP11

Sequencing Facility Contact's Institution

 

 

 

CP12

Sequencing Facility Contact's email

 

 

 

CP13

Bioinformatics Resource Center

http://purl.obolibrary.org/obo/OBI_0001626

 

 

CP14

Bioinformatics Resource Center Contact Name

 

 

 

CP15

Bioinformatics Resource Center Contact's Institution

 

 

 

CP16

Bioinformatics Resource Center Contact's email

 

 

 

CP17

Target Material

 

Material

 

CP18

Project Method

 

Methodology

 

CP19

Project Objectives

 

Objective

 

CP20

Sample Scope

 

Sample Scope

 

CP21

Target Capture

 

Capture

 

Slide10

Core Sample

Metadata Field ID

Metadata Field Descriptor

OBO Foundry ID

NCBI

BioSample

MIxS

CS1

Specimen Source ID

http://purl.obolibrary.org/obo/OBI_0001141

host-subject-id

host_subject_id

CS2

Specimen Source Species

http://purl.obolibrary.org/obo/OBI_0100026

specific_host

host_taxid

CS3

Species Source Common Name

 

host-common-name

host_common_name

CS4

Specimen Source Gender

http://purl.obolibrary.org/obo/PATO_0000047

host-sex

sex

CS5

Specimen Source Age - Value

http://purl.obolibrary.org/obo/OBI_0001167

host-age

age

CS6

Specimen Source Age - Unit

http://purl.obolibrary.org/obo/UO_0000003

host-age

 

CS7

Specimen Source Health Status

http://purl.obolibrary.org/obo/OGMS_0000022

host-health-state

disease status

CS8

Specimen Collection Date

http://purl.obolibrary.org/obo/OBI_0001619

collection_date

collection date

CS9

Specimen Collection Location - Latitude

http://purl.obolibrary.org/obo/OBI_0001620

lat_lon

geographic location (

lat

and

long)

CS10

Specimen Collection Location - Longitude

http://purl.obolibrary.org/obo/OBI_0001621

lat_lon

geographic location (

lat

and

long)

CS11

Specimen Collection Location - Location

http://purl.obolibrary.org/obo/GAZ_00000448

geo_loc_name

CS12

Specimen Collection Location - Country

http://purl.obolibrary.org/obo/OBI_0001627

geo_loc_name

geographic location (country and/or

sea)

CS13

Specimen ID

http://purl.obolibrary.org/obo/OBI_0001616

sample name

 

CS14

Specimen Type

http://purl.obolibrary.org/obo/OBI_0001479

host-tissue-sampled

body habitat, body site, body product

CS15

Suspected Organism(s) in Specimen - Species

http://purl.obolibrary.org/obo/OBI_0000925

organism

CS16

Suspected Organism(s) in Specimen -

Subclass

 

strain

subspecific genetic lineage

CS17

Human Pathogenicity of Suspected Organism(s) in Specimen

http://purl.obolibrary.org/obo/OBI_0000925

 

phenotype

CS18

Environmental Material

http://purl.obolibrary.org/obo/ENVO_00010483

isolation-source

environment

(material)

CS19

Organism Detection Method

http://purl.obolibrary.org/obo/OBI_0001624

 

sample collection device or method

CS20

Specimen Repository

 

culture-collection

source material identifiers

CS21

Specimen Repository Sample ID

 

culture-collection

source material identifiers

CS22

Sample ID - Sequencing Facility

 

 

 

CS23

Nucleic Acid Extraction Method

http://purl.obolibrary.org/obo/OBI_0666667

samp_mat_process

sample material processing

CS24

Nucleic Acid Preparation Method

 

samp_mat_process

sample material processing

CS25

Sequencing Method

http://purl.obolibrary.org/obo/OBI_0600047

 

sequencing method

CS26

Assembly Algorithm

http://purl.obolibrary.org/obo/OBI_0001522

 

assembly

CS27

Depth of Coverage - Average

http://purl.obolibrary.org/obo/OBI_0001618

 

finishing strategy

CS28

Annotation Algorithm

http://purl.obolibrary.org/obo/OBI_0001625

 

CS29

GenBank Record ID

http://purl.obolibrary.org/obo/OBI_0001614

 

 

CS30

Comments

http://purl.obolibrary.org/obo/IAO_0000300

host-description

 

CS31

Specimen Collector Name

 

collected-by

 

CS32

Specimen Collector's Institution

 

 

 

CS33

Specimen Collector's email

 

 

 

CS34

Sample Category

 

attribute_package

 

CS35

Host Disease

 

host-disease

 

Slide11

Metadata Processes

d

ata transformations –image processingassembly

s

equencing assay

specimen source – organism or environmental

specimen

collector

input sample

reagents

technician

equipment

type

ID

qualities

t

emporal-spatial

region

d

ata transformations –

variant detection

serotype marker detect.

gene detection

primary

data

sequence

data

genotype/serotype/

gene data

specimen

microorganism

enriched

NA sample

microorganism

g

enomic NA

s

pecimen isolation

process

isolation

protocol

sample

processing

data archiving

process

sequence

data record

has_input

has_output

has_output

has_specification

has_part

has_part

is_about

has_input

has_output

has_input

has_input

has_input

has_output

has_output

has_output

is_about

GenBank

ID

denotes

located_in

denotes

has_input

has_quality

instance_of

t

emporal-spatial

region

located_in

Specimen Isolation

Material Processing

Data Processing

Sequencing Assay

Investigation

t

emporal-spatial

region

located_in

t

emporal-spatial

region

located_in

t

emporal-spatial

region

located_in

t

emporal-spatial

region

located_in

quality assessment

assay

Host Characterization

has_input

has_output

Slide12

organism

environmentalmaterial

equipment

person

specimen

source role

specimen

capture role

specimen

collector role

t

emporal-spatial

region

spatial

region

temporal

interval

GPS

location

d

ate/time

specimen X

s

pecimen isolation

procedure X

isolation

protocol

has_input

has_output

plays

plays

has_specification

has_part

denotes

located_in

name

denotes

spatial

region

geographic

location

denotes

located_in

affiliation

has_affiliation

ID

denotes

specimen type

instance_of

s

pecimen isolation

procedure type

instance_of

Specimen Isolation

plays

has_input

organism part

hypothesis

is_about

IRB/IACUC

approval

has_authorization

environment

has_quality

organism

pathogenic

disposition

has part

has disposition

ID

denotes

CS1

gender

age

health status

has quality

CS4

CS5/6

CS7

CS2/3

CS8

CS9/10

CS11/12

CS13

CS14

CS18

CS15/16

Slide13

Core Project Semantics

Slide14

Outcome of Metadata Standards WGConsistent metadata captured across GSCIDBottom up approach focuses standard on important featuresSupport more standardized BRC interface developmentHarmonization with related stakeholders – Genome Standards Consortium MIxS, OBO Foundry OBI and NCBI BioProject/

BioSampleRepresented in the context of an extensible semantic framework

Slide15

Identified gaps in data field list (e.g. temporal components)Includes logical structure for other, project-specific, data fields - extensibleIdentified gaps in ontology data standards (use case-driven standard development)Identified commonalities in data structures (reusable)Support for semantic queries and inferential analysis in futureOntology-based framework is extensibleSequencing => “omics”Utility of semantic representation

Slide16

AcknowledgementsBruce Birren2,b, Lauren Brinkac1,a, Vincent Bruno3,c, Elizabeth Caler1,a, Ishwar Chandramouliswaran1,a, Sinéad

Chapman2,b, Frank Collins8,h, Christina Cuomo2,b, Joana

Carneiro Da Silva3,c, Valentina Di Francesco4

, Vivien Dugan1,a, Scott Emrich8,h, Mark Eppinger3,c, Michael Feldgarden

2,b, Claire Fraser3,c, W. Florian Fricke3,c, Maria Giovanni

4, Gloria Giraldo-Calderon8,h, Omar S. Harb5,g, Matt Henn2,b, Erin Hine3,c, Julie Dunning Hotopp3,c

, Jessica C. Kissinger

6,g

,

Eun

Mi

Lee

4

,

Punam

Mathur

4, Garry Myers

3,c, Emmanuel Mongodin3,c, Cheryl Murphy2,b, Dan Neafsey2,b, Karen Nelson1,a

, Ruchi Newman2,b, William Nierman

1,a, Brett E. Pickett1,d,e, Julia Puzak4

, David Rasko3,c, David S. Roos5,g, Lisa Sadzewica

3,c, Richard H. Scheuermann1,d,e, Lynn M. Schriml3,c, Bruno Sobral7,f, Tim Stockwell1,a

, Chris Stoeckert5,g, Dan Sullivan7,f, Luke Tallon3,c, Herve

Tettelin3,c, Doyle V. Ward2,b, David Wentworth1,a, Owen White

3,c

, Rebecca Will

7,f

, Jennifer Wortman

2,b

, Alison Yao

4

,

Jie

Zheng

5,g

 

1

J. Craig Venter Institute, Rockville, MD and San Diego,

CA,

2

Broad

Institute, Cambridge,

MA,

3

Insitute

for Genome Sciences, University of Maryland School of Medicine, Baltimore,

MD,

4National Institute of Allergy and Infectious Diseases, Rockville, MD,

5

University

of Pennsylvania, Philadelphia,

PA,

6

University

of Georgia, Athens,

GA,

7

Cyberinfrastructure

Division, Virginia Bioinformatics Institute, Blacksburg,

VA,

8

University

of Notre Dame, South Bend,

IN,

a

J

. Craig Venter Institute Genome Sequencing Center for Infectious

Diseases,

b

Broad

Institute Genome Sequencing Center for Infectious

Diseases,

c

Institute

for Genome Sciences Genome Sequencing Center for Infectious

Diseases,

d

Influenza

Research Database Bioinformatics Resource

Center,

e

Virus

Pathogen Resource Bioinformatics Resource Center, fPATRIC

Bioinformatics Resource Center, gEuPathDB Bioinformatics Resource Center, hVectorBase

Bioinformatics Resource CenterTanya Barrett – NCBIPelin Yilmaz – Genome Standards Consortium

N01AI2008038 /N01AI40041