Taxonomy Anne Thessen athessenmbledu David Patterson dpattersonmbledu Data Conservancy Life Sciences Scientists Dream Computer what is the trajectory of the planet Seti Alpha 5 ID: 795787
Download The PPT/PDF document "The Future of Microalgal" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Future of Microalgal TaxonomyAnne Thessen, athessen@mbl.eduDavid Patterson dpatterson@mbl.edu(Data Conservancy, Life Sciences)
Slide2Scientist’s Dream
Computer, what is the trajectory of the planet
Seti
Alpha 5?
Slide3Taxonomist’s Dream
How many algal species can be found on this planet?
Slide4Taxonomist’s DreamWhat species is this?
Slide5Taxonomist’s Dream
Slide6Taxonomist’s Dream
Slide7Setting the stage for a ‘big new biology’
BIG = data-centric (
like
particle physics and astronomy)
Characterized by data sharing via a virtual pool
New = new skill sets, tools, cyber-infrastructure to exploit the data pool
Data driven discovery as a new means of understanding
GenBank
as a model within the Life Sciences
Slide8Small scienceLarge number of providers with small amounts of data.
Small number of providers with lots of data.
Slide9Aa
paleacea
Limulus
polyphemus
Kiwa
hirsuta
Osedax
frankpressi
Kingia
australis
Names
Pieris
japonica
Pieris
rapae
Trypanosoma
brucei
Homo sapiens
Slide10Many names for one taxon
Didimosphenia
geminata
Didymosphenia
geminata
Didymosphenia
geminata
Didymosphenia
geminata
Rock snot
Didymo
Echinella
geminata
Gomphonema
geminatum
Gomphonema
vulgare
Slide11Reconciliation GroupDidymosphenia
geminata
Didimosphenia
geminata
Didymo
Rock Snot
Echinella
geminata
Gomphonema
geminatum
Gomphonema
vulgare
Slide12Reconciliation GroupDidymosphenia
geminata
Didimosphenia
geminata
Didymo
Rock Snot
Echinella
geminata
Gomphonema
geminatum
Gomphonema
vulgare
Slide13One name for many taxa
Cyclophora
tenuis
Cyclophora
Castracane
1878
Cyclophora
Cyclophora
Hübner
1822
Cyclophora
porata
.
Contextual data
Diatom
Chloroplast
Frustule
Benthic
Marine
Disambiguate by
authority, species, contextual data
Contextual data
Food
Moth
Wings
Exoskeleton
Caterpillar
Slide14Global Names Architecture
Provider Services
DATA
AND SERVICE CONSUMERS
DATA
AND SERVICE
PROVIDERS
EXPERTS
Consumer Services
GNA
Slide15Names-based cyberinfrastructureManaging names to manage biodiversity dataAll names (scientific vernacular surrogate)For all organismsMany names for one species reconciledOne name for many species disambiguatedGlobal Names Architecture a virtual layer, using names services to link together distributed dataGlobalnames.orgMicro*scope (microscope.mbl.edu) and Encyclopedia of Life (eol.org)
Slide16Legacy DataNarrative tradition in biologyToo much for a humanCan we get a machine to do the work?NLP!!!
Slide17Legacy DataUse NLP/machine learning to extract names and charactersHong Cui
Slide18Legacy DataSpirogyra:chloroplasts:present
Slide19Legacy DataSpirogyra:chloroplasts:present:attribution
Slide20Coffee Ontology
coffee
is a
drink
Slide21Existing
Ontology
Slide22Semantic Web
Slide23Data Discovery and Aggregation
Slide24Future Data
Triple Store
Slide25The New WorkforceInformatics/computing trainingModified workflowsImportance of data management and preservation
Slide26In SummaryBig New Biology is coming, taxonomy can benefit from being a part of itExisting data can be made machine-readable using information extraction algorithmsExisting workflows can be modified to capture data close to the sourceData can be shared using the semantic web
Slide27AcknowledgmentsDima MozzherinDavid ShorthouseSayeed ChoudhuryPete DeVries