David De Roure Overview Generation 1 Early adopters Generation 2 Embedding Generation 3 Radical sharing Music case study 10 years ago we saw a few early adopters of eScience technology now we see acceleration of research through broader adoption and sharing of tools techniques ID: 542348
Download Presentation The PPT/PDF document "Glimpses of future research practice: a ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Glimpses of future research practice: a musical study
David De RoureSlide2
Overview
Generation 1 – Early adopters
Generation 2 – Embedding
Generation 3 – Radical sharing
Music case studySlide3
10 years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artefacts, both for 'big science' and the 'long tail scientist'.
Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before?
This talk will draw on examples in music information retrieval and linked data from the NEMA and SALAMI projects, together with glimpses of research from the myExperiment social website, to suggest we are now moving into the next phase of research practice.Slide4
e-Science
e-Science
was defined by John Taylor (Director General of the UK Research Councils) as
global collaboration in key areas of science and the next generation of infrastructure that will enable it
e-Science
was the name of the destination
It became the name of the journey
When we arrive, the destination is just called
scienceSlide5
“e-research extends
e-Science and
cyberinfrstructure
to other disciplines, including the humanities and
social sciences.”
e-Research
http://mitpress.mit.edu/catalog/item/default.asp?tid=12185&ttype=2Slide6
2000 – 2005
Generation 1Slide7
...the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites
Tony Hey and Anne Trefethen
Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203Slide8
26/2/2007
| myExperiment | Slide
8
Jeremy FreySlide9
Workflows are the new rock and roll
Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources
The era of Service Oriented Applications
Repetitive and mundane boring stuff made easier
Carole Goble
E. Science
laboris
Slide10
Kepler
Triana
BPEL
Taverna
Trident
Meandre
GalaxySlide11
co-shaping
co-design
co-creation
co-constitution
co-evolution
co-construction
co-
co-realisationSlide12
humility
the quality of being modest, reverential, even politely submissive, and never being arrogant, contemptuous, rudeSlide13
Box of Chemists
My Chemistry ExperimentSlide14
Comb
e
ChemSlide15
empower
to equip or supply with an ability; enable
service
the performance of duties or the duties performed as or by a waiter or servantSlide16
Current practices of early
adoptors of tools.
Characterised by researchers using tools within their particular problem area, with some re-use of tools, data and methods within the discipline. Traditional publishing is supplemented by publication of some digital artefacts like workflows and links to data.
Science is accelerated and practice beginning to shift to
emphasise in
silico
work.
1
st
Generation SummarySlide17
2005 – 2010
Generation 2Slide18
Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis
in cattle
Paul meets Jo. Jo is investigating Whipworm in mouse.
Jo reuses one of Paul’s workflow
without change
.
Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite.
Previously a manual
two year study
by Jo had failed to do this.
Reuse, Recycling, RepurposingSlide19
“A biologist would rather share their toothbrush than their gene name”
Mike Ashburner and others
Professor in Dept of Genetics,
University of Cambridge, UK Slide20
Data
mining: my data’s mine and your data’s mine Slide21
mySpace for scientists!
Facebook for scientists!
Not Facebook for scientists!Slide22
Web 2
Open Repositories
Researchers
Social Network
The experiment that is
Developers
Social ScientistsSlide23
“Facebook for Scientists” ...but different to Facebook!
A repository of research methods
A community social network of people and things
A Social Virtual Research Environment
A probe into researcher behaviour
Open source (BSD) Ruby on Rails app
REST and SPARQL interfaces, Linked Data compliant
Inspiration for: BioCatalogue, MethodBox and SysmoDB
myExperiment currently has 3849 members, 234 groups, 1315 workflows, 349 files and 133 packsSlide24Slide25
data
methodSlide26
Results
Logs
Results
Metadata
Paper
Slides
Feeds into
produces
Included in
produces
Published in
produces
Included in
Included in
Included in
Published in
Workflow 16
Workflow 13
Common pathways
QTL
Paul’s Pack
Paul’s Research ObjectSlide27
Research Objects enable data-intensive research to be:
Replayable
– go back and see what happened
Repeatable – run the experiment again
Reproducible – independent expt to reproduce
Reusable
– use as part of new experiments
Repurposeable
– reuse the pieces in new
expt
Reliable
– robust under automation
Referenceable
– citable and traceable
The Six Rs of Research Object Behaviours
http://blog.openwetware.org/deroure/?p=56Slide28Slide29
“Scientists and developers journeying together”Slide30
Projects delivering now.
Some institutional embedding.
Key characteristic is re-use - of the increasing pool of tools, data and methods across areas/disciplines. Contain some freestanding, recombinant, reproducible research objects.
New scientific practices are established and opportunities arise for completely new scientific investigations.
Some expert curation.
2
nd
Generation SummarySlide31
2010 – 2015
Generation 3Slide32
4
th
Paradigm
The Fourth Paradigm: Data-Intensive Scientific Discovery
Presenting the first
broad look at the rapidly emerging field of data-intensive science
http://research.microsoft.com/en-us/collaboration/fourthparadigm/Slide33Slide34
BioEssays
, 26(1):99–105, January 2004Slide35
Francois BelleauSlide36
“…to discover proteins that interact with
transmembrane proteins, particularly those that can be related to neuro-degenerative diseases in which
amyloids play a significant role”Taverna provenance exposed as RDF
myExperiment RDF document for a protein discovery workflow
Mocked-up BioCatalogue document using myExperiment RDF data as example
Provisional RDF documents obtained from the
ConceptWiki
(conceptwiki.org) development server
An RDF document for an example protein, obtained from the RDF interface of the
UniProt
web site
A Bioinformatics Experiment
Scott Marshall Marco Roos
TavernaSlide37
LifeGuide
http://www.lifeguideonline.org/Slide38
http://www.galaxyzoo.org/Slide39
MethodBox
http://www.methodbox.org/Slide40
The solutions we'll be delivering in 5 years
Characterised
by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher. Routine use.
Key characteristic is radical sharing .
Research is significantly data driven - plundering the backlog of data, results and methods. Increasing automation and decision-support for the researcher - the VRE becomes assistive.
Curation
is autonomic and social.
3
rd
Generation SummarySlide41
Easy and low risk to start
Progress to advanced skills
For researchers
No obligation
Go as far as you want
Find a service & relax
Intellectual ramps
Malcolm AtkinsonSlide42
42
NRAO/AUI/NSF
telescopes for the naked mind
Datascopes
Malcolm Atkinson
From Signal to UnderstandingSlide43Slide44
2010 – 2011
and beyond
Music and Linked DataSlide45Slide46
http://www.openarchives.org/ore/terms/aggregates
http://eprints.ecs.soton.ac.uk/id/eprint/20817Slide47
EPrintsSlide48
It’s about enabling the join
Ben Fields, 6th October 2010Slide49
SALAMI: Structural Analysis of Large Amounts of Music Information
David De Roure
J. Stephen Downie Ichiro FujinagaSlide50
www.diggingintodata.orgSlide51
Digital Music Collections
Crowdsourced
ground truth
Community Software
Linked Data
Repositories
Supercomputer
23,000 hours
of
recorded
music
250,000 hours
NCSA
Supercomputer
time
Music Information
Retrieval CommunitySlide52
The SALAMI collaboration
DDeR
(e-Research South), J. Stephen Downie
(
Illinois) and Ichiro Fujinaga (McGill)
NCSA
donating 250,000 supercomputer hours
350,000
pieces of music (23,000 hours)
Internet Archive, DRAM, IMIRSEL, McGill
Feature analysis and structural analysis
Music Ontology by Yves Raimond (BBC
)
Musicologists from McGill and Southampton
Sharing of analysesSlide53
seasr.org/
meandre
MeandreSlide54
“Signal”
Digital Audio
“Ground Truth”
Community
It’s web-like!
Q. If and when should community-generated content be assimilated into managed repositories?
Structural
AnalysisSlide55
How country is my country?
www.nema.ecs.soton.ac.uk/countrycountrySlide56
Stephen Downie
Music and computational thinkingSlide57
Co-*
RampsDatascopesLinked data rocks
Computational thinkingIt’s about enabling the join
Take homes
Co-*
Ramps
Datascopes
Linked data rocks
Computational thinking
It’s about enabling the joinSlide58
david.deroure@oerc.ox.ac.uk
Visit
wiki.myexperiment.org
Thanks to: Jeremy Frey &
CombeChem; Carole Goble & myGrid; Iain Buchan, Sean Bechhofer and the myExperiment team; Doug
Kell
; Marco Roos; Stephen Downie, Kevin Page, Ben Fields and the NEMA/SALAMI team; Malcolm Atkinson