2 43 Questions of biomedical experts Are there any DNMT3 proteins present in plants Yes Yes The plant DOMAINS REARRANGED METHYLTRANSFERASE2 DRM2 is a homolog of the mammalian de novo ID: 335829
Download Presentation The PPT/PDF document "Biomedical articles per year" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
Biomedical articles per year
2
/43Slide3
Questions of biomedical experts
“Are there any DNMT3 proteins present in plants
?”
“
Yes”
“
Yes. The plant DOMAINS REARRANGED METHYLTRANSFERASE2 (DRM2) is a homolog of the mammalian de novo methyltransferase DNMT3. DRM2 contains a novel arrangement of the motifs required for DNA methyltransferase catalytic activity.”
Yes/No question
Exact Answer
Ideal Answer
3
/43Slide4
Questions of biomedical experts
“What
is the methyl donor of DNA (cytosine-5)-
methyltransferases
?
”
“S-adenosyl-L-methionine”“S-adenosyl-L-methionine (AdoMet, SAM) is the methyl donor of DNA (cytosine-5)-
methyltransferases. DNA (cytosine-5)-methyltransferases catalyze the transfer of a methyl group from S-adenosyl-L-methionine to the C-5 position of cytosine residues in DNA.”
Factoid question
Exact Answer
Ideal Answer
4
/43Slide5
Questions of biomedical experts (III)
List question
“
In 1955, the production of
itaconic
acid was firstly described for
Ustilago maydis. Some Aspergillus species, like A. itaconicus and A. terreus, show the ability to synthesize this organic acid and A. terreus can secrete significant amounts to the media. Itaconic acid is mainly supplied by biotechnological processes with the fungus Aspergillus
terreus. Cloning of the cadA gene into the citric acid producing fungus A. niger showed that it is possible to produce itaconic acid also in a different host organism.”“Aspergillus terreus”, “Aspergillus niger”, “Ustilago maydis” Exact AnswerIdeal Answer“Which species may be used for the biotechnological production of itaconic acid?”
5/43Slide6
Questions of biomedical experts (III)
Summary question
“
Histone
methyltransferases
(HMTs) are responsible for the site-specific addition of covalent modifications on the histone tails, which serve as markers for the recruitment of chromatin organization complexes. There are two major types of HMTs: histone-lysine N-
Methyltransferases and histone-arginine N-methyltransferases. The former methylate specific lysine (K) residues such as 4, 9, 27, 36, and 79 on histone H3 and residue 20 on histone H4. The latter methylate arginine (R) residues such as 2, 8, 17, and 26 on histone H3 and residue 3 on histone H4. Depending on what residue is modified and the degree of methylation (mono-, di- and tri-methylation), lysine methylation of histones is linked to either transcriptionally active or silent chromatin.”
-Exact AnswerIdeal Answer“How do histone methyltransferases cause histone modification?”6/43Slide7
7
/43Slide8
Finding relevant snippets
8
/43Slide9
Not only texts: ontologies, linked data, …
9
/43Slide10
10
/43Slide11
Information from structured data
List question
http://www.disease-ontology.org/api/metadata/DOID:162 (cancer)
http://www.uniprot.org/uniprot/M3K8_RAT (TPL2 synonym)
Subject: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3003 (lung
cancer)
Predicate: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene
Object: http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/TPL2"Related RDF tripleRelated concepts“Which forms of cancer is the Tpl2 gene associated with?”11/43Slide12
12
/43Slide13
BioASQ Vision
Make sure this knowledge is used to the benefit of patientsNeed to make it
accessible
to biomedical experts
Search is not effective enoughPush research in automated answering of questions
A challenge for such systems can achieve a multiplying effect13/43Slide14
What
is BioASQ?
A
challenge
funded by the European Union
(FP7). Task a: Hierarchical text classification
Organizers distribute new unclassified PubMed articles.Participants assign MeSH terms to the articles.
Evaluation based on annotations of PubMed curators.Task b: IR, QA, summarization, …Organizers distribute English biomedical questions.Participants provide: relevant articles, snippets, concepts, triples, “exact” answers, “ideal” answers. Evaluation: both automatic (GMAP, MRR, ROUGE etc.) and manual (by biomedical experts).
14
/43Slide15
Task
b
The challenge
15
/43
Task
aSlide16
16
/43Slide17
Behind the scenes
17
/43Slide18
BioASQ Platform
18
/43Slide19
Datasets
Task b
data contain
gold articles, snippets, concepts, triples, “exact” and “ideal” answers
prepared by biomedical experts from around Europe.
Task a1st challenge2nd
challengeTraining10,876,00412,628,968Test8349071950Task b1st challenge2nd challengeTraining29310Test281500
19/43Slide20
Data sources
They
include both
text and structured
info.
PubMed abstracts, PubMed Central articles, MeSH.
Gene Ontology, UniProt, Jochem, Disease Ontology.20/43Slide21
Annotation: questions and queries
21
/43Slide22
Annotation: snippets
22
/43Slide23
Annotation: answers
23
/43Slide24
Assessment: relevance of material
24
/43Slide25
Assessment: information in answers
25
/43Slide26
BioASQ social network
26
/43Slide27
Oracle
27
/43Slide28
Oracle
28
/43Slide29
Two cycles
March 2013 June 2013 August 2013 September 2013
2013 Schedule
February 2014 March 2014 May 2014 September 2014
2014 ScheduleThe official challenge is over, but…Task a continues to run each week .An oracle for task b
will be available soon.
Oracles will remain available.
Third cycle
is being designed …
29
/43Slide30
Challenge participants so far
30
/43Slide31
Challenge participants in each cycle
31
/43Slide32
Evaluation measures
Task a:
Hierarchical text classification
Flat measures for multi-label classification:
Accuracy,
MiF, MaF, EBFHierarchical measures: LCA-F (new), HF
Task b: IR, QA, summarization, …Phase A: standard IR measures, mean precision, mean recall, mean F-measure, MAP (used for winners selection), G-MAPPhase B:‘Exact answers’ (based on type): accuracy (yes/no), strict/lenient accuracy,
MRR (factoid), mean F-measure (list)‘Ideal answers’: manual scores from the experts {Readability, Repetition, Information Precision and Recall}, plus ROUGE32/43Slide33
First year technology/results overview
Task 1a
Mainly
SVMs and
learning-to-rank.Mostly flat classification, ignoring class taxonomy.
Mediocre results by hierarchical methods.One of the systems outperformed NLM’s system.Task 1bPhase A (retrieve relevant documents, concepts, snippets, triples
): low performance (compared to baselines).Phase B (formulate ‘exact’ and ‘ideal’ answers): poor performance for ‘exact’ answers (except for yes/no questions); high performance for ‘ideal’ answers (paragraph-sized summaries), but starting with gold documents, snippets etc.Large scope for improvements, esp. in Task 1b.33/43Slide34
“Exact” answer results (batch 2/3)
34
/43Slide35
“Ideal” answer results (batch 2/3)
35
/43Slide36
Results –
task a – flat measures
36
/43Slide37
Results – task a – hierarchical
37
/43Slide38
First challenge prizes
38
/43Slide39
Sustainability
BioASQ OracleSoftware release and
installation
instructions
Benchmark datasets BioASQ social network
Involvement of the biomedical community in the processAttracting sponsors for prizes
Making the challenge viable, at very low cost, after the end of the project39/43Slide40
Project Consortium
National
Centre for Scientific Research “
Demokritos
” -
NSCR “D” (EL)
Transinsight GmbH – TI (D)Universite
Joseph Fourier- UJF (F)University Leipzig - ULEI (D)Universite Pierre et Marie Curie Paris 6 – UPMC (F)Athens University of Economics and Business – Research Centre – AUEB-RC (EL)40/43Slide41
Project Consortium
41
/43Slide42
Get in touch!
BioASQ workshop @CLEF
(Sheffield, Sept 14)
Visit
www.bioasq.org
Follow
@BioASQ
42/43Slide43
Useful Links
BioASQ Annotation & assessment tools:http
://at.bioasq.org
/
http://assess.bioasq.org/
https://github.com/AKSW/BioASQ-ATBioASQ social network: http://sn.bioasq.org
/https://github.com/AKSW/BioASQ-SNBioASQ platform: http://bioasq.lip6.fr/BioASQ Oracles: http
://bioasq.lip6.fr/oracle/43/43A. Kosmopoulos, I. Partalas, E. Gaussier, G. Paliouras, I. Androutsopoulos, Evaluation Measures for Hierarchical Classification: a unified view and novel approaches. Data Mining and Knowledge Discovery (To appear)