/
step in this process. NLPsystems generally rely on nomenclatures and o step in this process. NLPsystems generally rely on nomenclatures and o

step in this process. NLPsystems generally rely on nomenclatures and o - PDF document

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
384 views
Uploaded On 2017-02-25

step in this process. NLPsystems generally rely on nomenclatures and o - PPT Presentation

O TU and would furtherdatabase interoperability This paper presents work towards this goal We have automaticallycreated lexical resources from four model organism nomenclature systems mouse fly ID: 519486

TU and would furtherdatabase interoperability.

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "step in this process. NLPsystems general..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

O. TU step in this process. NLPsystems generally rely on nomenclatures and ontological specifications as resources fordetermining the names of the entities, assigning semantic categories that are consistent withthe corresponding ontology, and assignment of identifiers that map to and would furtherdatabase interoperability. This paper presents work towards this goal. We have automaticallycreated lexical resources from four model organism nomenclature systems (mouse, fly, worm,and yeast), and have studied performance concerning ambiguity,synonymy, and name variations are quite challenging. In this paper we focus mainly onambiguity. We determined that the number of ambiguous gene names within the individualnomenclatures, across the four nomenclatures, and with general English ranged from 0%-10.18%, 1.187%-20.30%, and 0%-2.49% respectively. When actually processing text, we retrieval (IR) methods to help extract, organize and facilitate access terms mentioned isextremely challenging because 1) new genes are continually being named or knownones 3) thenomenclature conventions differ for Another substantial an ambiguous target form.Similar results were words. Withlexicon MB (which had English words removed), MBE was used, about 149,000 additional MGI IDs (a 45%increase) were obtained when processing the same set of abstracts, bringing the totalto 477,000.Table 3. Occurrences of ambiguities ofMGI gene names within MGI and acrossspecies. These were obtained as a result ofprocessing a set broader terms within that the other two species (mouse and fly) have a higher percentage ruledoes not place too many restrictions on the format of the gene names, and thus moreambiguities tend to arise. For example, alp is a symbol for the abnormal leg patterngene and have the same functions) with thesame name. For example, MGI has curated 9,981 mouse/human ortholog was exacerbated. Overall,33% of the mouse genes that were extracted shared a name in common with othergenes, either within the that first identifies the applicable organism(s) for each article wouldhelp alleviate the ambiguity problem somewhat.The information presented in this research may not be complete in anumber of respects. We found that the worm data that we collected did not symbols are in place of the official worm symbols, they would causemore ambiguities than we determined. This also raises the issue of completeness.We obtained the names of synonyms (aliases) from the websites and consideredambiguity in gene names and in the number of ambiguous occurrences based on thatinformation. However, the names, which work will involve expanding our study of ambiguity to includemore organisms and also under study, but the rate is probably low as suggested by and substantial species, gene recall rate than the abstracts. Not surprisingly, in therandom sample for Group II, 58% of missed MGI as containing only one primarygene, and therefore that gene feedback to