/
The Insider’s Guide to Accessing NLM Data The Insider’s Guide to Accessing NLM Data

The Insider’s Guide to Accessing NLM Data - PowerPoint Presentation

elise
elise . @elise
Follow
67 views
Uploaded On 2023-09-24

The Insider’s Guide to Accessing NLM Data - PPT Presentation

EDirect for PubMed Part 3 Formatting Results and Unix Tools Kate Majewski National Library of Medicine National Institutes of Health US Department of Health and Human Services Remember our theme ID: 1020554

tab element lastname author element tab author lastname pmid sep medlinecitation pubmedarticle pattern initials pubmed block commandoutput molloy cowan

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Insider’s Guide to Accessing NLM D..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. The Insider’s Guide to Accessing NLM DataEDirect for PubMedPart 3: Formatting Results and Unix ToolsKate MajewskiNational Library of MedicineNational Institutes of HealthU.S. Department of Health and Human Services

2. Remember our theme…Get exactly the data you need…and only the data you need…in the format you need.2

3. EDirect for PubMed AgendaPart 1: Getting PubMed DataPart 2: Extracting Data from XMLPart 3: Formatting Results and Unix ToolsPart 4: xtract Conditional ArgumentsPart 5: Developing and Building Scripts3

4. Today’s AgendaQuick Recap of Part TwoGrouping elements with –blockCustomizing separators with –tab and –sepSaving to a fileReading from a file4

5. Recap of Part Twoxtract: pulls data from XML and arranges it in a table-pattern: defines rows for xtract-element: defines columns for xtract5

6. Recap of Part Two (cont'd)Identify XML elements by nameArticleTitleIdentify specific child elements with Parent/Child constructionMedlineCitation/PMIDIdentify attributes with "@"MedlineCitation@Status6

7. Questions from last class? Homework?7

8. -tab and -sep-tab changes the separator after each column-sep changes the separator between multiple values in the same columns8

9. -tab "\t" -sep "\t"924102982 1742-4658 Wu Doyle Barry Beauvais21171099 1097-4598 Wu Gussoni17150207 0012-1606 Yoon Molloy Wu Cowan Gussonixtract –pattern PubmedArticle –tab "\t" –sep "\t" \–element MedlineCitation/PMID ISSN LastNamextract CommandOutput

10. -tab "\t" -sep " "1024102982 1742-4658 Wu Doyle Barry Beauvais21171099 1097-4598 Wu Gussoni17150207 0012-1606 Yoon Molloy Wu Cowan Gussonixtract –pattern PubmedArticle –tab "\t" –sep " " \–element MedlineCitation/PMID ISSN LastNamextract CommandOutput

11. 24102982|1742-4658|Wu Doyle Barry Beauvais21171099|1097-4598|Wu Gussoni17150207|0012-1606|Yoon Molloy Wu Cowan Gussoni-tab "|" -sep " "11xtract –pattern PubmedArticle –tab "|" –sep " " \–element MedlineCitation/PMID ISSN LastNamextract CommandOutput

12. -tab "|" -sep ", "1224102982|1742-4658|Wu, Doyle, Barry, Beauvais21171099|1097-4598|Wu, Gussoni17150207|0012-1606|Yoon, Molloy, Wu, Cowan, Gussonixtract –pattern PubmedArticle –tab "|" –sep ", " \–element MedlineCitation/PMID ISSN LastNamextract CommandOutput

13. With -tab/-sep, order matters!13xtract –pattern PubmedArticle \–element MedlineCitation/PMID -tab "|" -element ISSN \-tab ":" –element Volume Issue24102982 1742-4658|280:2321171099 1097-4598|43:117150207 0012-1606|301:1xtract CommandOutput-tab/-sep only affect subsequent -elements

14. With -tab/-sep, order matters!14xtract –pattern PubmedArticle \–element MedlineCitation/PMID -tab "|" -element ISSN \-tab ":" –element Volume Issue24102982 1742-4658|280:2321171099 1097-4598|43:117150207 0012-1606|301:1xtract CommandOutputLater -tab/-sep overwrite earlier ones

15. Exercise 1Write an xtract command that:Has a new row for each PubMed recordHas columns for PMID, Journal Title Abbreviation, and Author-supplied KeywordsEach column should be separated by "|"Multiple keywords in the last column should be separated with commasYour output should look like this:s1526359634|Elife|Argonaute,RNA silencing,biochemistry[…]

16. Exercise 1 Solution16xtract -pattern PubmedArticle -tab "|" -sep "," \-element MedlineCitation/PMID ISOAbbreviation Keyword

17. Getting Author InformationWe want a list of all of the authors for each citation.One row per PubMed recordPMIDall of the authors’ last names and initials17

18. Authors: First DraftWe want a list of all of the authors for each citationTry:Doesn't work the way we expectShows all the last names, then all the initialsWe want to retain the relationship between last name and corresponding initials18xtract –pattern PubmedArticle \–element MedlineCitation/PMID LastName Initials

19. xtract-ing authorsXML input<PubmedArticle> <MedlineCitation> <PMID>98765432</PMID> <Author> <LastName>Wu</LastName> <Initials>MP</Initials> </Author> <Author> <LastName>Billings</LastName> <Initials>JS</Initials> </Author> <Author> <LastName>Melendez</LastName> <Initials>BJ</Initials> </Author> <Author> <LastName>Collins</LastName> <Initials>FS</Initials> </Author>[…]98765432 Wu Billings Melendez Collins MP JS BJ FSxtract outputxtract –pattern PubmedArticle \–element MedlineCitation/PMID LastName Initials19

20. -blockGroups multiple child elements of the same parent element20xtract –pattern PubmedArticle –element MedlineCitation/PMID \-block Author –element LastName Initials

21. How -block worksXML input<PubmedArticle> <MedlineCitation> <PMID>98765432</PMID> <Author> <LastName>Wu</LastName> <Initials>MP</Initials> </Author> <Author> <LastName>Billings</LastName> <Initials>JS</Initials> </Author> <Author> <LastName>Melendez</LastName> <Initials>BJ</Initials> </Author> <Author> <LastName>Collins</LastName> <Initials>FS</Initials> </Author>[…]xtract output98765432 Wu MP Billings JS Melendez BJ Collins FSxtract –pattern PubmedArticle –element MedlineCitation/PMID \-block Author –element LastName Initials21

22. This is good, but we can do betterEverything is separated by tabs22xtract –pattern PubmedArticle –element MedlineCitation/PMID \-block Author –element LastName Initials24102982 Wu MP Doyle JR Barry B Beauvais A21171099 Wu MP Gussoni E17150207 Yoon S Molloy MJ Wu MP Cowan DBxtract CommandOutput

23. What we know so far…2324102982|1742-4658|Wu, Doyle, Barry, Beauvais21171099|1097-4598|Wu, Gussoni17150207|0012-1606|Yoon, Molloy, Wu, Cowan, Gussonixtract –pattern PubmedArticle –tab "|" –sep ", " \–element MedlineCitation/PMID ISSN LastNamextract CommandOutput

24. Two elements in the same columnUse a comma to group multiple elements24xtract –pattern PubmedArticle –element MedlineCitation/PMID \-block Author –sep " " –element LastName,Initials24102982 Wu MP Doyle JR Barry B Beauvais A21171099 Wu MP Gussoni E17150207 Yoon S Molloy MJ Wu MP Cowan DB Gussoni Extract CommandOutput

25. How –block creates columns25xtract –pattern PubmedArticle –element MedlineCitation/PMID \-block Author –sep " " –element LastName,Initials24102982 Wu MP Doyle JR Barry B Beauvais A21171099 Wu MP Gussoni E17150207 Yoon S Molloy MJ Wu MP Cowan DB Gussoni Extract CommandOutput

26. "-block" resets -tab/-sep to default26xtract –pattern PubmedArticle –tab "|" \–element MedlineCitation/PMID \-block Author –sep " " –element LastName,Initials24102982|Wu MP Doyle JR Barry B Beauvais A21171099|Wu MP Gussoni E17150207|Yoon S Molloy MJ Wu MP Cowan DB Gussoni Extract CommandOutput

27. "-block" resets -tab/-sep to default27xtract –pattern PubmedArticle –tab "|" \–element MedlineCitation/PMID \-block Author –tab "|" –sep " " –element LastName,Initials24102982|Wu MP|Doyle JR|Barry B|Beauvais A21171099|Wu MP|Gussoni E17150207|Yoon S|Molloy MJ|Wu MP|Cowan DB|Gussoni Extract CommandOutput

28. Exercise 2Write an xtract command that:Has a new row for each PubMed recordHas a column for PMIDLists all of the MeSH headings, separated by "|"If a heading has subheadings attached, separate the heading and subheadings with "/"2824102982|Cell Fusion|Myoblasts/cytology/metabolism|Muscle Development/physiology

29. Exercise 2 Solution29xtract –pattern PubmedArticle -tab "|" \–element MedlineCitation/PMID -block MeshHeading \–tab "|" –sep "/" –element DescriptorName,QualifierName

30. Saving Results to a File">"Save in the format of your choiceExample:Check using 30efetch –db pubmed –id 24102982,21171099,17150207 \-format xml > testfile.txtls

31. But where is my file!?Try Cygwin users: try this:$ cygpath -w ~Mac users: look in your Users folder:Users/<your user name>/31pwd

32. Another way to find your filesFind the "edirect" folder on your computerSave a file with a distinctive name, then search for it.Example:32efetch –db pubmed –id 24102982,21171099,25359968,17150207 \–format uid > specialname.csv

33. Exercise 3: Retrieving XMLHow can I get the full XML of all articles about the relationship of Zika Virus to microcephaly in Brazil? Save your results to a file.33

34. Exercise 3 Solution34esearch –db pubmed \–query “zika virus microcephaly brazil” | \efetch -format xml > zika.xml

35. catShort for concatenateUsed to open files and display them on screenCan also combine/append files.35

36. Reading a search string from a file36esearch –db pubmed –query “$(cat searchstring.txt)”

37. Reading a list of PMIDs from a fileCould use a similar techniqueRequires input to be specially formattedIs there another way?37

38. Piping esearch to efetchPipes the PMIDs retrieved with esearch, and uses them as the -id argument for efetch.Also pipes the -db38esearch –db pubmed –query “asthenopia[mh] AND \ nursing[sh]” | efetch –format uid

39. EDirect and the History serveresearchDB and PMIDsefetch39

40. EDirect and the History server 40

41. EDirect and the History serveresearchWebEnv and Query KeyefetchDB and PMIDsHistoryserverDB and PMIDs41

42. EDirect and the History serverepostWebEnv and Query KeyefetchDB and PMIDsHistoryserverDB and PMIDs42

43. epostUploads a list of PMIDs to the history serverExample:43epost –db pubmed –id 24102982,21171099

44. An epost-efetch pipeline44cat specialname.csv | epost –db pubmed | efetch –format xml

45. Using the -input argument45epost –db pubmed –input specialname.csv | \efetch –format abstract

46. Coming next time…Limiting output using Conditional arguments46

47. In the meantime…Insider’s Guide onlinehttps://dataguide.nlm.nih.govSign up for "utilities-announce" mailing list!Questions?https://dataguide.nlm.nih.gov/contact47

48. Questions?48