at the National Agricultural Library Ursula Pieper IT Specialist Web Team Lead National Agricultural Library Agricultural Research Service United States Department of Agriculture Feb 17 2016 ID: 687661
Download Presentation The PPT/PDF document "Open Source Technologies" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Open Source Technologiesat the National Agricultural Library
Ursula Pieper
IT Specialist – Web Team Lead
National Agricultural
Library
Agricultural Research Service
United
States Department of Agriculture
Feb 17, 2016Slide2
2
Ursula Pieper
Ursula.Pieper@ars.usda.gov
301-504-7379
Acknowledgements:
Knowledge Services Division (Susan
McCarthy)
Monica
Poelchau
and Chris Childers (i5K Workspace)
Peter Arbuckle and Ezra Kahn (LCA Commons)
Jeffrey Campbell (LTAR)
Cynthia Parr (Ag Data Commons)
Information Services Division (Vernon Chapman)
Chuck
Schoppet
, NAL – (
Fedora Commons/
Islandora
)Slide3
Why Open Source? Benefit from community contributions and support
Security managed by community
Cost – Vendor lock-in
Can get customized locallyInteroperability
Re-use of skillsSlide4
PHPAvailable Expertise @ NAL
Drupal
Python
Grails
Java
Solr
Subject
Matter Experts
DjangoSlide5
Open Source based Projects(Selection)
Drupal
Python
Grails
Java
Solr
Django
Ag Data Commons
Scientific data catalog/repository
LCA Commons
Life Cycle Assessment repo and tools
PubAg
Catalog of agricultural scientific literature
I5K@NAL Workspace
Repository and workspace for Arthropod Genomes
Long Term Agro-ecosystem Research
Historical and future agricultural research
data
National Nutrient Database
Dr. Duke's Phytochemical and
Ethnobotanical
DatabasesSlide6
Open Source based Projects(Selection)
Drupal
Grails
Java Based
Ag Data Commons
http://
data.nal.usda.gov
i5K@NAL Workspace
http://i5k.nal.usda.gov
LCA Commons
http://
lcacommons.gov
PubAg
–
Data Management System
http://
pubag.nal.usda.gov
LCA Commons
http://
lcacommons.gov
National
Nutrient
Database
http
://
ndb.nal.usda.gov/ndb/
Phytochem
Database (Duke)
http
://phytochem.nal.usda.gov
Long-term Agro-ecosystem Research
http://ltar.nal.usda.govSlide7
Ag Data CommonsRequirementsPublic Access
to USDA
funded
research resultsSupport scientific research and evidence-based policyRe-use / re-analysisREE Action Plan: 2012 goalsJournal submission requirements
Mandates
America COMPETES Act
OSTP Memorandum
M-13-13, Open Data Policy
7Slide8
Ag Data CommonsA data catalog and repository based on the Drupal DKAN distribution
8Slide9
Summary of Required CapabilitiesComprehensive catalog of research resultsSupport for compliance reporting
Feeds
Data.gov
Enhanced dataset description for discovery and reuseFlexibility to support distributed data repositoriesSome disciplines already have repositories (e.g.
GenBank
)
Preservation of valuable data for long-term research
Supportive infrastructure for small agencies & labs
Link scholarly literature to its supporting data
Sustainable business model
9Slide10
Ag Data Commons Pilot Standard DKAN Features
Drupal 7 Installation Profile
Fulfills Project Open Data requirements
Dataset content type: POD 1.1 metadata schemaUnlimited number of resources can get uploadeddata.json
and
rdf
available
Additional Features
Social media links
Some data analysis tools (map, graph through recline library)
License display
10Slide11
Ag Data Commons Pilot What’s missing from DKAN?
DKAN’s main use case:
Government and organizational documents and datasets
General improvements
Large File upload, virus checking, file size display
Harvest Dashboard – for harvesting external POD datasets or data using other standards
Solr
search
Versioning
Data
curation
workflow
Scientific data require additional functionalityDOI assignments to datasets
Identity management for authors (orcid, etc.)Citation information (Primary citation, Methods citation, Related publications)Collection of additional metadata Long-term archiving capabilitiesFunding source reference
Embargo periodSpecialized taxonomies
11Slide12
Ag Data Commons Pilot Lessons learned
Keeping codebase compliant with standard DKAN
All configuration changes need to get committed to code
Codebase cannot clash with standard DKAN (which requires discipline when under time pressure)Significant pain merging NAL customizations with new DKAN releases
Local programming and systems support is necessary (our model)
Contributing back to DKAN and Drupal
Many of NAL’s customizations are adopted (and then maintained) by standard DKAN
General Drupal functionality:
Open data schema mapper
NALT Thesaurus
Taking advantage of customizations by other organiz
ations
Workflow, Stories, Visualizations
12Slide13
Ag Data Commons Pilothttps://data.nal.usda.gov
13Slide14
I5k Workspace@NALP
rovides tools
and resources
for scientists working on insect genomes.
Goal:
to
store insect
genome sequences
visualize
them,
enable their curation
make
them accessible to scientists. D
esigned specifically to handle and support genomic data.Website:
https://i5k.nal.usda.govSlide15
Key open-source software used by the i5k WorkspaceMain
portal/website
built with
Drupal/Tripal
Key web application for genome visualization and feature annotation
Jbrowse
/ApolloSlide16
Key open-source software used by the i5k WorkspaceSlide17
I5K Workspace @ NAL 1. Drupal + TripalChado
is a database schema for biological data
Tripal
allows Drupal to access data stored in the Chado database to populate web pages using Drupal functionality.
Community: small and academic
Slide18
Apollo is a web application that allows interactive, instantaneous editing of genome featuresIt is one of the key features of the i5k Workspace Community: small and academic
I5K Workspace @ NAL
2. ApolloSlide19
Registration module for Apollo applicationCompletely built in houseIntegrates notifications, account creation, and captcha
Visualizing custom data types: gene pages
Hierarchical view to display gene/transcript relationshipsSearch website (many thousands of nodes)
Apache
Solr
search
I5K Workspace @ NAL
Customized ResourcesSlide20
Customization requires one full-time developer at the NAL Because our customizations are forked off the main repository, any updates in the main branch
require more updates on our part
Customizations are too specific to our website to be able to
fully contribute back to/integrate with the main project
I5K Workspace @ NAL
Tripal
:
Lessons learnedSlide21
Instead of building customized resources, we contributed financially to the salary of the lead developer.
Improvements were not specific to the NAL’s goals, but were aimed at improving the
stability of the application
Even without a financial contribution,
bug reports and feature requests
from the entire user community are usually addressed very quickly due to an active development team, and a lead developer solely focused on this project.
I5K Workspace @ NAL
Apollo: Customized resourcesSlide22
How you interact with the development community of an OSS project depends on 1) the community itself 2) the specificity of the customization required
I5K Workspace @ NAL
Apollo:
Lessons learnedSlide23
I5K Workspace @ NAL
https://i5k.nal.usda.govSlide24
Life Cycle Assessment (LCA) CommonsLCA Commons is a repository that provides access to data and tools that support life cycle assessment of agricultural products.
We collect, curate, and provide access to data edited and formatted explicitly for use in LCA
The LCA Commons is
designed specifically to handle and support
unit process data for LCA.
Website:
www.lcacommons.gov
Slide25
LCA Commons Technology StackThree separate applications accessed through Drupal web content management system. Discovery
and
Editorial
ApplicationsGroovy/grails web implementation of domain specific openLCA
data model/modeling tool
LCA Collection
on Ag Data Commons
DKAN catalog and
datastoreSlide26
LCA Commons Technology StackSlide27
Discovery
Application
Editorial
Application
LCA Collection on Ag Data Commons
l
cacommons.gov
Application
Groovy/Grails
Framework
Solr
Index
openLCA
API
Activiti BPM
DKAN
Drupal
Technology
Drupal
Custom User Mgt.
openLCA
mySQL
openLCA
mySQL
DKAN
Datastore
DKAN Catalog
Database
LCA Commons Technology StackSlide28
LCA CommonsCustomized Resources
openLCA
datastore not designed explicitly for data management beyond what is necessary for desktop modeling.
has required developing custom “work-arounds” for data management
Activiti BPM has required significant customization for editorial workflow for LCA data
Will need to develop customized search capabilities that enable search across all three applications through DrupalSlide29
LCA CommonsLessons learned
Technology selection based on clearly defined functional requirements is critical
Using
openLCA for an application for which it was not exactly designed has required custom developmentAND innovation in the field
Spurred
openLCA
developer to build functionality that more closely meets our needs and pushed the domain forward in terms of data sharing and managementSlide30
LCA Commonshttp://lcacommons.govSlide31
PubAg Data Management SystemPubAg is the National Agricultural Library's search system for agricultural information
.
Content:
Full-text articles relevant to the agricultural sciences
Citations
to peer-reviewed journal articles
.
Repository (Data Management):
Fedora Commons/
Islandora
/Drupal
Public Interface:
Apache Solr and Java application layer Slide32
PubAg Data Management SystemSlide33
PubAg Data Management SystemFrom
Islandora
(https://
wiki.duraspace.org/)Slide34
PubAg Data Management SystemLessons learned
Customization needed to accommodate NAL Quality Assurance and workflow
Performance tuning is necessary and non-trivial for large repositoriesSlide35
PubAg Data Management SystemInternal Access OnlySlide36
Long-Term Agroecosystem Research NetworkHistorical and future agricultural research data
https://ltar.nal.usda.gov
Aims to ensure
sustained crop and livestock production and ecosystem services from
agroecosystems
.
Aims to
forecast and verify the effects of environmental trends, public policies, and emerging technologies.Slide37
Long-Term Agroecosystem Research NetworkHistorical and future agricultural research data
18 sites across country
Aim: 30 to 100+ years of dataSlide38
Long-Term Agroecosystem Research NetworkSlide39
Long-Term Agroecosystem Research NetworkLessons learned
The project is still in the initial stages
Lessons learned is: we still have a lot to learnSlide40
Long-Term Agroecosystem Research Networkhttp://ltar.nal.usda.govSlide41
ConclusionWhat have we learned?
Use of open source technology
A
llows us to test out technology in depth without a huge initial investmentGives us access to community development (avoids reinventing the wheel)
Is mainly useful when customized
?