/
Open Source Technologies Open Source Technologies

Open Source Technologies - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
353 views
Uploaded On 2018-10-10

Open Source Technologies - PPT Presentation

at the National Agricultural Library Ursula Pieper IT Specialist Web Team Lead National Agricultural Library Agricultural Research Service United States Department of Agriculture Feb 17 2016 ID: 687661

nal data gov commons data nal commons gov lca i5k drupal research workspace usda management dkan agricultural open term

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Open Source Technologies" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Open Source Technologiesat the National Agricultural Library

Ursula Pieper

IT Specialist – Web Team Lead

National Agricultural

Library

Agricultural Research Service

United

States Department of Agriculture

Feb 17, 2016Slide2

2

Ursula Pieper

Ursula.Pieper@ars.usda.gov

301-504-7379

Acknowledgements:

Knowledge Services Division (Susan

McCarthy)

Monica

Poelchau

and Chris Childers (i5K Workspace)

Peter Arbuckle and Ezra Kahn (LCA Commons)

Jeffrey Campbell (LTAR)

Cynthia Parr (Ag Data Commons)

Information Services Division (Vernon Chapman)

Chuck

Schoppet

, NAL – (

Fedora Commons/

Islandora

)Slide3

Why Open Source? Benefit from community contributions and support

Security managed by community

Cost – Vendor lock-in

Can get customized locallyInteroperability

Re-use of skillsSlide4

PHPAvailable Expertise @ NAL

Drupal

Python

Grails

Java

Solr

Subject

Matter Experts

DjangoSlide5

Open Source based Projects(Selection)

Drupal

Python

Grails

Java

Solr

Django

Ag Data Commons

Scientific data catalog/repository

LCA Commons

Life Cycle Assessment repo and tools

PubAg

Catalog of agricultural scientific literature

I5K@NAL Workspace

Repository and workspace for Arthropod Genomes

Long Term Agro-ecosystem Research

Historical and future agricultural research

data

National Nutrient Database

Dr. Duke's Phytochemical and

Ethnobotanical

DatabasesSlide6

Open Source based Projects(Selection)

Drupal

Grails

Java Based

Ag Data Commons

http://

data.nal.usda.gov

i5K@NAL Workspace

http://i5k.nal.usda.gov

LCA Commons

http://

lcacommons.gov

PubAg

Data Management System

http://

pubag.nal.usda.gov

LCA Commons

http://

lcacommons.gov

National

Nutrient

Database

http

://

ndb.nal.usda.gov/ndb/

Phytochem

Database (Duke)

http

://phytochem.nal.usda.gov

Long-term Agro-ecosystem Research

http://ltar.nal.usda.govSlide7

Ag Data CommonsRequirementsPublic Access

to USDA

funded

research resultsSupport scientific research and evidence-based policyRe-use / re-analysisREE Action Plan: 2012 goalsJournal submission requirements

Mandates

America COMPETES Act

OSTP Memorandum

M-13-13, Open Data Policy

7Slide8

Ag Data CommonsA data catalog and repository based on the Drupal DKAN distribution

8Slide9

Summary of Required CapabilitiesComprehensive catalog of research resultsSupport for compliance reporting

Feeds

Data.gov

Enhanced dataset description for discovery and reuseFlexibility to support distributed data repositoriesSome disciplines already have repositories (e.g.

GenBank

)

Preservation of valuable data for long-term research

Supportive infrastructure for small agencies & labs

Link scholarly literature to its supporting data

Sustainable business model

9Slide10

Ag Data Commons Pilot Standard DKAN Features

Drupal 7 Installation Profile

Fulfills Project Open Data requirements

Dataset content type: POD 1.1 metadata schemaUnlimited number of resources can get uploadeddata.json

and

rdf

available

Additional Features

Social media links

Some data analysis tools (map, graph through recline library)

License display

10Slide11

Ag Data Commons Pilot What’s missing from DKAN?

DKAN’s main use case:

Government and organizational documents and datasets

General improvements

Large File upload, virus checking, file size display

Harvest Dashboard – for harvesting external POD datasets or data using other standards

Solr

search

Versioning

Data

curation

workflow

Scientific data require additional functionalityDOI assignments to datasets

Identity management for authors (orcid, etc.)Citation information (Primary citation, Methods citation, Related publications)Collection of additional metadata Long-term archiving capabilitiesFunding source reference

Embargo periodSpecialized taxonomies

11Slide12

Ag Data Commons Pilot Lessons learned

Keeping codebase compliant with standard DKAN

All configuration changes need to get committed to code

Codebase cannot clash with standard DKAN (which requires discipline when under time pressure)Significant pain merging NAL customizations with new DKAN releases

Local programming and systems support is necessary (our model)

Contributing back to DKAN and Drupal

Many of NAL’s customizations are adopted (and then maintained) by standard DKAN

General Drupal functionality:

Open data schema mapper

NALT Thesaurus

Taking advantage of customizations by other organiz

ations

Workflow, Stories, Visualizations

12Slide13

Ag Data Commons Pilothttps://data.nal.usda.gov

13Slide14

I5k Workspace@NALP

rovides tools

and resources

for scientists working on insect genomes.

Goal:

to

store insect

genome sequences

visualize

them,

enable their curation

make

them accessible to scientists. D

esigned specifically to handle and support genomic data.Website:

https://i5k.nal.usda.govSlide15

Key open-source software used by the i5k WorkspaceMain

portal/website

built with

Drupal/Tripal

Key web application for genome visualization and feature annotation

Jbrowse

/ApolloSlide16

Key open-source software used by the i5k WorkspaceSlide17

I5K Workspace @ NAL 1. Drupal + TripalChado

is a database schema for biological data

Tripal

allows Drupal to access data stored in the Chado database to populate web pages using Drupal functionality.

Community: small and academic

Slide18

Apollo is a web application that allows interactive, instantaneous editing of genome featuresIt is one of the key features of the i5k Workspace Community: small and academic

I5K Workspace @ NAL

2. ApolloSlide19

Registration module for Apollo applicationCompletely built in houseIntegrates notifications, account creation, and captcha

Visualizing custom data types: gene pages

Hierarchical view to display gene/transcript relationshipsSearch website (many thousands of nodes)

Apache

Solr

search

I5K Workspace @ NAL

Customized ResourcesSlide20

Customization requires one full-time developer at the NAL Because our customizations are forked off the main repository, any updates in the main branch

require more updates on our part

Customizations are too specific to our website to be able to

fully contribute back to/integrate with the main project

I5K Workspace @ NAL

Tripal

:

Lessons learnedSlide21

Instead of building customized resources, we contributed financially to the salary of the lead developer.

Improvements were not specific to the NAL’s goals, but were aimed at improving the

stability of the application

Even without a financial contribution,

bug reports and feature requests

from the entire user community are usually addressed very quickly due to an active development team, and a lead developer solely focused on this project.

I5K Workspace @ NAL

Apollo: Customized resourcesSlide22

How you interact with the development community of an OSS project depends on 1) the community itself 2) the specificity of the customization required

I5K Workspace @ NAL

Apollo:

Lessons learnedSlide23

I5K Workspace @ NAL

https://i5k.nal.usda.govSlide24

Life Cycle Assessment (LCA) CommonsLCA Commons is a repository that provides access to data and tools that support life cycle assessment of agricultural products.

We collect, curate, and provide access to data edited and formatted explicitly for use in LCA

The LCA Commons is

designed specifically to handle and support

unit process data for LCA.

Website:

www.lcacommons.gov

Slide25

LCA Commons Technology StackThree separate applications accessed through Drupal web content management system. Discovery

and

Editorial

ApplicationsGroovy/grails web implementation of domain specific openLCA

data model/modeling tool

LCA Collection

on Ag Data Commons

DKAN catalog and

datastoreSlide26

LCA Commons Technology StackSlide27

Discovery

Application

Editorial

Application

LCA Collection on Ag Data Commons

l

cacommons.gov

Application

Groovy/Grails

Framework

Solr

Index

openLCA

API

Activiti BPM

DKAN

Drupal

Technology

Drupal

Custom User Mgt.

openLCA

mySQL

openLCA

mySQL

DKAN

Datastore

DKAN Catalog

Database

LCA Commons Technology StackSlide28

LCA CommonsCustomized Resources

openLCA

datastore not designed explicitly for data management beyond what is necessary for desktop modeling.

has required developing custom “work-arounds” for data management

Activiti BPM has required significant customization for editorial workflow for LCA data

Will need to develop customized search capabilities that enable search across all three applications through DrupalSlide29

LCA CommonsLessons learned

Technology selection based on clearly defined functional requirements is critical

Using

openLCA for an application for which it was not exactly designed has required custom developmentAND innovation in the field

Spurred

openLCA

developer to build functionality that more closely meets our needs and pushed the domain forward in terms of data sharing and managementSlide30

LCA Commonshttp://lcacommons.govSlide31

PubAg Data Management SystemPubAg is the National Agricultural Library's search system for agricultural information

.

Content:

Full-text articles relevant to the agricultural sciences

Citations

to peer-reviewed journal articles

.

Repository (Data Management):

Fedora Commons/

Islandora

/Drupal

Public Interface:

Apache Solr and Java application layer Slide32

PubAg Data Management SystemSlide33

PubAg Data Management SystemFrom

Islandora

(https://

wiki.duraspace.org/)Slide34

PubAg Data Management SystemLessons learned

Customization needed to accommodate NAL Quality Assurance and workflow

Performance tuning is necessary and non-trivial for large repositoriesSlide35

PubAg Data Management SystemInternal Access OnlySlide36

Long-Term Agroecosystem Research NetworkHistorical and future agricultural research data

https://ltar.nal.usda.gov

Aims to ensure

sustained crop and livestock production and ecosystem services from

agroecosystems

.

Aims to

forecast and verify the effects of environmental trends, public policies, and emerging technologies.Slide37

Long-Term Agroecosystem Research NetworkHistorical and future agricultural research data

18 sites across country

Aim: 30 to 100+ years of dataSlide38

Long-Term Agroecosystem Research NetworkSlide39

Long-Term Agroecosystem Research NetworkLessons learned

The project is still in the initial stages

Lessons learned is: we still have a lot to learnSlide40

Long-Term Agroecosystem Research Networkhttp://ltar.nal.usda.govSlide41

ConclusionWhat have we learned?

Use of open source technology

A

llows us to test out technology in depth without a huge initial investmentGives us access to community development (avoids reinventing the wheel)

Is mainly useful when customized

?