2014 Summer Internship Tetherless World Constellation Sumithra Gnanasekar Lakshmi Chenicheri Objective Visualize Minimum Information about a Marker Gene Sequence MiMarks compliant datasets ID: 437074
Download Presentation The PPT/PDF document "Data Science and Visualization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Data Science and Visualization
2014 Summer Internship - Tetherless World Constellation
Sumithra GnanasekarLakshmi ChenicheriSlide2
Objective
Visualize Minimum Information about a Marker Gene Sequence (MiMarks
) compliant datasetsA dark data exercise
*Slide3
MiMarks
A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences
Describes the environment from which the sample has been taken fromEnsures contextual data is collected and submitted
*Slide4
MiMarks ChecklistSlide5
Datasets
Two datasets from a bacterial diversity study from the Western English Channel
Focused on the seasonal structure of microbial communitiesDataset 1 was converted from Excel to CSV
Dataset 2 was converted from SRA to CSV
Data cleaning was undertaken to retrieve relevant fields
*Slide6
Tools for Visualization
R
Google charts integrated with RShiny R Studio
D3.js
D3.js was finally used due to its flexibility of use and range of visualizations available
*Slide7
Scatter Plot Dataset 1
Allows the user to filter fields
Drill and expandGroup based on fieldsHandy in determining correlations between variables
*Slide8
Analysis of Scatter Plot Dataset 1
Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables
Near linear correlation between nitrate and silicate, and nitrate and phosphate
*Slide9
Scatter Plot Dataset 2
Allows the user to filter fields
Drill and expand
*Slide10
Analysis of
Scatter Plot Dataset 2Linear trend seen in the scatter plots of:
Spots vs BasesNitrate vs Phosphate
Org_nitro vs Ord_carb
Temperature vs Density
*Slide11
Temporal Visualization
Allows one to filter values based on time and analyze its effect on other variables
*Slide12
DOI Visualization
Visually represents DOIs associated with data points
On clicking a bubble, the metadata for that DOI is fetched and displayed
*Slide13
Bubble Chart
Visually represents the environment data associated with each sample
Bubble size corresponds to organism count
*Slide14
RDF Conversion
The RDF conversion for MiMarks
compliant datasets involves two steps:Construct an Ontology or use an existing one
Convert the dataset into a triple instance using
CSV to RDF conversion
tools
csv2rdf4lod is an open source tool that can be used to easily convert the data in a
CSV file
into RDF encoded data
*Slide15
Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets
Some tools or visualizations that can be used to visualize the MiMarks
, VAMPS and CoDL datasets are as follows: Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way
Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart
Calendar based representation of values if there is
continuous data,
is another option
*Slide16
Links to Visualizations
Timeline crossfiltering visualization:
http://dco.tw.rpi.edu/viz/timeline/index.htmlDOI visualization: http://dco.tw.rpi.edu/viz/doiVis/index.html
Scatterplot visualization for Dataset 1:
http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html
Bubble chart Visualization:
http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html
Scatterplot visualization for Dataset 2:
http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html
*