/
Data Science and Visualization Data Science and Visualization

Data Science and Visualization - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
440 views
Uploaded On 2016-08-07

Data Science and Visualization - PPT Presentation

2014 Summer Internship Tetherless World Constellation Sumithra Gnanasekar Lakshmi Chenicheri Objective Visualize Minimum Information about a Marker Gene Sequence MiMarks compliant datasets ID: 437074

data visualization mimarks dataset visualization data dataset mimarks http datasets dco rpi scatter rdf demo bubble plot index visualizations

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Science and Visualization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data Science and Visualization

2014 Summer Internship - Tetherless World Constellation

Sumithra GnanasekarLakshmi ChenicheriSlide2

Objective

Visualize Minimum Information about a Marker Gene Sequence (MiMarks

) compliant datasetsA dark data exercise

*Slide3

MiMarks

A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences

Describes the environment from which the sample has been taken fromEnsures contextual data is collected and submitted

*Slide4

MiMarks ChecklistSlide5

Datasets

Two datasets from a bacterial diversity study from the Western English Channel

Focused on the seasonal structure of microbial communitiesDataset 1 was converted from Excel to CSV

Dataset 2 was converted from SRA to CSV

Data cleaning was undertaken to retrieve relevant fields

*Slide6

Tools for Visualization

R

Google charts integrated with RShiny R Studio

D3.js

D3.js was finally used due to its flexibility of use and range of visualizations available

*Slide7

Scatter Plot Dataset 1

Allows the user to filter fields

Drill and expandGroup based on fieldsHandy in determining correlations between variables

*Slide8

Analysis of Scatter Plot Dataset 1

Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables

Near linear correlation between nitrate and silicate, and nitrate and phosphate

*Slide9

Scatter Plot Dataset 2

Allows the user to filter fields

Drill and expand

*Slide10

Analysis of

Scatter Plot Dataset 2Linear trend seen in the scatter plots of:

Spots vs BasesNitrate vs Phosphate

Org_nitro vs Ord_carb

Temperature vs Density

*Slide11

Temporal Visualization

Allows one to filter values based on time and analyze its effect on other variables

*Slide12

DOI Visualization

Visually represents DOIs associated with data points

On clicking a bubble, the metadata for that DOI is fetched and displayed

*Slide13

Bubble Chart

Visually represents the environment data associated with each sample

Bubble size corresponds to organism count

*Slide14

RDF Conversion

The RDF conversion for MiMarks

compliant datasets involves two steps:Construct an Ontology or use an existing one

Convert the dataset into a triple instance using

CSV to RDF conversion

tools

csv2rdf4lod is an open source tool that can be used to easily convert the data in a

CSV file

into RDF encoded data

*Slide15

Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets

Some tools or visualizations that can be used to visualize the MiMarks

, VAMPS and CoDL datasets are as follows: Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way

Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart

Calendar based representation of values if there is

continuous data,

is another option

*Slide16

Links to Visualizations

Timeline crossfiltering visualization:

http://dco.tw.rpi.edu/viz/timeline/index.htmlDOI visualization: http://dco.tw.rpi.edu/viz/doiVis/index.html

Scatterplot visualization for Dataset 1:

http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html

Bubble chart Visualization:

http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html

Scatterplot visualization for Dataset 2:

http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html

*