What is DESE If your science and engineering is not data enabled youre not doing it right http drewconwaycom zia 2013326thedatascience venn diagram Big Data in Agriculture Today ID: 644830
Download Presentation The PPT/PDF document "Challenges facing data-enabled interdisc..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Challenges facing data-enabled interdisciplinary trainingSlide2
What is DESE?
If your science and engineering is not data enabled…
…you’re not doing it right.Slide3
http://
drewconway.com
/
zia
/2013/3/26/the-data-science-
venn
-diagramSlide4Slide5
Big Data in Agriculture (Today)
Syngenta Challenge: What seed varieties to plant?
Consider
expected weather conditions, knowledge about the soil at their farms, and performance studies of candidate soybean varieties from numerous sources. Slide6
Tomorrow
Problem becomes Gene (60K) X Environment(?) X Phenotype (thousands)
G X E = P
Visualize results so a farmer can understand, actionable intelligenceSlide7
VELOCITY
Try it!
http://nextml.org/chemistry
Up to thousands of respondents at one time
With each choice, back end must update embedding and deliver new query without noticeable delay for user
Serious data-handling and infrastructure design challengeSlide8
VarietySlide9
Leveling the playing field
Everyone comes in with different skills and tool sets.
How do we get each discipline “up to
speed”on
critical data-science skills…
…without requiring extensive additional coursework / time to degree?Slide10
One-way street problem
Students in computer science and engineering have data-science skills that apply broadly…
…but “apply skillset A to dataset B” != cross-disciplinary science.
What will engage interest from both computational and applied sides to promote true interactions?Slide11
Tower of Babel 1
Data science tools and standards vary considerably across disciplines and even across labs…
…yet for students to interact, a common set of tools is required.
How do such standards get set, and what should they be?Slide12
Tower of Babel 2
Each disciplines has its own jargon, which can be efficient within discipline but a barrier across disciplines.
Talks are hard to follow when (a) you can’t understand the terms
and
when (b) you have to stop to explain every third word.
How do we promote shared language for data science?Slide13
Data science infrastructure
Means of collecting, sharing, documenting data are proliferating.
Esp
with big data, issues arise:
Privacy, data sharing, large data sets, documentation of data, etc.
What are the right tools and infrastructure for managing data storage, documentation, access,
etc
?
Open science? Amazon? Wiki?
Github
? Slack? WordPress? Slide14
Plan
Small group breakout 1 (15min): Elaborate and rank order list of challenges
What are we missing? Add any additional challenges to Google Doc
What is most pressing? Rank order listed challenges (last 2 min)
Report back (15 min): Which challenge was your table’s top ranked and why?
Small group breakout 2 (15 min): Top n challenges assigned to tables—regroup at a table that interests you and discuss solutions
Note solutions on Google doc
Report back: What are your
table’s solutions?Slide15
These are questions for you!
Teaching basic data science to students who are not in quantitative areas.
What basic skills should scientists have to at least get started?
How should these skills be taught?
How do we promote true interdisciplinary collaboration, rather than partitioning tasks by discipline?
How do we balance the utility of jargon versus its alienating effects? How do we best promote good communication from data-science to discipline?
How do we manage
variety and promoting standards in software use and development.
How do we build
infrastructure for big data sharing, security, and documentation