/
R eproducible computational social science R eproducible computational social science

R eproducible computational social science - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
397 views
Uploaded On 2017-06-08

R eproducible computational social science - PPT Presentation

Allen Lee Center for Behavior Institutions and the Environment https cbieasuedu Computational Social Science Wicked collective action problems Innovation gt Problems gt Innovation ID: 557343

metadata data computational analysis data metadata analysis computational model experiments social modeling methods output systems provenance sharing code models

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "R eproducible computational social scien..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Reproducible computational social science

Allen Lee

Center for Behavior, Institutions, and the

Environment

https://

cbie.asu.eduSlide2

Computational Social Science

Wicked collective action problems

Innovation -> Problems ->

Innovation

Mitigate transaction costs for information transferSlide3

Methodologies

Case study analysis

Controlled

experiments

Computational

modeling

Integrative data analysis / natural experimentsSlide4

Case Study Analysis

seshatdatabank.info

Our goal is to test rival social scientific hypotheses with historical and archaeological data … treating history as a predictive, analytic science.

Slide5

SES Library

D

escriptions of social ecological systems from around the world

Embeds mathematical models relating to specific cases where relevant to specific social-ecological dynamics via

xppautSlide6

Controlled Behavioral Experiments

Web-based experiments: Mechanical Turk,

oTree

,

nodeGame

,

vcwebDesktop experiments: zTree, CoNG

,

foraging

, irrigation

Diversity in software platforms is

valuable but also presents challenges

G

eneral

issues summarized in Experimental platforms for behavioral experiments on social-ecological systems (Janssen, Lee, Waring, 2014)Slide7
Slide8
Slide9
Slide10

Computational Modeling

Extrapolate potential future scenarios for complex systems with many interacting actors

Computational modeling makes the processes underlying complex

phemonema

explicit, sharable, & reproducible

.Assumptions are laid bare, and alternative assumptions / parameterizations can be explored via sensitivity analysisGeorge Box – “All models are wrong, but some are useful

”Slide11

Multiple methods

Convergent validity

Multiple

methods complement each other, e.g., experiments, case study analysis, formal modeling (

Poteete

, et al., 2010)Slide12

Reproducibility

Victoria

Stodden

: how do we know inference is reliable, and why should we believe "Big Data" findings?

Need new standards for

conducting

“Data and Computational Science” and communicating results: sound workflows, sharing specifications, guides to good practiceDistinguishing between empirical, statistical, and computational reproducibilitySlide13

Replicable Research Workflows

Planning, organizing, and documenting your

research protocols

Developing code for data analysis or experiments

Running your analyses (generating visualizations) or conducting experiments (generating data)

Presenting

/ publishing findingsCleaning and documenting your code and data

Archival and documentation with contextual metadata that preserves

provenance

https://osf.io

is a good example of a full-stack systemSlide14

Archiving data

Vines T

H

et al

. (2013)

Current

Biology

DOI:10.1016

/j.cub.

2013.11.014Slide15

CoMSES Net

Computational Model Library for archiving model code, next generation in active development and planning stages

Provide suite of

microservices

for transparency and reproducibility in computational modelingSlide16

The

MIRACLE project:

Cyberinfrastructure

for visualizing model outputs

Dawn Parker, Michael Barton, Terence Dawson, Tatiana

Filatova

,

Xiongbing

Jin,

Allen Lee,

Ju

-Sung Lee,

Lorenzo

Milazzo

, Calvin Pritchard, J

. Gary

Polhill

, Kirsten Robinson, and Alexey

Voinov

Slide17

Background and motivation

Growing interest in analyzing highly detailed “big data”

Concurrent development of a new generation of simulation models including ABMS, which themselves produce “big data” as outputs

Need for tools and methods to analyze and compare these two data sourcesSlide18

Motivation

Sharing model code is great—but there are large barriers to entry to getting someone else’s model running (

Collberg

, et al 2015)

Sharing model output data can accomplish many of the goals of code sharing

It also lets other researcher explore new parameter spaces, or use different algorithms

Sharing of analysis algorithms may jump start development of complex-systems specific output analysis methodsSlide19

Objectives

Collect, extend, and share methods for statistical analysis and visualization of output from computational agent-based models of coupled human and natural systems (ABM-CHANS).

Provide

interactive visualization and analysis of archived model output data for ABM-CHANS modelsSlide20

Objectives, cont.

Conduct meta-analyses of our own projects, and invite the ABM-CHANS community to conduct further meta-analyses using the new tools.

Apply the statistical analysis algorithms we develop to empirical datasets to validate their applicability to large scale data from complex social systems.Slide21

Metadata for ABM output data

Goals

User needs to understand the data (what’s inside the files, what are the relationships between the files, project and owners…)

User needs to know how the data were generated (input data, analysis scripts, parameters, computer environment, workflows that chain several scripts…)

Two types of metadata

Metadata that describe the current state of data (data structure, file and data table content

Fine Grain Metadata)

Metadata that describe the provenance of data (how the data were generated

Coarse Grain Metadata

)Slide22

Capturing metadata

Goal: Automated metadata extraction with minimum user input

Fine grain metadata

Automatically extracting metadata from files (CSV columns,

ArcGIS

Shapefile metadata and attribute table columns, etc.)Coarse grain metadata Workflow describes how a script could produce a certain file type, while provenance describes how script A produces file B

Provenance can be automatically captured when user runs scripts and workflows using the MIRACLE system (computer environment, user name, application name, process, input files and parameters, output files.)

Workflows can be constructed based on captured provenanceSlide23

MIRACLE platform use cases

Within a research group:

Efficiently share and discuss new model results

Let group member explore new parameter spaces

Create accessible archives for publications

Across groups:

Provide prototypes to new researchers, or those looking for new analysis methodsProvide examples for teaching and labsFacilitate additional “after-market” research and publicationSlide24

MIRACLE project goals

Develop, share, test, and compare new statistical methods appropriate for analysis of complex systems data;

Improve communication and assessment within the modeling community;

Reduce barriers to entry for use of models;

Improve the ability of policy makers and stakeholders to understand and interact with model outputSlide25

CoMSES Net: Catalog

Track the state of archival

Provide collective-action tools to incentivize model sharingSlide26

CoMSES Net: CatalogSlide27

CoMSES Net Future Goals

Provide one-stop shop for computational modeling

containerized execution with bundled dependencies

integration with

Jupyter

and

CyVerse and modeling platforms like RePast, NetLogo

Reparameterizable

data analysis and exploration via the Miracle project

Bibliometric

tracking

Collective action tools to incentivize

prosocial

behavior among scientistsSlide28

From http://stanford.edu/~vcs/talks/UIUCDataSummit-Feb5-2016-STODDEN.pdfSlide29

Guide to good practice

Learn to use a source control system (

git

, mercurial, SVN)

Use it with discipline:

commit early, commit often

write meaningful log messagescreate tags and releases at important checkpoints during the research processList versioned dependencies (e.g., packrat, Maven/

gradle

, pip)Slide30

Guide to good practice

Plan for reproducibility

Use

version

control efficiently

Archive everything – data, code,

and contextual / provenance metadataPrefer open, durable, formats (plaintext, CSV,

open file formats)

Use cloud backups

Automate where possible

Learn the basics of “software carpentry”Slide31

Guides to good practiceSlide32

Computational Social ScienceSlide33

Comments / Questions?