Allen Lee Center for Behavior Institutions and the Environment https cbieasuedu Computational Social Science Wicked collective action problems Innovation gt Problems gt Innovation ID: 557343
Download Presentation The PPT/PDF document "R eproducible computational social scien..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Reproducible computational social science
Allen Lee
Center for Behavior, Institutions, and the
Environment
https://
cbie.asu.eduSlide2
Computational Social Science
Wicked collective action problems
Innovation -> Problems ->
Innovation
Mitigate transaction costs for information transferSlide3
Methodologies
Case study analysis
Controlled
experiments
Computational
modeling
Integrative data analysis / natural experimentsSlide4
Case Study Analysis
seshatdatabank.info
“
Our goal is to test rival social scientific hypotheses with historical and archaeological data … treating history as a predictive, analytic science.
”
Slide5
SES Library
D
escriptions of social ecological systems from around the world
Embeds mathematical models relating to specific cases where relevant to specific social-ecological dynamics via
xppautSlide6
Controlled Behavioral Experiments
Web-based experiments: Mechanical Turk,
oTree
,
nodeGame
,
vcwebDesktop experiments: zTree, CoNG
,
foraging
, irrigation
Diversity in software platforms is
valuable but also presents challenges
G
eneral
issues summarized in Experimental platforms for behavioral experiments on social-ecological systems (Janssen, Lee, Waring, 2014)Slide7Slide8Slide9Slide10
Computational Modeling
Extrapolate potential future scenarios for complex systems with many interacting actors
Computational modeling makes the processes underlying complex
phemonema
explicit, sharable, & reproducible
.Assumptions are laid bare, and alternative assumptions / parameterizations can be explored via sensitivity analysisGeorge Box – “All models are wrong, but some are useful
”Slide11
Multiple methods
Convergent validity
Multiple
methods complement each other, e.g., experiments, case study analysis, formal modeling (
Poteete
, et al., 2010)Slide12
Reproducibility
Victoria
Stodden
: how do we know inference is reliable, and why should we believe "Big Data" findings?
Need new standards for
conducting
“Data and Computational Science” and communicating results: sound workflows, sharing specifications, guides to good practiceDistinguishing between empirical, statistical, and computational reproducibilitySlide13
Replicable Research Workflows
Planning, organizing, and documenting your
research protocols
Developing code for data analysis or experiments
Running your analyses (generating visualizations) or conducting experiments (generating data)
Presenting
/ publishing findingsCleaning and documenting your code and data
Archival and documentation with contextual metadata that preserves
provenance
https://osf.io
is a good example of a full-stack systemSlide14
Archiving data
Vines T
H
et al
. (2013)
Current
Biology
DOI:10.1016
/j.cub.
2013.11.014Slide15
CoMSES Net
Computational Model Library for archiving model code, next generation in active development and planning stages
Provide suite of
microservices
for transparency and reproducibility in computational modelingSlide16
The
MIRACLE project:
Cyberinfrastructure
for visualizing model outputs
Dawn Parker, Michael Barton, Terence Dawson, Tatiana
Filatova
,
Xiongbing
Jin,
Allen Lee,
Ju
-Sung Lee,
Lorenzo
Milazzo
, Calvin Pritchard, J
. Gary
Polhill
, Kirsten Robinson, and Alexey
Voinov
Slide17
Background and motivation
Growing interest in analyzing highly detailed “big data”
Concurrent development of a new generation of simulation models including ABMS, which themselves produce “big data” as outputs
Need for tools and methods to analyze and compare these two data sourcesSlide18
Motivation
Sharing model code is great—but there are large barriers to entry to getting someone else’s model running (
Collberg
, et al 2015)
Sharing model output data can accomplish many of the goals of code sharing
It also lets other researcher explore new parameter spaces, or use different algorithms
Sharing of analysis algorithms may jump start development of complex-systems specific output analysis methodsSlide19
Objectives
Collect, extend, and share methods for statistical analysis and visualization of output from computational agent-based models of coupled human and natural systems (ABM-CHANS).
Provide
interactive visualization and analysis of archived model output data for ABM-CHANS modelsSlide20
Objectives, cont.
Conduct meta-analyses of our own projects, and invite the ABM-CHANS community to conduct further meta-analyses using the new tools.
Apply the statistical analysis algorithms we develop to empirical datasets to validate their applicability to large scale data from complex social systems.Slide21
Metadata for ABM output data
Goals
User needs to understand the data (what’s inside the files, what are the relationships between the files, project and owners…)
User needs to know how the data were generated (input data, analysis scripts, parameters, computer environment, workflows that chain several scripts…)
Two types of metadata
Metadata that describe the current state of data (data structure, file and data table content
Fine Grain Metadata)
Metadata that describe the provenance of data (how the data were generated
Coarse Grain Metadata
)Slide22
Capturing metadata
Goal: Automated metadata extraction with minimum user input
Fine grain metadata
Automatically extracting metadata from files (CSV columns,
ArcGIS
Shapefile metadata and attribute table columns, etc.)Coarse grain metadata Workflow describes how a script could produce a certain file type, while provenance describes how script A produces file B
Provenance can be automatically captured when user runs scripts and workflows using the MIRACLE system (computer environment, user name, application name, process, input files and parameters, output files.)
Workflows can be constructed based on captured provenanceSlide23
MIRACLE platform use cases
Within a research group:
Efficiently share and discuss new model results
Let group member explore new parameter spaces
Create accessible archives for publications
Across groups:
Provide prototypes to new researchers, or those looking for new analysis methodsProvide examples for teaching and labsFacilitate additional “after-market” research and publicationSlide24
MIRACLE project goals
Develop, share, test, and compare new statistical methods appropriate for analysis of complex systems data;
Improve communication and assessment within the modeling community;
Reduce barriers to entry for use of models;
Improve the ability of policy makers and stakeholders to understand and interact with model outputSlide25
CoMSES Net: Catalog
Track the state of archival
Provide collective-action tools to incentivize model sharingSlide26
CoMSES Net: CatalogSlide27
CoMSES Net Future Goals
Provide one-stop shop for computational modeling
containerized execution with bundled dependencies
integration with
Jupyter
and
CyVerse and modeling platforms like RePast, NetLogo
Reparameterizable
data analysis and exploration via the Miracle project
Bibliometric
tracking
Collective action tools to incentivize
prosocial
behavior among scientistsSlide28
From http://stanford.edu/~vcs/talks/UIUCDataSummit-Feb5-2016-STODDEN.pdfSlide29
Guide to good practice
Learn to use a source control system (
git
, mercurial, SVN)
Use it with discipline:
commit early, commit often
write meaningful log messagescreate tags and releases at important checkpoints during the research processList versioned dependencies (e.g., packrat, Maven/
gradle
, pip)Slide30
Guide to good practice
Plan for reproducibility
Use
version
control efficiently
Archive everything – data, code,
and contextual / provenance metadataPrefer open, durable, formats (plaintext, CSV,
open file formats)
Use cloud backups
Automate where possible
Learn the basics of “software carpentry”Slide31
Guides to good practiceSlide32
Computational Social ScienceSlide33
Comments / Questions?