/
1 Digital Stewardship and Higher Education IT: 1 Digital Stewardship and Higher Education IT:

1 Digital Stewardship and Higher Education IT: - PowerPoint Presentation

firingbarrels
firingbarrels . @firingbarrels
Follow
345 views
Uploaded On 2020-06-16

1 Digital Stewardship and Higher Education IT: - PPT Presentation

Lessons from the National Agenda Prepared for NERCOMP Annual Conference March 2014 Presented by Micah Altman lt esciencemitedu gt Director of Research MIT Libraries NonResident Senior Fellow Brookings Institution ID: 778287

stewardship digital agenda national digital stewardship national agenda higher education lessons preservation information content ndsa curation storage long cost

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "1 Digital Stewardship and Higher Educati..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

Slide2

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Prepared for

NERCOMP Annual ConferenceMarch 2014

Presented by:

Micah Altman,

<

escience@mit.edu

>

Director of Research, MIT Libraries

Non-Resident Senior Fellow, Brookings Institution

Slide3

DISCLAIMER

These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaboratorsSecondary disclaimer

: “It’s tough to make predictions, especially about the future!”-- Attributed to Woody Allen, Yogi Berra, Niels Bohr,

Vint

Cerf, Winston Churchill, Confucius,

Disreali

[sic], Freeman Dyson, Cecil B.

Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob

Fourer

, Sam Goldwyn, Allan

Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. 

Capturing Contributor Roles in Scholarly Publications

Slide4

Preview

Who are the NDSA?

Why develop an agenda for digital stewardship?

What should national stewardship priorities be?

… research& foundations of stewardship

… digital content

… technical infrastructure

… organizational roles

Lessons for Higher Ed IT

4

Slide5

Collaborators & Co-Conspirators

The 160+ institutional members of NDSA, and the 10000+ hours contributed by their representatives to NDSA working groups, meetings and reportsNational Agenda Authors:

Micah Altman, Jefferson Bailey, Karen Cariani, Jim Corridan, Jonathan Crabtree, Blaine Dessy, Michelle Gallinger, Andrea Goethals, Abigail Grotke, Cathy Hartman, Butch Lazorchak, Jane

Mandelbaum

, Carol Minton Morris, Trevor Owens, Meg Phillips, John Spencer, Helen

Tibbo

, Tyler Walters, Kate Wittenberg, Kate

Zwaard

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

5

Slide6

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Who are the NDSA?

6

Slide7

About the NDSA

Founded in 2010, the National Digital Stewardship Alliance (NDSA) is a consortium of institutions that are committed to the long-term preservation of digital information.Our mission is to establish, maintain, and advance the capacity to preserve our nation's digital resources for the benefit of present and future

generations.NDSA member institutions represent all sectors, and include universities, consortia, professional associations, commercial enterprises, and government agencies at the federal, state, and local levels. The Library of Congress provides organizational support and substantive collaboration as Secretariat. Based on collaborative community effort

-- there

are no fees for NDSA

membership. Each member institution commits to to

NDSA principles, and contributes efforts to working groups

, reports, surveys, meetings

and other NDSA

initiatives.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

7

Slide8

NDSA Initiatives

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 8

Working Groups

Recent Outputs

Extending Knowledge

Preservation

Storage

Survey

Web Harvesting

Survey

Preservation Staffing Survey

Geospatial Selection & Appraisal report

Content case studiesNDSA Interview SeriesTools for PracticeLevels of Preservation

Digital

Preservation in a

Box

Digital

Preservation on

Wikipedia

Dissemination

National

agenda for digital stewardship

NDSA

Innovation

Awards

NDSA

Social Media

Slide9

NDSA Member Organizations165 Member Organizations

From all sectorsCommitted to digital stewardshipDigital Stewardship and Higher Education IT: Lessons from the National Agenda

9

digitalpreservation.gov

/ndsa/

memberslist.html

Slide10

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Why develop an agenda for digital stewardship?

10

Slide11

Why a national agenda for digital stewardship?

Effective digital stewardship is vital for:maintaining authentic public recordsgrowing a reliable scientific evidence base

providing durable access to our cultural heritageKnowledge of ongoing research, practice, and organizational collaborations is distributed widely across disciplines, sectors, and communities of practiceDigital Stewardship and Higher Education IT: Lessons from the National Agenda

11

Slide12

How was this accomplished it?

Contributed community effort

Development: contributions from the (now 150+) institutional members through working group participation, workshop discussion, commentaryWriting: LC Staff, chairs of NDSA working groups, coordination committeeReviewing: expert reviewers in the preservation community

Integrating diverse perspectives from multiple disciplines & sectors

The persistence, organization, and commitment of the

Library of Congress in its role as Secretariat

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

12

Slide13

Why Now - Climate

Strong trends towards: More production of digital content More publishing, filtering and access

More learners and collaborators More attention to public informationDigital Stewardship and Higher Education IT: Lessons from the National Agenda

13

Slide14

Trends in Higher Education Technology willIncrease Need for Information

StewardshipAdoption Trends

Growing Ubiquity of Social Media Integration of Online, Hybrid, and Collaborative Learning Rise of Data-Driven Learning and AssessmentShift

to

Students as Creators

Evolution

of Online Learning

Significant

Challenges

Low Digital Fluency of Faculty Scaling Teaching Innovations Important Developments

Learning

Analytics

3D Printing Quantified SelfDigital Stewardship and Higher Education IT: Lessons from the National Agenda

14

more information, in new

f

orms, created by more people

need to manage, understand, and retain information for teaching, research, and evaluation

Requires

curation

at scale

Slide15

Climate vs Weather

Climate is what you should expect -- weather is what you get. Climate for reproducibility and data management seems favorable… prepare for shifts in the weather.

Maximizing the Impact of Research through Research Data Management15

Slide16

What Was Accomplished?

The National Agenda for Digital Stewardship identifies high-impact opportunities to advance:

the state of the artthe state of practice

the state of collaboration

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

16

Slide17

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Foundations of Content Stewardship

— Framework & Research

17

Slide18

What is Content Stewardship?

Stewardship involves taking broad responsibility for preservation and curationThe goal of

preservation is ensuring meaningful long-term accessExample:If you have 1000 files (bitstreams), and you’d like to have 99.99% chance of accessing them in 20 years. How do you store them?

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

18

Slide19

Why not store everything with Amazon?

Why not put everything in Amazon?Amazon claims reliability of 99.999999999%(Better odds than winning Powerball ®, being struck by lightning, and finding alien life… combined)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 19

Slide20

What’s left out of the Eleven Nines?

What are the units? - Collection? Object? Bit?How was the failure rate calculated? (It’s theoretical)MBTF + Independence * enough replicas = lots of ninesBut.. No details for estimate provided; No historical reliability statistics provided; No service reliability auditing provided

What is the empirical evidence for MBTF?Storage manufacture hardware MTBF (mean time between failures) is inaccurate…Failures across hardware replicas are not independentWhat threats are assumed away? software failure

(

e.g. a bug in the AWS software for its control

backplane)

legal

threats (leading to account lock-out — such as this, deletion, or content removal)

;

institutional

threats (such as a change in Amazon’s business model)Process threats (someone hits the delete button by mistake; forgets to pay the bill; or AWS rejects the payment)Do SLA’s or audits back up “design” reliability claims?No claim to reliability in SLA’s (or uptime, availability, response time…)

Can’t even prove AWS has the content without taking it out!

Sole recovery for breach is limited to

r

efund of fees for periods the service was unavailableNo right to inspect Amazon logs, assistance with forensics, etc.Digital Stewardship and Higher Education IT: Lessons from the National Agenda 20

Slide21

And How Much Does it Really Cost?Glacier storage is relatively cheap

Getting your data back is not – if you want it fastCreates lock-in and gotcha’sDigital Stewardship and Higher Education IT: Lessons from the National Agenda

21

Slide22

ObservationsDigital preservation does not equal “backup”

Ensuring long-term access requires ongoing evaluation and management of a broad spectrum of risks & costsWithout attention the digital evidence base will erode

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 22

Slide23

The

Problem - Restated

Keeping risk of object loss fixed -- what choices minimize $?

“Dual problem”

Keeping $ fixed, what choices minimize risk?

Extension

For specific cost functions for loss of object:

Loss(

object_i

), of all lost objects

What choices minimize:

Total cost= preservation cost+ sum(E(Loss))

risk

cost

Are we there yet?

Slide24

Insider &

External

Attacks

What are some threats?

Physical & Hardware

Software

Curatorial Error

Organizational

Failure

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

24

Slide25

Threat Modeling

Slide26

Methods for Mitigating

Bit-Level Risk

Physical:

Media,

Hardware,

Environment

Number of copies

Diversification of copies

Formats

File

Transforms:

compression,

encoding, encryption

Fixity

Repair

Local

Storage

File

Systems:

transforms,

deduplication, redundancy

Replication

Verification

Audit

Slide27

Observations

Blind replication is rarely a rational long-run strategy – even with lots of copies.Without verification/audit and

repair strategies long-term risk often remains highThere are multiple methods to mitigating threats to access – use these to guide diversificationThreat / lifecycle modeling order to make an rational choiceBetter practices, models, and evidence are neededDigital Stewardship and Higher Education IT: Lessons from the National Agenda

27

Slide28

Research Priorities

Applied Research for Cost Modeling and Audit ModelingValue of informationUnderstanding Information Equivalence & Significance

Policy Research on Trust FrameworksPreservation at ScaleThe Evidence Base for Digital PreservationDigital Stewardship and Higher Education IT: Lessons from the National Agenda

28

Slide29

What Else do We Need To Know?

What is the expected future value of a specified collection of digital content? What content is already being effectively stewarded by other organizations?

How much is the expected future cost of preserving that content?How often do different threats to information manifeststorage hardware or media failuressoftware errors cause information loss

stored

information becomes inaccessible because of obsolete formats, or loss of other contextual

knowledge

that

human error or maliciousness causes loss content in an information

system

What

is the reliability of current digital preservation networks and services?How successful are other proposed strategies for replication, monitoring, certification, and auditing at preventing loss due to these threats?

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

29

Slide30

The Limits of Case Studies

Most current evidence for digital preservation practices and outcomes are based on local case studies and convenience samplesCase studies are useful for:existence proofs raising awareness of

problems process tracinghypothesis generation, Case studies are not enough toadvance our scientific knowledgecreate robust predictive

models

test

causal

hypotheses

strongly guide decision making.

Systematic Evidence

is needed both to support

general selection of digital preservation practices and methodapplications of selected digital preservation methods in a specific operational context

.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

30

Slide31

How will we learn?Apply existing research methodologies from other fields

-- especially fields involving observation research on humans and human systemsSome useful methodologies:probability-based surveys

(e.g. of information management practice and outcomes) replicable simulation experiments tied to theoretically grounded models of information management and risk; creation of testbeds and test-corpuses which can be used to systematically compare new practices, tools, and methods;

field

experiments, in which

randomized

interventions are applied and evaluated in real operational environments.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

31

Slide32

ObservationsDeveloping better practices will require going beyond case studies – to formal modeling, computer simulation, statistical analysis, experiments

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

32

Slide33

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

National priorities for…

Digital Content

33

Slide34

Selected Digital Content Areas that Challenge Curation

Web and Social Media Electronic RecordsMoving Image and Recorded Sound Research Data“Big” Data

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 34

Slide35

Goals of content curation

Curation involves selection

of content for retention, and management for useSelection requires predicting future value, in order to build an information portfolio that increases in valueManagement requires capturing and maintaining tacit information that ensures fitness for use: Content size, uncertain value, rapid change, unstable form, and external context are core challenges to curation

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

35

Slide36

Observations:

The tacit information needed to understand formats is lost over time. Format migration plans

are needed to mitigate risk.Information objects are rarely self-documenting, ensuring fitness for use: requires metadata, provenance, “documentation”, rights, authenticity, To select content for long-term access, we need to develop theoretically grounded and empirically tested models of information valuation and portfolios.Cost-models for digital stewardship exist, but they are most accurate for collections of small, static, digital objects in stable

formats. Generally, a few things are clear:

- Raw storage is rarely limiting cost factor

- Management of objects is cheapest and most effective if tacit information is captured early in the lifecycle

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

36

Slide37

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

National priorities for…

Technical Infrastructure

37

Slide38

2014 Technical Infrastructure Priorities

Interoperability and Portability in Storage ArchitecturesIntegration of Digital Forensics ToolsEnsuring Content Integrity

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 38

Slide39

Interoperability and Portability in Storage Architectures

As stewardship organizations manage increasingly large and complex data sets, the need for interoperability at various levels within the technical hardware and software stacks that make digital preservation becomes increasingly important. Interoperability of storage devices, hardware, data tape, and file systems software and would help alleviate bottlenecks in the interrelationship between distinct functions in workflows.

Need for establishing and promoting technical means by which lower levels of the technology stack can directly integrate without requiring extensive computation and processing at higher levels. Digital Stewardship and Higher Education IT: Lessons from the National Agenda

39

Slide40

Integration of Digital Forensics Tools

Digital Forensics tools are essential for working across the range of heterogeneous kinds of digital materials coming under stewardship Projects like BitCurator are pulling together the suite of tools to do this work and developing processes and workflows.

We are now at the point of implementation, it’s time for organizations to start implementing and sharing information about their workThe result of this work, will be large sets of heterogeneous digital files which will then push for the development of tools to work with these kinds of data at scale. Digital Stewardship and Higher Education IT: Lessons from the National Agenda

40

Slide41

Ensuring Content Integrity

Digital preservation is possible through a chain of migration of current hardware and software systems to yet-to-be-established future infrastructures.Maintaining file fixity is a minimum requirements.

Beyond file fixity there is a need to ensure that the semantics of the data and the quality of representation remain unchanged when the object is represented in different forms.Identifying the significant semantic properties of the digital object, and algorithms to create semantic fingerprints can ensure that meaning is preserved over time.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

41

Slide42

Observations:

Interoperability and portability across local and cloud storage architecture remains a significant issue – beware economic and technical lock-in

Curation of objects acquired later in the information lifecycle often require digital forensics – invest in tools and expertiseEnsuring integrity of content over time requires assessing fixity at both a file and semantic level Digital Stewardship and Higher Education IT: Lessons from the National Agenda

42

Slide43

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

National priorities for…

Organizational Development

43

Slide44

State of the curation practice:

Trusted Digital RepositoriesAn organization with a mission and to provide reliable, long-term access to managed digital resources to its designated community; coupled with sufficient evidence of practices to ensure the success of this mission.

Formalized in:OAIS Reference Model (standardized in ISO 14721:2012)Trustworthy

Repositories

Audit

& Certification

(TRAC)

(standardized in ISO 16363:2012)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

44

Slide45

National Priorities for Organizational Roles, Policies, and Practices

Identifies need to increase cross‐organizational cooperation to increase the impact and leverage investments made by individual institutions.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 45

Slide46

Auditing Distributed Digital Preservation Networks

Potential Nexuses for Preservation Failure

TechnicalMedia failure: storage conditions, media characteristicsFormat obsolescence

Preservation infrastructure software failure

Storage infrastructure software failure

Storage infrastructure hardware failure

External Threats to Institutions

Third party attacks

Institutional funding

Change in legal regimes

Quis

custodiet

ipsos

custodes

?

Unintentional curatorial modification

Loss of institutional knowledge & skills

Intentional curatorial

de-accessioning

Change in institutional mission

Source: Reich & Rosenthal 2005

46

Slide47

Provision networked preservation services

– network of preservation service providers with specialized services rather than every organization performing all aspects of digital preservation

-- A number of core risks are institutionalCollaborate on shepherding and promotion of standards– digital preservation community representation on the relevant standards bodies

rather than each

organization needing to participate in every

body

Share

digital preservation training and

staffing resources

Priorities for Organizational Collaboration

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

47

Slide48

Observations

Trustworthy repository standards provide good abstract models of a single institutions curatorial responsibilities, and an inventory of accepted practices Many threats to content require multi-institutional stewardship

Certification of trustworthiness and evaluation of impact of accepted practices is still in early stagesBoth intra- and inter- institutional collaboration is needed to prevision preservation services, set standards, establish and evaluate trustworthiness

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

48

Slide49

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

What’s next?

49

Slide50

A National Stewardship Agenda for 2015 and Beyond

Drafts and update process starts this winterCommunity review process late spring

An update will be presented in July at Digital Preservation 2014Digital Stewardship and Higher Education IT: Lessons from the National Agenda

50

Slide51

Moving Digital Stewardship Forward

NDSA has a commitment to: Facilitating broad collaboration

Promoting dissemination and engagementRegular updates and revisions of the National Agenda and core NDSA surveys

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

51

Slide52

Want more information?Contact NDSA for…

Briefings, webinars, and consultations on the Agenda or other NDSA work Assistance in gathering comments on National policies and programsAssistance in recruiting experts for review

and discussion panels; grant reviewReferrals to content stewards in specific areasDigital Stewardship and Higher Education IT: Lessons from the National Agenda

52

Slide53

Observation: Principles

The core of digital stewardship is taking broad responsibility for preservation and curationThe goal of preservation is meaningful long-term access

The principle activities of curation are selection and management for use

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

53

Slide54

Observations: Planning

Blind replication is rarely a rational long-run strategy – even with lots of copies.Without verification and

repair strategies long-term risk often remains highThere are multiple methods to mitigating threats to access – use these to guide diversificationThreat / lifecycle modeling order to make an rational choiceDeveloping better practices will require going beyond case studies – to formal modeling, computer simulation, statistical analysis, experiments

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

54

Slide55

Observations: Curation

The tacit information needed to understand formats is lost over time

. Format migration plans are needed to mitigate risk.Information objects are rarely self-documenting, ensuring fitness for use: requires metadata, provenance, “documentation”, rights, authenticity, To select content for long-term access, we need to develop theoretically grounded and empirically tested models of information valuation and portfolios.Cost

-models for digital stewardship exist, but they are most accurate for collections of small, static, digital objects in stable

formats. Generally, a few things are clear:

- Raw storage is rarely limiting cost factor

- Management of objects is cheapest and most effective if tacit information is captured early in the lifecycle

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

55

Slide56

Observations: Curation

The tacit information needed to understand formats is lost over time

. Format migration plans are needed to mitigate risk.Information objects are rarely self-documenting, ensuring fitness for use: requires metadata, provenance, “documentation”, rights, authenticity, To select content for long-term access, we need to develop theoretically grounded and empirically tested models of information valuation and portfolios.Cost

-models for digital stewardship exist, but they are most accurate for collections of small, static, digital objects in stable

formats. Generally, a few things are clear:

- Raw storage is rarely limiting cost factor

- Management of objects is cheapest and most effective if tacit information is captured early in the lifecycle

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

56

Slide57

Observations: Infrastructure

Interoperability and portability across local and cloud storage architecture remains a significant issue – beware economic and technical lock-in

Curation of objects acquired later in the information lifecycle often require digital forensics – invest in tools and expertiseEnsuring integrity of content over time requires assessing fixity at both a file and semantic level Digital Stewardship and Higher Education IT: Lessons from the National Agenda

57

Slide58

Observations: Organizations

Interoperability and portability across local and cloud storage architecture remains a significant issue – beware economic and technical lock-in

Curation of objects acquired later in the information lifecycle often require digital forensics – invest in tools and expertiseEnsuring integrity of content over time requires assessing fixity at both a file and semantic level Digital Stewardship and Higher Education IT: Lessons from the National Agenda

58

Slide59

Key Terms

Audit: An independent evaluation of records and activities to assess a system of controls Authenticity: information used to verify the truthfulness of assertions about content or ite

provenanceCuration: selection of content for retention, and management for fit useContent stewardship: broad responsibility for curation and preservation File

fixity:

information used to verify that a digital object has not been altered or corrupted

.

Provenance:

the chronology of the ownership,

custody, operations on, and/or

location of

an information object.Preservation: ensuring meaningful long-term accessTrusted Digital Repository: an organization with a mission and to

provide reliable, long-term access to managed digital resources to its designated

community; coupled with sufficient evidence of practices to ensure the success of this mission

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

59

Slide60

Bibliography

Bailey, Charles (2011). Digital Curation and Preservation Bibliography, <digital-scholarship.org/dcpb/>CCSDS (2012), Reference model for an open archival information system (OAIS),

<public.ccsds.org/publications/archive/650x0m2.pdf >Digital Curation Center, (2010-4): How to Guides: <dcc.ac.uk/resources/how-guides>

Curation

Reference Manual:

<

dcc.ac.uk/resources/curation-reference-manual

>

Giaretta

, David (2011). Advanced Digital Preservation. <

amazon.com/Advanced-Digital-Preservation-David-Giaretta> ISO, 2012, ISO 16363:2012: Audit and certification of trustworthy digital repositories. <

iso.org/iso/catalogue_detail.htm?csnumber=56510

>

Johnson

, L., Adams Becker, S., Estrada, V., Freeman, A. (2014). NMC Horizon Report: 2014 Higher Education Edition. Austin, Texas: The New Media Consortium.NDSA (2013), National Agenda for Digital Stewardship, <digitalpreservation.gov/ndsa/nationalagenda/>Rosenthal, David SH, Thomas S. Robertson, Tom Lipkis, Vicky Reich, and Seth Morabito. (2005) "Requirements for digital preservation systems: A bottom-up approach”. Dlib 11(

11)

<

dlib.org

/dlib/november05/rosenthal/

11rosenthal.html

>

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

60

Slide61

More Information

digitalpreservation.gov/ndsa/nationalagendandsa@loc.gov

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 61

Slide62

Questions?

E-mail:

escience@mit.edu Web: informatics.mit.edu

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

62