/
Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C.

Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
349 views
Uploaded On 2018-10-29

Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. - PPT Presentation

Weigle Old Dominion University mkellymlnmweiglecsoduedu Web Science and Digital Libraries Research Group wsdlblogspotcom Web archives capture a lot but not everything Individuals interests may not be captured ID: 700879

archive 2013 partner lake 2013 archive lake partner city meeting warcreate archiving utah 2013salt web salt wayback november tools

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Archive What I See Now Mat Kelly, Michae..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Archive What I See Now

Mat Kelly, Michael L. Nelson, Michele C. WeigleOld Dominion University{mkelly,mln,mweigle}@cs.odu.eduWeb Science and Digital Libraries Research Groupws-dl.blogspot.comSlide2

Web archives capture a lot but not everythingIndividuals’ interests may not be captured

Timely capture is importantCapture capability must be enabled for allWhat’s the Problem?2November 12, 2013Salt Lake City, Utah2013 Archive-It Partner MeetingSlide3

November 12, 2013

Salt Lake City, Utah2013 Archive-It Partner MeetingUse Case: Capturing Breaking Stories3

Calls for seed

URIs

are reactionary

Not quick enough

for rapidly

evolving eventsSlide4

November 12, 2013

Salt Lake City, Utah2013 Archive-It Partner MeetingUse Case: Capturing Breaking Stories4

Intermediate

mementos missed

The

s

tory is

incompleteSlide5

November 12, 2013

Salt Lake City, Utah2013 Archive-It Partner Meeting5

Use Case: Capturing Breaking StoriesSlide6

Users take ad hoc approaches

Screenshots of PagesWhy? Tools are hard.Build more accessible toolsAppeal to standards (e.g., WARC)Make interoperableThe Amateur Archivist’s Approach6November 12, 2013Salt Lake City, Utah

2013 Archive-It Partner Meeting

28500:2009Slide7

Safety of Archives Requires $No $, No Institution

Users Hard Drives FailNo Access to Save-As filesand ScreenshotsA hybrid approach is neededto leverage institutional safety, formats, and techwhile still allowing direct user depositsThe Institutional Dilemma7November 12, 2013Salt Lake City, Utah

2013 Archive-It Partner MeetingSlide8

Show use case where other tools cannot capturee.g., behind authentication

Juxtapose to Archive.is, Webcite, Save webpage AsVideo Here8November 12, 2013Salt Lake City, Utah

2013 Archive-It Partner MeetingSlide9

Scratch Slide

9November 12, 2013Salt Lake City, Utah2013 Archive-It Partner MeetingSlide10

So we built it!

10November 12, 2013Salt Lake City, Utah2013 Archive-It Partner Meeting

WARCreate

– Google

Chrome extension

Create web archives from browser

Capture

personalized content

Preserve

on a whim

Mat Kelly and Michele C., "

WARCreate

- Create

Wayback

-Consumable WARC Files from Any Webpage,"

In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (

JCDL 2012

). Washington, DC, June 2012, pp. 437-438

Mat Kelly, Michele C.

Weigle

, Michael Nelson. "

WARCreate

- Create

Wayback

-Consumable WARC Files from Any Webpage,"

Digital Preservation 2012

, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC. Slide11

WARCreate – How it Works

11November 12, 2013Salt Lake City, Utah2013 Archive-It Partner MeetingSlide12

Preserving the Original Context

12Facebook-Supplied Data DumpArchive created from

WARCreate in Wayback

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

Liberated Data Doesn’t Give The Whole PictureSlide13

Preserving the Original Context

13Using Scraping Tools (e.g. wget)Archive created from WARCreate in Wayback

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

The Target Controls What is AllowedSlide14

Preserving the Original Context

14A Crawler Has No ContextArchive created from WARCreate in Wayback

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

No Credentials

No Entry

No ArchivingSlide15

Preserving the Original Context

15IA/HERITRIX OBEY ROBOTSArchive created from WARCreate in Wayback

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

No Means No, if They Say and you ObeySlide16

PROBLEM:

Users don’t know what to do with WARCsSo we built it!16November 12, 2013Salt Lake City, Utah2013 Archive-It Partner Meeting

WARCreate

– Google

Chrome extension

Create web archives from browser

Capture

personalized content

Preserve

on a whim

Mat Kelly and Michele C., "

WARCreate

- Create

Wayback

-Consumable WARC Files from Any Webpage,"

In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (

JCDL 2012

). Washington, DC, June 2012, pp. 437-438

Mat Kelly, Michele C.

Weigle

, Michael Nelson. "

WARCreate

- Create

Wayback

-Consumable WARC Files from Any Webpage,"

Digital Preservation 2012

, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC. Slide17

So, again, we built it!

17November 12, 2013Salt Lake City, Utah2013 Archive-It Partner Meeting

Web Archiving

Integration Layer (WAIL)

Heritrix

,

Wayback

, etc. packaged for PC

GUI

front-end allows “One-Click Preservation”

Provides means to replay

WARCs

Mat Kelly, Michele C.

Weigle

, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving,"

Personal Digital Archiving 2013

, Poster Session; 2013 Feb 21; College Park, MD.

Mat Kelly, Michael Nelson and Michele C.

Weigle

. "

WARCreate

and WAIL: WARC,

Wayback

and

Heritrix

Made Easy,"

Digital Preservation 2013

, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VASlide18

PROBLEM:

Users want to preserve but store at institutions for safe keepingSo, again, we built it!18November 12, 2013Salt Lake City, Utah

2013 Archive-It Partner Meeting

Web Archiving

Integration Layer (WAIL)

Heritrix

,

Wayback

, etc. packaged for PC

GUI

front-end allows “One-Click Preservation”

Provides means to replay

WARCs

Mat Kelly, Michele C.

Weigle

, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving,"

Personal Digital Archiving 2013

, Poster Session; 2013 Feb 21; College Park, MD.

Mat Kelly, Michael Nelson and Michele C.

Weigle

. "

WARCreate

and WAIL: WARC,

Wayback

and

Heritrix

Made Easy,"

Digital Preservation 2013

, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA

PROBLEM:

Even with replay, not everyone wants to use ChromeSlide19

The Plan

Port

Add functionality in:

…to upload WARCs to:

Implement Sequential Archiving

19

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

&

&Slide20

Disjoint extension/add-on APIsLittle logic can be re-used

Problems with HTTP header capture in Chrome are trivial in FirefoxChrome = highly asynchronous fetchingJavaScript code to save to local file system from Chrome for WARCreate is re-usablePorting WARCreate to Firefox20November 12, 2013Salt Lake City, Utah

2013 Archive-It Partner MeetingSlide21

The Plan

Port

Add functionality in:

…to upload WARCs to:

Implement Sequential Archiving

21

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

&

&

In

βeta

now!Slide22

The Plan

Port

Add functionality in:

…to upload WARCs to:

Implement Sequential Archiving

22

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

&

&Slide23

Working with Archive-It to determine feasibility of user-provided WARCs

Consideration of data integrityShould data be merged with A-IT crawled WARCs? How do we account for your www.facebook.com vs. my www.facebook.comPrivacy?Uploading WARCs:An Open Question23November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner MeetingSlide24

The Plan

Port

Add functionality in:

…to upload WARCs to:

Implement Sequential Archiving

24

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

&

&Slide25

Like focused crawl but

URIs defined on per-site basis to be comprehensiveSimilar to Archive Facebook but generalizedImplement into

WARCreate

Utilize per-site specification tokeep tools from breaking★

personal stream

wall

posts

my tweets

global stream

news feed

streams

followees

’ tweets

multimedia-photos

photos

photos

N/A

multimedia-videos

videos

videos

N/A

photo collection

albums

N/A

N/A

posts

notes

N/A

N/A

friends

friends

circles

following

Sequential Archiving?

25

November 12, 2013

Salt Lake City, Utah

2013 Archive-It Partner Meeting

The Digital Libraries Approach

Discovery & Scraping:

The Information Retrieval Approach

- versus -Slide26

Only (and optionally) applied on recognized sites with scraping as fallback for establishing hierarchyLives online, tools allude to and are always updated

Standardized spec* prototype is live onlineSequential Archiving = Lots of Maintenance26November 12, 2013Salt Lake City, Utah2013 Archive-It Partner Meeting

* M. Kelly, An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication, Aug 2012Slide27

Firefox WARCreate in BetaChrome WARCreate Users Can Currently

Archive What They See NowSequential Archiving Implemented in Chrome WARCreate, needs portingNext Big Hurdle: Working with Archive-It in WARC upload logisticsSummary27November 12, 2013Salt Lake City, Utah2013 Archive-It Partner MeetingSlide28

Download Our Archiving Tools!

Share Your Use Cases for Capturing the Unpreserved and the UnpreservableHelp Us Improve Our Tools, Give Feedback!http://bit.ly/wc-wailArchive What I See Now28November 12, 2013Salt Lake City, Utah

2013 Archive-It Partner Meeting

In Beta

Available Soon!

Web Archiving Integration Layer (WAIL)

One-Click Preservation

Heritrix

, Wayback and Others On Your PC!

WARCreate for Chrome

Create WARC files form any web page

from your browser