or OxfordIllinois Digital Libraries Placement Program Summer 2015 Jennifer Westrick MSLIS University of Illinois OIDLPP The Initial Project Project 3 Migration Workflow for Digital Collections ID: 583368
Download Presentation The PPT/PDF document "How I Spent My Summer" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
How I Spent My Summer
– or –
Oxford-Illinois Digital Libraries Placement Program
Summer
2015
Jennifer Westrick, MSLIS
University of Illinois, OIDLPPSlide2
The Initial Project
Project
3: Migration Workflow for Digital Collections.
The Bodleian has been digitizing collections and making them available online for more than twenty years. While the images created can still be useful and viable, the platforms to deliver the images are often difficult to maintain. This project will analyze existing digitization and publication workflows, and propose ways of making legacy content and collections available online through new platforms.
Written by Christine Madsen, Head of Digital
Programmes
, BDLSS
(Bodleian Digital Libraries Systems and Services)Slide3
Project Changed to Become an Exploration of the Current Situation
Before moving ahead on designing workflows, basic information was lacking:
- How many are we talking about?
- What platforms are they built on?
- Does anyone still use them?
Digital collections constantly changing; workflow still
neededSlide4
Deliverables
:
- Spreadsheet
- Detailed walk-through/ workflow of
steps taken
(
what
worked
and what didn’t
)
Slide5
Retiring a Digital Library:
Considerations for Legacy
Digital
Collections
Jennifer Westrick, MSLIS
University of Illinois, OIDLPPSlide6
d
igital.bodleian
was
i
ntroduced in July 2015Slide7
But what
about these?Slide8
But what about these?Slide9
But what
about these?Slide10
But
what about these?Slide11
But w
hat about these?Slide12
But
what about these?Slide13
But w
hat about these?Slide14
But w
hat about these?Slide15
These legacy collections
Are not an issue at smaller and/or less progressive institutions
Grew
organically; often part of a funded project with specified time frame
May be currently used
May have important sites that reference/cite the URL
Are
(?) in
digital.bodleianSlide16
These legacy collections
Are not an issue at smaller and/or less progressive institutions
Grew
organically; often part of a funded project with specified time frame
May be currently used
May have important sites that reference/cite the URL
Are
(?) in
digital.bodleian
Goal: easy for users, and less work for IT
Slide17
The Process: Five
Steps
Define: what is a legacy collection?
Identify: The goal is a comprehensive list of possible legacy collections
Collect data
- Basic information about
site
- Patron usage (Google Analytics)
- Scholarly usage (Google Scholar and Webometrics)
- how many outside URLs link to that digital collection
- source of that link
- Technical
Data
Assess the collections
Next steps for the collectionsSlide18
Step One:
Define
Goal
:
what criteria must a collection meet to be considered legacy?
A legacy collection:
- its content is now part of a newer collection
- its technology managed by Bodleian IT staff
- some judgement calls, i.e. blogs, exhibits
- don’t forget collections within collectionsSlide19
Step Two: Identify
Goal
: a
list of legacy
collections
Sources:
- spreadsheet from Christine Madsen
- spreadsheet from Michael
Popham
- quick Google search
- list of sites from IT?Slide20
Step Three: Collect
Data
Goal: Numbers in spreadsheet (color-coded!)
Four areas of analysis
- Basic information about site
- Patron usage (Google Analytics)
- Scholarly usage (Google Scholar and Webometrics)
- how many outside URLs link to that digital collection
- source of that link
- Technical DataSlide21
Basic informationSlide22
Patron U
sage: Google Analytics
- Google Analytics currently tracking usage on 23 of the legacy sites
- website statistics include a 30-day summary of
- sessions
-
unique
users
-
pageviews
- pages
per session
-
avg
session duration
-
%
of users
who were
new
- screenshots saved for future analysis
Slide23
2,259 sessions
1,776 users
8,601
pageviews
3.81 pages per session
03:06 avg. session duration
72% new sessions
Over
the past 30 days,
ballads.bodleian.ox.ac.uk hadSlide24
Google: 1,460
Direct logins: 376
Referral: 171
(harkavagrant.com)
Bing: 39
Yahoo: 13
Sources of
the 2,259 sessions
:Slide25Slide26Slide27Slide28
Other means to analyse patron usage?
-
Software that require purchase
- Google keyword search:
- hard to do accurately - search results vary widely depending on keyword used
- Google assumptions - tailoring results
- Alexa or other web-ranking sites: can’t get them to take entire URL
“
arshama.bodleian.ox.ac.uk” searches on “
ox.ac.uk
”Slide29
URL Analysis: Google Scholar and Webometrics
Assumption:
Greater number of documents that include a reference, citation
or link to our URL =
greater significance in the scholarly world (and less inclination to change the URL)Slide30
Google Scholar search:
G
oal
:
citation information – how many times was the legacy collection’s URL cited?
Go
into Google Scholar, search on
URL
- # mentions (supplied by Google Scholar, at the top of the page)
-
# of items that cite the URL (count each item once
)
- # of citations (how many total citations from the above items) Slide31
3 items have cited this URL
81 total results
..for a total of 44 citations
↗
↗
↗Slide32Slide33
Google
Scholar can be:
- a source of detailed information about each URL that cited the collection
(started compiling; too much data)
- searched many ways – i.e. root vs entire URL
- managed; many academics and institutions monitor their inclusions Slide34
URL
Analysis with
Webometrics
Different
from Google Scholar because
it returns URL usage on websites rather than publications, and no focus on academia.
Best thing: List
of
URLs
Mike
Thelwall’s
Statistical
Cybermetrics
Research Group
Webometrics
Analyst
2.0, http
://lexiurl.wlv.ac.uk/index.html Slide35
Webometrics screen shot
Search this list of URLs for wikip, .edu, .ac.uk; record totalsSlide36Slide37
Technical information
Goal: collect specifics about each site to assess the difficulty of future actions.
Problem: Information is not easily accessible.Slide38
Technical information
Goal
: collect specifics about each site to assess the difficulty of future actions.
Problem: Information is not easily accessible.
Also collect information on metadata, if it existsSlide39
Compilation
Approximately 150 rows of potential legacy sites, and 45 columns of criteria Slide40
Step Four: Assess
Helpful
to mark the collection if it is
- Active
- Dead
link
- Duplicate
or near-duplicate
-Not
ours:
An exhibit
A different department or different university
entirely
Is
it in
digital.bodleian
?Slide41
Step Five: Potential Next Steps
Goals
: Maintain user access and
lighten the
workload on IT
- delete the website - redirect: automatically or with a click
- pros: seamless to user
- con: site still needs to be maintained
- move all legacy collections to a single server
- make the pages static and put all on a single server
- Archive-It - or other outside web archiving serviceSlide42
Potential Next Steps – big
picture
- Create
guidelines for future digital
collections
- specify
supported platforms
- metadata specifics
- regularly scheduled maintenance and/or checks
- Updated Digital Collections Management Policy
- Much depends on staff/time/budgetSlide43
Retiring
a Digital Library:
Considerations for Legacy Digital
Collections
Deliverables:
-
a detailed walk-through of steps taken (what worked and what didn’t)
- a spreadsheet
with lots of data (color-coded!)
Jennifer
Westrick, MSLIS
University of Illinois,
OIDLPP
Summer
2015