Pascal Calarco amp Alison Hitchens Library December 6 2011 Agenda The state of search in libraries Pascal Expanding Primo beyond the local catalogue Alison Questions 2011 Library Information Systems Milestones ID: 181743
Download Presentation The PPT/PDF document "Search to Discovery: Finding Global Scho..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Search to Discovery: Finding Global Scholarly Resources with Primo
Pascal Calarco &
Alison Hitchens, Library
December 6, 2011Slide2
Agenda
The state of search in libraries (Pascal)
Expanding Primo beyond the local catalogue (Alison)
Questions
2011Slide3
Library Information Systems: Milestones
Discovery
Metasearch
Citation Linking ILS 3rd gen (Client-server; 1990s) ILS 2nd gen (Mainframe; 1980s) OCLC (library network; 1972) Early systems MARC1960 1980 1990 2000 2010
2011Slide4
In the beginning, there was the card catalog (1901+)
Indexes:
Subject
AuthorTitleInterfiled cards, calln
umber access
2011Slide5
Library of Congress National Union Catalog (pre-1956)
2011Slide6
Henriette Avram, Developer of MARC
Programmer/analyst at Library
Of Congress
Developed system for printing
c
ard catalog information (MARC)
ISO certification 19732011Slide7
Later, there was the Online Public Access Catalog (OPAC)
Machine Readable Cataloging (MARC)
Inventory of the print/physical holdings of a library
Better than the card catalog; keyword searching & boolean
functionality
Non-intuitive; required training or intermediation (information professional)
Limited generally to single library2011Slide8
Library networks & resource sharing
2011Slide9
Print to Electronic
2011Slide10
Now: Electronic Almost Ubiquitous
85%+ of journal
literature digital
Hundreds of specialized scholarly databasesMass print book digitization effortsElectronic books going mainstream
Aggregated meta-indexes: 750 million metadata for journal/newspaper articles
2011Slide11
Goal: improve user experience
Users want to FIND not search
Source required information to user regardless of format or location
Leverage our knowledge of academic community @ uWaterlooIntegrate into key services: LMS, CMS, other library services
2011Slide12
Database Content Silos
Content Silos
System Silos
Catalog
ILL
Meta-search
eReserve
Website
Science-Direct
Web of Science
ETDs
EEBO
JSTORSlide13
Metasearch: an interim step
aka Federated Search; emerged 2003
Distributed search from one interface via web services, SOAP/XML gateways
Idiosyncratic and slow; vendors implemented variouslyRelevancy of merged results problematic
2011Slide14
Problems with catalog searching & evolution to discovery
UCLA & Berkeley: information
retrieval & user
behavior (1986-1996)Google Books: “digitize the world’s knowledge” (2002)
Karen Schneider, Andrew Pace, Roy
Tennant: “The OPAC ‘Sucks’”
(2002)Next generation catalogs -> Discovery (2008+)2011Slide15
Catalogs
: Information
Science
Research
Christine L.
Borgman (1986) “Why are online catalogs hard to use? Lessons learned from information retrieval studies” Journal of the American Society for Information Science
Ray R. Larsen (1991) “The decline of subject searching: Long-term trends and patterns of index use in an online
catalog
”
Journal of the American Society for Information Science
Ray R. Larsen (1992) “Evaluation of advanced retrieval techniques in an experimental online
catalog
”
Journal of the American Society for Information Science
Ray R. Larsen (1996) “Cheshire II: designing a next-generation online
catalog
”
Journal of the American Society for Information Science
Christine L.
Borgman
(1996) “Why are online
catalogs
still hard to use?”
Journal of the American Society for Information ScienceSlide16
How Users Search: What We’ve Learned
Most people make typos at least some of the time
Most searches are 2, 3, 4 words with no Boolean operators
Most searches use keyword
Search is hesitant, iterative, often random process of discovery
Most people start elsewhere
Few read help screensFew use advanced search – this is true even in Google Slide17
The Google Effect
Expectations for web search tools now:
Radically simplified UI, fast results
Aggregated content Relevant results on first pageNatural Language queries
Spelling correction/adaptation
2011Slide18
The OPAC “Sucks”
The OPAC lacks common features of most search engines
Relevance ranking vs. last in, first out
Spell checking (related - did you mean?)
Popular query operators like + and –
Refine search
Sort flexibility
Faceting
Citation indexing vs full text
Developed for print materials, limitations with electronic materials or atomized items (like articles)
Difficult for certain known item searchSlide19
Industry Trends
Decouple the front end (search and discovery) from the back end (inventory and cataloguing)
Service Oriented Architecture – many programs loosely coupled
Cloud services --
SaaS
The 5
th
generation of
library business systems emerging now – hosted, cloud solutionsSlide20
Discovery Characteristics
Enhanced Search Functionality
Faceted browse
Relevance ranking“Did
you
mean?”
/ Spell Checking auto-correction, resubmit searchContent aggregationIntegrating search for books, articles, etc.Single, Simple Search BoxFRBR – functional requirements for bibliographic record, grouping editionsSlide21
Discovery Characteristics, cont.
Enhanced Experience
Sometimes fun and engaging
Interactive/Collaborative
User centered design
Enhanced Services
Find it / Get it for meBook Covers / SynopsisFull textAvailability on same page as resultsSlide22
Discovery Characteristics, cont.
Enhanced Content
Article Searching
Commercial Data
Merging Special Collections
Harvesting Online Collections
Grey LiteratureFree ContentEnhanced AccessSyndication - Getting into users toolsCourse Management SystemsBrowser and Desktop Tool BarsPortalsSlide23
Discovery Components
Next Generation Catalog
Next Generation “Unified Search” Aid
Normalization &
Apache
SOLR/
LuceneUser Interface
ILS
OPAC
MARC
Vendor
Data
MetaSearch
OAI
Vendor
Data
Circ Data
Full
TextSlide24
Primo Central
Content Components
Primo
RACER
TUG
Archives
OCUL
Geospatial
HathiTrust
Others
Phase I
Phase II
FutureSlide25
Evolution of DiscoverySlide26
Options for Expanding Primo
Local ingestion of resources using FTP or OAI harvesting
Searching remote resources in Primo using the Primo
DeepSearch API*Subscribing to a large centralized index, such as Primo Central
*Application Programming Interface
2011Slide27
Local ingestion of records
Example:
Hathi
Trust Digital LibraryHarvest the public domain records from Hathi Trust Digital Library
Normalize the records
Index the records in our local Primo database
Schedule updates from Hathi Trust into Primo2011Slide28
Normalization: creating local sort field (Date – Oldest)
2011Slide29
Primo Normalized XML (PNX)
2011Slide30
Open source & Open platform
Primo uses
Lucene
for its indexingSOLR exposes Lucene as a web service and allows for facetingAPIs and web services allow flexibility and customization
2011Slide31
We can’t index everything!
Trying out a subscription to Primo Central, a centralized index of scholarly journal articles, newspapers, conference proceedings etc.
User sees one interface; user is searching 2 indexes
2011Slide32
What is Primo Central Index?
A centralized index
of free and restricted resources
primarily articles & e-booksbased on metadata & full-text provided by publishers/aggregators
based on the collections selected by the library in the Primo Administration module
created & maintained by our vendor, Ex
LibrisSlide33
What is Primo Central Index?
A centralized index
of records harvested using the same process as our local Primo database
created using the same PNX record structure as our local Primo databaseindexed using the same indexing tools as our local Primo databaseSlide34
Blending local and remote resources
Both local and remote results are represented in the facets
Blended relevance ranking
Can configure Primo to
boost high ranking local results
so that when Primo is doing relevance ranking on our 4 million records alongside 100s of millions of Primo Central records local results aren’t missed by the userSlide35
Search = local resources & Primo CentralSlide36
How does it work?
Ex
Libris
has created & indexed records for millions of items based on information from the publishersPrimo searches Primo Central the same way it searches the local databaseFull text availability is determined in advance by our URL resolver SFX, i.e.
Delivery of the resource uses menu for Slide37
New features: snippets give context
If your search term is found in the full-text, Primo supplies a snippet highlighting the termSlide38
New features: expanding the search
Defaults to our library’s electronic subscriptions but users can expand the search to all of Primo CentralSlide39
New Facets & Facet ValuesSlide40
Added value: bX RecommenderSlide41
Trouble-shooting remote resources
We can view the PNX records using web services but we have no control over the content or the normalization rules
Records have the same structure as our local records but are missing local fields and don’t reflect local policies
2011Slide42
Assessing Primo Central
Over 65 hours of one-on-one usability testing and focus groups with undergraduate students, graduate students, faculty, staff and alumni
Library staff survey
Feedback formStatistics from Cognos
2011Slide43
Looking to the future
What other content should be added to Primo?
How can we improve/enhance the interface?
What is the right balance for boosting local physical resources?How do we point users to resources that can’t be searched using Primo?
2011Slide44
Questions?
Pascal Calarco
Associate
University Librarian, Digital & Discovery Servicespvcalarco@uwaterloo.caAlison Hitchens
Cataloguing & Metadata Librarian
ahitchen@uwaterloo.ca
2011