February 2011 Larry Lannom Corporation for National Research Initiatives httpwwwcnrirestonvaus httpwwwhandlenet Why Worry About Identifiers Managing increasing amounts of primary and secondary data on the Net over long periods of time ID: 615367
Download Presentation The PPT/PDF document "Handle System Overview" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Handle System OverviewFebruary 2011Larry LannomCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/http://www.handle.net/Slide2
Why Worry About Identifiers?Managing increasing amounts of primary and secondary data on the Net over long periods of time
Managing increasingly complex data relationships on the Net over long periods of time
When that data, its location(s), responsible parties, and the underlying systems may change dramatically over time
Science builds on past work and increasingly relies on collaboration within virtual distributed communities
All of this absolutely requires reliable, long-term persistent references to bind together the distributed data, processes, and parties involvedSlide3
Role of Identifier Resolution Systems in InformationManagement on Networks
Client
Resource Discovery
Search Engines, Metadata Databases, Catalogues, Guides, etc.
<?xml version="1.0"?>
<description>
…….
</description>
<?xml version="1.0"?>
<description>
…….
</description>
<?xml version="1.0"?>
<description>
…….
</description
>
<?xml version="1.0"?>
<note>
<to>John</to>
<from>Jane</from>
<heading>Reminder
<body>Don't forget me!
</note>
<?xml version="1.0"?>
<note>
<to>John</to>
<from>Jane</from>
<heading>Reminder
<body>Don't forget me!
</note>
Repositories
/
Collections
Identifier Resolution SystemSlide4
Handle SystemProvides basic identifier resolution system for InternetGo from object name to current state dataName can persist over changes in location and other attributesLogically a single system, but physically and organizationally distributed and highly scalableEnables association of one or more typed values, e.g., IP address, public key, URL, with each id
Optimized for speed and reliability
Secure resolution with its own PKI as an option
Open, well-defined protocol and data model
Provides infrastructure for application domains, e.g., digital libraries & publishing, e-research, id mgmt.Slide5
Handle System UsageLibrary of CongressDTIC (Defense Technical Information Center)IDF (International DOI Foundation)CrossRef (scholarly journal consortium, representing >2K publishers & societies)DataCite (consortium of 9 members from 12 countries started by TIB)
EIDR (Entertainment Identifier Registry)
mEDRA (Multilingual European DOI Registration Agency)
R.R. Bowker (bibliographic data - ISBN)
Office of Publications of the European Community (OPOCE)
Wanfang
Data
OECD
National Agricultural Library/USDA
DSpace (MIT + HP)
ADL (DoD Advanced Distributed Learning initiative)
Australian National Data Service (ANDS)EPIC (European Persistent Identifier Consortium)GENI (Global Environment for Network Innovations)Slide6
Assigned PrefixesDOI – 211, 323Other – 1,569HandlesDOI – 49.8 MOther - Additional millions (total per prefix known only to prefix manager)Handle ServicesGlobalSix service sites (three CNRI, one CrossRef, one CNNIC, one GWDG)Locals>1000 registered LHS’sTrafficGlobal: 100 million per month
CNRI-run proxy servers: tens of millions per month
Handle System Usage (Jan 2011)Slide7
HANDLE.NET Version 7.0Major upgrade; released December 2010
Berkeley DB is default storage system
Important new features:
A single template handle in the form of a base formula will allow any number of extensions to that base to be resolved according to a pattern, without registering each as a handle
.
Handle values can be signed with "offline" private keys.
A new handle value type, 10320/loc, specifies a list of URL locations (including information that differentiates the locations) to which a handle can resolve.
A DNS interface means handle servers can be used to host DNS names.Slide8
Server (v7.0)Java 1.4.2 and higherClient Library
Java & C versions available
Proxy servlet
Java servlet, typically runs under Apache Tomcat
Build your own or use hdl.handle.net
Misc. CNRI software (admin tools, browser plug-ins, etc.)
Misc. community software (alternate clients, database modules, etc.)
All available at www.handle.net
Alternate complete implementations
Two known to CNRI, neither public
Both developed from spec, but they talked to us
Handle System SoftwareSlide9
Handle String<prefix> / <suffix>Examples10.1525/bio.2009.59.5.94263537/5030Character Set: Unicode 2.0
Encoding: UTF-8
Prefixes
Currently allocating only numeric
Any text possibleSlide10
Handles Resolve to Typed DataHandleData Type
Handle Data
10.123/456
URL
http://acme.com/...
URL
http://a-books.com/...
HS_ADMIN
user123
XYZ
1001110011110Slide11
10.1525/bio.2009.59.5.9http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9URLHS_ADMINhandle=0.na/10.1525; index=200; [delete hdl,add val,read val,modify val,del admin,add admin,list]
10320/loc
<locations chooseby="locatt, country, weighted">
<location id="1" cr_type="MR-LIST" href="http://mr.crossref.org/
iPage?doi=10.1525%2Fbio.2009.59.5.9" weight="1" />
<location id="2" cr_src="unca" label="SECONDARY_BIOONE"
cr_type="MR-LIST" href="http://www.bioone.org/doi/full/10.1525/
bio.2009.59.5.9" weight="0" />
</locations>
Handles Resolve to Typed Data
Handle
Data Type
Handle DataSlide12
Handle Resolution
The Handle System
is a collection of
handle services,
GHR
LHS
LHS
LHS
LHS
each of which
consists of one or
more replicated sites,
Site 1Site 2Site 1
Site 2Site 3…...Site n
each of which mayhave one or moreservers.123.456/abcURL
4http://www.acme.com/http://www.ideal.com/8URL
#1#2#n#4#3
#1#2...Slide13
Handle ClientsGlobal Handle RegistryClient gets requestto resolve hdl:123/456
1. Client sends request to Global to resolve 0.NA/123 (prefix handle for 123/456)
hdl:123/456Slide14
Handle ClientsGlobal Handle RegistryClient gets requestto resolve hdl:123/456
2. Global Responds with Service Information for 123
Service Information
Acme Local Handle Service
IP
xc
xc
xc
xc
xc
xc
xc
xcxcxcxcxc
xcxcxcxcxc
xcxcxcxcxcxcxc....
..xcxcxc..
....xcxcxc......
...xcccxvxccxxccxxcccxv
xccxxccxxcccxvxccxxccx
hdl:123/456Slide15
Handle ClientsPrimary Site
123.45.67.8
Port #
Secondary Site B
Server 1
Server 1
Server 2
Server 3
Server 1
Server 2
123.52.67.9
321.54.678.12321.54.678.14762.34.1.1123.45.67.4Public Key
...2641K03RLQ...264126412641264126415&M#FG...F^*JLS...3E$T%...A2S4D...
N0L8H7...
..................IP AddressSecondary Site A
xcccxv
xcxcxcxcxcxcxcxcxc
xcxcxcxcxc
xcxcxcxcxcxcxcxcxc
xc......
xcxcxc......xcxcxc
.........xcccxv
xccxxccxxcccxvxccxxccxxcccxvxccxxccxService Information - Acme Local Handle Service Slide16
Handle ClientsPrimary Site
123.45.67.8
Port #
Secondary Site B
Server 1
Server 1
Server 2
Server 3
Server 1
Server 2
123.52.67.9
321.54.678.12321.54.678.14762.34.1.1123.45.67.4Public Key
...2641K03RLQ...264126412641264126415&M#FG...F^*JLS...3E$T%...A2S4D...
N0L8H7...
..................IP AddressSecondary Site A
xcccxv
xcxcxcxcxcxcxcxcxc
xcxcxcxcxc
xcxcxcxcxcxcxcxcxc
xc......
xcxcxc......xcxcxc
.........xcccxv
xccxxccxxcccxvxccxxccxxcccxvxccxxccxService Information - Acme Local Handle Service Slide17
Handle ClientsPrimary Site
123.45.67.8
Port #
Secondary Site B
Server 1
Server 1
Server 2
Server 3
Server 1
Server 2
123.52.67.9
321.54.678.12321.54.678.14762.34.1.1123.45.67.4Public Key
...2641K03RLQ...264126412641264126415&M#FG...F^*JLS...3E$T%...A2S4D...
N0L8H7...
..................IP AddressSecondary Site A
xcccxv
xcxcxcxcxcxcxcxcxc
xcxcxcxcxc
xcxcxcxcxcxcxcxcxc
xc......xc
xcxc......xcxcxc
.........xcccxv
xccxxccxxcccxvxccxxccxxcccxvxccxxccxService Information - Acme Local Handle Service Slide18
Handle ClientsClient gets requestto resolve hdl:123/456
hdl:123/456
3. Client queries Server 3
in Secondary Site A
for 10.1000/1
#1
#1
#2
#3
Secondary Site A
Secondary Site B
Acme
Local Handle ServiceGlobal Handle Registry#1#2
Primary SiteSlide19
Handle ClientsClient gets requestto resolve hdl:123/456
hdl:123/456
#1
#1
#2
#3
Secondary Site A
Secondary Site B
Acme
Local Handle
Service
Global Handle Registry
#1#2Primary Site
4. Server responds with handle dataSlide20
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
GHR
LHS
Handle Clients
Handle System
Proxy/Web Server
HTTP Get
HandleResolutionhttp://hdl.handle.net/123/456Resolution With a Web BrowserSlide21
Handle Clients
Resolution With a Web Browser
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
GHR
LHS
Handle System
http://acme.com/index.html
Proxy/Web ServerHTTP RedirectHandleDataSlide22
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
GHR
LHS
Handle Clients
Handle System
Resolution with a Handle Client Plug-in
hdl:123/456
HandleResolutionHandleDataSlide23
Handle Clients
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
GHR
LHS
Handle System
Handle
Admin via Web Form
Web Server and/or AdminServletsSlide24
Handle Clients
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
GHR
LHS
Handle System
Handle
Admin via Web Form
Web Server and/or AdminServletsSlide25
Handle Clients
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
GHR
LHS
Handle System
Custom Admin ClientSlide26
Handle Clients
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
GHR
LHS
Handle System
Handle Administration
Embedded inAnother Process
Handle ResolutionEmbedded inAnother ProcessSlide27
Template HandlesAn unlimited number of handles are computed on the fly from a single registered templateRe-write rules and delimiter can be defined at the prefix level, e.g., use ‘-’ as delimiter and re-write any URL values, e.g., for any handle under the prefix 123Any handle under that prefix can be divided into base and extension, e.g., 123/456-abc has a base of 123/456 and and extension of abc. The base is registered.
The data at 123/456 will then be combined with the extension string (abc) using the re-write rule
Resolve “123/456-abc” and get back
http://repository.com/getobject?id=123/456&part=abc
Resolve “123/456-def” and get back
http://repository.com/getobject?id=123/456&part=defSlide28
Template HandlesDirectly results from modularity of the current implementation
Backend handle storage is pluggable
A new storage module allows handles to be computed
The rest of the handle resolution mechanisms are unchanged, only the storage module was enhanced
Any exception handles can be individually registered to over-ride the template
Re-write rules at the base level will over-ride the prefix level rules
Re-write rules use Java regular expression language
Templates allow handle strings to remain static in reference form while millions of resolution values can be changed at a single stroke Slide29
Offline SignaturesHandle values can be signed with "offline" private keys that need not exist on any Internet-connected machine. This additional layer of verification has been applied to all entries in the Global Handle Registry. Any party that has the authority to create handle records can use this capability to sign their handle records.
There is a simple (but flexible) API for building handle value digests and signing those digests.Slide30
Multiple ResolutionStructured alternatives, e.g., multiple locations, in a single handle valueInclude selection criteria in that same valueHandle client application, e.g., proxy server, performs evaluationType = 10320/loc; value = <locations chooseby=“locatt, country, weight”><location id=0 href=“http://abc…. Country=“gb” weight=0><location id=1 href=“http://def… weight=1>
<location id=2 href=“http://xyz… weight=1>
<locations/>
If the user is in the UK they are redirected to http://abc…, if not then either http://def... or http://xyz... at random, 50/50
Currently deployed in CNRI-run proxies
and also
available in the open source proxy code
Approach extensible for future selection methods, e.g., chooseby language or other value known to the proxySlide31
The evaluation falls through the first two criteria and the proxy uses 'weighted' as the selection criteria. The first location (http://mr.crossref.org) wins with a weight of 1. That location goes to a script on the CrossRef site that builds the page a user sees when resolving the DOI name as
http://dx.doi.org/10.1525/bio.2009.59.5.9
. The page is built to include the original URL value plus the 10320/loc data plus some
additional information held by CrossRef.
10.1525/bio.2009.59.5.9
http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9
URL
HS_ADMIN
handle=0.na/10.1525; index=200;
[delete hdl,add val,read val,modify val,del admin,add admin,list]
Multiple Resolution "Chooseby"
10320/loc
<locations chooseby="locatt, country, weighted"> <location id="1" cr_type="MR-LIST" href="http://mr.crossref.org/ iPage?doi=10.1525%2Fbio.2009.59.5.9" weight="1" /> <location id="2" cr_src="unca" label="SECONDARY_BIOONE" cr_type="MR-LIST" href="http://www.bioone.org/doi/full/10.1525/ bio.2009.59.5.9" weight="0" /> </locations>Slide32
The page displayed includes both the original URL and the added BioOne link:TYPE = URLVALUE = http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9
TYPE = 10320/loc
VALUE = http://www.bioone.org/doi/full/10.1525/bio.2009.59.5.9
Multiple Resolution "Chooseby"Slide33
Resolving to Metadata: Special CasesUse the multiple resolution option (handle value type 10320/loc) to redirect to metadata servicesAllow it to be defined at the prefix level, with individual handle overrideTrigger by content negotiation in http request (linked data)
Trigger by URL parameters
Being tested with DOIs
Test version of dx.doi.org proxy up and running since mid-October
All non-standard content negotiation requests would go to RA based services, e.g., metadata.crossref.org
Requested specific metadata through URL parameters, redirected to some service, e.g.,
EIDR registrySlide34
Using a Resolution System With Existing IdentifiersNo lack of identifiers in the world
Actionable ISBN scheme
Example: 10.97812345/99990
The syntax specification, reading from left to right, is:
Handle System DOI name prefix = "10.”
ISBN (GS1) Bookland prefix = "978." or "979.”
ISBN Publisher prefix = variable length numeric string of 2 to 8 digits
Prefix/suffix divider = "/”
ISBN Title enumerator and checkdigit = variable length numeric string of 8 to 2 digitsSlide35
SpecificationRFC 3650: OverviewRFC 3651: Namespace and Service DefinitionRFC 3652: ProtocolDoDI 1322.26ISO standards track for DOIU.S. Patent 6,135,646Intent was to protect the technology as usage grewNever used by CNRI, but has been referenced by others as prior artIt has served its purpose well and it expires in 2013HSAC - Handle System Advisory CommitteeApprox 15 members representing big usersMaturation has diminished need for adviceTime for the next stage
Handle System Management & Standards