and EPrints Services for Object Storage and Preservation March 2008 All content in these slides is considered work in progress In no way does it represent an absolute view of any final end product and at this stage should purely be considered a set of realistic ideas ID: 308141
Download Presentation The PPT/PDF document "STK5800" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
STK5800 and EPrints
Services for Object Storage and PreservationMarch 2008
All content in these slides is considered work in progress. In no way does it represent an absolute view of any final end product and at this stage should purely be considered a set of realistic ideas. Slide2
Outline
StorageTek 5800 (The Honeycomb) provides high resilience data storage with a built in metadata layer.EPrints
is a piece of repository software for managing large collections of digital objects and their related metadata.
Slide3
EPrints
Open Source repository software to provide open access to institutional output.
Provides a powerful
plugin
based package which can easily be extended at any layer to suit a users requirements.
2 types of archiveThose used to manage publications and small objects.Those used to deposit large objects. These tend to contain heavier customisation. Slide4
Preserv2
Preserv2 is the 2nd iteration of a project looking at preservation services for repositories.
Beyond simple backup
Format Renderers, Format Translation, Risk Assessment, Interoperability and long term storage. Slide5
Why use a
Honeycomb?
A Honeycomb is not just a “Big Disk”
A Service Based Architecture:
Big object, big storage, more powerful
plugins/services.Smaller Repositories can jointly use a single Honeycomb as a “Preservation Service”.
Preservation Service Providers
Can combine several servers into a “Honeycomb Cloud” Slide6
EPrints Architecture
EPrints
(Repository) Layer
Object Storage
Metadata StorageSlide7
EPrints and Honeycomb
EPrints
(Repository) Layer
STK5800
HoneyCombSlide8
Services for Repositories
EPrints
(Repository) Layer
Metadata Services
Storage Beans
Automated
Wide Area BackupSlide9
Metadata Services
Same resilience as data.Averts the need to store a file id/url somewhere in order to find an object.
Enables collections to be constructed by independent parties.
Objects can be exported into many formats accurately.Slide10
Storage Beans
Can perform operations upon the objects in the system without reliance upon the repository to manage these processes. (e.g. Object Translation)Preservation services can provide feedback to repository administrators on potential risks to their objects. (e.g. Object Classification, age)
Can be used to extend the metadata layer to provide more powerful access to objects and their parts/pages. (e.g. Retrieve me page 10 of volume 6 of X) Slide11
Wide Area Replication (Backup)
The
possibility to link two or more Honeycombs together over a wide area to provide mirrored backup.
This can be implemented
by
the archive which can store its objects in a “Honeycomb Cloud” Slide12
Possible Architectures (2)
Repository
Repository
RepositorySlide13
Possible Architectures (3)
Repository
Repository
RepositorySlide14
Possible Architectures (4)
Repository
Repository
RepositorySlide15
Preservation Services
A “Honeycomb Cloud” provides the basis for a preservation service which can be provided to many small scale (<200Gb) repositories.Options for object storage:
Locally with Honeycomb acting purely as a preservation service.
Hand all object storage and retrieval to Honeycomb Cloud.
A half and half solution:
Small Objects served locally, Large Objects from Honeycomb.Recent and Popular Objects served locally, Older Objects considered preserved. Slide16
EPrints with the STK 4500
The out of the box repository solution for Large Repositories.Slide17
Thumpers “Big Disk”
The Thumper system (STK 4500) is essentially a “Big Disk” server.“Out of the Box” solution.
Expansions:
Services to enable replication between 2 thumpers.
Preservation services using a Honeycomb.
Aimed at Repositories where tape backup is not ideal.Slide18
Ecrystals (Possible Use Case)
Large Chemistry repository which currently stores only processes result objects (small).
These result files are generated from >1Gb raw datasets.
8+ Datasets generated a day.
After 6 months results sets are of less worth.
This represents 1TB of raw data in a 6 month period. Slide19
ECrystals – Single Honeycomb Architecture
Current Repository RemainsAll Results Sets Stored on
HoneyComb
Pros
Simplistic Architecture
Sole use of Honeycomb Year of “on-site” storage.Cons Cost
Backup Procedure?
EPrints
(Repository) LayerSlide20
“
Thumper System
ECrystals
– Thumper with “Honeycomb Cloud”
Pros
Single local machine
6 months+ locally Accessible
Automated Preservation
Preservation Services managed by Honeycomb Cloud.
Storage Beans on Honeycomb Cloud compress older/less popular objects
Cons
?
EPrints
(Repository) LayerSlide21
Summary
Honeycomb provides:Better separation of repository layer from storage layer.R
epository interoperability.
A
new approach to storing and preserving
data from institutional repositories based on EPrints and other software.