CS 736 Project University of Wisconsin Madison Reliability Analysis of ZFS University of Wisconsin Madison To perform reliability analysis of ZFS Test existing reliability claims Layered driver interface simulating transient block corruptions at various levels in ZFS ondisk hierarch ID: 249880
Download Presentation The PPT/PDF document "Reliability analysis of ZFS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Reliability analysis of ZFS
CS 736 Project
University of Wisconsin - MadisonSlide2
Reliability Analysis of ZFS
University of Wisconsin - Madison
To perform reliability analysis of ZFS
Test existing reliability claims
Layered driver interface – simulating transient block corruptions at various levels in ZFS on-disk hierarchy.
ResultsClasses of fault handled by ZFS.Measure of the robustness of ZFS.Lessons on building a reliable, robust file system.
SummarySlide3
Coming Up
University of Wisconsin - Madison
ZFS Organization
ZFS On Disk format
ZFS features and specs regarding reliability.
Experimental Setup and ExperimentsResults and ConclusionsFuture Work
Outline of the talkSlide4
ZFS Organization
University of Wisconsin - Madison
Pooled Storage Model
Pooled Storage Model
Disk is a ZFS pool comprising of many file systems.
ZFS Pool
ZFS
ZFS
ZFS
ZFSSlide5
ZFS Organization
University of Wisconsin - Madison
Transactional based object file system
Every structure is an object.
Operation on object(s) is a transaction.
Grouping of transaction as transaction group.All data and metadata blocks are checksummed.No silent corruptions.Modifications are always Copy on Write
Always on-disk consistent.
All metadata and data(optional) is compressed.
Object basedSlide6
ZFS Structures
University of Wisconsin - Madison
Entire file system is represented as
Objects -
dnode_phys_t
Object Sets - dnode_phys_t [ ]P/L analogy – each object is a template. The bonus buffer describes specific attributes.Slide7
ZFS Structures
University of Wisconsin - Madison
Data transferred to disks in terms of blocks.
Block pointers (
blkptr_t
) used to locate, verify and describe blocks.Contains checksum and compression information.Physical size of block <> Logical Size of blockGang blocks
Blocks and block pointersSlide8
ZFS Structures
University of Wisconsin - Madison
Data Virtual Address – combination of fields in
blkptr_t
to locate block on disk.
Wideness – blkptr_t can store upto three copies of the data pointed by a unique
DVA. These blocks are called as “ditto blocks”.
Three for pool wide metadata
Two for file system wide metadata
One for data (configurable)
Block pointers
offset1
asize
vdev1
asize
vdev2
offset2
asize
vdev3
offset3
Lvl
typ
cksum
comp
psize
lsizeSlide9
ZFS Structures
University of Wisconsin - Madison
WidenessSlide10
ZFS Structures
University of Wisconsin - Madison
ZAP (ZFS Attribute Processor)
ZAP objects used to handle arbitrary (name, object) associations within an object set (
objset
)Most commonly used to implement directoriesAlso used extensively throughout the DSL
Attributes on diskSlide11
Putting it all together
University of Wisconsin - Madison
Everything in ZFS is an object.
A
dnode
describes and organizes a collection of blocks making up an object.
Objects
ObjectsSlide12
Putting it all together
University of Wisconsin - Madison
Group related objects to form
objsets
.
Filesystems, volumes, clones and snapshots are
objsets
.
Objects
Object set
Object SetsSlide13
Putting it all together
University of Wisconsin - Madison
Objects
Object set
Snapshot Information
DataSet
Encapsulates
objset
and provides
Space usage
Snapshot Information
Space map
DataSetsSlide14
Putting it all together
University of Wisconsin - Madison
Objects
Object set
Snapshot
Information
DataSet
Child Map
Properties
DataSet
Directory
Groups Datasets
Properties such as quotas, compression
Dataset Relationships
Space map
Dataset directoriesSlide15
A road less travelled
University of Wisconsin - Madison
From
vdev
label to dataSlide16
To sum up
University of Wisconsin - Madison
Layers of indirection
End to end Checksums which are separated from data.
Wideness (Ditto Blocks) (3 – 2 – 1)
CompressionCopy on WriteScrub facility
Moving forwardSlide17
Experimental Setup
Corruption Framework
Corrupter Driver
Modify physical disk blocks
Analyzer App
Understand on-disk ZFS structuresConsumer AppMonitor ZFS responses, error codes
University of Wisconsin - MadisonSlide18
Experimental Setup - Simplification
Setup on Solaris 10 VM
Only one physical
vdev
(disk)
No striping, mirror, raid…Initial target – Pointer CorruptionReduced Sample SpaceInteresting CasesDisable compression as much as possibleUniversity of Wisconsin - MadisonSlide19
Initial Finding
All metadata compressed
Cannot disable metadata compression
Pointer Corruption not feasible
Perform corruptions on compressed objects
Representative of effects of disk faults on ZFSUniversity of Wisconsin - MadisonSlide20
Corruption Experiments
TYPE:
Type-aware Object Corruptions
TARGET (Targeted On-Disk Objects)
Vdev
labels [@Pool]Uberblocks [@Pool]Object setsMeta Object Set [@Pool]objset_phys_t (describing object set)Object array
Myfs
Object Set [@FS]
objset_phys_t
Indirect
blkptr
objects
Object array
ZIL [@FS]
File Data [@FS]
Directory Data [@FS]University of Wisconsin - MadisonSlide21
Results
Detection
Recovery
Correction
vdev
label
YES/Checksum
YES/Replica
NO/COW
uberblock
YES/Checksum
YES/Replica
NO/COW
MOS
Object
YES/Checksum
YES/Replica
NO/COW
MOS Object
Set
YES/Checksum
YES/Replica
NO/COW
FS Object
YES/Checksum
YES/Replica
NO/COW
FS Indirect Objects
YES/Checksum
YES/Replica
NO/COW
FS Object
Set
YES/Checksum
YES/Replica
NO/COW
ZIL
YES/Checksum
NO
NO
Directory Data
YES/Checksum
NO/Configurable
NO/Configurable
File Data
YES/Checksum
NO/Configurable
NO/Configurable
University of Wisconsin - MadisonSlide22
Summary (using IRON Taxonomy)
Detection
Checksums in
parent
blkptrs
RecoveryReplication in parent blkptrs (ditto blocks)
University of Wisconsin - MadisonSlide23
Conclusion
Integration of File System and Volume Manager
Saves an additiona
l
translation
Use of one generic pointer block for checksums and replicationMerkel tree provides RobustnessUse of replication/compression in commodity file system viableCOW can be used effectively
University of Wisconsin - MadisonSlide24
Observations/Questions
No correction of ditto blocks: relies on COW
Consecutive (n=wideness) failures without transaction group commit ??
Snapshot corruption ??
Explicit scrubbing corrects ditto blocks in-place
Potential for corruption ??Space/ Performance hit due to redundancy/compression2% hit in terms of space/IO ?? (Banham & Nash)
No
Page Cache, uses ARC
University of Wisconsin - MadisonSlide25
Future Work
Snapshot corruptionsMultiple device configuration
Striping
Mirror
RAID-Z
University of Wisconsin - Madison