/
Reliability analysis of ZFS Reliability analysis of ZFS

Reliability analysis of ZFS - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
417 views
Uploaded On 2016-03-10

Reliability analysis of ZFS - PPT Presentation

CS 736 Project University of Wisconsin Madison Reliability Analysis of ZFS University of Wisconsin Madison To perform reliability analysis of ZFS Test existing reliability claims Layered driver interface simulating transient block corruptions at various levels in ZFS ondisk hierarch ID: 249880

wisconsin university zfs madison university wisconsin madison zfs object blocks disk data objects file block replicano checksumyes set pool dataset ditto system

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Reliability analysis of ZFS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Reliability analysis of ZFS

CS 736 Project

University of Wisconsin - MadisonSlide2

Reliability Analysis of ZFS

University of Wisconsin - Madison

To perform reliability analysis of ZFS

Test existing reliability claims

Layered driver interface – simulating transient block corruptions at various levels in ZFS on-disk hierarchy.

ResultsClasses of fault handled by ZFS.Measure of the robustness of ZFS.Lessons on building a reliable, robust file system.

SummarySlide3

Coming Up

University of Wisconsin - Madison

ZFS Organization

ZFS On Disk format

ZFS features and specs regarding reliability.

Experimental Setup and ExperimentsResults and ConclusionsFuture Work

Outline of the talkSlide4

ZFS Organization

University of Wisconsin - Madison

Pooled Storage Model

Pooled Storage Model

Disk is a ZFS pool comprising of many file systems.

ZFS Pool

ZFS

ZFS

ZFS

ZFSSlide5

ZFS Organization

University of Wisconsin - Madison

Transactional based object file system

Every structure is an object.

Operation on object(s) is a transaction.

Grouping of transaction as transaction group.All data and metadata blocks are checksummed.No silent corruptions.Modifications are always Copy on Write

Always on-disk consistent.

All metadata and data(optional) is compressed.

Object basedSlide6

ZFS Structures

University of Wisconsin - Madison

Entire file system is represented as

Objects -

dnode_phys_t

Object Sets - dnode_phys_t [ ]P/L analogy – each object is a template. The bonus buffer describes specific attributes.Slide7

ZFS Structures

University of Wisconsin - Madison

Data transferred to disks in terms of blocks.

Block pointers (

blkptr_t

) used to locate, verify and describe blocks.Contains checksum and compression information.Physical size of block <> Logical Size of blockGang blocks

Blocks and block pointersSlide8

ZFS Structures

University of Wisconsin - Madison

Data Virtual Address – combination of fields in

blkptr_t

to locate block on disk.

Wideness – blkptr_t can store upto three copies of the data pointed by a unique

DVA. These blocks are called as “ditto blocks”.

Three for pool wide metadata

Two for file system wide metadata

One for data (configurable)

Block pointers

offset1

asize

vdev1

asize

vdev2

offset2

asize

vdev3

offset3

Lvl

typ

cksum

comp

psize

lsizeSlide9

ZFS Structures

University of Wisconsin - Madison

WidenessSlide10

ZFS Structures

University of Wisconsin - Madison

ZAP (ZFS Attribute Processor)

ZAP objects used to handle arbitrary (name, object) associations within an object set (

objset

)Most commonly used to implement directoriesAlso used extensively throughout the DSL

Attributes on diskSlide11

Putting it all together

University of Wisconsin - Madison

Everything in ZFS is an object.

A

dnode

describes and organizes a collection of blocks making up an object.

Objects

ObjectsSlide12

Putting it all together

University of Wisconsin - Madison

Group related objects to form

objsets

.

Filesystems, volumes, clones and snapshots are

objsets

.

Objects

Object set

Object SetsSlide13

Putting it all together

University of Wisconsin - Madison

Objects

Object set

Snapshot Information

DataSet

Encapsulates

objset

and provides

Space usage

Snapshot Information

Space map

DataSetsSlide14

Putting it all together

University of Wisconsin - Madison

Objects

Object set

Snapshot

Information

DataSet

Child Map

Properties

DataSet

Directory

Groups Datasets

Properties such as quotas, compression

Dataset Relationships

Space map

Dataset directoriesSlide15

A road less travelled

University of Wisconsin - Madison

From

vdev

label to dataSlide16

To sum up

University of Wisconsin - Madison

Layers of indirection

End to end Checksums which are separated from data.

Wideness (Ditto Blocks) (3 – 2 – 1)

CompressionCopy on WriteScrub facility

Moving forwardSlide17

Experimental Setup

Corruption Framework

Corrupter Driver

Modify physical disk blocks

Analyzer App

Understand on-disk ZFS structuresConsumer AppMonitor ZFS responses, error codes

University of Wisconsin - MadisonSlide18

Experimental Setup - Simplification

Setup on Solaris 10 VM

Only one physical

vdev

(disk)

No striping, mirror, raid…Initial target – Pointer CorruptionReduced Sample SpaceInteresting CasesDisable compression as much as possibleUniversity of Wisconsin - MadisonSlide19

Initial Finding

All metadata compressed

Cannot disable metadata compression

Pointer Corruption not feasible

Perform corruptions on compressed objects

Representative of effects of disk faults on ZFSUniversity of Wisconsin - MadisonSlide20

Corruption Experiments

TYPE:

Type-aware Object Corruptions

TARGET (Targeted On-Disk Objects)

Vdev

labels [@Pool]Uberblocks [@Pool]Object setsMeta Object Set [@Pool]objset_phys_t (describing object set)Object array

Myfs

Object Set [@FS]

objset_phys_t

Indirect

blkptr

objects

Object array

ZIL [@FS]

File Data [@FS]

Directory Data [@FS]University of Wisconsin - MadisonSlide21

Results

Detection

Recovery

Correction

vdev

label

YES/Checksum

YES/Replica

NO/COW

uberblock

YES/Checksum

YES/Replica

NO/COW

MOS

Object

YES/Checksum

YES/Replica

NO/COW

MOS Object

Set

YES/Checksum

YES/Replica

NO/COW

FS Object

YES/Checksum

YES/Replica

NO/COW

FS Indirect Objects

YES/Checksum

YES/Replica

NO/COW

FS Object

Set

YES/Checksum

YES/Replica

NO/COW

ZIL

YES/Checksum

NO

NO

Directory Data

YES/Checksum

NO/Configurable

NO/Configurable

File Data

YES/Checksum

NO/Configurable

NO/Configurable

University of Wisconsin - MadisonSlide22

Summary (using IRON Taxonomy)

Detection

Checksums in

parent

blkptrs

RecoveryReplication in parent blkptrs (ditto blocks)

University of Wisconsin - MadisonSlide23

Conclusion

Integration of File System and Volume Manager

Saves an additiona

l

translation

Use of one generic pointer block for checksums and replicationMerkel tree provides RobustnessUse of replication/compression in commodity file system viableCOW can be used effectively

University of Wisconsin - MadisonSlide24

Observations/Questions

No correction of ditto blocks: relies on COW

Consecutive (n=wideness) failures without transaction group commit ??

Snapshot corruption ??

Explicit scrubbing corrects ditto blocks in-place

Potential for corruption ??Space/ Performance hit due to redundancy/compression2% hit in terms of space/IO ?? (Banham & Nash)

No

Page Cache, uses ARC

University of Wisconsin - MadisonSlide25

Future Work

Snapshot corruptionsMultiple device configuration

Striping

Mirror

RAID-Z

University of Wisconsin - Madison