Presentation by Eric Frohnhoefer 1 CS5204 Operating Systems CS5204 Operating Systems Assumptions Built from inexpensive commodity components Cheap components frequently fail Modest number of large files ID: 211895
Download Presentation The PPT/PDF document "The Google File System" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Google File System
Presentation by: Eric Frohnhoefer
1
CS5204
– Operating SystemsSlide2
CS5204 – Operating Systems
Assumptions
Built from inexpensive commodity components
Cheap components frequently fail
Modest number of large files
Few million files, each 100 MB or larger
Support for large streaming reads and small random readsFiles written once then appendedHigh sustained bandwidth favored over low latency
2Slide3
CS5204 – Operating Systems
Design Decisions
Single master, multiple chunkservers
File structure
Fixed size 64MB chunks
Chunk divvied into 64K blocks
32 bit checksum computer for each blockEach chunk replicated across 3+ chunkserversFamiliar interface
Create, delete, open, close, read, and write
Snapshot and record append
No caching
3Slide4
Single Master
Manages namespace and lockingManages chunk placement, creation, re-replication, and rebalancingGarbage collection
CS5204 – Operating Systems
4
ArchitectureSlide5
Architecture
Chunkserver
Servers chunks to directly to clientStores 64 MB chunks and checksums for each 64K blockReports chunks contained on server to master
Verifies contents during idle periods
CS5204 – Operating Systems
5Slide6
Metadata
NamespaceLogical mapping from files to locations on
chunkserverKept up to date with heartbeat messages from chunkserver
Metadata
stored in memory
Quick access
64 bytes of metadata for each 64 MB chunkOperations logHistorical record of changes made to metadata
Dennis Kafura – CS5204 – Operating Systems
6Slide7
Consistency Model
States:
Consistent – all replicas have the same value
Defined – replica reflects the mutation
Namespace mutations are atomic and
serializable
Client requires additional logicRemove inconsistent recordsRemove repeat recordsAdd checksums and unique identifies to records
CS5204 – Operating Systems
7Slide8
Mutation Operation
CS5204 – Operating Systems
8
Write operation:
Client
requests location primary and secondary
chunkserver.
Master
assigns primary chunkserver and replies to client.
Client pushes all data to replicas. Data stored
in LRU buffer.
Client sends write request to primary chunkserver.
Primary assigns serial number and forwards request to all secondary chunkservers.
Secondary servers reply to primary with operation status.
Primary replies to client with operations status.Slide9
Mutation Operation
CS5204 – Operating Systems
9
Atomic record
append
:
Similar to O_APPEND mode in Unix without race condition due to multiple writers.
Record written at least once.
Same logic flow as write except primary appends
the record and tells secondary chunkservers the exact location.
Used heavily by Google applications.Slide10
Mutation Operation
CS5204 – Operating Systems
10
Snapshot operation:
Master receives snapshot request and revokes outstanding leases.
After leases revoked
the master logs the operation.
In-memory
copy of file or directory metadata created
.
Copy created on same chunkserver only when chunk is mutated.Slide11
Master’s Responsibilities
Namespace management
Each entry has a associated read-write lockAllows for concurrent mutations in same directory
CS5204 – Operating Systems
11
/home/user
/save/user
Snapshot:
Read lock acquired on
/home
and
/save
Write lock acquired on
/save/user
and
/home/userSlide12
Master’s Responsibilities
Periodic communications with chunkserversCollect state, tracks cluster health
Replica placementMaximize reliability and maximize bandwidth utilizationDistribute chunks between multiple racks
Chunk Creation
New replicas on chunkservers with below-average disk space utilization
Limit number of recent creations on chunkserver
Replicate across racksCS5204 – Operating Systems
12Slide13
Master’s Responsibilities
Re-replicationOccurs when number of replicas falls below user-specified goal
Re-replication is prioritizedRebalanceMaster examines the current replica distribution and moves replicas for better disk space and load balancing.
Garbage collection
Master logs deletion immediately
File is renamed a given a deletion timestamp
Files actually deleted later at user-specified dateCS5204 – Operating Systems
13Slide14
High Availability
Fast recoveryChunk replication
Default 3 replicasDistribute across multiple racksShadow MasterMaster state is fully replicated.
Mutations only committed once log has been written on all replicas.
Provides read-only access even when master is down
Dennis Kafura – CS5204 – Operating Systems
14Slide15
CS5204 – Operating Systems
Performance
15
Cluster characteristics
Cluster performanceSlide16
CS5204 – Operating Systems
Amazon S3
RESTful
and SOAP style interface
BitTorrent
for distributed download
99.999999999% durability and 99.99% uptimeReplicated 3 times across 2 datacentersCostStorage: $0.14 / GB / Month
Bandwidth: $0.10 / GB
Requests: $0.01 / 1000 Requests
Permissions controlled by Access Control List (ACL)
16Slide17
Conclusions
Simple solutionSeamlessly handles hardware failures
Purpose built to Google’s needsLarge filesHigh read throughput
Record appends
Dennis Kafura – CS5204 – Operating Systems
17Slide18
Reference
Cluster Computing and MapReduce
Lecture 3 http://www.youtube.com/watch?v=5Eib_H_zCEY
http://
courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FileSystems/GoogleFileSystem.pdf
http://
communication.howstuffworks.com/google-file-system.htmDennis Kafura – CS5204 – Operating Systems
18