/
CS194-24 Advanced Operating Systems Structures and Implementation CS194-24 Advanced Operating Systems Structures and Implementation

CS194-24 Advanced Operating Systems Structures and Implementation - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
348 views
Uploaded On 2018-11-04

CS194-24 Advanced Operating Systems Structures and Implementation - PPT Presentation

Lecture 23 ApplicationSpecific File Systems Deep Archival Storage Security and Protection April 29 th 2013 Prof John Kubiatowicz httpinsteecsberkeleyeducs19424 Goals for Today ID: 713052

data system file access system data access file object key protection users nodes password files control photo policy user

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS194-24 Advanced Operating Systems Stru..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS194-24Advanced Operating Systems Structures and Implementation Lecture 23Application-Specific File SystemsDeep Archival StorageSecurity and Protection

April

29

th

, 2013

Prof. John

Kubiatowicz

http://inst.eecs.berkeley.edu/~cs194-24Slide2

Goals for TodayApplication-specific File SystemsDynamo, HaystackDeep Archival StorageOceanStoreSecurity and ProtectionInteractive is important! Ask Questions!

Note: Some slides and/or pictures in the following are

adapted

from Bovet, “Understanding the Linux Kernel”, 3

rd

edition, 2005Slide3

Recall: VFS Common File ModelFour primary object types for VFS:superblock object: represents a specific mounted filesysteminode object: represents a specific filedentry object: represents a directory entry file object: represents

open

file associated with

process

There

is no specific directory

object (VFS treats directories as files)May need to fit the model by faking itExample: make it look like directories are filesExample: make it look like have inodes, superblocks, etc.Slide4

Recall: Data-based Caching (Data “De-Duplication”)Use a sliding-window hash function to break files into chunksRabin Fingerprint: randomized function of data windowPick sensitivity: e.g. 48 bytes at a time, lower 13 bits = 0  2-13 probability of happening, expected chunk size 8192

Need minimum and maximum chunk sizes

Now – if data stays same, chunk stays the same

Blocks named by cryptographic hashes such as SHA-256 Slide5

Recall: Peer-to-Peer: Fully equivalent components

Peer-to-Peer has many interacting components

View system as a set of equivalent nodes

“All nodes are created equal”

Any structure on system must be self-organizing

Not based on physical characteristics, location, or ownershipSlide6

Recall: Lookup with Leaf Set (Chord)

0…

10…

110…

111…

Lookup ID

Source

Response

Assign IDs to nodes

Map hash values to node with closest ID

Leaf set is successors and predecessors

All that’s needed for correctness

Routing table matches successively longer prefixes

Allows efficient lookups

Data Replication:

On leaf setSlide7

Advantages/Disadvantages of Consistent HashingAdvantages:Automatically adapts data partitioning as node membership changesNode given random key value automatically “knows” how to participate in routing and data managementRandom key assignment gives approximation to load balanceDisadvantagesUneven distribution of key storage natural consequence of random node names

 Leads to uneven query load

Key management can be expensive when nodes transiently fail

Assuming that we immediately respond to node failure, must transfer state to new node set

Then when node returns, must transfer state back

Can be a significant cost if transient failure common

Disadvantages of “Scalable” routing algorithmsMore than one hop to find data  O(log N) or worse

Number of hops unpredictable and almost always > 1Node failure, randomness, etcSlide8

Dynamo AssumptionsQuery Model – Simple interface exposed to application levelGet(), Put()No Delete()No transactions, no complex queriesAtomicity, Consistency, Isolation, DurabilityOperations either succeed or fail, no middle groundSystem will be eventually consistent, no sacrifice of availability to assure consistency

Conflicts can occur while updates propagate through system

System can still function while entire sections of network are down

Efficiency – Measure system by the 99.9th percentile

Important with millions of users, 0.1% can be in the 10,000s

Non Hostile Environment

No need to authenticate query, no malicious queriesBehind web services, not in front of themSlide9

Service Level Agreements (SLA)Application can deliver its functionality in a bounded time: Every dependency in the platform needs to deliver its functionality with even tighter bounds.Example: service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per secondContrast to services which focus on mean response time

Service-oriented architecture of

Amazon’s

platformSlide10

ReplicationEach data item is replicated at N hosts“preference list”: The list of

nodes responsible

for storing

a

particular

key

Successive nodes not guaranteed to be on different physical nodesThus preference list includes physically distinct nodesSloppy Quorum

R (or W) is the minimum number of nodes that must participate in a successful read (or write) operation.Setting R + W > N yields a quorum-like system.

Latency

of a get (or put)

is

dictated by the slowest of the R (or W) replicas. For this reason, R and W are usually configured to be less than N, to provide better latency.

Replicas synchronized via anti-entropy protocol

Use of

Merkle

tree for each unique range

Nodes exchange root of trees for shared key range Slide11

AdministriviaGet moving on Lab 4Will require you to read a bunch of code to digest the VFS layerDesign due this Thursday!So that Palmer can have design reviews on FridayFocus on behavioral aspectsMounting, File operations, EtcDon’t forget final Lecture during RRR

Monday 5/6

Send me final topicsSlide12

Data VersioningA put() call may return to its caller before the update has been applied at all the replicasA get() call may return many versions of the same object.Challenge: an object having distinct version sub-histories, which the system will need to reconcile in the future.Solution: uses vector clocks in order to capture causality between different versions of the same objectA vector clock is a list of (node, counter) pairs

Every version of every object is associated with one vector

clock

If the counters on the first object’s clock are less-than-or-equal to all of the nodes in the second clock, then the first is an ancestor of the second and can be forgotten.Slide13

Vector clock exampleSlide14

Conflicts (multiversion data)Client must resolve conflictsOnly resolve conflicts on reads Different resolution options:Use vector clocks to decide based on historyUse timestamps to pick latest versionExamples given in paper:

For shopping cart, simply merge different versions

For customer’s session information, use latest version

Stale versions returned on reads are updated (“read repair”)

Vary N, R, W to match requirements of applications

High performance reads: R=1, W=N

Fast writes with possible inconsistency: W=1Common configuration: N=3, R=2, W=2When do branches occur?Branches uncommon: 0.06% of requests saw > 1 version over 24 hours

Divergence occurs because of high write rate (more coordinators), not necessarily because of failureSlide15

Haystack File SystemDoes it ever make sense to adapt a file system to a particular usage pattern?PerhapsGood example: Facebook’s “Haystack” filesystemSpecific application (Photo Sharing)Large files!, Many files!260 Billion images, 20

PetaBytes

(10

15

bytes!)

One billion new photos a week (60

TeraBytes)Presence of Content Delivery Network (CDN)Distributed caching and distribution networkFacebook web servers

return special URLs that encode requests to CDNPay for service by bandwidthSpecific usage patterns:New photos accessed a lot (caching well)Old photos accessed little,

but likely

to be requested

at

any

time

 NEEDLES

Number of photos

requested in day Slide16

Old Solution: NFSIssues with this design?Long Tail  Caching does notwork for most photosEvery access to back end storagemust be fast

without benefit of

caching!

Linear Directory scheme

works

badly

for many photos/directoryMany disk operations to find

even a single photoDirectory’s block map too big to cache in memory“Fixed” by reducing directory size, however still not greatMeta-Data (FFS) requires ≥ 3 disk accesses per lookupCaching all iNodes in memory might help, but iNodes

are big

Fundamentally, Photo Storage different from other storage:

Normal file systems fine for developers, databases,

etcSlide17

New Solution: HaystackFinding a needle (old photo) in HaystackDifferentiate between oldand new photosHow? By looking at “Writeable”vs “Read-only” volumesNew Photos go to Writeable

volumes

Directory: Help locate photos

Name (URL) of photo has

embedded volume and photo ID

Let CDN or Haystack Cache

Serve new photosrather than forwarding them to Writeable volumesHaystack Store: Multiple “Physical Volumes”

Physical volume is large file (100 GB) which stores millions of photosData Accessed by Volume ID with offset into fileSince Physical Volumes are large files, use XFS which is optimized for large filesSlide18

Haystack DetailsEach physical volume is stored as single file in XFSSuperblock: General information about the volumeEach photo (a “needle”) stored by appending to fileNeedles stored sequentially in fileNaming: [Volume ID, Key, Alternate Key, Cookie]Cookie: random value to avoid guessing attacksKey: Unique 64-bit photo ID

Alternate Key: four different sizes, ‘n’, ‘a’, ‘s’, ‘t’

Deleted Needle Simply marked as “deleted”

Overwritten Needle – new version appended at endSlide19

Haystack Details (Con’t)Replication for reliability and performance:Multiple physical volumes combined into logical volumeFactor of 3Four different sizes

Thumbnails, Small, Medium, Large

Lookup

User requests Webpage

Webserver returns URL of form:

http://<CDN>/<Cache>/<Machine id>/<Logical

volume,photo>Possibly reference cache only if old imageCDN will strip off CDN reference if missing, forward to cacheCache will strip off cache reference and forward to StoreIn-memory

index on Store for each volume map: [Key, Alternate Key]  OffsetSlide20

What about Protection?Start by asking some high-level questions…What do we expect of our systems?Won’t leak our informationWon’t lose our informationWill always work when we need themWon’t launch attacks against other peopleHow can we prevent systems from misbehaving?

Never connect them to the network?

Always authenticate users?

Never use them?

Protection:

use of one or more mechanisms for controlling the access of programs, processes, or users to resources

Page Table MechanismFile Access MechanismOn-disk encryptionCan use lots of Protection but still have an insecure system!Bugs, back doors, viruses, poorly defined policy, inside man

Denial of service, …Slide21

Protection vs SecuritySecurity is a very complex topic: see, i.e. CS161Security is about Policy, i.e. what human-centered properties do we want from our systemUsually with reference to an attack modelSecurity is achieved through a series of

Mechanisms

, i.e. individual elements of the system combined together to achieve a security policy

Security:

use of protection mechanisms to prevent misuse of resources

Misuse defined with respect to policy

E.g.: prevent exposure of certain sensitive information

E.g.: prevent unauthorized modification/deletion of dataRequires consideration of the external environment within which the system operatesMost well-constructed system cannot protect information if user accidentally reveals passwordSlide22

Preventing MisuseTypes of Misuse:Accidental:If I delete shell, can’t log in to fix it!

Could make it more difficult by asking: “do you really want to delete the shell?”

Intentional:

Some high school brat who can’t get a date, so instead he transfers $3 billion from B to A.

Doesn’t help to ask if they want to do it (of course!)

Three Pieces to Security

Authentication:

who the user actually isAuthorization: who is allowed to do what

Enforcement:

make sure people do only what they are supposed to do

Loopholes in any carefully constructed system:

Log in as

superuser

and you’ve circumvented authentication

Log in as self and can do anything with your resources; for instance: run program that erases all of your files

Can you trust software to correctly enforce Authentication and Authorization?????Slide23

Authentication: Identifying Users

How to identify users to the system?

Passwords

Shared secret between two parties

Since only user knows password, someone types

correct password  must be user typing it

Very common technique

Smart Cards

Electronics embedded in card capable of

providing long passwords or satisfying

challenge  response queries

May have display to allow reading of password

Or can be plugged in directly; several

credit cards now in this category

Biometrics

Use of one or more intrinsic physical or

behavioral traits to identify someone

Examples: fingerprint reader,

palm reader, retinal scan

Becoming quite a bit more common

What else?

Consider the “Swarm” and “Un-pad” viewsSlide24

Timing Attacks: Tenex Password CheckingTenex – early 70’s, BBNMost popular system at universities before UNIXThought to be very secure, gave “red team” all the source code and documentation (want code to be publicly available, as in UNIX)In 48 hours, they figured out how to get every password in the systemHere’s the code for the password check:

for (i = 0; i < 8; i++)

if (userPasswd[i] != realPasswd[i])

go to error

How many combinations of passwords?

256

8?Wrong!Slide25

Defeating Password CheckingTenex used VM, and it interacts badly with the above codeKey idea: force page faults at inopportune times to break passwords quicklyArrange 1st

char in string to be last char in

pg

, rest on next

pg

Then arrange for

pg with 1

st char to be in memory, and rest to be on disk (e.g., ref lots of other pgs, then ref 1st page)

a|aaaaaa

|

page in memory| page on disk

Time password check to determine if first character is correct!

If fast, 1

st

char is wrong

If slow, 1

st

char is right,

pg

fault, one of the others wrong

So try all first characters, until one is slow

Repeat with first two characters in memory, rest on disk Only 256 * 8 attempts to crack passwords

Fix is easy, don’t stop until you look at all the charactersSlide26

How do we decide who is authorized

to do actions in the system?

Access Control Matrix:

contains

all permissions in the system

Resources across top

Files, Devices, etc…

Domains in columnsA domain might be a user or a group of permissions

E.g. above: User D

3

can read F

2

or execute F

3

In practice, table would be huge and sparse!

Two approaches to implementation

Access Control Lists: store permissions with each object

Still might be lots of users!

UNIX limits each file to: r,w,x for owner, group, world

More recent systems allow definition of groups of users and permissions for each group

Capability List: each process tracks objects has permission to touch

Popular in the past, idea out of favor today

Consider page table: Each process has list of pages it has access to, not each page has list of processes …Recall: Authorization: Who Can Do What?Slide27

Authorization ContinuedPrinciple of least privilege: programs, users, and systems should get only enough privileges to perform their tasksVery hard to do in practice

How do you figure out what the minimum set of privileges is needed to run your programs?

People often run at higher privilege then necessary

Such as the “administrator” privilege under windows

One solution: Signed Software

Only use software from sources that you trust, thereby dealing with the problem by means of authentication

Fine for big, established firms such as Microsoft, since they can make their signing keys well known and people trust them

Actually, not always fine: recently, one of Microsoft’s signing keys was compromised, leading to malicious software that looked validWhat about new startups?

Who “validates” them?

How easy is it to fool them?Slide28

Mandatory Access Control (MAC)Mandatory Access Control (MAC)“A Type of Access control by which the operating system constraints the ability of a subject or initiator to access or generally perform some sort of operation on an object or target

.”

From Wikipedia

Subject: a process or thread

Object: files, directories, TCP/UDP ports,

etc

Security policy is centrally controlled by a security policy administrator: users not allowed to operate outside the policyExamples: SELinux, HiStar, etc.

Contrast: Discretionary Access Control (DAC)Access restricted based on the identity of subjects and/or groups to which they blongControls are discretionary – a subject with a certain access permission is capable of passing that permission on to any other subjectStandard UNIX modelSlide29

Data Centric Access Control (DCAC?)Problem with many current models:If you break into OS  data is compromisedIn reality, it is the data that matters – hardware is somewhat irrelevant (and ubiquitous)Data-Centric Access Control (DCAC)I just made this term up, but you get the idea

Protect data at all costs, assume that software might be compromised

Requires encryption and sandboxing techniques

If hardware (or virtual machine) has the right cryptographic keys, then data is released

All of the previous authorization and enforcement mechanisms reduce to key distribution and protection

Never let decrypted data or keys outside sandbox

Examples: Use of TPM, virtual machine mechanismsSlide30

EnforcementEnforcer checks passwords, ACLs, etcMakes sure the only authorized actions take place

Bugs in

enforcer

things

for malicious users to exploit

Normally, in UNIX,

superuser can do anything

Because of coarse-grained access control, lots of stuff has to run as superuser in order to workIf there is a bug in any one of these programs, you lose!Paradox

Bullet-proof enforcer

Only known way is to make enforcer as small as possible

Easier to make correct, but simple-minded protection model

Fancy protection

Tries to adhere to principle of least privilege

Really hard to get right

Same argument for Java or C++: What do you make private

vs

public?

Hard to make sure that code is usable but only necessary modules are public

Pick something in middle? Get bugs and weak protection!Slide31

SummaryPeer-to-Peer: Use of 100s or 1000s of nodes to keep higher performance or greater availabilityMay need to relax consistency for better performanceApplication-Specific File Systems (e.g. Haystack):Optimize system for particular usage patternSecurity: use of protection mechanisms to prevent misuse of

resources

Represents Human-Centered Policy as opposed to mechanism

Three

Pieces to Security

Authentication: who the user actually is

Authorization: who is allowed to do what

Enforcement: make sure people do only what they are supposed to doPrinciple of least privilege: programs, users, and systems should get only enough privileges to perform their tasks