1 It seems like the basic objective All a network does is make stuff available We view with suspicion networks that transform what they transport We are ok with networks that store itthat just makes it more available ID: 459476
Download Presentation The PPT/PDF document "Conceiving “Availability”" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Conceiving “Availability”
1Slide2
It seems like the basic objective
“All” a network does is make stuff available.
We view with suspicion networks that transform what they transport.
We are ok with networks that store it—that just makes it more available.
But our field seems to lack a theory of availability, nor a general definition.
2Slide3
In the early days…
Pragmatically, we knew things failed.
Links and routers would fail.
So we did dynamic routing.
Routers would drop packets.So we did retransmission. We addressed specific aspects of availability.
We had no general theory.
3Slide4
We can generalize this
Assume that a network without failures will be available.
Begs definition of “failure”.
Included deliberate adverse intervention.
To deal with failures:Must detect them.
Must localize them to a region or component.
Must reconfigure so as not to depend on that region or component.
Must trigger repair of the failed device.
4Slide5
In practice
We do ok with simple failures
Fail-stop.
Routers talk to each other—if they stop talking we assume they failed.
Localization is impliedWe do much less well with more general or Byzantine failures.
Devices that succeed at the “I’m ok” protocol but don’t actually perform their function.
5Slide6
Example: email
If a mail relay agent fails to receive mail, go back to DNS and find another agent.
However:
If it receives mail but does not forward it, no detection of failure.
No way to tell the operator of the agent that it has failed. Just wait until a human figures it out.
6Slide7
Security makes it worse
Attacks do not normally manifest as simple failures.
Comcast
attacked
BitTorrent by injecting resets.Redirection of packets due to rerouting.
Encryption can prevent disclosure.
Encryption turns attacks on integrity into a successful attack on availability.
Users care about availability. One reason they “click through” warnings.
7Slide8
Detecting failure
By the end-to-end principle, the only locus that can
in general
detect failures is the end-points.
They understand what the correct function is. But how can they localize the problem?And how can they avoid the affected region?
And can they tell anyone about the failure?
8Slide9
Is this logic true in general?
Perhaps for some architecture, this problem is mitigated by design.
Any failure that affects correct delivery can be detected by the network.
No end-node correction required.
Or there is a well-defined end-node response to all classes of failure.
Is there an exhaustive classification of failure modes?
Encryption may help.
Does not prevent failures.
Reduces the kinds
of failures.
9Slide10
Quantification
Are there aspects of availability that are amenable to quantification.
Can we talk in a meaningful way about a system that is “more” available?
Are any such measures useful to compare different architectural approaches with respect to availability?
10Slide11
Metrics of availability
Is there enough redundancy to allow reconfiguration?
Cut-sets as a metric.
But how does this apply
if the definition of success is access to a service or to content?Outage: the opposite of available.
How much went down for how long?
To what part of the Internet?
11Slide12
Definitions of availability
How do regulators define availability?
Current discussion at FCC, etc.
Builds (perhaps improperly) on definitions from phone era
Is the network “available” if your access ISP is working? How does availability and censorship relate?
How much of the Internet must be reachable for it to be “available”?
12Slide13
What can architecture do?
Can architectural features improve the components of availability?
Detect and localize faults at the network layer.
Provide means to reconfigure.
Must not be a new attack vector.Allow reporting of failed components.
To which part of the ecosystem?
Is there a relation between architecture and redundancy?
13Slide14
At every level…
Design at every level must build in detection, localization, reconfiguration, recovery.
A higher level may be designed to recover from unrecovered failures at lower layers.
Applications can be available even in the face of some lack of lower-level availability.
At what layer should availability be evaluated?If you can call 911, is the phone system available?
14