Network Fundamentals Lecture 10 Inter Domain Routing Its all about the Money Revised 2 42014 Network Layer Control Plane 2 Function Set up routes between networks Key challenges ID: 348947
Download Presentation The PPT/PDF document "CS 4700 / CS 5700" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CS 4700 / CS 5700Network Fundamentals
Lecture 10: Inter Domain Routing(It’s all about the Money)
Revised
2
/4/2014Slide2
Network Layer, Control Plane
2Function:Set up routes between networksKey challenges:Implementing provider policiesCreating stable paths
Application
Presentation
Session
Transport
Network
Data Link
Physical
BGP
RIP
OSPF
Control Plane
Data PlaneSlide3
BGP Basics
Stable Paths ProblemBGP in the Real WorldDebugging BGP Path ProblemsOutline3Slide4
ASs, Revisited
4AS-1
AS-2
AS-3
Interior
Routers
BGP
RoutersSlide5
AS Numbers
Each AS identified by an ASN number16-bit values (latest protocol supports 32-bit ones)64512 – 65535 are reservedCurrently, there are > 20000 ASNsAT&T: 5074, 6341, 7018, …Sprint: 1239, 1240, 6211, 6242, …Northeastern: 156North America ASs ftp://ftp.arin.net/info/asn.txt5Slide6
Inter-Domain Routing
6Global connectivity is at stake!Thus, all ASs must use the same protocolContrast with intra-domain routingWhat are the requirements?ScalabilityFlexibility in choosing routesCostRouting around failuresQuestion: link state or distance vector?Trick question: BGP is a path vector protocolSlide7
BGP
7Border Gateway ProtocolDe facto inter-domain protocol of the Internet Policy based routing protocolUses a Bellman-Ford path vector protocolRelatively simple protocol, but…Complex, manual configurationEntire world sees advertisementsErrors can screw up traffic globallyPolicies driven by economicsHow much $$$ does it cost to route along a given path?Not by performance (e.g. shortest paths)Slide8
BGP Relationships
8Customer
Provider
Customer pays
provider
Peer 1
Peer 2
Peer 3
Peers do
not pay each other
Peer 2 has no incentive to route
1 3 CustomerCustomerProviderSlide9
Tier-1 ISP Peering
9AT&TCenturylinkXO CommunicationsInteliquent
Verizon Business
Sprint
Level 3Slide10Slide11
Peering WarsReduce upstream costs
Improve end-to-end performanceMay be the only way to connect to parts of the InternetYou would rather have customersPeers are often competitorsPeering agreements require periodic renegotiation11PeerDon’t Peer
Peering struggles in the ISP world are extremely contentions, agreements are usually confidentialSlide12
Two Types of BGP Neighbors
12
IGP
Exterior routers also speak IGP
e
BGP
e
BGP
i
BGP
i
BGPSlide13
Full iBGP Meshes
13Question: why do we need iBGP?OSPF does not include BGP policy infoPrevents routing loops within the ASiBGP updates do not trigger announcements
eBGP
i
BGPSlide14
Path Vector Protocol
AS-path: sequence of ASs a route traversesLike distance vector, plus additional information
Used for loop detection and to apply policy
Default choice: route with fewest # of ASs
110.10.0.0/16
AS 1
AS 2
130.10.0.0/16AS 3120.10.0.0/16AS 4
AS 514
120.10.0.0/16: AS 2 AS 3 AS 4130.10.0.0/16: AS 2 AS 3110.10.0.0/16: AS 2 AS 5Slide15
BGP Operations (Simplified)
15Establish session on TCP port 179Exchange active routesExchange incremental updates
AS-1
AS-2
BGP SessionSlide16
Four Types of BGP Messages
Open: Establish a peering session. Keep Alive: Handshake at regular intervals. Notification: Shuts down a peering session. Update: Announce new routes or withdraw previously announced routes. announcement = IP prefix + attributes values16Slide17
CS 4700 / CS 5700
ECON 4700/5700Network Fundamentals
Lecture 10: Inter Domain Routing
(It’s all about the Money)Slide18
BGP Attributes
Attributes used to select “best” pathLocalPrefLocal preference policy to choose most preferred routeOverrides default fewest AS behaviorMulti-exit Discriminator (MED)Specifies path for external traffic destined for an internal networkChooses peering point for your networkImport RulesWhat route advertisements do I accept?Export RulesWhich routes do I forward to whom?18Slide19
Route Selection Summary
19Highest Local PreferenceShortest AS Path
Lowest MED
Lowest IGP C
ost to
BGP
Egress
Lowest Router IDTraffic engineering Enforce relationshipsWhen all else fails,break ties
19Slide20
Shortest AS Path != Shortest Path
20
Source
Destination
?
?
4 hops
4 ASs
9
hops
2
ASsSlide21
Hot Potato Routing
21Destination
Source
3
hops total,
3 hops cost
?
?
5
hops total, 2 hops costSlide22
22
Importing Routes
From Provider
From Peer
From Peer
From Customer
ISP RoutesSlide23
23Exporting Routes
To Customer
To Peer
To Peer
To Provider
Customers get all routes
Customer and ISP routes only
$$$ generating routesSlide24
Modeling BGP
24AS relationshipsCustomer/providerPeerSibling, IXPGao-Rexford modelAS prefers to use customer path, then peer, then providerFollow the money!Valley-free routingHierarchical view of routing (incorrect but frequently used)P-PC-P
P-P
P-C
P-P
P-CSlide25
AS Relationships: It’s Complicated
25GR Model is strictly hierarchicalEach AS pair has exactly one relationshipEach relationship is the same for all prefixesIn practice it’s much more complicatedRise of widespread peeringRegional, per-prefix peeringsTier-1’s being shoved out by “hypergiants”IXPs dominating traffic volumeModeling is very hard, very prone to errorHuge potential impact for understanding Internet behaviorSlide26
Other BGP Attributes
26AS_SETInstead of a single AS appearing at a slot, it’s a set of AsesWhy?CommunitiesArbitrary number that is used by neighbors for routing decisionsExport this route only in EuropeDo not export to your peersUsually stripped after first interdomain hopWhy?PrependingLengthening the route by adding multiple instances of ASNWhy?Slide27
Outline27
BGP BasicsStable Paths ProblemBGP in the Real WorldDebugging BGP Path ProblemsSlide28
28
What Problem is BGP Solving?28Underlying ProblemDistributed SolutionShortest PathsRIP, OSPF, IS-IS, etc.???BGPKnowing ??? can:Aid in the analysis of BGP policyAid in the design of BGP extensionsHelp explain BGP routing anomaliesGive us a deeper understanding of the protocolSlide29
An instance of the SPP:
Graph of nodes and edgesNode 0, called the originA set of permitted paths from each node to the originEach set contains the null pathEach set of paths is rankedNull path is always least preferred
2
29
The Stable Paths Problem
0
1
2435
2 1 0
2 05 2 1 0
4 2 04 3 03 0
1 3 01 0Slide30
A solution is an assignment of permitted paths to each node such that:
Node u’s path is either null or uwP, where path wP is assigned to node w
and edge
u
w
existsEach node is assigned the higest ranked path that is consistent with their neighbors
230A Solution to the SPP012435
2 1 0
2 05 2 1 0
4 2 04 3 0
3 01 3 01 0
Solutions need not use the shortest paths, or form a spanning treeSlide31
2
31Simple SPP Example0124
3
1 0
1
3 0
2 0
2
1 0
3 04 2 04
3 0
4 3 04 2 0
Each node gets its preferred route
Totally stable topologySlide32
2
32Good Gadget0124
3
1 3 0
1 0
2 1 0
2 0
3 0
4 3 0
4
2 0
Not every node gets preferred route
Topology is still stableOnly one stable configurationNo matter which router chooses first!Slide33
33
SPP May Have Multiple Solutions012
1 2 0
1 0
2 1
0
2 0
0
12
1 2 01 0
2 1 02 0012
1 2 01 0
2 1 02 0Slide34
2
34Bad Gadget0124
3
1 3 0
1 0
2 1 0
2 0
3 4 2 0
3 0
4 2 0
4 3 0
That was only one round of oscillation!
This keeps going, infinitely
Problem stems from:
Local (not global) decisionsAbility of one node to improve its path selectionSlide35
SPP Explains BGP Divergence
35BGP is not guaranteed to converge to stable routingPolicy inconsistencies may lead to “livelock”Protocol oscillationMustConvergeMustDivergeSolvable
Can Diverge
Good Gadgets
Bad Gadgets
Naughty GadgetsSlide36
2
36Beware of Backup Policies012
4
3
1 3 0
1 0
2 1 0
2 0
3 4 2 0
3 0
4 04 2 04 3 0
BGP is not robust
It may not recover from link failureSlide37
37
BGP is Precarious63
4
5
3 1 0
3 1 2 0
5 3 1 0
5 6 3 1 2 0
5 3 1 2 001
21 2 01 0
2 1 02 0
4 3 1 04 5 3 1 2 04 3 1 2 0
6 3 1 06 4 3 1 2 06 3 1 2 0
If node 1 uses path 1 0, this is solvable
No longer stableSlide38
Can BGP Be Fixed?
Unfortunately, SPP is NP-completeStatic ApproachInter-AScoordinationAutomated Analysis of Routing Policies(This is very hard)
Dynamic Approach
Extend BGP
to
detect
and
suppresspolicy-based oscillations?
These approaches are complementary38Possible SolutionsSlide39
Outline39
BGP BasicsStable Paths ProblemBGP in the Real WorldDebugging BGP Path ProblemsSlide40
MotivationRouting reliability/fault-tolerance on small time scales (minutes) not previously a priority
Transaction oriented and interactive applications (e.g. Internet Telephony) will require higher levels of end-to-end network reliabilityHow well does the Internet routing infrastructure tolerate faults?40Slide41
Conventional WisdomInternet routing is robust under faults
Supports path re-routingPath restoration on the order of secondsBGP has good convergence propertiesDoes not exhibit looping/bouncing problems of RIPInternet fail-over will improve with faster routers and faster linksMore redundant connections (multi-homing) will always improve fault-tolerance41Slide42
Delayed Routing ConvergenceC
onventional wisdom about routing convergence is not accurate Measurement of BGP convergence in the InternetAnalysis/intuition behind delayed BGP routing convergenceModifications to BGP implementations which would improve convergence times 42Slide43
Open Question
After a fault in a path to multi-homed site, how long does it take for majority of Internet routers to fail-over to secondary path?CustomerPrimary ISPBackup ISP43
Route Withdrawn
Traffic
Routing table convergence
Stable end-to-end pathsSlide44
Bad NewsWith unconstrained policies:
DivergencePossible create unsatisfiable policiesNP-complete to identify these policiesHappening today?With constrained policies (e.g. shortest path first)Transient oscillationsBGP usually convergesIt may take a very long time…BGP Beacons: focuses on constrained policies44Slide45
16 Month Study of ConvergenceInstrument the Internet
Inject BGP faults (announcements/withdrawals) of varied prefix and AS path length into topologically and geographically diverse ISP peering sessionsMonitor impact faults throughRecording BGP peering sessions with 20 tier1/tier2 ISPsActive ICMP measurements (512 byte/second to 100 random web sites)Wait two years (and 250,000 faults)45Slide46
46
Measurement ArchitectureResearchers pretending to be an AS
Researchers pretending to be an ASSlide47
Announcement ScenariosTup
– a new route is advertisedTdown – A route is withdrawni.e. single-homed failureTshort – Advertise a shorter/better AS pathi.e. primary path repairedTlong – Advertise a longer/worse AS pathi.e. primary path fails47Slide48
Major Convergence ResultsRouting convergence requires an order of magnitude longer than expected
10s of minutesRoutes converge more quickly following Tup/Repair than Tdown/Failure eventsBad news travels more slowlyWithdrawals (Tdown) generate several more announcements than new routes (Tup)48Slide49
Example
BGP log of updates from AS2117 for route via AS2129One withdrawal triggers 6 announcements and one withdrawal from 2117Increasing AS path length until final withdrawal49Slide50
Why So Many Announcements?
50
Route Fails: AS 2129
Announce: 5696 2129
Announce: 1 5696 2129
Announce: 2041 3508 2129
Announce: 1
2041 3508 2129Route Withdrawn: 2129AS 2129
AS 5696AS 1AS 2117
AS 2041AS 3508
Events from AS 2177Slide51
How Many Announcements Does it Take For an AS to Withdraw a Route?
Answer: up to 1951Slide52
Short->Long Fail-Over
New RouteLong->Short Fail-over
Failure
Less than half of
Tdown
events converge within two minutes
Tup
/Tshort and Tdown/Tlong form equivalence classesLong tailed distribution (up to 15 minutes)BGP Routing Table Convergence TimesSlide53
Failures, Fail-overs and RepairsBad news does not travel fast…
Repairs (Tup) exhibit similar convergence as long-short AS path fail-overFailures (Tdown) and short-long fail-overs (e.g. primary to secondary path) also similarSlower than Tup (e.g. a repair)80% take longer than two minutesFail-over times degrade the greater the degree of multi-homing53Slide54
Intuition for Delayed Convergence
There exists possible ordering of messages such that BGP will explore ALL possible AS paths of ALL possible lengthsBGP is O(N!), where N number of default-free BGP routers in a complete graph with default policy54Slide55
Impact of Delayed ConvergenceWhy do we care about routing table convergence?
It impacts end-to-end connectivity for Internet pathsICMP experiment resultsLoss of connectivity, packet loss, latency, and packet re-ordering for an average of 3-5 minutes after a faultWhy?Routers drop packets when next hop is unknownPath switching spikes latency/delayMulti-pathing causes reordering55Slide56
In real life …Discussed worst case BGP behavior
In practice, BGP policy prevents worst case from happeningBGP timers also provide synchronization and limits possible orderings of messages56Slide57
Outline
57BGP BasicsStable Paths ProblemBGP in the Real WorldDebugging BGP Path ProblemsSlide58
Control plane vs. Data PlaneControl:
Make sure that if there’s a path available, data is forwarded over itBGP sets up such paths at the AS-levelData: For a destination, send packet to most-preferred next hopRouters forward data along IP pathsHow does the control plane know if a data path is broken?Direct-neighbor connectivityWhat if the outage isn’t in the direct neighbor?58Slide59
Why Network Reliability Remains Hard
VisibilityIP provides no built-in monitoringEconomic disincentives to share information publiclyControlRouting protocols optimize for policy, not reliabilityOutage affecting your traffic may be caused by distant networkDetecting, isolating and repairing network problems for Internet paths remains largely a slow, manual process Slide60
Improving Internet Availability
New Internet designMonitoring everywhere in the networkVisibility into all available routesAny operator can impact routes affecting her trafficChallengesWhat should we monitor?What do we do with additional visibility?
How to use additional control?Slide61
A Practical Approach
We can do this already in today’s InternetCrowdsourcing monitoringUse existing protocols/systems in unintended waysAllows us to address problems todayAlso informs future Internet designsSlide62
Operators Struggle to Locate Failures
Mailing List User 11 Home router2 Verizon in Baltimore3 Verizon in Philly4 Alter.net in DC5 Level3 in DC6 * * *7 * * *
Mailing List User 2
1 Home router
2 Verizon in DC
3
Alter.net
in DC4 Level3 in DC5 Level3 in Chicago6 Level3 in Denver7 * * *8 * * *“Traffic attempting to pass through Level3’s network in the Washington, DC area is getting lost in the abyss. Here's a tracefrom Verizon residential to Level3.” Outages mailing list, Dec. 2010Slide63
Reasons for Long-Lasting Outages
Long-term outages are:Repaired over slow, human timescalesNot well understoodCaused by routers advertising paths that do not workE.g., corrupted memory on line card causes black holeE.g., bad cross-layer interactions cause failed MPLS tunnelSlide64
Key Challenges for Internet Repair
Lack of visibilityWhere is the outage?Which networks are (un)affected?Who caused the outage?Lack of controlReverse paths determined by possibly distant ASesLimited means to affect such pathsSlide65
Goals and Approach
Improve availability through:Failure isolation and remediationIdentifying the AS(es) responsible for path changesKey techniques: Visibility Active measurements from distributed vantage points Passive collection of BGP feeds Control
On-demand BGP prepending to route around outages
Active BGP measurements to identify alternative pathsSlide66
LIFEGUARD: Locating Internet Failures
E
ffectively and
G
enerating Usable
Alternate Routes Dynamically66Locate the ISP / link causing the problemBuilding blocksExampleDescription of technique
Suggest that other ISPs reroute around the problemSlide67
Building blocks for failure isolation
LIFEGUARD can use:Ping to test reachabilityTraceroute to measure forward pathDistributed vantage points (VPs)PlanetLab for our experimentsSome can source spoofReverse traceroute
to measure reverse path (NSDI
’
10
)
I’ll teach you about this during the security lecture
Atlas of historical forward/reverse paths between VPs and targets67Slide68
Historical atlas enables reasoning about changes
Traceroute yields only path from GMU to targetReverse traceroute reveals path asymmetry
68
How does
LIFEGUARD
locate a failure?
Before outage:
Historical
CurrentSlide69
69
Forward path works
Problem with ZSTTK?
Ping?
Fr:
VP
Ping!
To:
VP
During outage:
Historical
Current
How does
LIFEGUARD
locate a failure?Slide70
70
Forward path works
NTT:Ping
?
Fr:GMU
GMU:Ping
!
Fr:NTT
During outage:
Historical
Current
How does
LIFEGUARD
locate a failure?Slide71
71
Forward path works
Rostelcom
is not forwarding traffic towards GMU
Rostele
:
Ping?
Fr:GMU
During outage:
Historical
Current
How does
LIFEGUARD
locate a failure?Slide72
How LIFEGUARD Locates Failures
LIFEGUARD:Maintains background historical atlasIsolates direction of failure, measures working directionTests historical paths in failing direction in order toprune candidate failure locationsLocates failure as being at the horizon of reachability
72Slide73
Our Approach and Outline
73LIFEGUARD: Locating Internet F
ailures
E
ffectively and
Generating U
sable Alternate Routes DynamicallyLocate the ISP / link causing the problemSuggest that other ISPs reroute around the problemWhat would we like to add to BGP to enable this?What can we deploy today, using only available protocolsand router support?Slide74
Our Goal for Failure Avoidance
Enable content / service providers to repairpersistent routing problems affecting them,regardless of which ISP is causing themSettingAssume we can locate problemAssume we are multi-homed / have multiple data centersAssume we speak BGPWe use TransitPortal
to speak BGP to the real Internet:
5 US universities as providersSlide75
Self-Repair of Forward PathsSlide76
A Mechanism for Failure Avoidance
Forward path: Choose route that avoids ISP or ISP-ISP linkReverse path: Want others to choose paths to my prefix P that avoid ISP or ISP-ISP link XWant a BGP announcement AVOID(X,P):Any ISP with a route to P that avoids X uses such a route
Any ISP not using
X
need only pass on the announcement
76Slide77
AVOID(L3,WS)
AVOID(L3,WS)
AVOID(L3,WS)
77
Ideal Self-Repair of Reverse PathsSlide78
Do paths exist that AVOID problem?
LIFEGUARD repairs outages by instructing others to avoid particular routes.Q: Do alternative routes exist?A: Alternate policy-compliant paths exist in 90% of simulated AVOID(X,P) announcements.Simulated 10 million AVOIDs on actual measured routes.
78Slide79
WS
ATT
→
WS
UW
→
L3
→
ATT → WS
Sprint → Qwest → WSAISP → Qwest → WS
L3 → ATT → WSQwest → WS
79
Practical Self-Repair of Reverse PathsSlide80
WS
ATT
→
WS
UW
→
L3
→
ATT → WS
Sprint → Qwest → WSAISP → Qwest → WS
?Qwest → WS
UW
→ Sprint → Qwest → WS → L3→ WSSprint → Qwest → WS → L3
→ WS
AISP
→
Qwest → WS →
L3→ WS
ATT
→
WS
→
L3
→
WS
WS
→
L3
→
WS
Qwest
→
WS
→
L3
→
WS
AVOID(L3,WS)
80
L3
→
ATT
→
WS
BGP loop prevention encourages switch to working path.
Practical Self-Repair of Reverse PathsSlide81
Other results
Results from real poisoningsPoisoning in the wild / poisoning anomaliesCase study of restoring connectivityMaking poisoning flexibleMonitoring broken path while it is disabledAllowing ISPs w/o alternatives to use disabled routeLIFEGUARD’s scalabilityOverhead and speed of failure location
Router update load if many ISPs deploy our approach
Alternatives to poisoning
Compatibility with secure routing (BGPSEC, etc.)
Comparing to other route control mechanismsSlide82
Can poisoning approximate AVOID effects?
LIFEGUARD’s poisoning repairs outages by disabling routes to induce route exploration.Q: Does poisoning disrupt working routes?A: No. As I will describe: Under certain circumstances, we can disable a link without disabling the full ISP. We can speed BGP convergence by carefully crafting announcements.Slide83
What if some routes in an ISP still work?
83
We only want
C3
to change its route, to avoid
A-B2Slide84
What if some routes in an ISP still work?
84
We only want
C3
to change its route, to avoid
A-B2
Forward direction is easy: choose a different routeSlide85
What if some routes in an ISP still work?
85
We only want
C3
to change its route, to avoid
A-B2
Forward direction is easy: choose a different routeSlide86
What if some routes in an ISP still work?
86
We only want
C3
to change its route, to avoid
A-B2
Poisoning seems blunt, disabling an entire ISPSlide87
What if some routes in an ISP still work?
87
We only want
C3
to change its route, to avoid
A-B2
Poisoning seems blunt, disabling an entire ISPSlide88
What if some routes in an ISP still work?
88
We only want
C3
to change its route, to avoid
A-B2
Poisoning seems blunt, disabling an entire ISPSlide89
What if some routes in an ISP still work?
89
We only want
C3
to change its route, to avoid
A-B2
Poisoning seems blunt, disabling an entire ISP
Selective advertising via just D1 is also bluntSlide90
What if some routes in an ISP still work?
90
We only want
C3
to change its route, to avoid
A-B2
Poisoning seems blunt, disabling an entire ISP
Selective advertising via just D1 is also bluntSlide91
What if some routes in an ISP still work?
91
We only want
C3
to change its route, to avoid
A-B2
Poisoning seems blunt, disabling an entire ISP
If D1 and D2 (transitively) connect to different PoPs of A, selectively poison via D2 and not D1Slide92
What if some routes in an ISP still work?
92We only want C3 to change its route, to avoid A-B2Poisoning seems blunt, disabling an entire ISPIf D1
and
D2
(transitively) connect to different PoPs of
A, selectively poison via D2 and not
D1Slide93
93
What if some routes in an ISP still work?
We only want
C3
to change its route, to avoid
A-B2
Poisoning seems blunt, disabling an entire ISP
If
D1 and D2 (transitively) connect to different PoPs of A, selectively poison via D2 and not D1Slide94
Can poisoning approximate AVOID effects?
94L
IFE
G
UARD
’
s poisoning repairs outages by disabling routes to induce route exploration.Q: Does poisoning disrupt working routes?A: No. As I will describe:
“Selective poisoning” can avoid 73% of links without disabling entire AS.Real-world results from 5 provider BGP-Mux testbed We can speed BGP convergence by carefully crafting announcements.Slide95
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
95
AVOID(X,P)Slide96
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
96
AVOID(X,P)Slide97
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
97
AVOID(X,P)Slide98
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
98
AVOID(X,P)Slide99
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
99
AVOID(X,P)Slide100
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
100
AVOID(X,P)Slide101
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
101
AVOID(X,P)Slide102
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPsPath exploration causes transient loss
102
AVOID(X,P)Slide103
Prepend to Reduce Path Exploration
Most routing decisions based on:(1) next hop ISP(2) path lengthKeep these fixed to speed convergencePrepending prepares ISPs for later poison
103
AVOID(X,P)Slide104
Prepend to Reduce Path Exploration
Most routing decisions based on:(1) next hop ISP(2) path lengthKeep these fixed to speed convergencePrepending prepares ISPs for later poison
104
AVOID(X,P)Slide105
Prepend to Reduce Path Exploration
Most routing decisions based on:(1) next hop ISP(2) path lengthKeep these fixed to speed convergencePrepending prepares ISPs for later poison
105
AVOID(X,P)Slide106
Prepend to Reduce Path Exploration
Most routing decisions based on:(1) next hop ISP(2) path lengthKeep these fixed to speed convergencePrepending prepares ISPs for later poison
106
AVOID(X,P)Slide107
Prepend to Reduce Path Exploration
Most routing decisions based on:(1) next hop ISP(2) path lengthKeep these fixed to speed convergencePrepending prepares ISPs for later poison
107
AVOID(X,P)Slide108
Prepending Speeds Convergence
With no prepend, only 65% of unaffected ISPs converge instantlyWith prepending, 95% of unaffected ISPs re-converge instantly, 98%<1/2 min.Also speeds convergence to new paths for affected peersSlide109
LIFEGUARD Summary
We increasingly depend on the Internet, but availability lagsMuch of Internet unavailability due to long-lasting outagesLIFEGUARD: Let edge networks reroute around failuresLocation challenge: Find problem, given unidirectional failures and tools that depend on connectivityUse reverse traceroute, isolate directions, use historical viewAvoidance challenge: Reroute without participation of transit networksBGP poisoning gives control to the destination
Well-crafted announcements ease concernsSlide110
Inter-Domain Routing SummaryBGP4 is the only inter-domain routing protocol currently in use world-wide
Issues?Lack of securityEase of misconfigurationPoorly understood interaction between local policiesPoor convergenceLack of appropriate information hidingNon-determinismPoor overload behavior110Slide111
Lots of research into how to fix this
111SecurityBGPSEC, RPKIMisconfigurations, inflexible policySDNPolicy InteractionsPoiRoot (root cause analysis)ConvergenceConsensus RoutingInconsistent behaviorLIFEGUARD, among othersSlide112
Why are these still issues?
112Backward compatibilityBuy-in / incentives for operatorsStubbornness
Very similar issues to IPv6 deployment