PhD thesis defense Amit Mondal Committee Aleksandar Kuzmanovic Asst Professor Northwestern Univ Peter Dinda Assoc Professor Northwestern Univ Yan Chen Assoc Professor Northwestern ID: 648280
Download Presentation The PPT/PDF document "Transport and Application Layer Approach..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Transport and Application Layer Approaches to Improve End-to-end Performance in the Internet
PhD thesis defense
Amit
Mondal
Committee
:
Aleksandar Kuzmanovic,
Asst. Professor, Northwestern
Univ
Peter
Dinda
, Assoc.
Professor, Northwestern
Univ
Yan Chen,
Assoc. Professor, Northwestern
U
niv
Jin Li, Principal Researcher, Microsoft ResearchSlide2
The
Internet is a commercial infrastructure used by diverse
set of applications and
services
Internet — A multiservice IP network
2
FTP
IPTV
VoIP
Video Conferencing
Streaming
GamingSlide3
Challenges involved…Applications have end-to-end network performance requirements
Jitter, latency, packet loss, bandwidth, etc
Original InternetBest effort serviceNo service assuranceTCP ensures only in-order packet deliveryDestination-based IP routing
3
“low delay”
“high throughput”
Need to provide support to new set of emerging applications in the InternetSlide4
Application classification based on QoS
Low
bandwidthHigh bandwidthLatency
sensitiveVoIP, network games, SSH, chatting, web browsing, e-commerceMultimedia streaming, IPTV, audio/video conferencing
Latency insensitiveEmailFile transfer (FTP/BitTorrent)
My focus:
Low-latency interactive TCP applications (Chapter II and III)
Telnet, SSH, network games, e-commerce, etc.Interactive multimedia services (Chapter IV and V)Audio/video conferencing, VoIP, streamed multimedia services, etc.
4Slide5
Overlay routing (QRON, QSON, etc.)
Chapter-IV
ECN, ECN+, packet marking & differential dropping, Service differentiation, etc.
IntServ
, DiffServ, Traffic engineering, Constraint based routing MPLS
Bandwidth over-provisioningForward error correction, Bitrate adaptation, Chapter-V,
etc.
TCP smart framing, Limited retransmit,
Early retransmit, Chapter-II, Chapter-III,
etc.
N/A
N/A
N/A
Infrastr
Endpoint
Application
Transport
Network
Data Link
Physical
The spectrum of
QoS
provisioning
5Slide6
Research thesis
For example, I propose techniques that improve
Response times of short TCP flows by five times in certain scenariosMedian Mean Opinion Score (MOS) of VoIP calls over WiFi by a factor of two
6
Despite much work to improve end-to-end performance in the Internet, there still exists a significant space for improvement. In my dissertation, I develop techniques to reduce the gap further.Slide7
OutlineChapter I: Introduction
Chapter
II: Improving performance of thin-stream TCP applicationsChapter III: Removing exponential backoff from TCPChapter IV: Multi-constraint QoS routing framework Chapter V: Audio/video performance Issues: Diagnosis and solutionsConclusion
7Slide8
8
Chapter II: Improving thin-stream TCP flows
data packets
“
dummy”
packets
strict priority
TCP-fair rate
Upgrading mice to elephants
Packet switched
Circuit switched
A. Mondal and A. Kuzmanovic, “When TCP Friendliness Becomes Harmful”, IEEE INFOCOM 2007
A. Mondal and A. Kuzmanovic, “Upgrading Mice to Elephants: Effects and End-Point Solutions”, IEEE/ACM Transactions on Networking, Volume 18, Issue 2, April 2010Slide9
9
Chapter III: Removing Exponential Backoff from TCP
V. Jacobson, “
Congestion Avoidance and Control,” in ACM CCR, 18(4): 314-329, Aug 1988. Exponential retransmit timer backoffImplicit packet conservation principleResponse times improvement of short and interactive flows by five times in certain scenarios
A. Mondal and A. Kuzmanovic, “Removing Exponential Backoff from TCP”, In ACM SIGCOMM CCR, Volume 38, Number 5, October 2008. Slide10
Chapter IV: Multi-constraint QoS routing framework
We design a framework that finds path under multiple constraints without NP-hard computation
Dijkstra’s algorithm involves NP-hard computationHybrid protocol of p
ath vector protocol and on-demand route discoveryUsing simulation based on real-world data we demonstrated that our solution is both efficient and scalableBuilt a functional prototype using Click Modular router10
A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, “Supporting Application Network Flows with Multiple QoS Constraints”, In IEEE IWQoS 2009Slide11
Chapter V: Audio/video performance issues: Diagnosis and solutionsIdentify challenges towards high quality audio/video conferencing over the Internet
Understand loss and jitter behavior in shorter time scale and
quantify impacts of various network scenariosInvestigate solutions11
A. Mondal, R. Cutler, C. Huang, J. Li, and A. Kuzmanovic, “SureCall: Towards Glitch-Free Real-time Audio/Video Conferencing”, In IEEE IWQoS 2010A. Mondal, C. Huang, M. Jain, J. Li, and A. Kuzmanovic, “A Case of WiFi Relay: Improving VoIP Quality for WiFi
Users”, In IEEE ICC 2010Slide12
Modern AV conferencing System
12Slide13
SureCall platform
A distributed measurement and experiment platform
Understand problems and experiment solutionsAgents installed on volunteers’ machinesMeasurements and experiments driven by mastersSureCall agents are upgradeable without user interventionAvailable from http://research.microsoft.com/~chengh/SureCall/SureCall.htm
13Slide14
SureCall measurementEmulated bidirectional audio/video sessions using UDP
5 minute per hour
Audio bitrate : 24 kbps Video bitrate: 192 kbpsSTUN NAT traversal protocol for home usersDetailed packet-level traces collectedNetwork connectivity close to the clientsICMP packet pair with TTL=2Traceroute to other endpoint at the beginning and end of each session
Environmental details on client machines CPU load, network interface type 14Slide15
SureCall deploymentMicrosoft global enterprise network
Many residential networks
Current deployment status80 unique machinesEnterprise - 32Home – 20Both – 28Enterprise trace and Home traceTwo separate masters (within enterprise network and in public Internet)
15Slide16
SureCall dataset
4,800 hours of packet traces
4,100 from enterprise700 from home1968 unique IP addressesEnterprise - 1212 Home -756Trace classification and stratification
Intra-continental vs inter-continentalWired vs wirelessAudio-only vs audio+videoTrace preprocessingClock skew removal
16
Clock skew in wildSlide17
Jitter computation algorithm
Multiple algorithms to compute jitter
Variance of one-way-delay samplesTime difference between actual packet receiving time and ideal receiving timeMost relevant for multimedia streaming/conferencing with
playout buffer17Slide18
Jitter in enterprise and residential networks
18
US-US, wired traces
Inter-continental, wired traces
Residential networks have significantly higher jitter compared to enterprise networks and affected greatly by inter-continental links.Slide19
Jitter variation across hosts
19
Enterprise
Home
Jitter variation is much higher in residential networks than in enterprise networks. The 95-th percentile jitter values are significantly worse than median jitter values in home networks.Slide20
Packet loss in residential and enterprise networks
20
Even well provisioned enterprise networks can become quite congested in short time scale.
Both enterprise and home networks show long tail in loss burst size distribution.
Slide21
Impact of WiFi connections
21
Enterprise
Home
In both
enterprise and home networks, wireless traces show significantly worse jitter statistics than wired traces.Slide22
Impact of WiFi connections
22
Enterprise
Home
In
both enterprise and home networks, wireless traces show
significantly worse jitter
statistics than
wired traces.
The
degradation due
to WiFi
in enterprise scenarios
is more severe than that in home scenarios.Slide23
Impact of VPN on performance
23
Jitter
Loss
VPN connection causes more degradation compared to wireless.Slide24
Can jitter predict future loss events?Extent to which loss and jitter are correlated, i.e. whether abrupt jitter increase can serve as a precursor of network congestion and predict future loss events
audio/video conferencing applications can take anticipatory action.
> 10 ms average increase in end-to-end delay for the last three packets preceding a loss evententerprise networks ~ 82% , home networks ~ 80%
24Slide25
Correlation between loss burst size and jitter
25
End-to-end delay increases significantly before
loss events
in both enterprise and home networks. Increase in end-to-end delay is not
a great indicator of loss burst size in enterprise networks.EnterpriseHomeSlide26
Network audio diagnosticsConcealed: percent of packets interpolated or extrapolated due to unrecovered packet loss
Stretched:
percent of packets stretched via time compressionClassifier operates as followsSupervised training with ground-truth objectively determined by PESQ score
26Slide27
Audio classifier performance
27
The classifier achieves a true positive rate >80% and false positive rate < 1% for T1=T2=0.07.Slide28
WiFi Relay: Improving VoIP Quality for WiFi Users
Large number of WiFi clients both in enterprise and residential networks
43% enterprises provide only WiFi connections to their employees
36% uses VoIP over WiFiPossible reasonsdense deployment of APs, overloading of an AP point, other wireless devices in the vicinity, etc28
WiFi links can significantly degrade VoIP performanceSlide29
Effectiveness of redundancy
Passive analysis with voice packet replication
Replication ratio r = 2,3,4, or 5
Packet losses can be effectively mitigated using application layer packet replication29Slide30
Overhead of replication
Typical audio packet size = 60 bytes
Encapsulated with RTP(12bytes),
UDP (8bytes), IP(20bytes), 802.11 MAC(28bytes), PHY (20
us for 802.11g) headers.w/o ACK: air time = DIFS + PHY header + (60+76 bytes)/54Mbps = 70 us
Replication ratioAir time (us)w/o ACK w/ ACK
170
102
2
79
111
3
87120
496
128
Replicating audio packet at application layer causes only marginal increase in air time
30Slide31
WiFi relay solution
Nearby wired endpoints as relays
Heavy replication between relays and wireless endpointsNo dedicated infrastructure
31Slide32
Evaluation
Evaluated on SureCall platform
Upgrade SureCall clients to support relaySimultaneous direct call and relayed VoIP calls between each pair of SureCall agentsApple-to-apple comparisonOne-hop overlay (only one wireless endpoint)Two-hop overlay (both endpoints are wireless)Relay node selection based on enterprise internal database
32Slide33
Impact of relay on jitter
No dedicated infrastructure, ordinary endpoints as relay nodes
CDF of jitter diff at 50
th percentile
CDF of jitter diff at 95th percentile
Relay has negligible impact on end-to-end jitter
33Slide34
Improvement with WiFi relay
Mean Opinion Score (MOS)
Calculated from packet loss rate and jitter (Cole et al. CCR’01)
Fixed de-jitter buffer of 100 ms
WiFi relay significantly improve VoIP quality for WiFi users
WiFi relay greatly reduces packet loss34Slide35
Summary of Chapter VSureCall, a distributed experimental platform, to address the challenges of audio/video communications over Internet.
Characterized enterprise and residential networks over a wide variety of network scenarios
Classifier that accurately predicts when network issues most likely to cause audio quality degradationWiFi relay that significantly improve VoIP qualify for WiFi clients35Slide36
ConclusionProposed easily deployable techniques to improve performance of TCP based interactive applications
Demonstrated that exponential
backoff can be altogether removed from TCP without any stability issuesDesigned an overlay framework to support multimedia services with multiple QoS constraintsDeveloped an distributed experimental framework, SureCall, to understand the challenges towards IP based audio/video communications and for rapid evaluation of new protocols
36Slide37
37
Thank you!Slide38
[1] A. Mondal and A. Kuzmanovic, “When TCP Friendliness Becomes Harmful”, In IEEE INFOCOM 2007
[2]
A. Mondal and A. Kuzmanovic , “A Poisoning-Resilient TCP Stack”, In IEEE ICNP 2007[3] A. Mondal and A. Kuzmanovic , “Removing Exponential Backoff from TCP”, In ACM SIGCOMM CCR, Volume 38, Number 5, October 2008. [4
] A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, “Supporting Application Network Flows with Multiple QoS Constraints”, In IEEE IWQoS 2009[5] A. Kuzmanovic, A Mondal, S. Floyd, and K.K. Ramakrishnan. “Adding Explicit Congestion Notification (ECN) Capabilities to TCP’s SYN/ACK Packets”. RFC 5562, June 2009.[6] A. Mondal and A. Kuzmanovic, “Upgrading Mice to Elephants: Effects and End-Point Solutions”, In IEEE/ACM Transactions on Networking, Volume 18, Issue 2, April 2010[7]
A. Mondal, R. Cutler, C. Huang, J. Li, and A. Kuzmanovic, “SureCall: Towards Glitch-Free Real-time Audio/Video Conferencing”, In IEEE IWQoS 2010[8] A. Mondal, C. Huang, M. Jain, J. Li, and A. Kuzmanovic, “A Case of WiFi Relay: Improving VoIP Quality for WiFi Users”, In IEEE ICC 2010[9] A. Mondal, I. Trestian, Z. Quin, and A. Kuzmanovic, “P2P as CDN (Akamizing BitTorrent)”, under submission[10] J. Miller, A. Mondal, R. Potharaju, P
Dinda, and A. Kuzmanovic, “Network Monitoring is People: Understanding End-user Perception of Network Problems”, Under submission.Publications
38Slide39
Backup slides
39Slide40
QoS
and the Internet
QoS Architectures
Integrated Service (Intserv)Differentiated Service (Diffserv)Multi Protocol Label Switching (MPLS)Traffic Engineering and Constraint based routingKey ChallengesScalability issues in core Complex signaling protocols
Deployment overheadCurrent Internet still offers only a best-effort serviceMotivates to investigate easily deployable solutions that improve end-to-end network performance40Slide41
QoS using transport and application layer techniques without network supportExplicit congestion notification
[ Floyd 94]
Packet marking and differential dropping [Guo and Matta’01]Limited transmit [Allman et al. 01]Service differentiation
[Neoreddine and Tobagi’02]Differential congestion notification [Le et al.’04]TCP smart framing [Mellia et al. ‘05]ECN+ [Kuzmanovic’05]Early retransmit [Allman et al.’06]TCP SAReno
[Yang and Vecinia’02]PCP [Anderson et al. ‘06]41Slide42
Going beyond TCP-fairDifferentiated
minRTO
Application-limited flows use reduced minRTO valueShort-term padding with dummy packetsApplication data followed by three tiny dummy packetsDiversity approachApplication layer FEC-based approach
The simplest FEC scheme is replication42Slide43
43
Why Exponential Backoff?
Jacobson adopted exponential backoff from the classical shared-medium Ethernet protocol“IP gateway has essentially the same behavior as Ether in a shared-medium network.”
Slide44
44
Why Exponential Backoff?
Jacobson adopted exponential backoff from the classical shared-medium Ethernet protocol“IP gateway has essentially the same behavior as Ether in a shared-medium network.”
Not true!
C
CSlide45
Removing exponential backoff from TCP and its implications
Other reasons: no admission control, finite flow size, skewed traffic distribution, etc.
When to resend a packet?Implicit packet conservation principleAs soon as the retransmission timeout expires
End-to-end performance can only improve if we remove the exponential backoff from TCPImplicationsSignificant improvement of response times for short and interactive TCP flows45Slide46
Multiple QoS Constraints
The Internet evolves towards the global multiservice IP network
Diverse applications and different QoS requirementsMany applications have multiple QoS requirementsVideo streaming, VoIP, Video conferencing, etc.
Need support for end-to-end QoS guarantee under multiple constraintsMultiple QoS constraints often make the routing problem intractable 46Slide47
QoS provisioning using overlay networks
Build Overlay Backbone
Deploy overlay nodes at strategic locations in the Internet
Provide support for per-flow forwardinge.g. Anagran Flow Aware RoutersFlow route management architecture Discover and setup end-to-end paths for individual flows with diverse flow QoS requirements
Monitor end-to-end flow performance to trigger path adaptation47Slide48
Overlay flow QoS management architecture
48
AS3
AS4
AS1
AS2
End user
Overlay node
Physical link
Logical link
Sensing local link characteristics
Find a path to X with b/w > b, delay < d and loss < l%
Configure intermediate overlay nodes for per-flow forwarding
Adapt to different path dynamically as current path fails to meet QoS parametersSlide49
Contribution
Design a scalable QoS routing protocol which finds path under
multiple constraintsPropose a distributed algorithm for dynamic path adaptation
Evaluate accuracy, efficiency and scalability of the protocol using large-scale simulation and compare with other existing approachesBuild a functional prototype using Click modular router49Slide50
Design challenges
Multiple QoS metrics
Finding a feasible path using Dijkstra’s algorithm is NP-CompleteRandomized and approximation algorithms
Single composite metric derived from multiple metrics Paths might not meet individual QoS constraintsDynamic overlay-link propertiesIncreases control message overhead 50Slide51
Multi-constraint QoS routing protocol
Path vector protocol to disseminate path information
Tag with QoS parameters
How to aggregate path information when multiple QoS metrics are considered?Distribute the best paths for each metricsWhat about QoS requests which could be served by paths which are not in the best path set?On-demand route discovery
51A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, “Supporting Application Network Flows with Multiple QoS Constraints”, In IEEE IWQoS 2009Slide52
MCQoS: Disseminating path information
52
B
A
QoS Path
Table
X
AS1
(2ms, 0.01%, 128Kbps)
AS3
(3ms, 0.02%, 378Kbps)
AS5
Delay
X
AS1
(2ms, 0.0%, 128Kbps)
AS3
(3ms, 0.005%, 378Kbps)
AS5
Loss
X
AS1
(10ms, 0.01%, 1Mbps)
AS3
(5ms, 0.01%, 768Kbps)
AS5
B/w
Local link
info
Tag
QoS
characteristics
Advertise best path for each
QoS
metricSlide53
MCQoS: Aggregating path information
What about QoS requests in the undecidable region?
53
Delay
Bandwidth (b/w)
infeasible
undecideable
best
b/w
best delay
feasible
There will feasible requests that can be supported but the source node might not know about those paths, thus cannot admit flows based on local information
The source node already knows a path if the QoS request falls in the feasible region
There cannot exist a feasible path in the network if the QoS request falls in the infeasible regionSlide54
MCQoS: On-demand route discovery
Admit or deny flow based on local QoS table if in feasible or infeasible region
Otherwise, On-demand route discovery for requests in undecideable region
Exploit advertisement received from neighbors to reduce search space while route discovery
54
Delay
B/W
feasible
infeasible
undecideable
A
B
C
D
ESlide55
55
C
B
D
A
E
10ms
12Mbps
100ms
50Mbps
2ms
5Mbps
8ms
20Mbps
4ms
5Mbps
105ms
50Mbps
5ms
5Mbps
106ms
50Mbps
120ms, 15Mbps OK
10ms
12Mbps
100ms
50Mbps
2ms
5Mbps
8ms
20Mbps
10ms, 3Mbps OK
10ms, 100Mbps X
15ms, 15Mbps ???
10ms, 3Mbps OK ABD…E
120ms, 15Mbps OK ABC…E
10ms, 100Mbps X ----
(2ms, 20Mbps)
(5ms, 100Mbps)
(1ms, 100Mbps)
15ms, 15Mbps OK ABD…E
Requests:
best b/w
best delay
MCQoS
: Illustration through exampleSlide56
Route maintenance in MCQoS
Route maintenance through path patching
Each intermediate node knows the QoS requirements from the node to the destination
Upstream node periodically pushes QoS requirements to downstream nodesAs a node detects QoS violation, it triggers alternate path search at local nodeNotify upstream node if no alternative path
56
A
G
B
E
H
F
D
CSlide57
Overhead analysis of path dissemination
57
4
6
5
2
3
1
10
10
8
4
7
10
5
9
6
In
MCQoS
protocol, a node advertises only the best path to a destination. Thus many alternative paths are pruned, which increases scalability.Slide58
Overhead analysis of on-demand route discoveryParameters
Average out-degree of the nodes
Overlay distance between source to destinationWorst caseMessage overhead is proportional to sum of all possible path lengths from source to destinationAmortized costFraction of request in undecidable regionLimit no of hops of route discovery
58More than 99% of the undecidable region is discovered within 5 hops from the source node, thus amortized cost will be significantly less than worst case scenario.Slide59
Experimental evaluation of MCQoS
Built an event-driven simulator
Generated random flat topology of nodes using GT-ITM Outdegree min(10, size/2)
Assigned link metrics from actual planetlab link measurement data59Slide60
Convergence time of path dissemination
60
Being path vector based protocol
MCQoS takes longer time to converge, but does not involve any NP-hard computation, thus scale with network size
Convergence time: how long does it take to stabilize for a given network snapshot?Re-stabilization time: how long does it take to stabilize once a link metric changes?
QRON: Link state based multi-QoS routing protocol using composite metric approachSlide61
Message overhead of path dissemination
61
Message overhead of
MCQoS is comparable to Link-State based (QRON) protocol Slide62
Elaborating the undecidable region
62
Depletion area
Global feasible region:
feasible region at the source node if the source node knew all alternative paths like link-state protocol
Depletion area:
part of global feasible QoS region not known at the source node because many alternate paths are suppressed
K-hop path
: paths in the
undecidabe
region discovered within k-hops of on-demand route discovery processSlide63
Overhead of on-demand path discovery
63
More than 90% of the depletion area is discovered within 3 hops
How many hops does it take to discover the entire depletion area?
We measure the fraction of depletion area discovered within k hops from the source nodeSlide64
Improvement in accuracy by MCQoS
64
A feasible path with a composite metric might not satisfy individual QoS metrics.
The line-segment based approach often suffers from loss/distortion.
Our hybrid approach has no false positive and false negative percentage can be reduced to less than one 1% by 3-hop on-demand route discovery.Slide65
QoS violation ratio in dynamic environment with
MCQoS
65
Arrival rate (conn/sec)
60
120
240
300
600
Violation
ratio (%)
0.32
0.33
0.78
0.4
1.12
100 node topology
Generate QoS requests with certain arrival rate with
b/w
[5Mbps, 55Mbps] and delay [100ms,400ms]
Each
flow lasts
between 5 to 10
minutes
We simulate the network behavior for 10
minutes
New flows arrive before network
stabilizes
Expect
to observe QoS violation
The QoS violation ratio is negligible even with arrival rate of 600
conn
/sec
.Slide66
MCQoS enabled overlay node prototype
66
MCQoS
S
3
Click Router
DataIn
DataOut
Flow setup
Local link
c
haracteristics
Peers (path ads)
Control
Plane
Data
Plane
QoS path setup (Y:p -> X:q,
Dms
, L%,
BKbps
)
Rt. discovery
req
, Rt. discovery reply
QoS Path
table
Flow setup
req
Flow id
Next hop
Y:p ->X:q
CSlide67
Summary
Designed a scalable multiple constraints QoS flow route management protocol
hybrid approach of path vector routing and on-demand route discoveryKeep balance between flow setup time and control message overheadNo complex NP-hard computation
Performed large-scale simulations to demonstrate the efficiency and scalability of the approachBuilt a prototype using Click modular router67Slide68
‘Composite Metric’ approach to multi-QoS
routing (1/2)
68
Composite Metric = K1*delay + k2/bw where k1=1, k2 = 10^7, delay in sec,
b/w in bpsFalse positive: flow is admitted but the path does not meet the QoS
False negative: there exists a feasible path but the flow is not admittedSlide69
‘Composite Metric’ approach to multi-QoS
routing (2/2)
69Slide70
‘Line Segment’ approach to multi-QoS
routing (1/2)
70
Lui et al. proposed line segment based approach to for topology aggregation in delay-
bw plane. Tam et al. designed a distance vector based QoS protocol using the line-segment approachFalse positive: Fraction of undecidable region that is actually infeasible, but the approach labels as feasible.
False negative: Fraction of undecidable region that is feasible, but the approach labels as infeasible.Slide71
‘Line Segment’ approach to multi-QoS
routing (2/2)
71