/
NetAI 2018 , Budapest, Hungary NetAI 2018 , Budapest, Hungary

NetAI 2018 , Budapest, Hungary - PowerPoint Presentation

avantspac
avantspac . @avantspac
Follow
342 views
Uploaded On 2020-08-28

NetAI 2018 , Budapest, Hungary - PPT Presentation

IFSRL An Intelligent Forwarding Strategy Based on Reinforcement Learning in NamedData Networking Yi Zhang 1 Bo Bai 2 Kuai Xu 3 Kai Lei 1 1 ICNLAB SECE Peking University 2 Future Network Theory Lab 2012 Labs Huawei ID: 808538

pkt interface cont learning interface pkt learning cont amp mbps forwarding ifs interest netai basic training state topology granularity

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "NetAI 2018 , Budapest, Hungary" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

NetAI 2018, Budapest, Hungary

IFS-RL: An Intelligent Forwarding Strategy Based on Reinforcement Learning in Named-Data NetworkingYi Zhang1, Bo Bai2, Kuai Xu3, Kai Lei1,*1ICNLAB, SECE, Peking University2Future Network Theory Lab, 2012 Labs, Huawei3Arizona State University

Slide2

Outline

2/23IntroductionMethodologyBasic Training AlgorithmLearning GranularityEnhancement for Topology ChangePreliminary ExperimentsConclusionsNamed-Data Networking (NDN)Intelligent Forwarding StrategyReinforcement Learning (RL)

Slide3

NetAI 2018, Budapest, Hungary

Introduction3/23

Slide4

Introduction

Named-Data Networking (NDN)An Information Centric Network (ICN) architecturePull-based data delivery processTriggered by user requests, i.e., Interest Pkt.Request forwarding is driven by forwarding enginesReachability information about different content itemsForwarding Information Base (FIB)4

/23

Slide5

Introduction (Cont

)Interest Forwarding Process in NDNThe forwarding plane enables each router to Utilize multiple alternative interfaces Measure the performance of each pathForwarding StrategyFor each Interest Pkt., select the optimal interface from multiple alternative interfaces5/23

interface 1

interface 2

interface

k

forward

Slide6

Introduction (Cont

)Existing forwarding strategies Fixed control rulesSimplified models of the deployed environmentFail to achieve optimal performance across a broad set of network conditions & application demands6/23Propose IFS-RL: An intelligent forwarding strategy based on RLDetermine a self-adaptive learning granularityEnhance the basic model to handle topology changes

Slide7

NetAI 2018, Budapest, Hungary

Methodology7/23

Slide8

Basic Training Algorithm

Reinforcement Learning (RL) FrameworkConsist of Agent & EnvironmentFor a certain time step t8/23Transit state st→st+1

Observe state

s

t

Choose action

a

t

Receive reward

r

t

The goal

Maximize the

expected cumulative discounted reward

Slide9

Basic Training Algorithm (Cont)

9/23The IFS-RL ModelAgent - RouterImplemented by Neural Networks (NNs)Observe the network state (e.g., RTT & # Pkt for each interface)Determine the optimal forwarding interfaceUse reward information to train the NNsEnvironment - Network

Slide10

Basic Training Algorithm (Cont)

10/23The IFS-RL Model (Cont)State: st = (Dt, Nt) (Average Delay, # of Interest Pkt.)Dt = (d1, d2, …, dK); di: Avg. delay of interface i (Approximated by RTT)Nt

= (

n

1

,

n

2

, …,

n

K

);

n

i

:

# of Interest Pkt

. forwarded by interface

i

D

t

N

t

Slide11

Basic Training Algorithm (Cont

)11/23The IFS-RL Model (Cont)ActionChoose an interface based on the learned policy μRewardNegative Average RTTs of all packets between two continuous actions

Slide12

Basic Training Algorithm(Cont)

The IFS-RL Model (Cont)Policy π(st, at) (continuous domain)Deep Deterministic Policy Gradient (DDPG) [Timothy P. et al. '15]Actor-critic method12/231-D Conv. LayerDense Hid. Layer

Output

Layer

Actor Net.

Critic Net.

Slide13

Learning GranularitySetting of learning granularity

Massive packets to be processedLet calculation keep up with pkt. arrivalPut the learning granularity as a part of action spaceUse the combination of Selected interface & Num. of time intervals13/23Action (Interface, #Time intervals)

Slide14

Learning Granularity (Cont)IFS-

RL Algorithm (Consider the learning Granularity)Observe state information st = (Dt, Nt)Take action at according to the learned policy μSelected interface iLearning granularity TlgDuring the period of time TlgForward all the Interest Pkt. through interface iCalculate reward rtUpdate the NNs’ parameters according to (st, at, rt)Start the next round of learning

14

/23

Slide15

Enhancement for Topo. ChangeNetwork Topology Changes

Lead to dimensional changes of st and atSet input and output formats span the max. # of interfaceE.g., ordinary routers with max. # of interfaces of 48 Zero out unavailable interfacesInterpretation of actor network’s outputApply a mask to the (softmax) actor net.'s output layer0-1 vector [m1, m2, …, mk]pi: normalized probability for action i15/23

Slide16

NetAI 2018, Budapest, Hungary

Preliminary Experiments16/23

Slide17

Experiment Results

Experiment settingSimulation experiments in NDNSimThroughput & Drop rateComp. with BestRoute [A. Afana et al.'12] & EPF [K. Lei et al.'15]Simulation topology: 17/23ConsumerProducerR1

R6

R2

R3

R4

R5

4 Mbps

7 Mbps

Bandwidth

4 Mbps

4 Mbps

4 Mbps

7 Mbps

7 Mbps

7 Mbps

10 Mbps

10 Mbps

Slide18

Experiment Results (Cont)

Simulation experimentSimulation topology Pkt SizeInterest Pkt: 40 bytesData Pkt: 1024 bytes18/23ConsumerProducerR1R6R2R3

R4

R5

Delay

40

ms

40

ms

40

ms

10

ms

7

ms

7

ms

7

ms

7

ms

4 links between consumer & producer

With 1 link having

smaller delay

R1

-

R3

-

R6

Slide19

Experiment Results (Cont)

Experimental ResultsConsumer sends Interest Pkt. at a constant rate of 1500 Pkt./sec for 50 Sec19/23ThroughputDrop RateIFS-RLIFS-RL

Slide20

Experiment Results (Cont)

20/23Link UtilizationLoad balance of IFS-RL is not the bestMaximize throughput & minimize Pkt. drop rateTend to choose the interface with minimum RTTBestRouteEPFIFS-RLLink utilization

Slide21

NetAI 2018, Budapest, Hungary

Conclusion21/23

Slide22

Conclusion

IFS-RLAn intelligent forwarding strategyDeep Reinforcement Learning (DRL)Deep Deterministic Policy Gradient (DDPG)Learning granularityIncorporate learning granularity into the action spaceNetwork topology changesSet input and output formats span the max. # interfaceIntroduce a softmax maskSimulation experimentAchieve higher throughput & lower drop rateNeed improvement in load balancing22/23

Slide23

NetAI 2018, Budapest, Hungary

Thank You!Q&AFor implementation details, please contact Yi Zhang (1601214039@sz.pku.edu.cn)23/23