/
    RL with subsampling       RL with subsampling

RL with subsampling - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
353 views
Uploaded On 2020-04-06

RL with subsampling - PPT Presentation

Theoretical Analysis Motivation Challenges Learning Caching Policies with Subsampling Haonan Wang Hao He Mohammad Alizadeh Hongzi Mao MIT Computer Science and Artificial Intelligence Laboratory ID: 776077

cache object caches size cache object caches size caching subsampled problem state reward small policy remaining large total visit

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document " RL with subsampling " is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

RL with subsampling

Theoretical Analysis

Motivation

Challenges

Learning Caching Policies with Subsampling

Haonan

Wang, Hao He, Mohammad Alizadeh, Hongzi Mao MIT Computer Science and Artificial Intelligence Laboratory

Why learning approach for caching policies?

Handcrafting optimal policy can be tedious

Object size distribution/arrival pattern changes

Optimize for different objectives (e.g., minimize overwrites for SSD based caches)

Problem setup

Experiment

Large caches significantly delay the reward feedback

Typical CDN caches are hundreds of GBs (hosts many millions of objects)Reward feedback are 100× longer than normal RL applications (e.g., AlphaGo only deals with MDPs of < 400 steps)

Cache admission setting (LRU for eviction)State: size of incoming object, steps since last object visit, object total visit count, remaining cache sizeAction: admit/dropReward: total byte hits since last actionStep: proceed until next cache miss

Subsample trace by

hashing

on object IDReduce the cache size proportionally

 The caching statistics remains unchanged

Train RL on reduced caching problem (with small cache and subsampled trace)

Generalize the policy to larger problem (normalize remaining cache size to [0, 1]

As a reference, directly training RL on small cache (both subsampled and original) reaches state-of-the-art performance

For large caches, only subsampled RL can successfully train

State transition:

Reward function: