/
Maintaining Large And Fast Streaming Indexes On Flash Maintaining Large And Fast Streaming Indexes On Flash

Maintaining Large And Fast Streaming Indexes On Flash - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
420 views
Uploaded On 2015-09-30

Maintaining Large And Fast Streaming Indexes On Flash - PPT Presentation

Aditya Akella UWMadison First GENI Measurement Workshop Joint work with Ashok Anand Steven Kappes UWMadison and Suman Nath MSR 1 Memory amp storage technologies 2 Question ID: 145548

data flash block hashtable flash data hashtable block index random key store hash

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Maintaining Large And Fast Streaming Ind..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Maintaining Large And Fast Streaming Indexes On Flash

Aditya Akella, UW-MadisonFirst GENI Measurement WorkshopJoint work with Ashok Anand, Steven Kappes (UW-Madison)and Suman Nath (MSR)

1Slide2

Memory & storage technologies

2

Question:

What is the role of emerging memory/storage technologies in supporting current and future measurements and applications?

This talk:

Role of flash memory in supporting applications/measurements that need large streaming

indexes; Improving current apps and enabling future appsSlide3

Streaming stores and indexesMotivating

apps/scenariosCaching, content-based networks (DOT), WAN optimization, de-duplicationLarge-scale & fine-grained measurementsE.g., IP Mon: compute per packet queuing delaysFast correlations across large collections of netflow recordsIndex featuresStreaming: Data stored in a streaming fashion, maintain online index for fast access

Expire old data, update index constantly

Large

size

: Data store ~ several TB, index ~ 100s of GB

Need for speed

(fast reads

and

writes)Impacts usefulness of caching applications, timeliness of fine-grained TE

3Slide4

Index workload

Key aspectsIndex lookups and writes are randomEqual mix of reads/writesNew data replaces some old data  fast, constant expiry

Index data structures

Tree-like (B-tree) and log structures not suitable

Slow lookup (e.g.

log(n

) complexity in trees)

Poor support for flexible, fast garbage collection

Hash tables ideal…

… But

current options for large streaming hash tables not optimal

4Slide5

Current options for >100GB Hashtables

DRAM: large DRAMs expensive and can get very hotDisk: inexpensive, but too slow Flash provides a good balance between cost, performance, power efficiency…Bigger and more energy efficient than DRAMComparable to disk in price>2 orders of magnitude faster than disk, if used carefullyBut… need appropriate data structures to maximize flash effectiveness and overcome inefficiencies

5Slide6

Flash properties

Flash chips: Layout – large number of blocks (128KB), each block has multiple pages (2KB)Read/write granularity: page, erase granularity: blockRead page: 50us, write page: 400us,

block

erase: 1ms

Cheap: Any

read including random,

sequential

write

Expensive: random writes/overwrites,

sub

-block deletion

Requires movement of valid pages from block to erase

SSDs

: disk like

interface for flash

Sequential/random read,

sequential write: 80usRandom write: 8ms

6

Flash good for hashtable lookups

Insertions are hard  small random overwrites

Expiration is hard

small random deletesSlide7

BufferHash data structure

Batch expensive operations – random writes and deletes – on flashMaintain a hierarchy of small hashtablesMaintain upper levels in DRAMEfficient insertionAccumulate random updates in memoryFlush accumulated updates to lower level in flash (at the granularity of a flash page)

Efficient deletion

Delete in batch (at flash block granularity

)

Amortizes deletion cost

7Slide8

Handling

small random updates

2^k

Buffers

Each table uses

N

-

bits key

DRAM

Flash

K bits

N bits

Hash key

HT Index

HT key

Buffer small random updates in

DRAM

as small

hashtables

(buffers)

When a HT is full, write it to flash, without modifying existing data

Each

“super table”

is a collection of small

hashtables

different “incarnations” over time of the same buffer

How to search them? Use (bit-sliced) bloom filters

Bit-sliced Bloom

filter

Super Table

8Slide9

LookupLet key = <k

1,k2>Check the k1’th hashtable in memory for the key k2If not found, use the bloom filters to decide which hashtable h of the k1’th supertable may contain the key

k

2

Read and check the

hashtable

(e.g., in

h

’th page of k1’th block of flash)9Slide10

Expiry of hash entries

A supertable is a collection of hashtablesExpire the oldest hashtable from a supertableOption 1: use a flash block as a circular queueSupertable = flash block, hashtable = flash pageDelete oldest

hashtable

incarnation (page

) and replace it with a new one

If a flash block has

p

pages,

supertable

has p latest hashtablesProblem: a page can not be independently deleted without deleting the block (requires copying other pages)

10Slide11

Handle expiry of hash entries

Interleave pages from different supertables when writing to flash or SSDInstead of

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

1

1

1

2

2

2

2

3

3

3

3

4

4

4

4

Do this …

Advantage: batch deletion of multiple oldest

incarnations

Other flexible expiration policies can also be supported

11Slide12

InsertionKey = <k

1,k2>Insert into k1’th hashtable in-memory, using k2 as the keyIf the hashtable is fullExpire the tail hashtable in k1’th supertable

This expires the oldest incarnation from all

supertables

Copy

k

1

’th

hashtable

from memory to the head of k1’th supertable

12Slide13

Benchmarks

13Prototyped BufferHash on 2 SSDs and hard drive99th percentile read and write latencies under 0.1ms

Two orders of magnitude better than disks, at roughly similar cost

Built a WAN accelerator that is 3X better than current designs

Theoretical results on tuning

BufferHash

parameters

Low bloom filter false

positives

, low lookup cost, low deletion cost on average

Optimal buffer sizeSlide14

Conclusion

14Many emerging apps and important measurement problems need fast streaming indexes with constant read/write/evictionFlash provides a good hardware platform to maintain such indexesBufferHash helps maximize flash effectiveness and overcome efficienciesOpen issues: Role of flash in other measurement problems/architectures?Role of other emerging memory/storage technologies (e.g. PCM)

?

How to leverage persistence?Slide15

I/O operations

APIData store/index: StoreData(data)Add data to store; Create/update index with data_nameData store: address ←

Lookup(data_name

)

Data store:

Data ←

ReadData(address

)

Data store/index:

ExpireOldData

()

Remove old data from store; clean up index

Workload

data_name

is a hash over data

Index lookups and writes are

random

Equal mix of reads/writesIndex data structuresTree-like and log structures not suitableHash tables ideal, but current options for large streaming hash tables not optimal…

15