Aditya Akella UWMadison First GENI Measurement Workshop Joint work with Ashok Anand Steven Kappes UWMadison and Suman Nath MSR 1 Memory amp storage technologies 2 Question ID: 145548
Download Presentation The PPT/PDF document "Maintaining Large And Fast Streaming Ind..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Maintaining Large And Fast Streaming Indexes On Flash
Aditya Akella, UW-MadisonFirst GENI Measurement WorkshopJoint work with Ashok Anand, Steven Kappes (UW-Madison)and Suman Nath (MSR)
1Slide2
Memory & storage technologies
2
Question:
What is the role of emerging memory/storage technologies in supporting current and future measurements and applications?
This talk:
Role of flash memory in supporting applications/measurements that need large streaming
indexes; Improving current apps and enabling future appsSlide3
Streaming stores and indexesMotivating
apps/scenariosCaching, content-based networks (DOT), WAN optimization, de-duplicationLarge-scale & fine-grained measurementsE.g., IP Mon: compute per packet queuing delaysFast correlations across large collections of netflow recordsIndex featuresStreaming: Data stored in a streaming fashion, maintain online index for fast access
Expire old data, update index constantly
Large
size
: Data store ~ several TB, index ~ 100s of GB
Need for speed
(fast reads
and
writes)Impacts usefulness of caching applications, timeliness of fine-grained TE
3Slide4
Index workload
Key aspectsIndex lookups and writes are randomEqual mix of reads/writesNew data replaces some old data fast, constant expiry
Index data structures
Tree-like (B-tree) and log structures not suitable
Slow lookup (e.g.
log(n
) complexity in trees)
Poor support for flexible, fast garbage collection
Hash tables ideal…
… But
current options for large streaming hash tables not optimal
4Slide5
Current options for >100GB Hashtables
DRAM: large DRAMs expensive and can get very hotDisk: inexpensive, but too slow Flash provides a good balance between cost, performance, power efficiency…Bigger and more energy efficient than DRAMComparable to disk in price>2 orders of magnitude faster than disk, if used carefullyBut… need appropriate data structures to maximize flash effectiveness and overcome inefficiencies
5Slide6
Flash properties
Flash chips: Layout – large number of blocks (128KB), each block has multiple pages (2KB)Read/write granularity: page, erase granularity: blockRead page: 50us, write page: 400us,
block
erase: 1ms
Cheap: Any
read including random,
sequential
write
Expensive: random writes/overwrites,
sub
-block deletion
Requires movement of valid pages from block to erase
SSDs
: disk like
interface for flash
Sequential/random read,
sequential write: 80usRandom write: 8ms
6
Flash good for hashtable lookups
Insertions are hard small random overwrites
Expiration is hard
small random deletesSlide7
BufferHash data structure
Batch expensive operations – random writes and deletes – on flashMaintain a hierarchy of small hashtablesMaintain upper levels in DRAMEfficient insertionAccumulate random updates in memoryFlush accumulated updates to lower level in flash (at the granularity of a flash page)
Efficient deletion
Delete in batch (at flash block granularity
)
Amortizes deletion cost
7Slide8
Handling
small random updates
…
2^k
Buffers
Each table uses
N
-
bits key
DRAM
Flash
K bits
N bits
Hash key
HT Index
HT key
Buffer small random updates in
DRAM
as small
hashtables
(buffers)
When a HT is full, write it to flash, without modifying existing data
Each
“super table”
is a collection of small
hashtables
different “incarnations” over time of the same buffer
How to search them? Use (bit-sliced) bloom filters
Bit-sliced Bloom
filter
Super Table
8Slide9
LookupLet key = <k
1,k2>Check the k1’th hashtable in memory for the key k2If not found, use the bloom filters to decide which hashtable h of the k1’th supertable may contain the key
k
2
Read and check the
hashtable
(e.g., in
h
’th page of k1’th block of flash)9Slide10
Expiry of hash entries
A supertable is a collection of hashtablesExpire the oldest hashtable from a supertableOption 1: use a flash block as a circular queueSupertable = flash block, hashtable = flash pageDelete oldest
hashtable
incarnation (page
) and replace it with a new one
If a flash block has
p
pages,
supertable
has p latest hashtablesProblem: a page can not be independently deleted without deleting the block (requires copying other pages)
10Slide11
Handle expiry of hash entries
Interleave pages from different supertables when writing to flash or SSDInstead of
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
Do this …
Advantage: batch deletion of multiple oldest
incarnations
Other flexible expiration policies can also be supported
11Slide12
InsertionKey = <k
1,k2>Insert into k1’th hashtable in-memory, using k2 as the keyIf the hashtable is fullExpire the tail hashtable in k1’th supertable
This expires the oldest incarnation from all
supertables
Copy
k
1
’th
hashtable
from memory to the head of k1’th supertable
12Slide13
Benchmarks
13Prototyped BufferHash on 2 SSDs and hard drive99th percentile read and write latencies under 0.1ms
Two orders of magnitude better than disks, at roughly similar cost
Built a WAN accelerator that is 3X better than current designs
Theoretical results on tuning
BufferHash
parameters
Low bloom filter false
positives
, low lookup cost, low deletion cost on average
Optimal buffer sizeSlide14
Conclusion
14Many emerging apps and important measurement problems need fast streaming indexes with constant read/write/evictionFlash provides a good hardware platform to maintain such indexesBufferHash helps maximize flash effectiveness and overcome efficienciesOpen issues: Role of flash in other measurement problems/architectures?Role of other emerging memory/storage technologies (e.g. PCM)
?
How to leverage persistence?Slide15
I/O operations
APIData store/index: StoreData(data)Add data to store; Create/update index with data_nameData store: address ←
Lookup(data_name
)
Data store:
Data ←
ReadData(address
)
Data store/index:
ExpireOldData
()
Remove old data from store; clean up index
Workload
data_name
is a hash over data
Index lookups and writes are
random
Equal mix of reads/writesIndex data structuresTree-like and log structures not suitableHash tables ideal, but current options for large streaming hash tables not optimal…
15