On the Interplay of Hardware Transactional Memory and Lockfree Indexing Justin Levandoski Microsoft Research Redmond Ryan Stutsman Microsoft Research Redmond Darko Makreshanski Department of Computer Science ID: 576199
Download Presentation The PPT/PDF document "To Lock, Swap or Elide:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing
Justin LevandoskiMicrosoft Research Redmond
Ryan StutsmanMicrosoft Research Redmond
Darko
Makreshanski
Department of Computer Science
ETH ZurichSlide2
MotivationHardware Transactional Memory
Proposed as hardware support for lock-free data-structures [1]Introduced in Intel Haswell (2013)Existing Lock-free data-structuresRelying on CPU atomic primitives (CAS, FAI)Notoriously difficult to get right2[1] Transactional Memory: Architectural Support for Lock-Free Data Structures, M. Herlihy, J. E. B. Moss, ISCA ‘93Slide3
Lock-free Programming
Hardware Transactional Memory3Slide4
OverviewQ1: Does HTM obviate the need for crafty lock-free designs?
A1: No. Technical limitations prohibit use of HTM as a general purpose solution.Q2: What if all technical limitations are overcome?A2: No. There are still important fundamental differences.Q3: Can lock-free data-structures benefit from HTM?A3: Yes. Using HTM for MW-CAS can simplify lock-free designs4Slide5
5Hardware Transactional Memory
If (BeginTransaction()) Then < Critical Section > CommitTransaction()Else < Abort Fallback Codepath
>EndIf
Programming Model:Sequence of instructions with ACI(D) properties
AcquireElidedLock
() <
Critical Section >ReleaseElidedLock
()
Lock Elision:
Transaction buffers stored in core-local (L1) cache
Conflict-detection and ensuring atomicity piggyback on cache-coherence protocolSlide6
6
AddressMapping Table
Page B
Page DPage C
Logical pointer
Physical pointer
Page A
A
B
C
D
Bw-Tree
1
(A Lock-free B-Tree)
[1] The
Bw
-Tree: A B-tree for New Hardware. Levandoski, Lomet,
Sengupta
. ICDE ‘13Slide7
7
Bw-Tree1 (Lock-free Updates)AddressMapping Table
PPage PΔ: Insert record 50
Δ: Delete record 48
Δ
: Update record 35
Δ
: Insert Record 60
Consolidated Page P
[1] The
Bw
-Tree: A B-tree for New Hardware. Levandoski, Lomet,
Sengupta
. ICDE ‘13Slide8
OverviewQ1: Does
HTM obviate the need for crafty lock-free designs?Q2: What if all technical limitations are overcome?Q3: Can lock-free data-structures benefit from HTM?8Slide9
HTM Parallelized B-TreeWrap individual tree operations in a
transactionEffortless parallelization of existing single-threaded implementationsState-of-the-art in using HTM for database indexing [1,2]Using the Google B-Tree implementation [3] In-memory single-threaded B-Tree9Q1: Does HTM obviate the need for crafty lock-free designs?[3] https://code.google.com/p/cpp-btree/[2] Improving In-Memory Database Index Performance with Intel®Transactional Synchronization ExtensionsKarnagel et al. HPCA 2014[1] Exploiting Hardware Transactional Memory in Main-Memory Databases. V. Leis, A. Kemper, T. Neumann. ICDE 2014Slide10
HTM Parallelized B-TreeWorks well for simple use-casesSmall key and payload sizes
8B Keys, 8B Payloads4M Key-Payload pairsRandom read-only workload10Q1: Does HTM obviate the need for crafty lock-free designs?Slide11
HTM Parallelized B-TreeTransaction size limited by cache size. (32KB L1 cache, 8-way associativity)
11Q1: Does HTM obviate the need for crafty lock-free designs?Sensitive to payload sizeSensitive to tree sizeHyper-threading
Even more sensitive to key sizeSlide12
Overview
Q1: Does HTM obviate the need for crafty lock-free designs?Q2: What if all technical limitations are overcome?Q3: Can lock-free data-structures benefit from HTM?12Slide13
Lock-free vs HTMLock-free
Bw-Tree and HTM both offer optimistic concurrency controlHTM-parallelized data-structures can also provide lock-freedomCan HTM be seen as a hardware-accelerated version of lock-free algorithms?Fundamental difference:Lock-free (Bw-Tree) -> copy-on-write (MVCC-like)Transactional memory -> atomic update in-place (2PL-like)Different behavior under read-write contention13Q2: What if all technical limitations are overcome?Slide14
Read-write ContentionExperimental Setup
4 read-only point lookup threads 0-4 write-only point update threadsZipfian skew (s = 2) Workload AFixed-length 8-byte keys & payloadWorkload BVariable length (30-70 byte keys)256-byte payloads14Q2: What if all technical limitations are overcome?Workload AWorkload BSlide15
Overview
Q1: Does HTM obviate the need for crafty lock-free designs?Q2: What if all technical limitations are overcome?Q3: Can lock-free data-structures benefit from HTM?15Slide16
HTM-enabled Lock-free B-TreeBw-Tree Problem: Code complexity
Structure modification operations (SMOs) such as page split, merge require multi-word CASBw-Tree separates SMOs into multiple sub-operationsReasoning about all possible race-conditions is hardUse HTM as hardware support for multi-word compare-and-swapSMOs can be installed in a single operationSmall transaction footprint -> avoid capacity problems16Q3: Can lock-free data-structures benefit from HTM? Slide17
ConclusionDoes HTM obviate the need for crafty lock-free designs?
No. Technical limitations prohibit use of HTM as a general purpose solution.What if all technical limitations are overcome?No. There are still important fundamental differences.Can lock-free data-structures benefit from HTM?Yes. Using HTM for MW-CAS can simplify lock-free designs17Slide18
Conclusion18