/
Hoda   NaghibiJouybari Hoda   NaghibiJouybari

Hoda NaghibiJouybari - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
342 views
Uploaded On 2019-11-08

Hoda NaghibiJouybari - PPT Presentation

Hoda NaghibiJouybari Khaled N Khasawneh and Nael Abu Ghazaleh Constructing and Characterizing Covert Channels on GPGPUs Covert Channel Malicious indirect communication of sensitive data ID: 764702

channel thread memory warp thread channel warp memory covert cache bandwidth spy trojan channels gpu scheduler inter high latency

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hoda NaghibiJouybari" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Hoda NaghibiJouybariKhaled N. Khasawneh and Nael Abu-Ghazaleh Constructing and Characterizing Covert Channels on GPGPUs

Covert ChannelMalicious indirect communication of sensitive data.1 Trojan Spy Covert Channel Gallery App Weather App Why? There is no communication channel. The communication channel is monitored. Covert channel is undetectable by monitoring systems on conventional communication channels.

Covert channel are a substantial threat on GPGPUsTrends to improve multiprogramming on GPGPUs.GPU-accelerated computing available on major cloud platforms 2 No protection offered by an Operating system High quality (low noise) and Bandwidth

OverviewThreat: Using GPGPUs for Covert Channels.To demonstrate the threat: We construct error-free and high bandwidth covert channels on GPGPUs.Reverse engineer scheduling at different levels on GPU Exploit scheduling to force colocation of two applications Create contention on shared resourcesRemove noise Key Results: Error-free covert channels with bandwidth of over 4 Mbps. 3

4 Intra-SM Channels: L1 constant cache, functional units and warp schedulers Inter-SM Channels: L2 constant cache, global memory GPU Architecture

5GPGPU Programming

6Attack Flow

Colocation (Reverse Engineering the Scheduling)7 GPU Thread Block Scheduler Interconnection Network L2 Cache and Memory Channels SM 0 SM 1 SM m TB 0 TB 1 TB n TB 0 TB 1 TB n Kernel 1 Kernel 2 TB 0 TB 1 TB n TB 0 TB 1 TB n Leftover Policy Step 1: Thread block scheduling to the SMs

8 Warp Scheduler Warp Scheduler Dispatch Unit Dispatch Unit Register File Shared Memory / L1 Cache SP SP SP SP SP SP DP DP L/D L/D SFU SFU SM k TB i TB j W 0 W 1 W k TB TB W k-1 W 0 W 1 W k W k-1 Step 2: Warp to warp schedulers mapping

9Attack Flow

Cache Channel (intra-SM and inter-SM)Extracting the cache parameters using latency plot. (cache size, number of sets, number of ways and line size)Communicating through one cache set.10 Spy Constant Cache set x Constant Memory Spy Data Array (SD) Trojan Data Array (TD) Trojan Send 1 Eviction of Spy data Cache misses Higher Latency Send 0 No Access! Cache Hit Low Latency

Synchronization:11L1 Constant cache Trojan Spy 1 Wait ( ReadytoSend ) 1 Wait ( ReadytoReceive ) 1 0 0 1 1 0 Thread 0-5 … 011001 Thread 0-5 Receive 6 bits

12Synchronization and Parallelization GPU SM 0 SM 1 SM n …

SFU and Warp scheduler Channel (intra-SM)13 Kepler SM Limitation on number of issued operations in each cycle: Type and number of functional units . Issue bandwidth of warp schedulers Contention is isolated to warps assigned to the same warp scheduler.

14SFU and Warp scheduler Channel (intra-SM)Does operations to the target functional unit to create contention to send “1”.No operation to send “0”.Does operations to the target functional unit and measures the time.Low latency: “0” High latency: “1” Trojan Spy Communicating different bits through warps assigned to different warp schedulers. Parallelism at Warp Scheduler level Parallelism at SM level Improved BW Base Channel

15Attack Flow

16 GPU SM … Kmeans Back Propagation Heart Wall K-Nearest Neighbor … What about other concurrent applications co-located with spy and trojan ?

Exclusive Colocation of Spy and TrojanConcurrency limitations on GPU hardware (leftover policy):Shared MemoryRegisterNumber of Thread blocks 17 GPU SM … TB 0 TB 1 TB n TB 0 TB 1 TB n Spy Trojan Spy Trojan Kmeans Back Propagation Heart Wall K-Nearest Neighbor … No Resource Left! Shared Memory Register Shared Memory Register Prevented interference from Rodinia Benchmark workloads on covert communication and achieved error free communication in all cases.

18ResultsL1 Cache Covert channel bandwidth on three generations of Real NVIDIA GPUs 1.7 x 3.8 x 12.9 x Error-free bandwidth of over 4 Mbps The fastest known micro-architectural covert channel under realistic conditions .

19ResultsSFU Covert channel bandwidth on three generations of Real NVIDIA GPUs 3.5 x 13 x

ConclusionGPUs improved multiprogramming makes the covert channels a substantial threat.Colocation at different levels by leveraging thread block scheduling and warp to warp scheduler mapping.GPU inherent parallelism and specific architectural features provides very high quality and bandwidth channels; up to over 4Mbps error-free channel.20

21Thank You!

Normal Load and Store Operations:High memory bandwidth >> So many operations are required to saturate the bandwidth; each with high latency >> low bandwidth channel!Atomic Operations:Rely on atomic units that are limited in number >> possible to cause measurable contention.22 Memory Channel (inter-SM) 3

Memory Channel (inter-SM)23Scenario 1: Each thread does atomic additions on one particular global memory address. Trojan Spy Thread 0 Thread 1 Thread 2 Thread n Thread 0 Thread 1 Thread 2 Thread n

24 Thread 0 Thread 1 Thread 2 Thread 0 Thread 1 Thread 2 Thread 0 Thread 1 Thread 2 Thread 0 Thread 1 Thread 2 … … … … Trojan Spy Memory Channel (inter-SM) Scenario 2: Each thread does atomic additions on strided global memory addresses. Accesses for all threads in each warp are coalesced

25 Thread 0 Thread 0 Thread 1 Thread 1 Thread 2 … … Trojan Spy Memory Channel (inter-SM) Scenario 3: Each thread does atomic additions on consecutive global memory addresses Thread 2 … Thread 0 Thread 0 Thread 1 Thread 1 Thread 2 … … Thread 2 … Accesses for all threads in each warp are Uncoalesced

Results26Coalescing behavior on GPUs improves the channel bandwidth. Memory Covert channel bandwidth on three generations of Real NVIDIA GPUs