PPT-Efficient Warp Execution in Presence of Divergence with

Author : myesha-ticknor | Published Date : 2018-02-20

Collaborative Context Collection Farzad Khorasani Rajiv Gupta Laxmi N Bhuyan UC Riverside The 48th Annual IEEEACM International Symposium on Microarchitecture

Presentation Embed Code

Download Presentation

Download Presentation The PPT/PDF document "Efficient Warp Execution in Presence of ..." is the property of its rightful owner. Permission is granted to download and print the materials on this website for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Efficient Warp Execution in Presence of Divergence with: Transcript


Collaborative Context Collection Farzad Khorasani Rajiv Gupta Laxmi N Bhuyan UC Riverside The 48th Annual IEEEACM International Symposium on Microarchitecture MICRO 2015 One PC for the SIMD group warp. High Performance Computing . for Engineering Applications. “Computers are useless. They can only give you . answers.” . Pablo Picasso. © Dan Negrut, . 2011. ME964 . UW-Madison. Execution Scheduling in CUDA. Light-weight Error detection for GPGPU. Hyeran. . Jeon. and . Murali. . Annavaram. University of Southern California. S. upported . by. Reliability Concern in GPGPU. Many . of the . top-ranked supercomputers . : Speeding up GPU Warps . by . Reducing Memory . Pitstops. Ankit Sethia* D. . Anoushe. Scott Mahlke. . Jamshidi. University of Michigan. Graphics. Simulation . Linear Algebra. Data Analytics. to Improve GPGPU Performance. Rachata. . Ausavarungnirun. Saugata. . Ghose, . Onur. . Kayiran, Gabriel H. . Loh. . Chita . Das, . Mahmut. . Kandemir. , . Onur. . Mutlu. Overview of This Talk. Problem: . T. Rogers, M O’Conner, and T. . Aamodt. MICRO 2012. Goal. Understand the relationship between schedulers (warp/wavefront) and locality behaviors . Distinguish between inter-wavefront and intra-wavefront locality. Farzad Khorasani. , . Rajiv Gupta. , . Laxmi. N. . Bhuyan. University of California Riverside. Scalable SIMD-Efficient Graph Processing on GPUs. Graph Processing. Building blocks of data analytics.. Loose Round Robin (LRR). Goes around to every warp . and issue if ready (R). If warp is not ready (W), . skip and issue next ready warp. Issue: Warps all run at the same speed,. potentially all reaching memory access. T. Rogers, M O’Conner, and T. . Aamodt. MICRO 2012. Goal. Understand the relationship between schedulers (warp/wavefront) and locality behaviors . Distinguish between inter-wavefront and intra-wavefront locality. Value . Similarity . Daniel Wong. †. , Nam Sung Kim. ‡. , . Murali. . Annavaram. ¥. †. University of California, Riverside. dwong@ece.ucr.edu. ‡. University of Illinois, Urbana-. Champagin. Farzad Khorasani. , . Rajiv Gupta. , . Laxmi. N. . Bhuyan. University of California Riverside. Scalable SIMD-Efficient Graph Processing on GPUs. Graph Processing. Building blocks of data analytics.. Reducing Branch Divergence in GPU Programs. . T. D. Han and T. Abdelrahman. GPGPU 2011. Reading. T. D. Han and T. Abdelrahman, “Reducing Branch Divergence in GPGPU Programs,” GPGPU 2011. Goal. Improve the utilization of the SIMD core. Niladrish Chatterjee. Mike O’Connor. Gabriel H. . Loh. Nuwan. . Jayasena. Rajeev . Balasubramonian. Irregular GPGPU Applications. Conventional GPGPU workloads access vector or matrix-based data structures. Profiling, AWS Cluster. Synchronization. Ideal case for parallelism: . no resources shared between threads. no communication between threads. . Many algorithms that require just a little bit of resource sharing can still be accelerated by massive parallelism of GPU. Characterization. , . Impact. , and . Mitigation. Ping Xiang. ,. . Yi Yang,. . Huiyang. . Zhou. 1. The 20th IEEE International Symposium On High Performance Computer Architecture. , Orlando, Florida, .

Download Document

Here is the link to download the presentation.
"Efficient Warp Execution in Presence of Divergence with"The content belongs to its owner. You may download and print it for personal use, without modification, and keep all copyright notices. By downloading, you agree to these terms.

Related Documents