An Efficient GPU Implementation of a Tree-based
1 / 1

An Efficient GPU Implementation of a Tree-based

Author : myesha-ticknor | Published Date : 2025-05-10

Description: An Efficient GPU Implementation of a Treebased nBody Algorithm Martin Burtscher Department of Computer Science Highend CPUGPU Comparison Xeon E52687W Kepler GTX 680 Cores 8 superscalar 1536 simple Active threads 2 per core 11 per

Presentation Embed Code

Download Presentation

Download Presentation The PPT/PDF document "An Efficient GPU Implementation of a Tree-based" is the property of its rightful owner. Permission is granted to download and print the materials on this website for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Transcript:An Efficient GPU Implementation of a Tree-based:
An Efficient GPU Implementation of a Tree-based n-Body Algorithm Martin Burtscher Department of Computer Science High-end CPU-GPU Comparison Xeon E5-2687W Kepler GTX 680 Cores 8 (superscalar) 1536 (simple) Active threads 2 per core ~11 per core Frequency 3.1 GHz 1.0 GHz Peak performance (SP) 397 GFlop/s 3090 GFlop/s Peak mem. bandwidth 51 GB/s 192 GB/s Maximum power 150 W 195 W* Price $1900 $500* Release dates Xeon: March 2012 Kepler: March 2012 *entire card An Efficient GPU Implementation of a Tree-based n-Body Algorithm 2 Nvidia Intel GPU Advantages Performance 8x as many operations executed per second Main memory bandwidth 4x as many bytes transferred per second Cost-, energy-, and size-efficiency 29x as much performance per dollar 6x as much performance per watt 11x as much performance per area (based on peak values) An Efficient GPU Implementation of a Tree-based n-Body Algorithm 3 GPU Disadvantages Clearly, we should use GPUs all the time So why aren’t we? GPUs are harder to program and tune than CPUs Easy to make performance mistakes GPUs can only execute some types of code fast Need lots of data parallelism, data reuse, & regularity Mostly regular codes have been ported to GPUs E.g., matrix codes executing many ops/word Dense matrix operations, stencil codes (PDEs) An Efficient GPU Implementation of a Tree-based n-Body Algorithm 4 LLNL GPU Disadvantages Clearly, we should use GPUs all the time So why aren’t we? GPUs are harder to program and tune than CPUs Easy to make performance mistakes GPUs can only execute some types of code fast Need lots of data parallelism, data reuse, & regularity Mostly regular codes have been ported to GPUs E.g., matrix codes executing many ops/word Dense matrix operations, stencil codes (PDEs) An Efficient GPU Implementation of a Tree-based n-Body Algorithm 5 LLNL Our goal is to also handle irregular codes well Outline Introduction GPU programming Barnes Hut algorithm CUDA implementation Experimental results Conclusions An Efficient GPU Implementation of a Tree-based n-Body Algorithm 6 CUDA Programming Model Non-graphics programming Uses GPU as massively parallel co-processor Thousands of threads needed for full efficiency C/C++ with extensions Kernel launch Calling functions on GPU Memory management GPU memory allocation, copying data to/from GPU Declaration qualifiers Device, shared, local, etc. Special instructions Barriers, fences, etc. Keywords threadIdx, blockIdx An Efficient GPU Implementation of a Tree-based n-Body Algorithm 7 Calling GPU Kernels Kernels are functions that run on the GPU Callable by

Download Document

Here is the link to download the presentation.
"An Efficient GPU Implementation of a Tree-based"The content belongs to its owner. You may download and print it for personal use, without modification, and keep all copyright notices. By downloading, you agree to these terms.

Related Presentations

GPU Programming Model Energy-Efficient Query Processing on A GPU Implementation of Extragalactic Radio Source Detectio Particle Systems on GPU Mapping Irregular OpenFOAM on a GPU-based Heterogeneous Cluster Writing Efficient CUDA Programs GPU based ARAP Deformation using Volumetric Lattices Why GPU Computing GPU CPU GPU-based Parallel Collision Detection for Real-time Motion Planning CS 179: GPU  Programming Lecture 7 Week 3 Goals: Advanced GPU- NeuGraph : Parallel Deep Neural Network Computation on Large Graphs Efficient Lists Intersection by CPU-GPU