/
Accelerating Simulation of Agent-Based Models Accelerating Simulation of Agent-Based Models

Accelerating Simulation of Agent-Based Models - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
447 views
Uploaded On 2016-05-17

Accelerating Simulation of Agent-Based Models - PPT Presentation

on Heterogeneous Architectures Jin Wang Norman Rubin Haicheng Wu Sudhakar Yalamanchili Georgia Institute of Technology AMD The author is now affiliated with NVIDIA Research ID: 323459

compute cpu states gpu cpu compute gpu states agent unit sort memory architectures neighbors model integrated merge locate based

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Accelerating Simulation of Agent-Based M..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Accelerating Simulation of Agent-Based Models on Heterogeneous Architectures

Jin Wang†, Norman Rubin‡*, Haicheng Wu†, Sudhakar Yalamanchili†† Georgia Institute of Technology‡ AMD* The author is now affiliated with NVIDIA Research

1Slide2

Discrete Heterogeneous Architectures

Discrete GPU and CPU connected by PCIe busPowerful GPUs and CPUsSlow PCIe bus2Compute UnitCompute Unit

Compute Unit

Compute Unit

Compute Unit

Compute Unit

Compute Unit

Compute Unit

Device Memory

GPU (e.g. AMD Southern Island)

CPU (e.g. Intel Core i7)

PCIe

bus

Host MemorySlide3

Integrated Heterogeneous Architectures

CPU and GPU one the same dieLess powerful CPU and GPUFaster On-chip Memory BusE.g. AMD Fusion APU3GPUCompute UnitCompute Unit

Compute Unit

Compute Unit

CPU

System Memory

Physical Memory

Host

Mem

Dev

Mem

CPU

GPU

UNB

L2

WCSlide4

Agent-Base Model on Heterogeneous Architectures

Agent-based Model:Time-step or event-driven simulation of a group of agents with statesOn Discrete GPU:Intrinsic Parallel structure for GPU implementationsCPUs only transfer data and are idle most of timeOn Integrated Architectures:More computation capability can be extracted from CPU4Agent StatesAgent StatesAgent States

Transit Functions

Updated States

Updated States

Updated StatesSlide5

This WorkGoal: Efficiently use integrated CPU-GPU architectures for agent-based model simulations

Proposed: A massively parallel implementation of agent-based model on GPUsAn optimization for integrated architectures that moves a portion of computation to CPUUses Traffic Simulation as an example for Agent-based Model5Slide6

Traffic Simulation

Agent-based Model SimulationTwo-lane TrafficDepends on close neighborsStatesvelocityxPositionLanevehicleTypeTransit FunctionsAcceleration Function: Depends on preceding neighborLane-change Function: Depends on preceding and back neighbor on both lanesThree neighbors: Preceding neighbors in both lanes / back neighbor in the other lane6

xPositionSlide7

GPU Massively Parallel Implementation

Data structure for statesStructure of ArraysSorted according to x positionsStored in Global MemoryMappingOne work-item for one vehicleA work-group for a block of vehiclesThree stepsLocate NeighborsUpdate StatesSort states according to x positions7

Kernel 1

Kernel 2Slide8

Locate Neighbors

Target: locate neighbors from mixed lanesTwo stages, both of which have BSP structureStage 1: Locate group neighborsStage 2: Locate individual neighbors within a group8Slide9

Stage 1: Locate group neighborsStage 2: Locate individual neighbors within a

groupLoad group neighbors with current block to local memoryLocate Neighbors Cont.9Slide10

Update States and Sort

For each vehicle (mapped to each work-item)Get the neighbor index (from “locate-neighbor” step)Load neighbor data (velocity, xposition)Compute new acceleration, velocity and xposition according to neighbor data using the transit functionsStore the newly update states to global memorySort vehicles according to x positionSort is necessary becauseLane-changingOne lane is moving faster than the otherIn other agent-based model:Restructuring mechanism the same as or similar to sorting10Slide11

Experiments Platforms

Discrete Platform:GPU: AMD Radeon HD7950Southern Island (GCN)28 Compute Unites850MHzCPU: Intel Core i7-9202.66GHz4 CPU Cores / 8 ThreadsIntegrated Platform:AMD Trinity APU A10-5800K4 CPU CoresHD7660D GPUNorthern Islands (4-way VLIW)6 Compute Unites800MHz11Slide12

Performance for GPU Implementation

Sort consumes lots of time!12

Radix Sort

Bitonic

Sort

Odd-Even SortSlide13

Optimization for Integrated Architectures

Move some computation to CPUUtilize faster on-chip memory bus13Non-zero copy memory accesszero copy memory accessSlide14

Optimization: Local Sort and CPU Merge

Most of time: sort is only required within block (local sort)Some time: merge required across blocks14Slide15

Optimization: Local Sort and CPU Merge Cont.

Merge neighbor blocks on CPU if necessaryCompare max X Position in current block with min X Position in the next blockThere can be consecutive mergeMaximum consecutive blocks to be merged15Slide16

Benefit of Proposed Optimization

Reduce global workloadMerge algorithm is serial and can have CPU as its more natural venueCommunication between CPU and GPU for merge stage is faster on integrated platform through on-chip memory bus16Slide17

ResultsSpeedup over pure GPU implementation on Discrete Platform

17Even worse than baselineSlide18

Results Cont.

Breakdown for States Update / Sort / CPU Merge18Slide19

Results Cont.

Breakdown for States Update / Sort / CPU Merge19Slide20

Results Cont.

Breakdown for States Update / Sort / CPU Merge20Slide21

ConclusionOptimization of agent-based model on Integrated Architectures through traffic simulation problem

Utilize computation capability of both CPU and GPUMemory access from host to device is faster through the on-chip memory busProvides insight to mapping traditional GPGPU applications to integrated architectures21Slide22

Thank you!

Questions?22Slide23

Appendix: Traffic Simulation Models

Acceleration ModelLane-changing ModelInteract with front vehicle before/after lane-changing, back vehicle after lane-changing s’ > minGap s’’ > minGap acc' (M') - acc (M) > p [ acc (B') - acc' (B') ] + athr

Reference: Martin Treiber

and

Arne

Kesting

. An open-source microscopic traffic simulator.

Intelligent Transportation

Systems Magazine, 2(3):6{13, Fall 2010

.

acc’(M’)

acc

(M)

acc(B’), acc’(B’)

s'

s''

23

velocity

distance to front vehicle

velocity difference from front vehicle