/
Configurable and Scalable Belief Propagation Accelerator fo Configurable and Scalable Belief Propagation Accelerator fo

Configurable and Scalable Belief Propagation Accelerator fo - PowerPoint Presentation

trish-goza
trish-goza . @trish-goza
Follow
392 views
Uploaded On 2017-04-02

Configurable and Scalable Belief Propagation Accelerator fo - PPT Presentation

Jungwook Choi and Rob A Rutenbar Belief Propagation FPGA for Computer Vision Variety of pixellabeling apps in CV are mapped to probabilistic graphical model effectively solved by BP ID: 532697

configurable slide mem message slide configurable message mem jump passing middlebury feed stereo results flooding matching opengm scalable architecture

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Configurable and Scalable Belief Propaga..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Configurable and Scalable Belief Propagation Accelerator for Computer Vision

Jungwook Choi and Rob

A.

RutenbarSlide2

Belief Propagation FPGA for Computer Vision

Variety

of pixel-labeling apps in CV are mapped to probabilistic graphical model, effectively solved by

BPFPGA acceleration  Better {Performance/Watt} + Reconfigurability

Slide 2

Stereo Matching

Image

denoising

Object segmentationSlide3

Before: Point-Accelerators for BP [FPL 2012, FPGA 2013]

Very fast stereo matching, but

not

configurable to other BP problemsPipelined, but not scalable/parallel (only one PE consumes entire mem BW)

Slide 3

Video

Stereo Matching Benchmark

Pipelined Message Passing ArchSlide4

New: Scalable/Configurable BP Architecture

Not

just a pipeline any longer:

really parallel… Slide 4

P Parallel

processor elements

(pixel streams)

Efficient new memory

subsystem overlaps

BW and computation,

checks for data conflicts

Novel, Configurable

Factor-Evaluation

unit removes the

O

(|Labels|

2

) complexitySlide5

Fast Configurable Message Passing: Jump Flooding

Problem: BP

message

computation quadratic in L=|Labels|Solution: Jump Flooding* BP msg approx = L log(L)Analogy: Like “FFT”, smart order for label arith

& comparisons Slide

5

*[

Rong

, Tan, ACM

Symp

Int

3D, 2006]

Jump Flooding

Message Passing Unit

Cost

fn

for

inrerenceSlide6

Results: Positive Scalability

2, 4 PEs

running (limited by Xilinx V5 size);

sims 1-16 PEsParameterized by “Bandwidth needed to feed P processors”If we can feed the architecture – promising scalability Slide 6

Normalized Mem BW to Feed P

Proc

(mem

blocksize B=4 fixed)

Execution Time vs (Mem BW for P processors)

P=2

P=4Slide7

Results: Configurable BP Architecture

12-40X

faster

than software (PE = 4); no loss of result qualityFirst “custom HW” to ever run >1 {Middlebury,OpenGM} benchmarks

Slide 7

Comparison of Execution Time (in sec) for

{

Middlebury[1], OpenGM

[2]} Benchmarks

Inference Results for {

Middlebury,OpenGM

}

[1] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother, “A comparative study of energy minimization methods for Markov random fields with smoothness-based priors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 6, pp. 1068– 1080, 2008

.

[2] J. H. Kappes, B. Andres, F. Hamprecht, C. Schnorr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, J. Lellmann, N. Komodakis et al., “A comparative study of modern inference techniques for discrete energy minimization problems,” in CVPR. IEEE, 2013, pp. 1328–1335.

Speed comparable to “point-accelerator”Slide8

Slide

8