Mohammadhossein . Behgam. Agenda. Need for parallelism. Challenges. Image processing algorithms. Data handling & Load Balancing. Communication cost & performance. What is the problem?. Image Processing applications can be very computationally demanding due to:. ID: 726512
DownloadNote - The PPT/PDF document "Parallel Image Processing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Parallel Image Processing
Need for parallelism
Image processing algorithms
Data handling & Load Balancing
Communication cost & performanceSlide3
What is the problem?
Image Processing applications can be very computationally demanding due to:
Large amount of data
Short response time
Complexity of the algorithm
A typical desktop workstation does not have sufficient computational resources to support large scale image processing.Slide4
Suppose a pixmap has 1024 x 1024, 8-bit pixel.
Storage requirement is
Bytes (1 Mbytes).
Suppose each pixel must be operated upon just once.
operations are needed in the time of one frame.
At sec/op (10 ns/op) this would take 10 ms.In real-time applications, the speed of computation must be at the frame rate (typically 60-85 frames/sec)All pixels in the image must be processed in the time of one frame that is, in 12-16 ms.Typically, many high-complexity operations must be performed, not just one operation.`
How To Alleviate This Problem
Due to the nature of many image processing algorithms an effective way to alleviate this problem is through parallel computation.
image processing routines can achieve near linear speed-up with the
A wide range of general purpose or custom hardware has been used for image processing.SIMD: using data parallelism,Suitable for low level image analysis where each processor performs a uniform set of operations based on the image data matrix in a fixed amount of timeMIMD: using task parallelism and pipeliningSuitable for high level image processing simulations, such as pattern recognition, where each processor is assigned an independent task.Slide6
Things To Consider
library for parallel image
challenges. A library must be:
usable in numerous parallel execution environmentsusable by a wide variety of userspresenting a programming interface suitable for its audience image processing experts, not parallel computing expertsAlthough applying parallel computing techniques to image processing seems attractive at first glance, in general, one wants to avoid reinventing the wheel.Slide7
should be hidden
Attention has to be given
to serial performance in the library; optimization techniques should be applied to exploit data locality and multi-level memory hierarchies present in modern RISC architecturesLoad balancing algorithms should be developed and implemented that automatically (and transparently) distribute the computational load over heterogeneous active workstation clusters in an attempt to minimize overall wall clock execution time.Slide8
Low Level Image Processing
Operates directly on
single or neighborhood of stored image pixels
to improve/enhance it.Slide9
Coarse-grained data decomposition for an image processing algorithmSlide10Slide11
Number of steps can be reduced
by separating the computation
data transfer steps in lock-step data-parallel fashion.Slide13Slide14
Parallel Implementation of Active Contour AlgorithmSlide16
Data Handling & Shared Memory
A Thread must have exclusive access to a shared variable for both read and write back
One must protect those sections of the code that must be protected, but at the same time, use of mutual exclusion serializes protected section of the code, so such protection should be kept to an absolute minimumSlide17
Giving equally sized slices to each workers.
Suitable for dedicated parallel machines, or unloaded homogeneous workstation clusters, since every node will process the same amount of work in the same amount of time
, First-Serve (FFFS
)Divides the input image into more slices that there are nodes available.Faster node can request more work from manager.Redundant FFFSSlide18
Redundant FFFs load Balancing Algorithm
Divide input image into
Send each of the
workers a sliceMark each worker as “working”Mark N slices as “sent”; mark remaining slices as “not sent”While there are more slices to processReceive “done” message from worker xIf worker is marked as “working”Receive output slice from worker xMark slice as “done”Mark worker as “idle”If other workers redundantly processing the same sliceSend “abort” messagesIf there are more slices that are not being processedSend next sliceMark worker x as “
Mark slice as “
Else if there are more slices that have not been returned yet
Find a “
” slice, send to worker x
Mark worker as “
Send abort messages to workers marked as “
32Average filter1.9832.4003.9365.6475.743Square median filter1.9903.907
Parallel Image Processing, CHARALAMBOS D.
Image Processing System on a Cluster of Personal Computers, J. Barbosa, J. Tavares and A.J.
and Distributed Algorithms for High Speed Image Processing, Jeffrey M. Squyres, Andrew LumsdaineParallel Image Segmentation Using Reduction-Sweeps On Multicore Processors and GPUs, Renato Farias, Ricardo Farias, Ricardo MarroquimA toolkit for parallel image processing, J. M. Squyres, A. Lumsdaine, R. L. StevensonSlide22