/
ENEL653 Project ENEL653 Project

ENEL653 Project - PowerPoint Presentation

test
test . @test
Follow
371 views
Uploaded On 2016-08-01

ENEL653 Project - PPT Presentation

By Messaoud Mohamed Anis Outline FFT divide and conquer algorithm ADSP21469 FFT coprocessor Background telemetry channels Hum detector application 2 33 Discrete Fourier Transform DFT ID: 428141

data fft memory btc fft data btc memory accelerator tcb divide step dft buffer conquer channel bit control state

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "ENEL653 Project" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

ENEL653 Project

By

Messaoud

Mohamed

AnisSlide2

Outline

FFT divide and conquer algorithm

ADSP21469

FFT

coprocessorBackground telemetry channelsHum detector application

2

/33Slide3

Discrete Fourier Transform (DFT)

;

Complexity

:

N² complex multiplication

N(N-1) complex addition

 

3

/33Slide4

FFT

Family

of algorithms

based

on the properties of symmetry and periodicity of the twiddle factors

Symmetry

:

Periodicity:  4/33Slide5

Divide And Conquer

decompose N into N=LM

Remap

the 1D array into 2D one

Two possible configurations:

Column wise

n=

l+mLrow wisen=Ml+mx(n) :5/33Slide6

Divide and conquer simplification

 

 

k =

Mp

+ q

n =

mL + l 

 

Remap

the initial

array

Rewrite the DFT

equation

 

6

/33Slide7

Divide and conquer result interpretation

 

M point DFT

Multiplication with special twiddles

L

point DFT

Step1 : L FFT with M points LM² multiplication and LM(M-1) addition

Step2 : N multiplicationStep3 : M FFT with L points ML² multiplication and ML(L-1) additionComplexity:Direct evaluationDivide and conquerMultiplicationN²N(M+L+1)AdditionN(N-1)N(M+L-2)The algorithm can be reiterated on M and L to break the calculation to even smaller DFTs 7/33Slide8

Divide and conquer Algorithm

formulation[1]

Remap the input into a 2D array

Calculate the DFT along the first dimension

Multiply the obtained data by the special twiddle factors

Calculate the DFT along the second dimension

Rearrange the output data

123458/33Picture taken from[1]Slide9

ADSP21469 FFT accelerator

Features:

Supports

FFT sizes from 16 –

points.

Computes

a radix 2 decimation in time algorithm with automated bit reversal.Contains a 1024 32-bit word data memory unit.Contains a 512 32-bit word twiddle coefficients memory unit.Contains a compute block unit with four floating-point multipliers and six floating-point adders. 9/33Slide10

FFT limitation

The size of the local memory is limited so only FFTs of less or equal to 256 points can be calculated internally.

For FFTs longer than 256 the calculation is broken into smaller ones using the divide and conquer approach which decrease the performance

of the accelerator

.Only the internal memory can be used to load and store any data related to the FFT accelerator.

The FFT use the PCLK signal as clock source which runs at half the core clock frequency.

10

/33Slide11

FFT accelerator state machine

Idle State:

used

to program the

accelerator’s

control

registers.Read State: The coefficients and input data are loaded to the local memory.Processing State: the FFT is computed.Write state: In this mode all the computed data is written out to internal memory.11/33Slide12

FFT accelerator control registers

FFTCTL1:

FFTCTL2:

FFTDMASTAT:

12

/33Slide13

FFT local data organisation

Unpacked(only if

)

 

packed

data format

Coefficient format

13/33Slide14

FFT Transfer control block configuration

The local memory is only accessible through DMA

Two DMA channels are reserved for the FFT core

The DMA channels are configured using transfer control blocks (TCB)

FFT transfer control block

This register

acts as a pointer to the next read or write location

provide the signed increment by which the DMA controller post-modifies the corresponding memory index registerindicate the number of words remaining to be transferred to or from memoryConfigure a DMA circular bufferhold the starting address of the TCB for the next DMA operation on the corresponding channel14/33Slide15

TCB configuration for N less than 256

Bits

Name

Description

18-0

IIx

address

address of next chain pointer. This address must be offset by 0x8000019PCIProgram controlled interrupt(PCI)0 = no interrupt after current TCB1 = interrupt after current TCB20 COEFFSELCoefficient select for next TCB0 = next TCB is data TCB1 = next TCB is coefficient TCBTCB Index register15/33Slide16

FFT C-API header file

16

/33Slide17

The

ConfHardFFT

function

Setup FFTCTL2 register

17

/33Slide18

The

RunFFT function

18

/33Slide19

Twiddle generation MATLAB function

19

/33Slide20

Timing Analysis

Cycle count

Total

 

FFT type

Data read

Coefficient read

Data writeCompute  FFT accelerator N<=2562Nx22Nx22N10N+NFFT accelerator N=VxH>256Vertical FFT2Nx22Vx22N28N+Special prod2Nx24Nx2

2N

2 x 4N/4

Horizontal FFT

2Nx2

2Hx2

2N

Cycle count

Total

 

FFT type

Data read

Coefficient read

Data write

Compute

 

FFT accelerator N<=256

2Nx2

2Nx2

2N

FFT accelerator N=

VxH

>256

Vertical FFT

2Nx2

2Vx2

2N

Special prod

2Nx2

4Nx2

2N

2 x 4N/4

Horizontal FFT

2Nx2

2Hx2

2N

Reads from internal memory take 2 cycles/word.

Writes to internal memory take 1 cycle/word.

It takes 2 cycles to compute a single complex butterfly by the FFT core.

20

/33Slide21

Timing Analysis: measured cycles

FFT type

Speed ratio

Hardware

Software

256 points

9319 cycle

2623 cycle3.55512 points38919 cycle5303 cycle7.331024points-11151 cycle-21/33Slide22

Background telemetry channels

It is

a debug feature enabling real time data exchange with the processor via the JTAG interface without interrupting processor

execution.

BTC uses an intermediary I/O buffer in which the application has to periodically put the data.

22

/33Slide23

How to use BTC step 1: adding necessary files

Add the BTC library file

Include BTC headers and

signal.h

23

/33Slide24

How to use BTC step

2: BTC initialization

The BTC initialization is done by calling the function

btc_init

().For the ADSP21469, the BTC commands from the host are processed via the low-priority emulator interrupt (EMULI). So the corresponding interrupt vector must be installed.

24

/33Slide25

How to use BTC step

3: Channel

definition

Buffer used by the application

Intermediary buffer used by BTC

Channel name

(less than 32char)

Channel starting addressChannel size in 32-bit words25/33Slide26

BTC data must be updated periodically by the application.

Update rate shouldn’t be too high (a maximum of ~1.5 to 2

MBytes

of data per

second are supported)How to use BTC step 3: Data update

Channel number

buffer starting address

Buffer size in 32-bit words26/33Slide27

How to use BTC step

3: Display Data(BTC memory)

27

/33Slide28

How to use BTC step

3: plot Data

Use the

intermediary

buffers

28

/33Slide29

Hum

detector: Initialization and update tasks

Initialization task

Periodic Task

29

/33Slide30

Hum detector :

preemptive Task

30

/33Slide31

Hum detector : Screen Capture

31

/33Slide32

Hum detector : resolution

49+51Hz

49+50Hz

49+52Hz

49+62Hz

32

/33Slide33

References

[1]

Proakis

, Digital Communications, 5th edition, ISBN: 0-07-295716-6, 2008[2] ADSP-214xx SHARC® Processor Hardware Reference, Analog Devices

[3] SHARC®

Processor Programming Reference, Analog Devices

33/33