/
ECE 506 ECE 506

ECE 506 - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
361 views
Uploaded On 2016-04-07

ECE 506 - PPT Presentation

Reconfigurable Computing httpwwwecearizonaeduece506 Lecture 3 Reconfigurable Architectures Ali Akoglu Complex Programmable Logic Device Hierarchical design against size explosion of PLAs ID: 276119

input lut inputs fpga lut input fpga inputs sram product output components luts logic terms programming gate function device

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "ECE 506" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

ECE 506

Reconfigurable Computing

http://www.ece.arizona.edu/~ece506

Lecture 3

Reconfigurable Architectures

Ali AkogluSlide2

Complex Programmable Logic Device

Hierarchical design against size explosion of PLAs

Combinational logic with Flip Flops (registered output)Organized into logic blocks connected in an interconnect matrix Usually enough logic for simple counters, state machines, decoders, etc. Slide3

Xilinx

CoolRunner

II CPLDPLA and Macrocell combination 1.8V device, estimated power consumption of less than 100 micro ampsUp to 12,000 gates, 512 MacroCells

Slide4

CPLD

Multiple Function

Blocks (FBs) and I/O Blocks (IOBs) Fully interconnected (FB outputs and input signals to the FB Inputs)Each FB provides programmable logic 54 inputs,18 outputs.The IOB provides buffering for device inputs and outputs.

Output

enable signals

drive directly

to the IOBs

.Slide5

Function Block

Comprised of 18

independent macrocells, Each can implement a combinatorial or registered function. Logic

within the FB is implemented using a

sum-of-products representation

.

Fifty-four

inputs

(108

true and

complement signals)

into the programmable AND-array to form 90 product terms. Any number of these product terms, can be allocated to each macrocell by the product term allocator.

How many product terms would you assign for each

M

acrocell

?Slide6

Macrocell

Product Term Allocator selects: 5 product

terms

primary

data inputs

to

the OR

gate for combinatorial

functions,

as

control inputs

(clock, clock enable, set, reset, output en.)configured for a combinatorial or registered function. Slide7

Product Term Allocator

Controls

how the five direct product terms are assigned to each MC. For example, all five direct terms can drive the OR function

.Slide8

Product Term Allocator

Can re-assign other product terms within the FB to increase the logic capacity of a

macrocell beyond five direct terms. Any macrocell requiring additional product terms can access uncommitted product terms

in other

macrocells

within the FB.

Up

to 15

product terms

can be available to a single

macrocell

with only a small incremental delay (tPTA)Slide9

Product Term AllocatorSlide10

Product Term Allocator

Can

re-assign product terms from any macrocell within the FB by combining partial sums of products over several macrocells

What is the

incremental delay

in this example

2

t

PTA

If all 90 product terms are available to any

macrocell

, what is the maximum incremental delay? Slide11

Programmability Options

PLDs,

CPLDs have different types of programmability.initial programming and reprogrammingOne-time programmable: device is programmed once and holds its programming "forever" usually uses fuses to make/break linksnot reusable, but usually the cheapest

discard device if changes are to be madeSlide12

Programmability Options

UV-Erasable (EPROM)

a floating gate positioned between regular MOS transistor control gate and the channel.floating gate is unchargedTo program the cell:

a

high voltage (e.g.

14 volts

)

applied

to the control gate

(drain is at ~12 volts).

causes current

to flow between the source and drain.accelerates electrons to high velocity and a small fraction of them traverse the thin oxide and become trapped on the floating gate. floating gate, surrounded by an insulating layer, becomes “permanently” negatively charged and the transistor is permanently turned off. “Permanent” means about 10 years at 125 degrees C; at higher temperatures this time is reduced.

Cells erased by Ultra-Violet (UV) light. electrons on floating gates are excited and discharged to the substrate. Slide13

Programmability Options

Electrically Erasable

(EEPROM)uses a floating gate structure with a control gate on top.both erasing and reprogramming is accomplished with an electrical current device can be programmed/erased on circuit board, no special packaging or IC socket is needed

erase time is much faster than UV erase

programming retained after power down

non-volatile

programming/erasing limited to 1000s of cycles

Slide14

Programmability Options

Electrically Erasable:

both erasing and reprogramming is accomplished with an electrical current device can be programmed/erased on circuit board, no special packaging or IC socket is needed erase time is much faster than UV erase programming retained after power downnon-volatile

programming/erasing limited to 1000s of cyclesSlide15

Electrically Erasable PLDs

Conventional PLDs are either

One-time programmableUV ErasableMust be placed in a programmer to program themEE PLDs can be programmed and erased in place

A small (four wire) connection to a

computer

is needed

Once programmed, will retain program

indefinitely

Never have to take the chip out of its circuitSlide16

FPGA

Introduced

in 1985 by Xilinx Similar to CPLDs A function to be implemented in FPGA Partitioned into modules , each implemented in a logic block. Logic

blocks

connected with the

programmable interconnection.

Slide17

FPGA Technology

1)

Antifuse-basedRealization of interconnections2) Memory-based. realization of interconnections and computation FLASH, SRAM

Slide18

FPGA Technology

Antifuse FPGAs:configured by burning a set of fuses. once configured, cannot be altered any

more

bug

fixes and updates possible for

new PCBs

, but hardly for already

manufactured boards

.

ASIC

replacement for small volumes.Flash FPGAsmay be re-programmed several thousand times and are non-volatileExpensive, re-configuration takes several secondsSRAM FPGAsdominating technologyunlimited re-programming

additional circuitry is required to load the configuration into the FPGA after power onre-configuration is very fast, Some devices allow even partial re-configuration during operationSlide19

Antifuse

(

Actel FPGA)An antifuse is normally an open circuit. Two-terminal elements connected

to

upper

and lower

layer of

the

antifuse

, in the middle

is a

dielectric (Oxygen-Nitrogen-Oxygen, ONO) layerInitial state: High resistance of dielectric does not allow any current to flow. Applying a high voltage: causes large power dissipation and melts the

dielectricDrastically reduces the resistance a link can be built, which permanently connects the two layers. Slide20

Antifuse

chips

Advantage ! Small area With metal-to-metal anti-fuses, no silicon area is required to make connections, decreasing the area overhead of programmability.

M

uch lower

resistance and parasitic capacitance

over transistors.

possible to include more switches per

device

reduces

the RC delays in the routing.

No bitstream can be intercepted in the field (no bitstream transfer)Need a Scanning Electron Microscope to try to know antifuse

states (an Actel AX2OOO antifuse FPGA contains 53 million antifuses with only 2-5% programmed in an average design)Interconnect structure is naturally “rad hard,”

relatively

immune to

the effects

of

radiation (except flip-flops!),

SRAM-based component can

be “flipped” if

hit

by

radiationSlide21

Antifuse

chips

Disadvantage !not suitable for devices that must be frequently reprogrammedone-time programmable FPGAs.

special

programmers must be used to

program a

device before it is mounted on a final

product

involves significant changes to the properties of the materials

in the

fuse,

leads to scaling challenges when new IC fabrication processes are considered Slide22

Programmability Options

Static Random Access Memory (SRAM) Programming:

Switch is a pass transistor controlled by the state of the SRAM bitLogic block configuration bits are stored in SRAM can be reprogrammed infinite number of

times

use of standard CMOS process

technology

SRAM

cells are created using exactly the same CMOS technologies as the rest of the device,

No

special processing steps are required in order to create these components.

benefit

from the increased integration, higher speeds and lower dynamic power consumption of new processes with smaller minimum geometries.Slide23

Programmability Options

SRAM

Volatilityprogramming contents NOT retained after power downexternal non-volatile memory device required on

power up

SRAM Size

SRAM

cell requires either 5 or 6 transistors and

the programmable

element used to interconnect signals

requires at

least a single transistor.

SRAM SecuritySince the configuration information must be loaded into the device at power up, there is the possibility that the configuration information could be intercepted and stolen for use in a competing system. Slide24

Programmability Options

Flash Programming:

alternative that addresses some of the shortcomings of SRAMUse of floating gate programming technologiesinject charge onto a gate that “floats” above the transistor.Non-volatile

e

liminates the

need for the external

storage for configuration data

can

function immediately upon

power-up

Area efficiency

Area overhead: The programming circuitry (high and low voltage buffers) needed to program the cell, Cost is relatively modest as it is amortized across numerous programmable elements.Slide25

Programmability Options

Cannot be reprogrammed an infinite number of times.

Charge buildup in the oxide eventually prevents a flash-based device from being properly erased and programmedNon-standard CMOS process.around five additional process steps on top of standard CMOS

behind

SRAM-based

devices by

one or more

generations.

 

P

rogramming

time is about three times that of an SRAM-based component.High resistance and capacitance due to the use of transistor-based switches.Solution: on-chip flash memory to provide non-volatile storage with SRAM cells to control the programmable elements in the design.Slide26

Programmability Options

An ideal technology

non-volatilereprogrammable using a standard CMOS process offer low on resistances and low parasitic capacitances. Slide27

FPGA Components

How can we implement any circuit in an FPGA?

Example: Half adderCombinational logic represented by truth tableWhat kind of hardware can implement a truth table?

Input

Out

A

B

S

0

0

0

0

1

1

1

0

1

1

1

0

Input

Out

A

B

C

0

0

0

0

1

0

1

0

0

1

1

1Slide28

FPGA Components

Lookup Table (LUT)

Implement truth table in small memories (LUTs)Usually SRAMA function is implemented by writing all possible values that the function can take in the LUT The inputs values are used to address the LUT and retrieve the value of the function corresponding to the input values

A

B

S

0

0

0

0

1

1

1

0

1

1

1

0

A

B

C

0

0

0

0

1

0

1

0

0

1

1

1

0

1

1

0

Addr

Output

0

0

0

1

Output

2-input, 1-output LUTs

00

01

10

11

00

01

10

11

A

B

Addr

A

B

S

CSlide29

FPGA Components

Alternatively, could have

used a 2-input, 2-output LUTOutputs commonly use same inputs

0

1

1

0

S

0

0

0

1

C

0

1

1

0

S

0

0

0

1

C

00

01

10

11

00

01

10

11

00

01

10

11

Addr

A

B

Addr

A

B

Addr

A

BSlide30

FPGA Components

Slightly bigger example: Full adder

Combinational logic can be implemented in a LUT with same number of inputs and outputs3-input, 2-ouput LUT

Inputs

Outputs

A

B

Cin

S

Cout

0

0

0

0

0

0

0

1

1

0

0

1

0

1

0

0

1

1

0

1

1

0

0

1

0

1

0

1

0

1

1

1

0

0

1

1

1

1

1

1

0

0

1

0

1

0

0

1

1

0

0

1

0

1

1

1

A

B

Cin

S

Cout

Truth Table

3-input, 2-output LUTSlide31

FPGA Components

LUT Example: Implement the

function ABD+BCD+ABC 2-input LUTs 3-input LUTs 4-input LUTs Slide32

FPGA Components

LUTs

are used as function generators How many SRAM locations does a k-input LUT have?How many different functions can a k-input LUT implement?

0

1

1

0

S

0

0

0

1

C

01

10

11

Addr

A

B

00

2

k

2

2

kSlide33

FPGA Components

Why aren’t FPGAs just a big LUT?

Size of truth table grows exponentially based on # of inputs3 inputs = 8 rows, 4 inputs = 16 rows, 5 inputs = 32 rows, etc.Same number of rows in truth table and LUT

LUTs grow exponentially based on # of inputs

Number of SRAM bits in a LUT

=

2

i

* o

i

= # of inputs, o = # of outputs

Example: 64 input combinational logic with 1 output would require 264 SRAM bits1.84 x 1019Clearly, not feasible to use large LUTsSo, how do FPGAs implement logic with many inputs?Slide34

FPGA Components

Fortunately, we can map circuits onto multiple LUTs

Divide circuit into smaller circuits that fit in LUTs (same # of inputs and outputs)Example: 3-input, 2-output LUTsSlide35

FPGA Components

Large LUTs

Fast when using all inputsWastes transistors otherwiseMust also consider total chip areaWasting transistors may be ok if there are plenty of LUTsSlide36

FPGA Components

What if circuit doesn’t map perfectly?

More inputs in LUT than in circuitTruth table handles this problemMore outputs in LUT than in circuitExtra outputs simply not usedSpace is wasted, so should use multiple outputs whenever possible

Important Point

The number of gates in a circuit has no effect on the mapping into a LUT

All that matters is the number of inputs and outputs

Unfortunately, it isn’t common to see large circuits with a few inputs

1 gate

1,000,000 gatesSlide37

FPGA Components

LUT-Realization

A LUT is basically a multiplexer that evaluates the truth table stored in the configuration SRAM cells (can be seen as a one bit wide ROM). Slide38

QUIZ2Slide39

FPGA Components

Example:

Determine best LUTs for following circuitChoices 4-input, 2-output LUT (delay = 2 ns)6-input, 2-output LUT (delay = 3 ns)Assume each SRAM cell is 6 transistors

4-input LUT = 6 * 2

4

* 2 = 192 transistors

6-input LUT = 6 * 2

6

* 2 = 384 transistorsSlide40

FPGA Components

Example:

Determine best LUTs for following circuitChoices 4-input, 2-output LUT (delay = 2 ns)6-input, 2-output LUT (delay = 3 ns)Assume each SRAM cell is 6 transistors

4-input LUT = 6 * 2

4

* 2 = 192 transistors

6-input LUT = 6 * 2

6

* 2 = 384 transistors

6-input LUT

Propagation delay = 3 ns

Total transistors = 384 Slide41

FPGA Components

Example:

Determine best LUTs for following circuitChoices 4-input, 2-output LUT (delay = 2 ns)6-input, 2-output LUT (delay = 3 ns)Assume each SRAM cell is 6 transistors

4-input LUT = 6 * 2

4

* 2 = 192 transistors

6-input LUT = 6 * 2

6

* 2 = 384 transistors

4-input LUT

Propagation delay = 4 ns

Total transistors = 384 transistors

6-input LUTs are 1.3x faster and use same areaSlide42

FPGA Components

Problem: How to handle sequential logic

Truth tables don’t workPossible solution: Add a flip-flop to the output of LUTBLEs: the basic logic element Circuit

can now use output from LUT or from FF

Where does select come from

?

Related Contents


Next Show more