David Mohabir University of Arizona March 19 th 2012 Testing and diagnosis of interconnect faults in clusterbased FPGA architectures Section 1 Motivation Quickly identify faulty components ID: 712794
Download Presentation The PPT/PDF document "Testing and Diagnosis of Interconnect Fa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Testing and Diagnosis of Interconnect Faults in Cluster-Based FPGA Architectures
David Mohabir
University of Arizona
March 19
th
, 2012Slide2
Testing and diagnosis of interconnect faults in cluster-based FPGA architectures
Section 1Slide3
Motivation
Quickly identify faulty components
Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to ASIC testing
Defect location information is an important modern strategy as FPGAs can be reconfigured to avoid faults
Increased test generation complexity
Increased test application time
Multiple configurations to test assortment of switch settingsSlide4
Limitations
High complexity for test generation
Increased test application time
Need for external controllability and
observability
Multiple configurations to test assortment of switch settings, compared to a single configuration for an ASIC
As FPGAs have more programmable switch points, this becomes a bigger issueSlide5
Previous and related work
FPGA testing has been divided into interconnect testing and FPGA logic testing
Reduction in the need for I/O pads for testing
Several configurations are required to ensure all FPGA logic is tested in some configuration
Unutilized FPGA logic and routing are being used to implement modular redundancy
Faults can be targeted for the entire FPGA structure, or those that are application-specificSlide6
Related work (con’t)
Need for external controllability and
observability
has also been reduced using iterative logic array (ILA) test architecture
one-dimensional configuration with one direction for signal propagation
A complete array of m x m LUT/RAM modules requires 4 test configurations independent of size of array and of modules [11]
Problems of defining a set of test configurations for cluster-based architectures and diagnosisSlide7
Related work (con’t)
The use of LUTs with logic checkers to implement testing schemes in interconnects
Using LUTs to form shift registers to easily check the output of the test pattern
Built-in Self Test (BIST) architecture to locate any single and most multiple fault PLBs
This is FPGA logic
Cluster-based FPGA test methodologies
Does not cover specific fault extra-clusterSlide8
Geometric Scaling
Increased defect rates
Increased device variation
Increased change in device parameters
Increased single die capacity
Increased susceptibility to transient upsetsSlide9
Defect Tolerance
If device failure renders a
bitop
or an interconnect unusable, the device should be reconfigured to avoid these failing areas
Substitute good resources for bad ones
As defect rates increase, spare resources should be strategically reservedSlide10
Interchangeable LUTsSlide11
Interchangeability
Not all unused units will be substitutable, as location strongly affects interconnections to other logic blocks
Preferable to have fewer large pools of mostly interchangeable resourcesSlide12
Cluster-based architectures
Primitive logic components are grouped into coarse-grained clusters
Richness of internal connectivity means large range of potential interconnect patterns
External access to internal test points becomes increasingly difficult as device sizes scale
Cluster I/O are the input and output pins of the cluster
Tile I/O pins include the endpoint of wire segments which can connect to a neighboring tile via programmable interconnect pointsSlide13
StructureSlide14
Built-in Self Test
BIST overhead not an issue
Easily inserted and removed by reconfiguration
Test logic inside the FPGA enables test access to internal components
Each BISTER is composed of
T
est pattern generator
Output response analyzer
Two blocks under testSlide15
BISTER test structureSlide16
BISTERSlide17
BIST strategy
To guarantee testing of all tiles, the FPGA is reconfigured to shift the BISTERs across the entire array
All tiles will be tested by acting as a BUT
Perimeter tiles are tested by using the I/O pads to access the periphery
Total test application time is related to the area of the TPG/ORA logic
Decomposes the problem into many identical problems of a size which is determined by the test requirements for a single tileSlide18
Interconnect Fault Detection
High density of internal cluster interconnect makes test access difficult
Must test intra-cluster interconnect and extra-cluster interconnect
Four classes of faults
Permanent connection
PIP off
Permanent disconnection
PIP on
Stuck-at 0
Stuck-at 1Slide19
Detection and Diagnosis
Defines testability and diagnosis requirements of each fault and fault pair
Some test pattern must exist to detect each fault and differentiate each fault pair
All LUTs are configured as 4 input XOR gates
The
detectability
of each fault can be expressed as a function of the tile I/OSlide20
Fault Detection Conditions
Faulty line segment s1 must be both controllable by at least one tile input and observable by at least one tile outputSlide21
Fault Detection Conditions (con’t)
A faulty pair of segments must be both controllable, separately controllable, and both observable
The PIP between the two segments must be switched offSlide22
Fault Detection Conditions (con’t)
If s2 is the floating segment, then the non-floating segment must be controllable and the floating segment must be observable
PIP between the two segments must be switched onSlide23
Interconnect Fault Equivalence
Equivalent faults cannot be differentiated
Fault equivalence is determined by the FPGA configuration
Faults that are equivalent in one configuration may not be equivalent in another
Maximum diagnostic resolution is achieved when every pair of faults is non-equivalent in at least one configuration
Two faults are equivalent if their corresponding faulty machines produce the same output with all possible test patterns, at all outputs of the circuit
Two segments are test equivalent in a configuration if the segments have identical control sets and identical observe setsSlide24
Interconnect Fault Equivalence (con’t)
Two segments are test equivalent when they are controlled by the same set of tile inputs and observed by the same set of tile outputsSlide25
Interconnect Fault Equivalence (con’t)
Each segment in a faulty segment pair must be test equivalent to a segment in the other faulty segment pairSlide26
Interconnect Fault Equivalence (con’t)
Pair of faults may be equivalent if a segment which is not driven by a signal floats to a ‘v’ value
The two faults are equivalent if the floating segment is test equivalent to the segment associated with the stuck-at ‘v’ fault
The segment with the stuck-at fault and the floating segment must be controlled by the same set of tile inputs and observed by the same set of tile outputsSlide27
Interconnect Fault Equivalence (con’t)
The pair of segments involved in one fault are test equivalent to the pair of segments involved in the other fault
Each segment in a faulty segment pair must be test equivalent to a segment in the other faulty segment pairSlide28
Test Configurations
Identifies a set of configurations for the tiles acting as BUTs in a BISTER
Size of configuration should be minimized to reduce test application time
Intra-cluster configurations are defined separately from extra-cluster configurationsSlide29
Intra-Cluster Configurations
Fault effect on a cluster input must propagate to at least one cluster output
Cluster outputs must be separately controllableSlide30
BLE configurations
Observability
of cluster inputs and BLE output branches must be achieved by propagating fault effects
Controllability of the BLE outputs must be achieved through the BLEs
Each BLE is composed of a LUT and a multiplexer
Both must be configured
Each LUT acts as a 4-input XOR gate
Good controllability because output value can be determined by controlling any single input
Good
observability
because a fault effect on any input will propagate to output
Majority of test configurations bypass the flip-flop
A single configuration will test the interconnect associated with the flip-flopsSlide31
BLE input multiplexer configurations
Input
muxes
determine controllability of BLE outputs by determining the function which defines the output of each BLE ‘n’
BLE output function:
All inputs
XORed
together
Multiplexers are not configured to create loops
All BLE outputs are separately controllable from each other, and from all cluster inputs
Each input multiplexer is configured to select data from each of its inputs in at least one configuration
There is a sensitized path from each cluster input stem to a cluster output in every configurationSlide32
Algorithm 1Slide33
Input Multiplexer configurationsSlide34
Extra-Cluster Configurations
Defines current flow paths through the extra-cluster interconnect
Modeled as a flow graph
Create flow paths between tile I/O nodes which allow the detection criteria of each fault to be satisfied in at least one configuration
Flow paths are created from tile I/Os to every cluster input, and from every cluster output to tile I/OsSlide35
Transparent Extra-Cluster ConfigurationSlide36
Algorithm 2Slide37
Algorithm 3Slide38
Results
Assumptions
Cluster inputs and outputs are equally distributed around the sides of the cluster
Each cluster I/O on the north face may connect to all horizontal tracks via a set of PIPs
West face I/O connects to all vertical tracks
Cluster I/O for east and south faces connect directly to tracks in neighboring tiles
Results
Intra-cluster configuration, and two sets of extra-cluster configuration
Extra-Cluster (specific) is for when the fault independent algorithm has reached its coverage limit
By using the fault specific extra-cluster configuration algorithm, 100% fault coverage can be guaranteed
At a cost of increased number of configurations
Fault Coverage Achieved
Percent of fault pairs which are differentiated across all configurations
A small set of test configurations can detect and diagnose nearly all targeted interconnect faultsSlide39
ResultsSlide40
Summary
Approach is encompassing, can guarantee 100% fault detection
Does require good deal of computation time for extra-cluster
Does a good job of describing fault classes
I personally believe they could have described it using less mathematical jargon, so that it would make more sense to a digital logic engineer
Algorithms are described neatly in
pseudocode
All details are coveredSlide41
Discussion topics
Section 2Slide42
Discussion #1
Let’s discuss the logical ways to test circuitry for the various faults
Permanent open
Permanent closed
Stuck-at 0
Stuck-at 1
How could you design test patterns without access to all internal signals?Slide43
Discussion #2
Algorithms
Intra-cluster
Extra-clusterSlide44
Discussion #3
Defect mapping
Annealing placers
Marks physical location of defective units as
Costly
Invalid
Routers
Marks wires and switches that are defective as
In use
High cost
Avoids these defective components of the FPGASlide45
Discussion #4
Parity