/
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
377 views
Uploaded On 2018-03-18

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs - PPT Presentation

Dan Fisher Addison Floyd Outline Introduction Fault Detection Motivation Methods etc Fault Diagnosis Motivation Methods etc Fault Tolerance Single FPGA Multiple FPGAs Single Faults ID: 655945

tolerance fault fpga detection fault tolerance detection fpga single faults lut method multiple logic diagnosis luts fpgas tmr blocks

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Survey of Detection, Diagnosis, and Faul..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs

Dan Fisher, Addison FloydSlide2

Outline

Introduction

Fault Detection - Motivation, Methods, etc.

Fault Diagnosis - Motivation, Methods, etc.

Fault Tolerance

Single FPGA

Multiple FPGAs

Single Faults

Multiple Faults

ConclusionSlide3

Introduction

FPGA Background

Importance

Applications

Motivation for Fault Tolerance

http://en.wikipedia.org/wiki/Field-programmable_gate_arraySlide4

Fault Detection - Motivation

Main Causes of Faults

Degradation

Manufacturing Defects

Single Event Upsets(SEUs)Slide5

Fault Detection - Judgement Criteria

Detection Methods are judged on:

Speed of Detection

Coverage

Resource Overhead

Performance Overhead

Detection GranularitySlide6

Fault Detection - Criteria In-Depth

Detection Granularity - how specific one is when detecting an error.

FPGA made up of Tiles containing:

Logic Blocks

Connection Blocks - connect tiles

Switch Blocks - connect tiles, allow for direction changeSlide7

Fault Detection - Comparison

Slide8

Fault Detection - SEDC Method

The Method Explained

Partition data and Encode with SEDC codes

Calculate and Store check bits

Generate check bits as circuit operates

Compare calculated and generated values

Better than Berger and TMRSlide9

Fault Detection - Nazar Method

CED method providing single error detection

Takes advantage of properties of LUTs

Major Drawback - LUT insertion

Area Improvement over DWCSlide10

Nazar Method - LUT Properties Explained*

1st Advantage: A LUT can be viewed as combinational circuit independent from others. Area overhead is avoided since you don’t need to replicate sub-expressions that form circuit outputs

2nd Advantage: A K-input LUT can compute any function with up to K inputs. So as long as our selected group is no more than K different inputs than the parity can be calculated using just one LUT. If the selected group also has no more than K-1 different outputs, then the checker can be made of just one LUT(with the last input the parity bit).

This

picture shows upside-down triangles as LUTs, with a one parity LUT for each K-1 outputs. Also show is the checker which would be composed of just one LUT. Separate LUTs in the same checker group can’t overlap (otherwise they wouldn’t be independent) but in order to provide coverage different checker group LUTs can overlap.

*

Note:This slide wasn’t in the original presentation but was added to try to better explain the method since some mentioned wanting to know more Slide11

Fault Detection - Roving Stars

New method for online detection

Detected faults do not affect working logic

STARs and BISTERs

Better than other methods

*Picture added after presentation to attempt to help

clear up any confusion.Slide12

Fault Detection - Injection Topic 1

Which modules most sensitive to SEU

1.4% sensitive(83% routing/16% logic)

Density matrixSlide13

Fault Detection - Injection Topic 2

HW module to test efficiency of SEU mitigation schemes

How to emulate SEUs - 2 step process

Example Results

Scrubbing RateSlide14

Fault Diagnosis - Roving Stars

Diagnose both interconnect & plb faults

Partial Reuse

Future - Do we allow for retest of fault?Slide15

Fault Diagnosis - More Abramovici

BIST-based method in 2000

2004 paper further extending Roving Stars

Slide16

Fault Diagnosis - Niamat - MATS++

Diagnose multiple stuck at faults

Use of MATS++ algorithm

Goal of speeding up diagnosisSlide17

Fault Diagnosis - Tahoori’s Method

Diagnose a single fault in interconnect or logic

Application Dependent

Basic IdeaSlide18

Fault Tolerance

Single FPGA platform

Multi FPGA platform

Single Fault

Multiple FaultsSlide19

Fault Tolerance - Single FPGA

Dynamic Fault Tolerance via Partial Reconfiguration

online - handles faulty PLBs without system stopping

uses spare logic cells

Stroud et alSlide20

Fault Tolerance - Single FPGA

Online Fault Tolerance for FPGA Logic Blocks

reuse defective blocks to increase the number of spares and extend mission life

uses commercial CAD tools to implement

Stroud et alSlide21

Fault Tolerance - Single FPGA

Using Relocatable Bitstreams for Fault Tolerance

combines passive and active techniques

standardized relocatable modules, which are copied and stored

Montminy et alSlide22

Fault Tolerance - Multi FPGA

A Reliable Reconfiguration Controller for Fault-Tolerant Embedded Systems on Multi-FPGA platforms

multiple FPGAs in a mesh topology

hardening achieved by TMR

distributed solution

Bolchini et alSlide23

Fault Tolerance - Single Fault

Designing Fault Tolerant Systems into SRAM-based FPGAs

for use in space

Duplication with Comparison and Concurrent Error Detection

Lima et alSlide24

Fault Tolerance - Single Fault

TMR and Partial Dynamic Reconfiguration to Mitigate SEU Faults in FPGAs

passive Triple Modular Redundancy

Bolchini et alSlide25

Fault Tolerance - Single Fault

IPR: In-Place Reconfiguration for FPGA Fault Tolerance

preserves function and topology of LUT-based logic network

algorithm applied post-layout

Zhe et alSlide26

Fault Tolerance - Single Fault

A Novel SRAM-Based FPGA Architecture for Efficient TMR Fault Tolerance Support

Architectural level

augments LUTs with TMR

minimize number of reconfigurations

Kyriakoulakos et alSlide27

Fault Tolerance - Multiple Faults

Placement of Repair Circuits for In-Field FPGA Repair

utilize unused FPGA resources

repair circuits identified before faults occur

alternate repair circuits cached locally or remotely

Wirthlin et alSlide28

Fault Tolerance - Multiple Faults

Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing

dynamic self-adaptation

high reliability vs. high performance

Jacobs et alSlide29

Fault Tolerance - Multiple Faults

Exploiting Partially Defective LUTs: Why You Don’t Need Perfect Fabrication

because of shrinking feature size, transistor variability and failure rates are going up

identifies partially defective LUTs for reuse

DeHon et alSlide30

Conclusion

Importance of FPGAs

FPGA applications

Future of FPGA fault toleranceSlide31

Questions?