/
Combining Pre-Silicon Verification Brains with Post-Silicon Combining Pre-Silicon Verification Brains with Post-Silicon

Combining Pre-Silicon Verification Brains with Post-Silicon - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
396 views
Uploaded On 2017-04-19

Combining Pre-Silicon Verification Brains with Post-Silicon - PPT Presentation

Reviewer ShuoRen Lin 2012511 ALCom 1 Abstract Postsilicon functional validation challenges Light overhead associated with the testing procedures Highquality validation Postsilicon functional exerciser ID: 539452

alcom memory translation 2012 memory alcom 2012 translation data silicon address test management page physical exerciser interval pre post

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Combining Pre-Silicon Verification Brain..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Combining Pre-Silicon Verification Brains with Post-Silicon Platform Muscle

Reviewer: Shuo-Ren, Lin

2012/5/11

ALCom

1Slide2

Abstract

Post-silicon functional validation challengesLight overhead associated with the testing proceduresHigh-quality validationPost-silicon functional exerciser

A applicationLoad it to the system, and it generates test-cases, executes them, and checks the results.

Novel solutionPrepare partial data which can incorporate with exerciserGenerate these useful data in advance during pre-silicon verification

2012/5/11

2

ALComSlide3

Outline

IntroductionSolution schemeFloating-pointAddress translationMemory management

Conclusions

2012/5/11

3

ALComSlide4

Introduction

Post-silicon validationHigh-quality validationMinimize overheadPost-silicon exerciser

Non-OSFast and lightHigh validation quality

Two conflict requirementsBridge between pre-silicon and post-silicon worldBased on a common verification plan and similar languages for test-template [4]

2012/5/11

4

ALComSlide5

Introduction

ContributionsEnable a post-silicon exerciser (Threadmill) to generate sophisticated stimuli and keep the exerciser efficient and simple.Prepare the input data off the platform and integrate it into the exerciser image.

Three domainsFloating-point data operands: FPgenAddress translation paths: DeepTrans

Memory access management: CSP solver, Genesys-Pro, and X-Gen

2012/5/11

5

ALComSlide6

Solution Scheme

Off-platform data-generationUse well-established pre-silicon techniques and tools to ensure that the generated data is high quality.Static build

Generate large amounts of data by pre-silicon toolsData can be reused across many different exerciser imagesDynamic build

Sufficiently efficient for constructing a new exerciser image in a reasonable amount of timeThree main roles:

filter data

generate new data based on available inputs (test template and configuration)

create optimized data structures for efficient retrieval by exerciser during execution time

2012/5/11

6

ALComSlide7

Floating-Point

Static buildFPgen achieve a high level of coverage for all supported inputs and outputsTake 24 hours to generate a set of 1000 inputs for 390 floating point instructions on a single Pentium 6 machine

Dynamic buildFor each instruction which intend to have in the test-cases, this work randomly and uniformly pick up a subset of the entries and create a smaller table.

ExecutionRandomly select operands from the table prepared in the dynamic-build stage

2012/5/11

7

ALComSlide8

Floating-Point Experiment I

10 FP instructions: fadd, fsub,

fmul, fdiv, and

fsqurt in both single and double variations12 types of FP number forms: normalized, infinity, …

3 operands: 2 inputs and 1 output

360 event – (unreachable event) = 280 different events

FPgen only generates legal interesting operand values

2012/5/11

8

ALComSlide9

Floating-Point Experiment II and III

Trailing zeroes for divide and square rootExtending the output format to 52 bits (2*52 events)

Almost equal (32 events)

Mixed and FPgen have the same coverage since the defined coverage models include only legal events.

2012/5/11

9

ALComSlide10

Address Translation

Address translationSupport virtual spacesMemory protectionCaching mechanisms

Under ThreadmillInvoke the corresponding micro-architectural mechanismsEnable the generation of memory management stimuli

2012/5/11

10

ALComSlide11

Address Translation

Coverage requirementsProduce a set of valid translation paths that cover a large physical memory regionGenerate translation paths that activate all possible inter-thread and inter-processor memory sharing scenarios

Generate all the types of address translations possible for any given translation mode. (e.g. different segment and page size)Requirement related to the representation of translation data

Compact and easy to extractContain translation path for all the physical memory blocks

Possess the required properties (e.g. allow accesses to the same cache line from different threads)

2012/5/11

11

ALComSlide12

Address Translation

Static build stageUse DeepTrans to construct address translation sets for the entire physical memoryDivide the entire physical memory space into primary blocks, and randomly split each primary memory blocks into one or more secondary blocks.

DeepTrans generate the address translations that cover an entire

primary block for each primary partition and all its sub-partitions

Generation of a single set of address translations spends 20 hours on a single

Pentium 6 Linux machine

2012/5/11

12

ALComSlide13

Address Translation

Dynamic build stageNeed additional information (test template and configuration)Filter out the partitions that do not possess any of the desired memory properties

Reduce the remaining partitionsBuild the address translation data structureContaining the memory mapping data for every secondary memory block

Build book-up tables for Threadmill to easily retrieve the desired memory blocks at execution time

2012/5/11

13

ALComSlide14

Address Translation Experiemnt

Threadmill prefers Cacheable pages since it can stress the HW cachesPrefer small page because of memory management flexibilitysmall: 4k and 6k, medium: 1M, large: 8M, huge: 8G

Collisions: a physical page covered by a translation path with a larger page size contains a physical page covered by another translation path with a smaller page size

2012/5/11

14

ALCom

Pre-generated translation collisions (no bias)Slide15

Memory Management

Target multi-threaded systemsCollision: Having different threads access the same memory locationTend to generate test-cases which can trigger memory access collisions

 increase the bug detection potentialThreadmill use

multi-pass consistency checkingRun a test-case multiple times with the same resource and ensure the outputs are the same each time

write-write and write-read are not checked by Threadmill but still can be generated

2012/5/11

15

ALComSlide16

Memory Management

To stress the design  place the intervals at interesting locationsCache-line or page/segment crossing

Memory having certain attributes e.g. non-cacheable memory or memory obeying different consistency rule

Various memory affinity: memory located on a different chip

2012/5/11

16

ALComSlide17

Memory Management

Threadmill’s high-level architecture2012/5/11

17

ALCom

Dynamic build stage

Run timeSlide18

Memory Management

CSP solverRun in dynamic-build stage  should be efficient

Input: requirements, test-template, and system configurationA pair of CSP variables, interval start and interval length

GenerateMemory for the code and data areas

Memory for test-cases

Memory accessed by the generated load/store instructions

Hard constraints

All intervals are disjoint

All intervals reside in the available memory space

Others. (e.g. user-defined memory allocation request)

Soft constraints  for high quality test-cases generation

Direct interval allocation to interesting areas

2012/5/11

18

ALComSlide19

Memory Management

Three ownershipowned interval : only owner thread can writeread-only interval

: All threads can readunowned interval

: All threads can accessMemory accessCacheable

Non-cacheable

Construct a primary look-up table for fast and simple random choice

2012/5/11

19

ALComSlide20

Memory Management Experiment

3 configuration, 8G Mem, 10 intervals/ownership, 50 load/store instructionsFirst experimentRandomly allocated and uniformly distributed intervals

Second experimentApply CSP solver

2012/5/11

20

ALComSlide21

2012/5/11

ALCom21Slide22

2012/5/11

ALCom22Slide23

2012/5/11

ALCom23Slide24

2012/5/11

ALCom24Slide25

2012/5/11

ALCom25