Reviewer ShuoRen Lin 2012511 ALCom 1 Abstract Postsilicon functional validation challenges Light overhead associated with the testing procedures Highquality validation Postsilicon functional exerciser ID: 539452
Download Presentation The PPT/PDF document "Combining Pre-Silicon Verification Brain..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Combining Pre-Silicon Verification Brains with Post-Silicon Platform Muscle
Reviewer: Shuo-Ren, Lin
2012/5/11
ALCom
1Slide2
Abstract
Post-silicon functional validation challengesLight overhead associated with the testing proceduresHigh-quality validationPost-silicon functional exerciser
A applicationLoad it to the system, and it generates test-cases, executes them, and checks the results.
Novel solutionPrepare partial data which can incorporate with exerciserGenerate these useful data in advance during pre-silicon verification
2012/5/11
2
ALComSlide3
Outline
IntroductionSolution schemeFloating-pointAddress translationMemory management
Conclusions
2012/5/11
3
ALComSlide4
Introduction
Post-silicon validationHigh-quality validationMinimize overheadPost-silicon exerciser
Non-OSFast and lightHigh validation quality
Two conflict requirementsBridge between pre-silicon and post-silicon worldBased on a common verification plan and similar languages for test-template [4]
2012/5/11
4
ALComSlide5
Introduction
ContributionsEnable a post-silicon exerciser (Threadmill) to generate sophisticated stimuli and keep the exerciser efficient and simple.Prepare the input data off the platform and integrate it into the exerciser image.
Three domainsFloating-point data operands: FPgenAddress translation paths: DeepTrans
Memory access management: CSP solver, Genesys-Pro, and X-Gen
2012/5/11
5
ALComSlide6
Solution Scheme
Off-platform data-generationUse well-established pre-silicon techniques and tools to ensure that the generated data is high quality.Static build
Generate large amounts of data by pre-silicon toolsData can be reused across many different exerciser imagesDynamic build
Sufficiently efficient for constructing a new exerciser image in a reasonable amount of timeThree main roles:
filter data
generate new data based on available inputs (test template and configuration)
create optimized data structures for efficient retrieval by exerciser during execution time
2012/5/11
6
ALComSlide7
Floating-Point
Static buildFPgen achieve a high level of coverage for all supported inputs and outputsTake 24 hours to generate a set of 1000 inputs for 390 floating point instructions on a single Pentium 6 machine
Dynamic buildFor each instruction which intend to have in the test-cases, this work randomly and uniformly pick up a subset of the entries and create a smaller table.
ExecutionRandomly select operands from the table prepared in the dynamic-build stage
2012/5/11
7
ALComSlide8
Floating-Point Experiment I
10 FP instructions: fadd, fsub,
fmul, fdiv, and
fsqurt in both single and double variations12 types of FP number forms: normalized, infinity, …
3 operands: 2 inputs and 1 output
360 event – (unreachable event) = 280 different events
FPgen only generates legal interesting operand values
2012/5/11
8
ALComSlide9
Floating-Point Experiment II and III
Trailing zeroes for divide and square rootExtending the output format to 52 bits (2*52 events)
Almost equal (32 events)
Mixed and FPgen have the same coverage since the defined coverage models include only legal events.
2012/5/11
9
ALComSlide10
Address Translation
Address translationSupport virtual spacesMemory protectionCaching mechanisms
Under ThreadmillInvoke the corresponding micro-architectural mechanismsEnable the generation of memory management stimuli
2012/5/11
10
ALComSlide11
Address Translation
Coverage requirementsProduce a set of valid translation paths that cover a large physical memory regionGenerate translation paths that activate all possible inter-thread and inter-processor memory sharing scenarios
Generate all the types of address translations possible for any given translation mode. (e.g. different segment and page size)Requirement related to the representation of translation data
Compact and easy to extractContain translation path for all the physical memory blocks
Possess the required properties (e.g. allow accesses to the same cache line from different threads)
2012/5/11
11
ALComSlide12
Address Translation
Static build stageUse DeepTrans to construct address translation sets for the entire physical memoryDivide the entire physical memory space into primary blocks, and randomly split each primary memory blocks into one or more secondary blocks.
DeepTrans generate the address translations that cover an entire
primary block for each primary partition and all its sub-partitions
Generation of a single set of address translations spends 20 hours on a single
Pentium 6 Linux machine
2012/5/11
12
ALComSlide13
Address Translation
Dynamic build stageNeed additional information (test template and configuration)Filter out the partitions that do not possess any of the desired memory properties
Reduce the remaining partitionsBuild the address translation data structureContaining the memory mapping data for every secondary memory block
Build book-up tables for Threadmill to easily retrieve the desired memory blocks at execution time
2012/5/11
13
ALComSlide14
Address Translation Experiemnt
Threadmill prefers Cacheable pages since it can stress the HW cachesPrefer small page because of memory management flexibilitysmall: 4k and 6k, medium: 1M, large: 8M, huge: 8G
Collisions: a physical page covered by a translation path with a larger page size contains a physical page covered by another translation path with a smaller page size
2012/5/11
14
ALCom
Pre-generated translation collisions (no bias)Slide15
Memory Management
Target multi-threaded systemsCollision: Having different threads access the same memory locationTend to generate test-cases which can trigger memory access collisions
increase the bug detection potentialThreadmill use
multi-pass consistency checkingRun a test-case multiple times with the same resource and ensure the outputs are the same each time
write-write and write-read are not checked by Threadmill but still can be generated
2012/5/11
15
ALComSlide16
Memory Management
To stress the design place the intervals at interesting locationsCache-line or page/segment crossing
Memory having certain attributes e.g. non-cacheable memory or memory obeying different consistency rule
Various memory affinity: memory located on a different chip
2012/5/11
16
ALComSlide17
Memory Management
Threadmill’s high-level architecture2012/5/11
17
ALCom
Dynamic build stage
Run timeSlide18
Memory Management
CSP solverRun in dynamic-build stage should be efficient
Input: requirements, test-template, and system configurationA pair of CSP variables, interval start and interval length
GenerateMemory for the code and data areas
Memory for test-cases
Memory accessed by the generated load/store instructions
Hard constraints
All intervals are disjoint
All intervals reside in the available memory space
Others. (e.g. user-defined memory allocation request)
Soft constraints for high quality test-cases generation
Direct interval allocation to interesting areas
2012/5/11
18
ALComSlide19
Memory Management
Three ownershipowned interval : only owner thread can writeread-only interval
: All threads can readunowned interval
: All threads can accessMemory accessCacheable
Non-cacheable
Construct a primary look-up table for fast and simple random choice
2012/5/11
19
ALComSlide20
Memory Management Experiment
3 configuration, 8G Mem, 10 intervals/ownership, 50 load/store instructionsFirst experimentRandomly allocated and uniformly distributed intervals
Second experimentApply CSP solver
2012/5/11
20
ALComSlide21
2012/5/11
ALCom21Slide22
2012/5/11
ALCom22Slide23
2012/5/11
ALCom23Slide24
2012/5/11
ALCom24Slide25
2012/5/11
ALCom25