Navaneet Kumar Sandeep Jain Rajesh Jain Networking amp Multimedia Solutions Group Updated Jan 2012 Outline Virtual Platform Challenges amp Requirement Significance of TLM 20 TLM LT Methodology ID: 797111
Download The PPT/PDF document "April 9, 2012 Case Study: Complex Multi-..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
April 9, 2012
Case Study: Complex Multi-Core Virtual Platform Enablement Using TLM 2.0 to co-simulate Diverse Simulation Models in a Multi-threaded Environment
Navaneet Kumar, Sandeep Jain, Rajesh Jain
Networking & Multimedia Solutions Group
Updated Jan 2012
Slide2Outline
Virtual Platform – Challenges & RequirementSignificance of TLM 2.0TLM LT MethodologyTLM LT AdapterMoving to Multi -Threading
Challenges and Learning’s of Multi-ThreadingChallenges and Learning’s of DMIPerformance resultConclusionReferences
Slide3Virtual Platform – Challenges & Requirement
SoCs are becoming more complex day by dayToday’s SoC contain multiple heterogeneous cores, hardware accelerators, peripherals, complex memory hierarchy with hardware supported coherency
High-fidelity models developed by different divisions and teams in a large organization follow diverse modeling frameworkPorting all the complex models to a common modeling framework is no easy taskTimely availability of Virtual Platform is criticalUseful for Software driver developmentA common methodology and infrastructure is required To enable interoperability of such diverse simulation models To demonstrate quick virtual platform integration and co-simulation
Slide4Example of FSL B4860 Baseband Processor
Slide5Significance of TLM 2.0
TLM 2.0 introduces interoperability layerGeneric payload class is suitable for carrying most common payload information Core interfaces are suitable for various model-to-model communication scenarios
TLM 2.0 enables seamless integration of diverse simulation modelsExtensions can be used to carry any type of payload and phase informationModels from different modeling framework can suitably comply to TLM 2.0 APIs Our virtual platform integration is a proofHere, we present our TLM LT methodology as a case study
Slide6TLM LT Methodology
TLM definition
TLM integrationTLM Adapter Creation
TLM Extension file
TLM LT Adapters
Specs
C/C++ and SystemC Model Library
Slide7Based on blocking transport interface Provides fast-functional model-to-model communication
Has no dependency on SystemC schedulerUses TLM multi-sockets to allow connectivity to multiple initiators/targets
Zero delay model Doesn’t buffer transaction at all Immediately maps C/C++ API call to TLM API and vice versaTLM LT Adapter – The Key piece
Slide8TLM LT Adapters – An example
TLM 2.0
DSP side C++ ComponentsPower Arch side C++ ComponentsCentral Interconnect Module
TLM LT PowerAdapterMemAccess I/ftlm_generic_payloadSnoop I/fTLM LT DSPAdapter
tlm_generic_payload
DSPCustom I/f
b_transport
b_transport
Multi
Init/Target
Socket Pair
On
Each side
Mem
Access
Mem/
Reg
Access
Snoop
Request-
Responses
DSP Module
Slide9bool snoop_enableHelps to distinguish between cacheable and non-cacheable memory regions
Transactions with this attribute set are broadcasted to all masters with caches implemented to maintain coherencyuint32_t decoration_attrUsed with special decoration instructions which can atomically set, clear, increment or decrement another region in memory along with read/write to the specified region
Set by the SC3900/e6500 cores on execution of a decoration instruction to indicate L3 cache to take required actionsuint32_t addr_only_typeHelps to model specific address only transactions types, e.g.TLBIE causes broadcast of a TLB invalidate entry operation to all snoopers TLM Extension – Key Attributes
Slide10Full system virtual platforms is becoming very complex
Running a highly complex system in a single thread becomes a severe bottleneck to simulation performanceMulti-Threading the simulator is the approach to speed up simulation Helps effective utilization of the concurrent host system resources, which are heavily into Multi-Core and Multi-Thread
Logical partitioning of sub-system models to run in separate threads helps achieve Multi-ThreadingMost straight forward way to partition work among different threadsMinimal changes needed in single-threaded version of the codeMoving to Multi-Threading (MT)
Slide11Example of FSL B4860 Baseband Processor
Slide12Challenges & Learning’s in Multi-Threading
TLM Adapter with Multi-Initiator/Target socket became a severe bottleneckAll threads on DSP side contended for the shared resources in the TLM Adapter, even when they were accessing different resources on the Power Arch side
We chose to instantiate multiple TLM Adapters – one per DSP threadAn adapter may still connect to multiple entities working on same threadNo contention due to TLM connectivity! All DSP side threads still contend for the shared memoryNeed to protect memory and this again impacts simulation performance TLM 2.0 DMI helps minimize this impactMost of the accesses are satisfied via DMIProtection needed only when acquiring DMI pointers
Slide13DMI should be used cautiously!Causes any bus snooping capabilities provided by the interconnect model to be bypassed
Certain device models have internal caches and these needs to be kept coherent with the main memoryIf a master modifies certain memory region via DMI and if this region is present in one of the cache models, then there is no way keep the system coherent!
We employed a mechanism to allow DMI while maintaining coherencyDMI acquire succeeds only if a memory region is not already cachedAll requests to cached region go via interconnect model – no DMI, snoop broadcast if askedWhenever a new region is cached, DMI invalidates is sent to all masters to prevent them from using DMI for such regionsChallenges and Learning’s with DMI
Slide14Multi-Thread setup
TLM 2.0
DSP side Power Arch side
Inter-connectTLM PA AdapterTLM PA AdapterTLM PA AdapterTLM DSP Adapter
TLM DSP Adapter
TLM DSP Adapter
DSP Partition 1
DSP Partition 2
DSP Partition 3
Power Partition 1
Power Partition 2
Thread-3
Thread-4
Thread-5
Thread-1
Thread-2
Slide15Performance Chart
Slide16TLM 2.0 based methodology is a good technique to integrate and co-simulate diverse LT simulation models
TLM Adapters can be designed and deployed in a smart way that ensures coherencyMulti-threading is the way to go
Great speed-up for virtual platform simulationConclusion
Slide17SystemC IEEE 1666-2011 Language Reference Manual
http://www.freescale.comReferences
Slide18Thank You!
Slide19