/
The PARSEC Benchmark Suite TutorialPARSEC 3.0 YungangBao, Christian Bi The PARSEC Benchmark Suite TutorialPARSEC 3.0 YungangBao, Christian Bi

The PARSEC Benchmark Suite TutorialPARSEC 3.0 YungangBao, Christian Bi - PDF document

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
512 views
Uploaded On 2016-07-06

The PARSEC Benchmark Suite TutorialPARSEC 3.0 YungangBao, Christian Bi - PPT Presentation

Tutorial Contents Part 1 Understanding PARSEC OverviewHistory Impact What146s NewWorkloadsResearch on PARSEC Part Working with PARSEC The parsecmgmttoolBuilding Running workloadsConfiguration ID: 392804

Tutorial Contents Part 1: Understanding PARSEC OverviewHistory Impact

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The PARSEC Benchmark Suite TutorialPARSE..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

The PARSEC Benchmark Suite TutorialPARSEC 3.0 YungangBao, Christian BieniaKai LiPrinceton University Tutorial Contents Part 1: Understanding PARSEC OverviewHistory, Impact, What’s NewWorkloadsResearch on PARSEC Part Working with PARSEC The parsecmgmttoolBuilding & Running workloadsConfiguration files Part Roadmap of PARSEC Network WorkloadsGPU version Part Concluding Remarks Part 1 Understanding PARSEC What is PARSEC? P rinceton pplication epository for hared mory omputers Benchmark Suite for ChipMultiprocessorsStarted as a cooperation between Intel and Princeton University, many more have contributed since thenFreely available at:You can use it for your research http://parsec.cs.princeton.edu/ Other Resources:http://wiki.cs.princeton.edu/index.php/PARSEC parsecusers@lists.cs.princeton.edu Goal: An opensource parallel benchmark suite of emerging applications for evaluating multicore and multiprocessor systemsApplication domains: financial, computer vision, physical modeling, future media, contentbased search, deduplicationCurrent releasePARSEC 2.1 (13 applications) Contributors The first version of PARSEC was created by Intel and Princeton University.We would like PARSEC to be a community project.Many people and institutions have already contributed. Interest in PARSEC 6000+ Downloads Impact of PARSECGoogle Scholar Citations: 400+ Citation in top conferences (~40%) 0% 20% 40% 60% 80% 100% MICRO 2008 HPCA 2009 ISCA 2009 MICRO 2009 HPCA 2010 ISCA 2010 Micro 2010 HPCA 2011 Mix (w/o PARSEC) Mix (with PARSEC) PARSEC (exclusive) SPLASH - 2 (exclusive) STAMP SPEC (multiprogrammed) Total PARSEC Usage History of PARSECJan 2008 PARSEC 1.0 12 workloadsFeb 2009 PARSEC 2.0One new workload, raytraceAug 2009 PARSEC 2.1 Bugfix PARSEC 3.0Summer 2011 PARSEC 3.0 is coming soonNew framework Support network workloadsSupport citations to encourage contribution Be more convenient to add new workloads Much improved workloads blackscholes, bodytrack, canneal, dedup, facesim, ferret, fluidanimate, freqmine, vips, SPLASH2 and SPLASH2xExisting SPLASH2 using the same frameworkUse parsecmgmtto manage, build, and run SPLASH2x (joint work with Prof. JP Singh)Multiple input sets at different scales Objectives of PARSECMultithreaded ApplicationsFuture programs must run on multiprocessorsEmerging WorkloadsIncreasing CPU performance enables new applicationsDiverseMultiprocessors are being used for more and more tasksStateArt TechniquesAlgorithms and programming techniques evolve rapidlySupport ResearchOur goal is insight, not numbers Workloads There aren’t any two workloads with the same combinations Blackscholes Overview Blackscholes is the simplest of all PARSEC workloadPrices a portfolio of options with theBlackScholesPDEComputational finance application (Intel)Synthetic input based on replication of 1,000 real optionsCoarsegranular parallelism, static loadbalancingSmall working sets, negligible communication Blackscholes Rationale Computers have become key technology for tradingDerivatives are financial instrument with one of highest analytical requirementsBlackscholesformula fundamental description of option behaviorHigh demand for performance: Saving few milliseconds can earn lots of money Blackscholes Characteristics 0.00%5.00%10.00%15.00%20.00%25.00%Cache Size (KB)Miss Rate (%)Working Sets Small working sets, negligible communication OptionsPortfolio 1 2 4 8 16 CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Bodytrack Overview Tracks a markerlesshuman bodyComputer vision application (Intel)Input is video feedfrom 4 camerasMediumgranularparallelism, dynamicloadbalancingPipeline andasynchronous I/OMedium working sets,some communication Output of Bodytrack (Frame 1) Bodytrack Rationale Machines increasingly rely on computer vision to interact with environmentOften no aid available (e.g. Markers, constrained behavior)Must usually happen in realtime Stanley, Winner of the DARPA Challenge 2005. Autonomous vehicle navigation requires realtime computer vision. 0.00%0.50%1.00%1.50%2.00%Cache Size (KB)Miss Rate (%)Working Sets Bodytrack Characteristics Medium working sets, some communication Edge mapsInput frames 1 2 4 8 16 CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes CannealOverview Minimizes the routing cost of a chip design with cacheaware simulated annealingElectronic Design Automation (EDA) kernel (Princeton)Input is a synthetic netlistFinegrainr parallelism, no problem decompositionUses atomic instructions to synchronizeSynchronization strategy based on data race recovery rather than avoidanceHuge working sets, communication intensity only constrained by cache capacity. Workload with most demanding memory behavior Canneal Rationale Optimization is one of the most common types of problems.Place & Route is a difficult EDA challenge.Transistor counts continue to increase at an exponential rate.Simulated annealing allows to scale optimization cost by allowing incremental performance investments.Photo of AMD's Barcelona quadcore CPU. It consists of about 463 million transistors. 0.00%10.00%20.00%30.00%Cache Size (KB)Miss Rate (%)Working Sets Canneal Characteristics Huge working sets, communication limited by capacity Netlist elementsNetlist CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Dedup Overview Detects and eliminates redundancy in a data stream with a nextgeneration technique called 'deduplicaton'Enterprise storage kernel (Princeton)Input is an uncompressed archive containing various filesImproved, more computationally intensive deduplication methodsMore cacheefficient serial versionPipeline parallelism with multiple thread poolsHuge working sets, significant communication DedupRationale Growth of world data keeps outpacing growth of processing power.This data has to be stored and transferred.Use cheap resources (processing power) to make more efficient use of scarce resources (storage & bandwidth).Already in use in commercial products.Nextgeneration storage and networking products already use data deduplication. 0.00%0.50%1.00%1.50%2.00%Cache Size (KB)Miss Rate (%)Working Sets Dedup Characteristics Huge working sets, some communication Data chunksHash table CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Facesim Overview Simulates motions of a human face forvisualization purposesComputer animation application(Intel + Stanford)Input is a face model and aseries of muscle activationsCoarsegrained parallelism,similarities to HPC programs Large working sets, some sharing Facesim creates visually realisticanimations of a human face Source: Eftychios Sifakis et al. Facesim Rationale Video games and other interactive animations require visualization of realistic faces in realtimeChallenging problem, humans evolved to perceive finest details in a facePhysical simulation gives excellent results, but is computationally very challengingTechnology already in use for movie productions (e.g. Pirates of the Caribbean 3)Faces are an integral part of contemporary games. Screenshot of Codemasters' “Overlord: Raising Hell” (2008). 1 2 4 816 32 0.00%1.00%2.00%3.00%4.00%5.00%Cache Size (KB)Miss Rate (%)Working Sets Facesim Characteristics Large working sets, some sharing TetrahedraFace mesh CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Ferret Overview Search engine which finds a set of images similar to a query image by analyzing their contentsServer application for contentbased similarity search of featurerich data (Princeton)Input is an image database and a series of query imagesPipeline parallelism with multiple thread poolsHuge working sets, very communication intensive Ferret Rationale Growth of world data requires methods to search and index itNoise and minor variations frequently make same content appear slightly differentTraditional approaches using key words are inflexible and don't scale wellComputationally expensiveA web interface for image similarity search. 0.00%2.00%4.00%6.00%8.00%10.00%Cache Size (KB)Miss Rate (%)Working Sets Ferret Characteristics Huge working sets, very communication intensive ImagesDatabase 1 2 4 8 16 CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Fluidanimate Overview Simulates the underlying physics of fluid motionfor realtime animation purposes with SPHalgorithmComputer animation application (Intel)Input is a list of particlesCoarsegranular parallelism, static load balancingLarge working sets, some communication Fluidanimate Rationale Physics simulations allows significantly more realistic animationsHighly demanded feature for gamesFluid animation one of most challenging effectsAlready beginning to get used in gamesAdvanced physics effects are already starting to get used in games: Tom Clancy's Ghost Recon Advanced Warfighter (2006) with (left) and without (right) PhysX effects. 1 2 4 8163264 128 256 512 1024 20484096819216384 32768 65536 131072 262144 0.00%1.00%2.00%3.00%Cache Size (KB)Miss Rate (%)Working Sets Fluidanimate Characteristics Large working sets, some communication CellsParticle data 1 2 4 8 16 CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Freqmine Overview Identifies frequently occurring patterns ina transaction databaseData mining application (Intel + Concordia)Input is a list of transactionsMediumgranular parallelism, parallelized with OpenMPHuge working sets, some sharing Freqmine Rationale Frequent Itemset Mining is already used e.g. for ecommerce (Screenshot: Amazon.com). Increasing amounts of data need to be analyzed for patternsApplies to many different areas such as marketing, computer security or computational biologyRequirements for computational processing power virtually unlimited in practice 12 4 816128512 1024 20484096163843276865536131072 262144 0.00%0.50%1.00%1.50%2.00%Cache Size (KB)Miss Rate (%)Working Sets Freqmine Characteristics Huge working sets, some sharing Transactionstree CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Raytrace Overview Uses physical simulation for visualizationComputer animation application (Intel)Input is a complex object composed of many trianglesFinegranular parallelism, dynamic load balancingLarge working sets, little communication, significant data sharing Native input for raytrace.(10 million polygons)Source: Stanford University Raytrace Rationale Physics simulations allows accurate visualizations with realistic 3D graphicsRealistic effects possible without tricks (shadows, reflections, refractions, etc.)Simpler development of games at the cost of more expensive computationsMajor companies have started to invest into ray tracing(Source: cnet, May 2008) Raytrace Characteristics Large working sets, little communicationHuge working sets containing the whole sceneExact working set sizes are datadependentEntire scene is shared among all threadsMemory bandwidth main issue for good speedups Streamcluster Overview Computes an approximation for the optimal clustering of a stream of data pointsMachine learning application (Princeton)Input is a stream of multidimensional pointsCoarsegranular parallelism, static loadbalancingMediumsized working sets of userdetermined size Working set size can be determined at the command line Streamcluster Rationale Clustering is a common problem in many fields like network security or pattern recognitionOften input data is only available as a data stream, not as a data set (e.g. huge data set that has to be processed under realtime conditions, continuously produced data, etc).Approximation algorithms have become a popular choice to handle problems which are intractable otherwise 0.00%5.00%10.00%15.00%20.00%Cache Size (KB)Miss Rate (%)Working Sets Streamcluster Characteristics Mediumsized working sets of userdetermined size Data pointsData block CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Swaptions Overview Prices a portfolio of swaptions with theHeathJarrowMorton frameworkComputational finance application (Intel)Input is a portfolio of derivativesCoarsegranular parallelism, static loadbalancingMediumsized working sets, little communication Employs Monte Carlo simulation Swaptions Rationale Computerized trading of derivatives has become widespreadHigh demand for performance: Saving few milliseconds can earn lots of moneyMonte Carlo simulation is a common approach in many different fields 0.00%1.00%2.00%3.00%4.00%5.00%Cache Size (KB)Miss Rate (%)Working Sets Swaptions Characteristics Mediumsized working sets, little communication Swaptions CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Vips Overview Applies a series of transformations to an imageMedia application (Princeton +National Gallery of London)Input is an uncompressed imageMediumgranular parallelism, dynamic loadbalancingMediumsized working sets, some sharing http://www.vips.ecs.soton.ac.uk/ Vips Rationale Image processing is one of most common operations for desktops and workstationsAmount of digital photos grows exponentiallyProfessional images can become huge but still need to be handled quicklyBenchmark based on real printdemand service at National Gallery of London The native input set for vips is a picture of the Orion galaxy with 18,000 x 18,000 pixels. 1248163264128 256 5121024 2048 4096 0.00%2.00%4.00%6.00%8.00%10.00%Cache Size (KB)Miss Rate (%)Working Sets Vips Characteristics Mediumsized working sets, some sharing Image data Image data CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes X264 Overview MPEG4 AVC / H.264 video encoderMedia application (Princeton +Open Source Community)Input is a sequence of uncompressed imageCoarsegranular pipeline parallelismMediumsized working sets, very communication intensive http://www.videolan.org/developers/x264.html X264 Rationale Increasing storage andnetwork capacity havemade videos popularShift towards digital TVMPEG4 AVC / H.264 isthe standard for nextgeneration videocompressionThe input frames for x264 were taken from the open source movie “Elephants Dream” (2006). More processing power enables better compression quality 0.00%2.00%4.00%6.00%8.00%10.00%Cache Size (KB)Miss Rate (%)Working Sets X264 Characteristics Mediumsized working sets, very communication intensive Macroblocks Referenceframes CoresTraffic (Bytes / Instr.)Cache Hits Private Reads Private Writes Shared Reads Shared Writes True Shared Reads True Shared Writes Workloads Summary There aren’t any two workloads with the same combinations Comparing Program Behavior Question:How to quantify and compare program behaviorA PrincipleComponentAnalysis (PCA) based Benchmark Analysis MethodologyPCA: a mathematical procedure (wikipediaA set of possibly correlated characteristicsA set of uncorrelatedprinciple components (PC)StepsCollect characteristics by simulations or real executionsRun the PCA procedure several PCs vectors in PCA spaceEvaluate the similarity of programs by computing the uclidean Distance of the vectors in PCA spaceVisualize similarity with scatter plots and dendrograms Redundancy & Similarity The PARSEC workloads are unique and representative PARSEC vs. SPLASH You should expect different resultsInstruction MixStatistical analysis shows significant differences. SPLASH PARSEC Sharing Systematic Differences SPLASH PARSEC Benchmark suitescluster in different areas, little overlap PARSEC and SPLASHcomplement each other well Integrate SPLASH2 into PARSEC framework PARSEC vs. SPLASH2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on ChipMultiprocessors, InProceedings of the IEEE International Symposium on Workload Characterization, September Input Set Selection/EvaluationLinearLinear impact on runtime / loopsTypically does not change working set sizesComplexFrequently affects multiple kernels at the same timeOften impacts working set sizes, can change the ratio of the kernel execution time Greedy Heuristic Rules:Use linear scaling Use combination of linear and complex scaling Question:How to choose input sets with multiple scales to meet various demands, e.g., simulation, real machine? Input Set Evaluation Four reference input scalesBoth Linear and Complex impacts are includedFidelity and Scaling of the PARSEC Benchmark Inputs, InProceedings of the IEEE International Symposium on Workload Characterization, December 2010 Input Set Similarity Most workloads form local cluster linear Pipelined Programming ModelPipelined programming model is the most common model used in productsClean interfaces and modules Parallel programming 60 Characteristics Characteristics of Workloads Using the Pipeline Programming Model, InProceedings of the 3rd Workshop on Emerging Applications and Manycore Architecture, June 2010.Significant systematic differences between the two types ofprograms Research by PARSEC“Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded Programs?”, Eddy Z. Zhang, YunlianJiang, XipengShenPPoPPBest Paper Award“Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors”,AbhishekBhattacharjeeMargaret Martonosi. PACT 2009, Best paper Finalist Part 2 Working with PARSEC Framework Directory StructurePARSEC is composed of the framework and packages Extended benchmark directory Each group directorycontains one directory perpackage in that group Framework executable filesGlobal configuration files PARSEC benchmark directory Package Directory StructureEach package directory is structured as follows: &#x/MCI; 16;&#x 000;&#x/MCI; 16;&#x 000;inputs/&#x/MCI; 18;&#x 000;&#x/MCI; 18;&#x 000;inst/&#x/MCI; 20;&#x 000;&#x/MCI; 20;&#x 000;...&#x/MCI; 22;&#x 000;&#x/MCI; 22;&#x 000;obj/&#x/MCI; 24;&#x 000;&#x/MCI; 24;&#x 000;...&#x/MCI; 26;&#x 000;&#x/MCI; 26;&#x 000;parsec/&#x/MCI; 28;&#x 000;&#x/MCI; 28;&#x 000;run/&#x/MCI; 30;&#x 000;&#x/MCI; 30;&#x 000;src/&#x/MCI; 32;&#x 000;&#x/MCI; 32;&#x 000;...&#x/MCI; 35;&#x 000;&#x/MCI; 35;&#x 000;Input archives (optional)Build directory for temporaryfiles, one subdirectory per build Build installations with onesubdirectory per installationLocal configuration filesRun directory fortemporary filesSource codeof package Configuration FilesGlobal configuration files (in directory of framework):PARSEC main configuration file: 3.0 packageSystem configurations: : sysconf&#x/MCI; 55;&#x 000;&#x/MCI; 55;&#x 000;–&#x/MCI; 27;&#x 000;&#x/MCI; 27;&#x 000;Global build configurations: bldconf&#x/MCI; 56;&#x 000;&#x/MCI; 56;&#x 000;–&#x/MCI; 31;&#x 000;&#x/MCI; 31;&#x 000;Global run configurations: : runconf&#x/MCI; 57;&#x 000;&#x/MCI; 57;&#x 000;●&#x/MCI; 36;&#x 000;&#x/MCI; 36;&#x 000;Local configuration files (in directory of each package):Local build configurations: : bldconf&#x/MCI; 59;&#x 000;&#x/MCI; 59;&#x 000;–&#x/MCI; 45;&#x 000;&#x/MCI; 45;&#x 000;Local run configurations: Hello World (1) Run the following command:parsecmgmtstatus p parsec Hello World (2) You should see some information similar to the following one:Run the following command: [PARSEC] Installation status of selected packages:[PARSEC] parsecblackscholes[PARSEC] amd64linux.gcc[PARSEC] amd64linux.gcc[PARSEC] amd64linux.gcc[PARSEC] parsec.bodytrack[PARSEC] amd64linux.gcc[PARSEC] amd64linux.gccserial[PARSEC] parsec.canneal[PARSEC] amd64linux.gcc[PARSEC] amd64linux.gccpthreads[PARSEC] amd64linux.gccserialparsecmgmtstatus p parsecsuite nameWorkload/package name Hello World (3) Run the following command:parsecmgmtstatus parsecmgmtA script to help you manage your PARSEC installationCan build and run PARSEC workloads for youOnly there for convenience, you can also do the same tasks manuallyUses information in configuration files to do its jobUse the following command to get some help: parsecmgmt Building WorkloadsYou can build a PARSEC workload as follows:Flag '' specifies the desired action, flag '' gives one or more packagesA package can be a workload, library or anything else that comes with PARSEC and can be compiled' gives you a list of all available packageswill automatically handle dependencies between packages correctly parsecmgmta build [suite].[PACKAGE] Building Workloads Q: How do you build workload Q: How do you build workload in parsec suite?Q: How do you build workload in splash2x suite? Building Workloads Answer Q: How do you build package A: You can use the following command:parsecmgmta build canneal[PARSEC] Packages to build: canneal[PARSEC] [========== Building package canneal==========][PARSEC] [----------Analyzing package canneal----------[PARSEC] cannealdepends on: hooks[PARSEC] [----------Analyzing package hooks ----------[PARSEC] hooks does not depend on any other packages.[PARSEC] [----------Building package hooks ----------[PARSEC] Copying source code of package hooks.[PARSEC] Running 'make':usr/bin/gccO3 funrollloops fprefetchlooparraysDPARSEC_VERSION=2.0 Wall std=c99 D_GNU_SOURCED_XOPEN_SOURCE=600 hooks.clibhooks.ahooks.oranliblibhooks.a[PARSEC] Running 'make install': ... Building Workloads Answer Q: How do you build package A: You can use the following command:parsecmgmta build parsec.canneal[PARSEC] Packages to build: canneal[PARSEC] [========== Building package canneal==========][PARSEC] [----------Analyzing package canneal----------[PARSEC] cannealdepends on: hooks[PARSEC] [----------Analyzing package hooks ----------[PARSEC] hooks does not depend on any other packages.[PARSEC] [----------Building package hooks ----------[PARSEC] Copying source code of package hooks.[PARSEC] Running 'make':usr/bin/gccO3 funrollloops fprefetchlooparraysDPARSEC_VERSION=2.0 Wall std=c99 D_GNU_SOURCED_XOPEN_SOURCE=600 hooks.clibhooks.ahooks.oranliblibhooks.a[PARSEC] Running 'make install': ... Building Workloads Answer A: You can use the following command: parsecmgmta build parsec.raytrace parsecmgmta build splash2x.raytrace Q: How do you build workload in parsec suite?Q: How do you build workload in splash2x suite? Suite, Groups & AliasesEach package belongs to exactly one groupalso understands aliasesYou can use group names and aliases instead of package namesExample:Current Suites are Possible aliases are and defined aliases [demo] parsecmgmta build p parsec parsecmgmta build p allparsecmgmta build p splash2x Build ConfigurationsBuild configurations determine how is to build a packageSpecifies compiler, compiler flags, optimizations, etc.Use flag '' with to select a build configurationYou should create your own build configurations according to your needsDefault build configurations are and PARSEC build configurations to enable specific parallelizationsare and Build Configurations Quiz Q: How do you build workload with build configuration Build Configurations Answer A: You can use the following command:� parsecmgmt a build p canneal c gccserial[PARSEC] Packages to build: canneal[PARSEC] [========== Building package canneal ==========][PARSEC] [----------Analyzing package canneal ----------[PARSEC] canneal depends on: hooks[PARSEC] [----------Analyzing package hooks ----------[PARSEC] hooks does not depend on any other packages.[PARSEC] [----------Building package hooks ----------[PARSEC] Copying source code of package hooks.[PARSEC] Running 'env make':/usr/bin/gcc O3 funrollloops fprefetchlooparraysDPARSEC_VERSION=2.0 Wall std=c99 D_GNU_SOURCED_XOPEN_SOURCE=600 c hooks.car rcs libhooks.a hooks.oranlib libhooks.a [PARSEC] Running 'env make install': Q: How do you build workload with build configuration Multiple BuildsYou can have more than one build of every package installedwill create a platform description string to distinguish builds as follows:You can override this string by defining environment variable PARSECPLATPARSEC 2.0 also allows you to append an extension to further distinguish builds [ARCHITECTURE][OSNAME].[BUILDCONF] Show Available InstallationsYou can see a list of all installed builds if you run: will list the platform description strings of all installed builds for each workload: parsecmgmtstatus [PARSEC] Installation status of selected packages:[PARSEC] blackscholes:[PARSEC] no installations[PARSEC] bodytrack:[PARSEC] no installations[PARSEC] canneal:[PARSEC] x86_64linuxgnu.gcc[PARSEC] x86_64linuxgnu.gccserial CleanupRemove all temporary directories (used e.g. for building):Uninstall a specific installation:Uninstall everything: parsecmgmt a fullclean parsecmgmt a uninstall p [PACKAGE] c [BUILDCONF]parsecmgmt a fulluninstall p all Running BenchmarksYou can run a PARSEC benchmark as follows:Like building workloads, but you can also specify an input and the number of threadsDefault inputs are and parsecmgmt a run p [PACKAGE] c [BUILDCONF]i [INPUT] n [THREADS] Flag '' specifies the minimumnumber of threads.The actual number can be higher. You must useother techniques to limit the number of CPUs. Input Sets Execute program, as small as possible, besteffort execution path as real inputs Stresses all machine parts required by larger input sets,same execution path as real inputsLike real inputs, runtime ~1sLike real inputs, runtime ~5sLike real inputs, runtime ~15sLike real inputs, runtime ~15min Running Benchmarks Quiz Q: How do you run the serial version of workload with input Running Benchmarks Answer A: You can use the following command:� parsecmgmt r run p canneal c gccserial i simsmall[PARSEC] Benchmarks to run: canneal[PARSEC] [========== Running benchmark canneal ==========][PARSEC] Setting up run directory.[PARSEC] Unpacking benchmark input 'simsmall'.100000.nets[PARSEC] Running '...':[PARSEC] [----------Beginning of output ----------PARSEC Benchmark Suite Version 2.0Threadcount: 110000 moves per threadStart temperature: 2000[PARSEC] [----------End of output ----------[PARSEC] Done.Q: How do you run the serial version of workload with input Log Filesstores all output of builds and runs in log filesAll log files are kept in the directory of the frameworkNaming convention: build_[DATE]_[TIMESTAMP].logrun_[DATE]_[TIMESTAMP].logand DocumentationComprehensive documentation shipped with PARSECFull set of man pages available in the directoryAdd it to the environment variable to access it (example assumes bash shell):We provide a script which does that for you (see next slide)Then you can start browsing the documentation as follows: MANPATH=${MANPATH}:${PARSECDIR}/manman parsec Environment SetupYou can modify your environment to make the PARSEC tools and its man pages available at the command line (without full path)The script in the PARSEC root directory will do that for youSource it as follows (example assumes bash shell):If you use PARSEC a lot you can add that to your login scripts to have it always available source env.sh Managing Build ConfigurationsCreate a new build configuration:In most cases you will want to create a copy of an existing build configurationUse flag '' for a hard copy and flag '' for a soft copyDelete a build configuration:Use flag '' with both tools to get more detailed usage information bldconfadd -bldconfdel - Modifying Build ConfigurationsYou should adapt build configurations to your needsEach build configuration has to define:Default environment variables for makefiles (CC, CXX, CFLAGS, ...)Build tool version numbers (CC_ver, CXX_ver, ...)It should define macro The global configuration files define all parameters, the local ones adapt them and add additional variables as needed by each package Build Configuration Quiz Q: Create a new build configuration based onthat compiles all packages without optimization butwith debugging support. Test it on workload Build Configuration Answer Quiz A: First, create a copy of build configuration Next, edit in directory to use the new flags:Q: Create a new build configuration based onthat compiles all packages without optimization butwith debugging support. Test it on workload Build Informationcreates a special file '' with information about the build in each build installation directoryFile contains details about build configuration and environment at the time of compilation:Exact location and version of all compilersCompiler flags specified by build configurationModifications of environment variablesMakes it a lot easier to figure out what was going on if build configurations were modified Build Information Quiz Q: How did modify the environment to build theserial version of workload Build Information Answer A: It's in for the configuration:PARSEC Compile Information==========================Package 'canneal'Built on Wed May 7 20:24:59 EDT 2008Configure arguments: prefix=/home/cbienia/parsec/parsec2.0/pkgs/kernels/canneal/inst/x86_64linuxlinux.gccserialEnvironment modifications: version=serialCC: /usr/bin/gccVersion: gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.214)Copyright (C) 2006 Free Software Foundation, Inc.This is free software; see the source for copying conditions. There is warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPCFLAGS: O3 funrollloops fprefetchlooparrays DPARSEC_VERSION=2.0 ... Q: How did modify the environment to build theserial version of workload How to add a workloadAdd a “hello” workloadcd“ext” directorycreate your suite“ext/user”Copy template workload to your suitechange configfile Part Roadmap of PARSEC Network workloads are ubiquitousTCP/IP stack is CPU intensiveRulethumb: 1Gbits/sec ~1Ghz Pentium CPUGbitsare here and CPUs are multicoresNeed parallelized TCP/IP stack No TCP/IP stack in existing benchmarks 99 Network Workloads Framework Goal: A framework easily run network workloads on real machine and simulatorsA userlevel, parallelized TCP/IP stackEasy to run on a simulatorEnvironment Run client and server workloads together ApproachUserlevel TCP/IP Stack (uTCP/IP)Extract the TCP/IP Stack from FreeBSD kernel Keep uTCP/IP’s behavior similar Parallelized uTCP/IP TCP/IPUse multiple methods to parallelize TCP/IPPipelined modelData parallel Two ModesInterNode IntraNode GPGPU WorkloadsMany emails asking if we provide GPGPU workloadsNeed Huge EffortsOur PlanEncourage people to port PARSEC to GPGPUSubmit your GPUversion PARSECCredits given by the new framework Part 4 Concluding Remarks PARSEC 3.0 Release planned for summer 2011 We need your contributionNetwork WorkloadsPorting PARSEC to GPGPU We are looking for contributions References[1] Christian Bienia. Benchmarking Modern MultiprocessorsPh.D. Thesis. Princeton University, January 2011[2] Christian Bieniaand SanjeevKumar and JaswinderPal Singh and Kai Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. InProceedingsof the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008.[3] Christian Bieniaand Kai Li. Fidelity and Scaling of the PARSEC Benchmark Inputs.. Proceedings of the IEEE International Symposium on Workload Characterization, December 2010[4] Christian Bieniaand Kai Li. Characteristics of Workloads Using the Pipeline Programming Model. InProceedings of the 3rd Workshop on Emerging Applications and Manycore Architecture, June 2010[5] Christian Bienia, SanjeevKumar and Kai Li. PARSEC vs. SPLASH2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on ChipMultiprocessors. Proceedings of the IEEE International Symposium on Workload Characterization, September 2008[6] YungangBao, Christian Bieniaand Kai Li. A Framework for Benchmarking Network Workloads. TR_110415, 2011. Open Discussion Where do you think PARSEC should go?What has to change?Questions? The PARSEC Benchmark Suite TutorialPARSEC 3.0 YungangBao, Christian BieniaKai LiPrinceton University PARSEC Hooks Write code once, automatically insert into all workloads simply by rebuilding them The hooks API functions are called at specific, predefined locations by all workloadsImplemented as a libraryComes with several useful features already implemented (see in hooks package)Read the man pages for detailed explanations Enabling PARSEC HooksDefine macro ENABLE_PARSEC_HOOKS (and tell the compiler and linker to use the hooks header files and library)The following flags work with gcc:For CFLAGS: I${PARSECDIR}/pkgs/hooks/inst/${PARSECPLAT}/For LDFLAGS: L${PARSECDIR}/pkgs/libs/hooks/inst/${PARSECPLAT}/libFor LIBS: The build configuration gcchooks does this already by default PARSEC Hooks API Initialization Parallel phase Cleanup Parallel CodeSerial CodeApplication StartApplication EndCall tovoid __parsec_bench_begin(enum __parsec_benchmark __bench) Call tovoid __parsec_bench_end() Call tovoid __parsec_roi_begin()Call tovoid __parsec_roi_end() Region of Interest PARSEC Hooks Features Measure execution time of ROI Define in (enabled by default) Control thread affinity via environment variables Define in (enabled by default, Linux only) Execute Simics “Magic Instruction” before and after ROI Define in (disabled by default, Simics simulations only) Assisting Simulations with PARSEC Hooks Parallel phase Call tovoid __parsec_roi_begin()Call tovoid __parsec_roi_end() Possible actions:Create checkpointSwitch from fastforward to detailed simulation Possible actions:Terminate simulationSwitch to fastforwardAnalyze simulation results You can use PARSEC Hooks to eliminateunnecessary simulation time: PARSEC Hooks Quiz Q: Use PARSEC hooks to print out “Entering ROI” if buildconfiguration is used. Test it with PARSEC Hooks Answer (1) A: Add a print statement to Define macro for build configuration printf(HOOKS_PREFIX“ I like PARSECQ: Use PARSEC hooks to print out “I like PARSEC” if buildconfiguration is used. Test it with PARSEC Hooks Answer (1) A: Remove any existing installations of Build and run Q: Use PARSEC hooks to print out “I like PARSEC” if buildconfiguration is used. Test it with