/
Parallel programs Inf-2202 Concurrent and Data-intensive Programming Parallel programs Inf-2202 Concurrent and Data-intensive Programming

Parallel programs Inf-2202 Concurrent and Data-intensive Programming - PowerPoint Presentation

zoe
zoe . @zoe
Follow
64 views
Uploaded On 2024-01-13

Parallel programs Inf-2202 Concurrent and Data-intensive Programming - PPT Presentation

Fall 2015 Lars Ailo Bongo larsabcsuitno Course topics Parallel programming The parallelization process Optimization of parallel programs Performance analysis Dataintensive computing Parallel programs ID: 1039687

parallel communication data programming communication parallel programming data design synchronization parallelization model processes architecture computer studiesparallelization issuescase architecturesfundamental outlineparallel

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Parallel programs Inf-2202 Concurrent an..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Parallel programsInf-2202 Concurrent and Data-intensive ProgrammingFall 2015Lars Ailo Bongo (larsab@cs.uit.no)

2. Course topicsParallel programmingThe parallelization processOptimization of parallel programsPerformance analysisData-intensive computing

3. Parallel programsSupercomputingScientific applicationsParallel programming was hardParallel architectures were expensiveStill important!Data intensive computingWill return to this topicServer applicationsDatabases, Web servers, App servers, etcDesktop applicationsGames, image processing, etcMobile phone applicationsMultimedia, sensor based, etcGPU and hardware accelerator applications

4. OutlineParallel architecturesFundamental design issuesCase studiesParallelization processExamples

5. Parallel architecturesA parallel computer is “a collection of processing elements that communicate and cooperate to solve large problems fast” (Almasi and Gottlieb, 1989)Conventional computer architecture+ communication among processes+ coordination among processes

6. Communication architectureHardware/ software boundary?User/ system boundary?Defines:Basic communication operationsOrganizational structures to realize these operations

7. Parallel architecturesShared address spaceMessage passingData parallel processingBulk synchronous processing (Valiant 1990)Google’s Pregel (Malewicz, et al., 2010) MapReduce (Dean & Ghemawat, 2010) and Spark (Zaharia et al, 2012)Dataflow architectures (wikipedia1, wikipedia2)VHDL, Verilog, Linda, Yahoo Pipes(?), Galaxy (?)

8. OutlineParallel architecturesFundamental design issuesCase studiesParallelization processExamples

9. Fundamental design issuesCommunication abstractionProgramming model requirementsNamingOrderingCommunication and replicationPerformance

10. Communication abstractionsWell defined operationsSuitable for optimizationCommunication abstractions in Pthreds? Go?

11. Programming modelOne or more threads of control operating on dataWhat data can be named by which threadsWhat operations can be performed on the named dataWhat ordering exists among those operationsProgramming model for a uniprocessor?Pthreads programming model?Go programming modelWhy need for explicit synchronization primitives?

12. NamingCritical at each level of the architecture

13. OperationsOperations that can be performed on the dataPthreads?Go?More exotic?

14. OrderingImportant at all layers in the architecturePerformance tricksIf implicit ordering is not enough; need synchronization:Mutual exclusionEvents / condition variablesPoint-to-pointGlobalChannels?

15. Communication and replicationRelated to each otherCachingIPCBinding of data:WriteReadData transferData copyIPC

16. PerformanceData types, addressing modes, and communication abstractions specifies naming, ordering and synchronization for shared objectsPerformance characteristics specifies how they are actually usedMetricsLatency: the time for an operationBandwidth: rate at which operations are performedCost: impact on execution time

17. OutlineParallel architecturesFundamental design issuesCase studiesParallelization processExamples

18. The Basic Local Alignment Search Tool (BLAST) BLAST finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. Popular to usePopular to parallelize

19. Nearest neighbor equation solverExample from chapter 2.3 in Parallel Computer Architecture: A Hardware/Software Approach. David Culler, J.P. Singh, Anoop Gupta. Morgan Kaufmann. 1998.Common matrix based computationWell known parallel benchmark (SOR)

20. DeduplicationMandatory assignment 2

21. OutlineParallel architecturesFundamental design issuesCase studiesParallelization processExamples

22. Parallelization processGoals: Good performanceEfficient resource utilizationLow developer effortMay be done at any layer

23. Parallelization process (2)Task: piece of workProcess/thread: entity that performs the workProcessor/core: physical processor cores

24. Parallelization process (3)Decomposition of the computation into tasksAssignment of tasks to processesOrchestration of necessary data access, communication, and synchronization among processesMapping of processes to cores

25. Steps in the parallelization process

26. DecompositionSplit computation into a collection of tasksAlgorithmicTask granularity limits parallelism Amdahl’s law

27. AssignmentAlgorithmicGoal: load balancingAll processes should do equal amount of workImportant for performanceGoal: reduce communication volumeSend minimum amount of dataTwo types:StaticDynamic

28. OrchestrationSpecific to computer architecture, programming model, and programming languageGoals:Reduce communication costReduce synchronization costLocality of dataEfficient schedulingReduce overhead

29. MappingSpecific to system or programming environmentParallel system resource allocatorQueuing systemsOS scheduler

30. Goals of parallelization processStepArchitecture dependent?Major performance goalsDecompositionMostly noExpose enough concurrency but not too muchAssignmentMostly noBalance workloadReduce communication volumeOrchestrationYesReduce noninherent communication via data localityReduce communication and synchronization cost as seen by the processorReduce serialization to shared resourcesSchedule tasks to satisfy dependencies earlyMappingYesPut related threads on the same core if necessaryExploit locality in chip and network topology

31. SummaryFundamental design issues for parallel systemsHow to write a parallel programExamples