/
Advanced Architectures Advanced Architectures

Advanced Architectures - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
375 views
Uploaded On 2017-12-10

Advanced Architectures - PPT Presentation

Performance The speed at which a computer executes a program is affected by the design of its hardware processor speed clock rate memory access time etc the machine language ML instructions the instruction format instruction set etc ID: 614191

instructions instruction processors steps instruction instructions steps processors execute program time basic branch data execution executed processor memory pipeline

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Advanced Architectures" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Advanced ArchitecturesSlide2

Performance

The speed at which a computer executes a program is affected by

the design of its hardware – processor speed, clock rate, memory access time etc.

the machine language (ML) instructions – the instruction format, instruction set etc

the compiler that translates HLL programs into ML programs – how efficient the ML code generated by the compiler is.Slide3

Performance –

Memory Access Time

Techniques used to reduce memory access time are-

Use Cache Memory

Prefetch instructions and place them in the Instruction Queue in the processorThese reduce instruction fetch time close to within a processor clock cycle.

Memory

Cache

Instruction Queue

Instruction RegisterSlide4

Performance Equation

To execute a m/c instruction processor divides the actions to be taken into a sequence of basic steps

Each basic step can be executed in one clock cycle

For a clock cycle period, P the Clock rate, R = 1/P

Let T be the processor time required to execute a program written in a high level language (HLL) N is the number of machine language instructions generated corresponding to the program

S is the average basic number of steps needed to execute a single machine instructionsThe program execution time T is given as T = (N*S) /RThis is called the basic performance equationSlide5

Performance Equation

To enhance the performance of a computer the performance parameter T must be reduced

To reduce T: reduce N and S and increase R

N, S and R are interdependent

Reduction in N usually comes at the cost of greater number of basic steps per instructionReduction in S usually comes at the cost of greater number of instructions

Increasing the value of R reduces the length of the clock cycle, thereby giving less time for execution of a basic stepHence, need is to introduce features that collectively brings down TSlide6

Pipelining

Normally it is assumed that instructions are executed one after another, therefore

The total number of basic steps for a program

S1 + S2 + … + SN = N*S

where S1, S2 … SN are the individual number of basic steps for each of the N instructions of a given program and S is the average number of basic steps

N*S is the total number of clock cycles required to execute a program if all the steps of an instruction is executed by the same module in the processor and the N*S steps are executed sequentiallyThere can be multiple functional modules to execute the different steps in an instruction with these operating in a pipeline to execute the identical steps of the consecutive instructions in successive clock cycles.

Leads to overlapping execution of the successive instructions in the program Results in reduction of the program execution time Slide7

Pipelining

Pipelining takes advantage of the fact that the execution of any instruction can be broken down into a sequence of basic steps which are handled by different hardware units inside the processor

For example

Fetch (F) is performed by the system bus

Decode (D) is performed by the decode unit in the control unitExecute (E) is performed by the ALU

Write (W) is performed by the internal processor bus

Fetch Unit

Decode Unit

Execute Unit

Write Unit

InstructionSlide8

Pipelining

In pipelining, different instructions at different stages of their execution are executed at the same time

It is assumed that each of the basic steps require the same amount of time – 1 clock cycle

Completes the processing of instruction in each cycleSlide9

Pipelining

If there are K stages in the pipeline, there will be steps of K instructions parallelly being executed.

After the initial charging of the pipeline execution of one instruction will be completed in each clock cycle.

The time required to execute a program involving N*S steps will be- (N*S/K) + (K-1)

Speed-up achieved is- Time required to execute the program without pipeline

Time required to execute the program with pipeline = (N*S)/ ((N*S/K) + (K-1)) = 1/ ((1/K) + (K-1)/N*S)

≈ K for K << N*S Slide10

Dependencies – Pipeline Hazards

There are dependency conditions that prevents normal scheduling of the instruction steps in the pipeline. These lead to

pipeline

hazardThree types of dependenciesStructural Dependencies – when an instruction in the pipeline needs a hardware resource being used by another instruction

Data Dependencies – when an instruction depends on a data value produced by an instruction still in the pipelineControl Dependencies – when whether an instruction will be executed or not is determined by a control instruction which is still in the pipeline

These hazards need resolution for the instructions to execute correctly.Slide11

Structural Hazard

Usually occurs when instructions have different sequences of basic steps –

To deal with such situations:

Programmer explicitly avoids such sequence of instructions

Stalling – postponing a step in the latter instruction to avoid collision

Add more h/w resources to allow instructions to use independent resources at the same time

M2 is a operand fetch step

F5 is a stalled instruction fetch step

W3 and W4 are stalled write stepsSlide12

Data Hazard

To deal with it –

Programmer explicitly avoids such sequencing

Stalling

– Freeze latter stages until results of preceding instruction are written

Bypassing/Operand forwarding – Data available at o/p of ALU is directly forwarded to next instruction instead of writing the results

Using software – During compilation, compiler detect such hazards and introduces NOP (no operation) instructions in between

D2A is stalled because it uses output

of I1 before the results are written

into (W1)Slide13

Control Hazard

Usually as a result of a branch instruction

To deal with such hazards –

Branching delay/stalling

Introduce delay slots after every branch instruction to avoid execution of unnecessary instructions

Reorder the sequence of instructions to avoid wastage of processor cycles due to introduction of delay slotsUseful only in case of unconditional branch instructions

I3 and I4 are executed needlessly as the

branch instruction I2 jumps to

IkSlide14

Control Hazard

To deal with such hazards …

2.

Static Branch Prediction

– In case of conditional branching predict whether or not a branch will be taken or not and fetch/execute next instruction accordingly. The predicted instruction is executed but results are not written into.

Predict based on some heuristic such as –Branch is always takenBranch is taken 50% of the timeTake if branch instruction is in the beginning of a loop

3. Dynamic Branch Prediction – In static branch prediction the branch taken/not taken decision will be the same for all cases. At some point, the static prediction will lead to a wrong decision. In dynamic prediction, decision is made by looking at the instruction execution history

The probability of a branch being taken or not depends on the branch decisions taken so far.Slide15

Control Hazards

Reordering InstructionsSlide16

CISC/ RISC

CISC – Complex Instruction Set Computer

Allows different instructions to have different number and sequence of basic steps

Allows instructions of different length

RISC – Reduced Instruction Set ComputerAll instructions have a fixed length of 1 wordAll instructions require equal execution timeSlide17

RISC

Disadvantages of CISC

Complex compilers – complex machine instructions are often hard to exploit because compiler needs to find the exact machine instructions that fit the HLL construct. Having a simple (reduced) instruction set means that there are fewer instructions to choose from

Smaller programs are not necessarily fast – ML programs with fewer set of instructions ensures that the space taken by the program is less; but the number of basic steps to execute each instruction is moreSlide18

RISC

Characteristics of RISC processors:

One instruction per cycle –

With simple, one-cycle instructions, there is little or no need for microcode; the machine instructions can be hardwired. Such instructions should execute faster

Register-to-register operations – most operations are register to register, with only simple LOAD and STORE operations accessing memory. This design feature simplifies the instruction set and therefore the control unit

Simple addressing modes – Almost all RISC instructions use simple register addressing; simplifies the instruction set and the control unit

Simple instruction formats – Only one or a few formats are used. Instruction length is fixed; Field locations (especially opcode) are fixed. Benefits are:Opcode decoding and register operand accessing can occur simultaneously.

Simplified formats simplify the control unit. Instruction fetching is optimized because unit word-length unit are fetched. Pipelining is optimized easily due to these RISC featuresSlide19

Superscalar and VLIW Processors

Superscalar and Very Long Instruction Word (VLIW) processors maintain multiple instruction execution pipelines

Instructions are parallelly scheduled and simultaneously executed in these pipelinesSlide20

Superscalar/ VLIW Processors

The superscalar/ VLIW approach depends on the ability to execute multiple instructions

in parallel.

I

nstruction-level parallelism refers to the degree to which, on average, the instructions of a program can be executed in parallelData and control hazards are even more difficult to deal withSlide21

Superscalar Processors

In a Superscalar processors a normal m/c code program is made available to the processor.

The processor schedules these instructions to the pipelines in it after resolving the hazards.

Stage1

Stage2

Stage3

Stage4

Stage1

Stage2

Stage3

Stage4

Stage1

Stage2

Stage3

Stage4

Instruction Queue

Instruction Execution Pipelines

. . . .

. . . .

. . . .

. . . .Slide22

VLIW Processors

In a VLIW processor a specially designed compiler resolves the hazards at the time of compilation to prepare a m/c code program of very long instruction words.

Each of these VLIWs contain several parallelly executable instructions to be executed in the available pipelines.

Stage1

Stage2

Stage3

Stage4

Stage1

Stage2

Stage3

Stage4

Stage1

Stage2

Stage3

Stage4

VLIW

Instruction Execution Pipelines

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .Slide23

Parallel Processors

A taxonomy of different types of parallel processors was put forward by Flynn

, as follows,-

Single Instruction Single Data (SISD) stream

- A single processor executes a single instruction stream to operate on data stored in a single memory. E.g. - Uniprocessors

Single Instruction Multiple Data (SIMD) stream - A single machine instruction controls the simultaneous execution of a number of processing elements on a lockstep basis. Each processing element has an associated data memory, so that instructions are executed on different sets of data by different processors.

Multiple Instruction Single Data (MISD) stream - A sequence of data is transmitted to a set of processors, each of which executes a different instruction sequence. Not available commerciallyMultiple Instruction Multiple Data (MIMD) stream

- A set of processors simultaneously execute different instruction sequences on different data setsSlide24

Parallel ProcessorsSlide25

Parallel Processors

MIMDs can be further subdivided by the means in which the processors communicate

Multiprocessors:

Processors share a common memory; each processor accesses programs and data stored in the shared memory, and processors communicate with each other via that memory.

Multicomputers: Processors have individual memory areas; these are basically collection of independent uniprocessors

/multiprocessors and communication among the computers is either via fixed paths or some network facility. Also called clusters.