/
CS6303 COMPUTER ARCHITECTURE. CS6303 COMPUTER ARCHITECTURE.

CS6303 COMPUTER ARCHITECTURE. - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
352 views
Uploaded On 2018-11-09

CS6303 COMPUTER ARCHITECTURE. - PPT Presentation

8 Great Ideas in Computer Architecture The following are eight great ideas that computer architects have invented in the last 60 years of computer design 1 Design for Moores Law 2 ID: 723990

computer memory power instruction memory computer instruction power time addressing performance clock instructions program energy processor design cont

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS6303 COMPUTER ARCHITECTURE." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS6303 COMPUTER ARCHITECTURE.Slide2

8 Great Ideas in Computer Architecture

.

The following

are eight great ideas that computer architects have invented in the last 60 years of computer design

.

1. Design for Moore’s Law.

2.

Use Abstraction to Simplify Design

3.

Make the common case

fast

4.

Performance via parallelism

5.

Performance via pipelining

6.

Performance via prediction

7.

 Hierarchy of memories

8.

 Dependability via redundancySlide3

Design for Moore’s Law.

The Number of Transistors in a integrated circuit doubles approximately every two years. (

Gordon Moore, one of the founders of Intel.)

As computer designs can take years, the resources available per chip can easily double or quadruple between the start and finish of the project.

Computer architects must anticipate where the technology will be when the design finishes rather than design for where it starts.Slide4

8 Great Ideas in Computer Architecture.

2.

Use Abstraction to Simplify Design

In

 computer architecture, a computer system is usually represented as consisting of five abstraction levels: hardware  firmware, assembler, operating system and processes.

In

 computer science, an abstraction level is a generalization of a model or 

algorithm.

The

simplification provided by a good abstraction layer

facilitates

easy

reuse.

Both computer architects and programmers had to invent techniques to make

themselves

more productive, for otherwise design time would lengthen as dramatically as

resources

grew by Moore's Law

.

-

A major productivity technique for hardware and soft ware is to use abstractions to

represent

the design at different levels of representation; lower-level details are hidden

to offer

a simpler model at higher

levels.Slide5

3.

Make the common case fast

-Making

the common case fast will tend to enhance performance better than optimizing the rare case

.- It implies that you know what the common case is, which is only possible with careful experimentation and measurement. Slide6

4. Performance

via parallelism

-Computer

architects have offered designs that get more performance by performing operations in parallel. 

-Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones which are then solved concurrently

( “Parallel”).

- Parallelism has been employed for many years, mainly in high performance computing.Slide7

5.

Performance via pipelining

-Pipelining

 is a technique used in the design

of computers to increase the instruction throughput (the number of instructions that can be executed in a unit of time).

-

The

basic instruction cycle is broken up into a series

of pipeline stages.

--Rather

than processing each instruction sequentially (one at a time, finishing one instruction before starting the next), each

instruction

is split up into a sequence of steps so

that different

steps can be executed concurrently (by different

circuitry

) and in parallel (at the same time

).

--Pipelining

increases instruction throughput by performing multiple operations at the same time (in parallel), but does not reduce instruction latency (the time to complete a single instruction from start to finish) as it still must go through all steps.Slide8

6.

 Performance via prediction

--To improve the flow and throughput in a instruction pipeline,

 Branch predictors play a critical role in achieving

high effective  performance in many modern pipelined microprocessor architectures.

--

Without branch prediction, the processor would have to wait until the conditional jump instruction has passed the execute stage before the next instruction can enter the fetch stage in the pipeline

.

--

The branch predictor attempts to avoid this waste of time by trying to guess whether the conditional jump is most likely to be taken or not taken.Slide9

7.

Hierarchy of memories--

Programmers want memory to be fast, large, and cheap, as memory speed often shapes performance, capacity limits the size of problems that can be solved, and the cost of memory today is often the majority of computer cost

.

--Architects have found that they can address these conflicting demands with a hierarchy of memories, with the fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom. 

--Caches

give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy and nearly as big and cheap as the bottom of the

hierarchy.Slide10

Memory Hierarchy.Slide11

Levels of Memory Hierarchy

CPU Registers

100s Bytes

1s ns

Cache

K Bytes

4 ns

1-0.1 cents/bit

Main Memory

M Bytes

100ns- 300ns

$.0001-.00001 cents /bit

Disk

G Bytes, 10 ms

(10,000,000 ns)

10 - 10 cents/bit

-5

-6

Capacity

Access Time

Cost

Tape

infinite

sec-min

10

-8

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

Upper Level

Lower Level

faster

LargerSlide12

8.

Dependability via redundancy

--

Computers not only need to be fast; they need to be dependable. 

-- Since any physical device can fail, we make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures. 

Examples: Systems

designers usually provide failover capability in servers, systems or networks requiring continuous availability -- the used term is high availability -- and a high degree of reliability

.Slide13

System Software:

Software that provides services that are commonly useful, including operating systems, compilers, loaders and assemblers.

Operating System:

Supervising Program that manages the resources of a computer for the benefit of the programs that run on that computer.

Compiler: A Program that translates high-level language statements into assembly language statements.

binary digit

:

Also called a

bit

. One of the two numbers in base 2 (0 or 1)

that are the components of information.

Instruction :

A command that computer hardware understands and obeys.

assembler :

A program that translates a symbolic version of instructions(Assembly language program) into the binary version.

assembly language:

A symbolic representation of machine instructions. Instructions are written using Mnemonics.

machine language:

A binary representation of machine instructions.Slide14

High level Programming Language:

A portable language such as C, C++, Java or Visual Basic that is composed of words and algebraic notation that can be translated by a compiler into assembly language.

Input Device :

A mechanism through which the computer is fed information, such as a keyboard.

output device: A mechanism that conveys the result of a computation to a user, such as a display, or to another computer.

liquid crystal display:

A display technology using a thin layer of liquid

polymers that can be used to transmit or block light according to whether a charge is applied.

active matrix display:

A liquid crystal display using a transistor to

control the transmission of light at each individual pixel.

pixel :

The smallest individual picture element. Screens are composed of hundreds of thousands to millions of pixels, organized in a matrix.Slide15

Integrated circuit:

Also called a chip.

A device combining dozens to millions of transistors.

central processor unit(CPU) :

Also called processor. The active part of the computer, which contains the datapath and control and which adds numbers, tests numbers, signals I/O devices to activate, and so on.

Datapath

:

The component of the processor that performs arithmetic operations.

control :

The component of the processor that commands the

datapath

,

memory, and I/O devices according to the instructions of the program.

Memory:

The storage area in which programs are kept when they are

running and that contains the data needed by the running programs.

dynamic random access memory (DRAM):

Memory built as an

integrated circuit; it provides random access to any location. Access times

are 50 nanoseconds and cost per gigabyte in 2012 was $5 to $10.Slide16

cache memory :

A small, fast memory that acts as a buffer for a slower, larger memory.

static random access memory (SRAM):

Also memory built as an integrated circuit, but faster and less dense than DRAM.

main memory :Also called primary memory. Memory used to hold programs while they are running; typically consists

of DRAM in today’s computers.

secondary memory:

Nonvolatile memory used to store programs and data between runs; typically consists of flash

memory in PMDs and magnetic disks in servers.

magnetic disk :

Also called hard disk. A form of nonvolatile secondary memory composed of rotating platters coated

with a magnetic recording material. Because they are rotating mechanical devices, access times are about 5 to 20 milliseconds and cost per gigabyte in 2012 was $0.05 to $0.10.Slide17

flash memory:

A nonvolatile semiconductor memory. It is cheaper and slower

than DRAM but more expensive per bit and faster than magnetic disks.

Access times are about 5 to 50 microseconds and cost per gigabyte in 2012was $0.75 to $1.00.

local area network (LAN) :A network designed to carry data within a geographicallyconfined area, typically within a single building.

wide area network(WAN):

A network extended over hundreds of kilometers that can

span a continent.Slide18

Components of a Computer System.

Five Classic components are:

1.Input 2. Output 3. Memory 4. Datapath 5. Control.

The processor gets instructions and data from memory. Input writes data to memory, and output reads data from memory. Control sends the signals that determine the operations of the datapath, memory, input, and output.

LCD :The most fascinating I/O device is probably the graphics display. Most personal mobile devices use liquid crystal displays (LCDs) to get a thin, low-power display.

Touch screen: While PCs also use LCD displays, the tablets and smart phones of the Post PC era have replaced the keyboard and mouse with touch sensitive displays, which has the wonderful user interface advantage of users pointing directly what they are interested in rather than indirectly with a mouse.Slide19

Components cont…

--While there are a variety of ways to implement a touch screen, many tablets today use capacitive sensing. Since people are electrical conductors, if an insulator like glass is covered with a transparent conductor, touching distorts the electrostatic field of the screen, which results in a change in capacitance. This technology can allow multiple touches simultaneously, which allows gestures that can lead to

attractive user interfaces.

--The list of I/O devices includes a capacitive multitouch LCD display, front facing camera, rear facing camera, microphone, headphone jack, speakers, accelerometer, gyroscope, Wi-Fi network, and Bluetooth network.

Integrated Circuits (IC’s): nicknamed Chips.Slide20

Components cont…

---The

processor is the active part of the computer, following the

instructions of a program. It adds numbers, tests numbers, signals I/O devices to activate, and so on. Occasionally, people call the processor the CPU, central processing unit.

--The datapath performs the arithmetic operations, and control tells the datapath, memory, and I/O devices what to do according to the wishes of the instructions of the program.

--The memory is where the programs are kept when they are running; it also contains the data needed by the running programs. The memory is built from DRAM chips.

DRAM stands for

dynamic random access memory.

--Cache memory consists of a small, fast memory that acts as a buffer for the DRAM memory. Cache is built using a different memory technology, static random access memory(SRAM). SRAM is faster but less dense, and hence more expensive, than DRAM. SRAM and DRAM are two layers of the memory hierarchy.Slide21

Components cont…

ISA: Instruction set Architecture.

--It is the interface between the hardware and the lowest level software.

--The ISA includes anything the Programmers need to know to make a binary language program work correctly including instructions, I/O devices and so on.

-- The Operating system will encapsulate the details of doing I/O, allocating memory, and other low level system functions so that application programmers do not need to worry about such details.

-- The combination of the basic instruction set and the OS interface provided for application programmers is called the

Application binary interface.(ABI).Slide22

A Safe place to save Data.

--Volatile and Non volatile Memory.

--Main Memory or Primary Memory.

--Secondary Memory --Flash Memory.

Communicating with other Computers.--Communication: Information is exchanged between computers at high

speeds.

--Resource sharing: Rather than each computer having its own I/O devices,

computers on the network can share I/O devices.

--Nonlocal access: By connecting computers over long distances, users need not

be near the computer they are using.

-- LAN, WAN,

--Bandwidth.Slide23

Technologies for building Processors and Memory. Slide24

Technology Cont…

--The IC manufacturing process starts with a silicon crystal ingot.

-- The Ingots are 8-12 inches in diameter and about 12 to inches

long.

--An Ingot is finely sliced into wafers no more than 0.1 inches

thick.

--These wafers then go through a series of processing steps, during which patterns of chemicals are placed on each wafer, creating the Transistors, conductors, and insulators .

-- In figure, One wafer produced 20 dies ,of which 17 passed testing.( X means the die is bad). The yield of good dies in this case was 17/20. or 85 %.

--These good dies are then bonded into packages(

connected to the input/output pins of a Package)

and tested one more time before shipping the packaged parts to customers.

--As in fig, one bad packaged part was found in the final list.Slide25

die :

The individual rectangular sections that are cut from a wafer, more informally known as

chips.

yield :The percentage of good dies from the total number of dies on the wafer.

Transistor: An on/ off switch controlled by an electric signal.VLSI : A device(IC) containing millions of transistors.

Silicon:

A natural element that is a semiconductor.

Semiconductor:

A substance that does not conduct electricity well.Slide26

Performance:

Response time or execution

time:

The total time required for the computer to complete a task, including disk accesses, memoryaccesses, I/O activities, operating system overhead, CPU execution

time, and so on.Throughput: The total amount of work done in a given time.

Bandwidth:

 The amount of data that can be carried from one point to another in a given time period (usually a second). This kind of bandwidth is usually expressed in bits (of data) per second (bps). Occasionally, it's expressed as bytes per second (Bps).

clock cycles per instruction (CPI):

Average number of clock cycles per

instruction for a program or program fragment.Slide27

Performance:

we can relate performance and execution time for a computer X:Slide28
Slide29
Slide30

Measuring Performance

:

clock cycle: Also called tick, clock tick, clock period, clock, or cycle.

The time for one clock period, usually of the processor clock, whichruns at a constant rate.

clock period : The length of each clock cycle.Slide31

Example:Slide32
Slide33

Instruction Performance:

Clock cycles per instruction (CPI):

Average number of clock cycles per

instruction for a program or program fragment.

Suppose we have two implementations of the same instruction set architecture. Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program and by how much?

Example:Slide34
Slide35

The Classic CPU Performance Equation.Slide36

Example:Slide37
Slide38

The basic Components of Performance and how each is Measured.Slide39
Slide40
Slide41
Slide42

Addressing Modes:

The method used to identify the location of an operand.

The Following are the MIPS Addressing Modes.

Immediate addressing

Register addressing

Base or displacement addressing

PC-relative addressing

Pseudodirect addressingSlide43

Addressing Modes: Cont…..

1. Immediate addressing:

The operand is a constant within the instruction

itself. i.e. The operand is specified in the instruction itself.

2. Register addressing:

The operand is in a CPU register. The register is specified in the instruction.Slide44

Addressing Modes: Cont…..

3. Base or displacement addressing:

The operand is at the memory location

whose address is the sum of a register and a constant in the instruction.

4. PC-relative addressing:

The

branch address is the sum of the PC and a

constant in the instruction.Slide45
Slide46

5. Pseudo-direct addressing: T

he jump address is the 26 bits of the

instruction concatenated with the upper Four bits of the PC.Slide47

Pseudo-direct addressing Cont……Slide48

Addressing Modes Summary

The method used to identify the location of an operand.

The Following are the MIPS Addressing Modes.

Immediate addressing, where the operand is a constant within the instruction

itself.

Register addressing, where the operand is a register.

3.

Base or displacement addressing, where the operand is at the memory location

whose address is the sum of a register and a constant in the instruction.

4.

PC-relative addressing, where the branch address is the sum of the PC and a

constant in the instruction.

5.

Pseudodirect addressing, where the jump address is the 26 bits of the

instruction concatenated with the upper Four bits of the PC.Slide49

Uniprocessors To Multiprocessors:

Increasing the clock speed of Uniprocessor has reached saturation and cannot be increased beyond a certain limit because of power consumption and heat dissipation issues.

As the physical size of chip decreased, while the number of transistors/chip increased, clock speed increased, which boosted the heat dissipation across the chip to a dangerous level. Cooling & heat sink requirement issues were there.

3. There were limitations in the use of silicon surface area.

4. There were limitations in reducing the size of individual gates further.

5. To gain Performance within a single core, many techniques like pipelining, super pipelined, superscalar architectures are used .

6. Most of the early dual core processors were running at lower clock speeds, the rational behind is that a dual core processor with each running at 1Ghz should be equivalent to a single core processor running at 2

Ghz

.

7. The Problem is that this does not work in practice when the applications are not written to take advantage of the multiple processors. Until the software is written this way, unthreaded applications will run faster on a single processor than on a dual core

cpu

.Slide50

Uniprocessors To Multiprocessors Cont……..

8. In Multi-core processors, the benefit is more on throughput than on response time.

9. In the past, programmers could rely on innovations in the hardware, Architecture and compilers to double performance of their programs every 18 months without having to change a line of code.

10. Today, for programmers to get significant improvement in response time, they need to rewrite their programs to take advantage of multiple processors and also they have to improve performance of their code as the number of core increases.

The need of the hour is……..

11. Ability to write Parallel programs

12. Care must be taken to reduce Communication and Synchronization overhead. Challenges in Scheduling, load balancing have to be addressed.Slide51

Power Wall: Power & Energy in Integrated circuits.

1.Power is the biggest challenge facing the computer designer for every class of computer.

2. First, power must be brought in and distributed around the chip which includes hundreds of pins and multiple interconnection layers just for power and ground.

3. Second, power is dissipated as heat and must be removed.

How should a system architect think about performance, power and energy?

There are three primary concerns.

What is the maximum power a processor ever requires?

--If it attempts to draws more Power than a Power supply can provide, by drawing more current, the voltage will eventually drop which can cause the device to malfunction

--Modern processors can vary widely in power consumption with high peak currents. Hence, they provide voltage indexing methods that allow the processor to slow down and regulate voltage within a wider margin.

Obviously, doing so decreases the performance.Slide52

Power & Energy Cont……

What is the sustained Power consumption?

--This Metric is called Thermal design power (TDP), since it determines the cooling requirement.

--Power supply is usually designed to match or exceed TDP.

--Failure to provide adequate cooling will allow the temperature to exceed the maximum value resulting in device failure.--Modern processors provide two features to manage heat.

Reduce clock rate, thereby reducing the power.

Thermal overload trip is activated to power down the chip.

Which metric is the right one for comparing processors: energy or power?

--Power is energy per unit time.

1 watt = 1 joule per second.

Energy is a better metric because it is tied to a specific task and the time required for that task.

Power consumption will be a useful measure if the workload is fixed.

Slide53

---The dominant technology for integrated circuits is called CMOS (complementary metal oxide semiconductor). For CMOS, the primary source of energy consumption is so-called dynamic energy—that is, energy that is consumed when transistors switch states from 0 to 1 and vice versa.

--The dynamic energy depends on the capacitive loading of each transistor and the voltage applied:

Power & Energy Cont…….Slide54

Frequency switched is a function of the clock rate. The capacitive load per transistor is a function of both the number of transistors connected to an output (called the

fan out) and the technology, which determines the capacitance of both wires and

transistors.

Although dynamic energy is the primary source of energy consumption in CMOS, static energy consumption occurs because of leakage current that flows even when a transistor is off. In servers, leakage is typically responsible for 40% of the energy consumption. Thus, increasing the number of transistors increases power dissipation, even if the transistors are always off. A variety of design techniques and technology innovations are being deployed to control leakage, but it’s hard to lower voltage further.

Power & Energy Cont…….Slide55

Representing instructions in the Computer.

--Difference between the way humans instruct computers and the way computers see instructions.

--Instructions are kept in computer as a series of high and low electronic signals and may be represented as numbers.

--Each piece of an instruction can be considered as an individual number, and placing these numbers side by side forms the instruction.

--Slide56
Slide57

Representing instructions in the Computer Cont…..

Instruction format :

A form of representation of an instruction composed of fields of binary numbers.Slide58
Slide59
Slide60
Slide61
Slide62
Slide63

Memory Operands:Slide64
Slide65
Slide66
Slide67
Slide68

Logical Operations:Slide69

Instructions for making decisions:Slide70
Slide71
Slide72

ALU: Arithmetic and Logic Unit.

Hardware that performs addition, subtraction and other Logical operations such as AND

and

OR.

Exception : Also called Interrupt . An unscheduled event that disrupts program execution; used to detect overflow.Slide73

Addition and Subtraction:Slide74

Subtraction:Slide75

Four Bit Adder-

Subtractor

:Slide76

Checking Overflow

Note that in the previous slide if the numbers considered to be signed V detects overflow. V=0 means no overflow and V=1 means the result is wrong because of overflow

Overflow can be happened when adding two numbers of the same sign (both negative or positive) and result can not be shown with the available bits. It can be detected by observing the carry into sign bit and carry out of sign bit position. If these two carries are not equal an overflow occurred. That is why these two carries are applied to exclusive-OR gate to generate V. Slide77
Slide78

Multiplication:Slide79
Slide80
Slide81

An Improved Version of the Multiplication Hardware:Slide82

MIPS Multiplication Instructions:Slide83

Division:Slide84
Slide85
Slide86
Slide87
Slide88

An Improved version of the division Hardware:Slide89

MIPS Division Instructions:Slide90

MIPS Multiply & Division Instructions: