/
I/O Prof. Hakim Weatherspoon I/O Prof. Hakim Weatherspoon

I/O Prof. Hakim Weatherspoon - PowerPoint Presentation

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
369 views
Uploaded On 2018-03-15

I/O Prof. Hakim Weatherspoon - PPT Presentation

CS 3410 Spring 2015 Computer Science Cornell University See Online PampH Chapter 656 Announcements Prelim2 Topics Lecture Lectures 10 to 24 Data and Control Hazards Chapters 4748 ID: 651823

device dma data memory dma device memory data cpu read disk controller interconnect bus devices status ram dev buf

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "I/O Prof. Hakim Weatherspoon" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

I/O

Prof. Hakim WeatherspoonCS 3410, Spring 2015Computer ScienceCornell University

See:

Online

P&H

Chapter 6.5-6Slide2

Announcements

Prelim2 TopicsLecture: Lectures 10 to 24Data and Control Hazards

(Chapters 4.7-4.8)

RISC/CISC

(Chapters

2.16-2.18,

2.21)

Calling conventions and linkers

(Chapters

2.8, 2.12, Appendix

A.1-6)

Caching and Virtual Memory (Chapter 5)

Multicore/parallelism (Chapter 6)

Synchronization (Chapter 2.11)

Traps, Exceptions, OS

(Chapter

4.9, Appendix A.7, pp

445-452)

HW2, Labs 3/4, C-Labs 2/3, PA2/3

Topics from Prelim1 (not the focus, but some possible questions)Slide3

Announcements

Project3 submit “

souped

up”

bot to

CMS

Project3 Cache Race Games

night

Monday

,

May 4

th

,

5pm

Come, eat, drink, have fun and be merry!

Location: B17 Upson Hall

Prelim2:

Thursday

,

April 30

th

in evening

Time and Location:

7:30pm

sharp

in

Statler

Auditorium

Old prelims are online in CMS

Prelim Review Session:

TODAY

, Tuesday

, April 28,

7-9pm

in B14 Hollister

Hall

Project4

:

Design Doc due May 5

th

, bring design doc to

mtg

May 4-6

Demos

: May

12 and 13

Will not be able to use slip

daysSlide4

iClicker CheckSlide5

Goals for Today

Computer System Organization

How does a processor interact with its environment?

I/O Overview

How to talk to device?

Programmed I/O

or

Memory-Mapped I/O

How to get events?

Polling

or

Interrupts

How to transfer lots of data?

Direct Memory Access (DMA)Slide6

Next Goal

How does a processor interact with its environment?Slide7

Big Picture:

Input/Output (I/O)

How does a processor interact with its environment?Slide8

Big Picture:

Input/Output (I/O)How does a processor interact with its environment?

Computer System Organization =

Memory

+

Datapath

+

Control +

Input +

OutputSlide9

I/O Devices

Enables Interacting with Environment

Device

Behavior

Partner

Data

Rate (b/sec)

Keyboard

Input

Human

100

Mouse

Input

Human

3.8k

Sound Input

Input

Machine

3M

Voice Output

Output

Human

264k

Sound Output

Output

Human

8M

Laser

Printer

Output

Human

3.2M

Graphics Display

Output

Human

800M –

8G

Network/LAN

Input/Output

Machine

100M – 10G

Network/Wireless LAN

Input/Output

Machine

11

– 54M

Optical Disk

Storage

Machine

5

– 120M

Flash memory

Storage

Machine

32 –

200M

Magnetic

Disk

Storage

Machine

800M

– 3GSlide10

Attempt#1:

All devices on one interconnect

Replace

all

devices as the interconnect changes

e.g. keyboard speed == main memory speed ?!

Unified Memory and I/O Interconnect

Memory

Display

Disk

Keyboard

NetworkSlide11

Attempt#2: I/O Controllers

Decouple I/O devices from Interconnect

Enable smarter I/O interfaces

Core0

Cache

Memory

Controller

I/O

Controller

Unified Memory and I/O Interconnect

Core1

Cache

Memory

Display

I/O

Controller

Disk

I/O

Controller

Keyboard

I/O

Controller

NetworkSlide12

Attempt#3: I/O Controllers + Bridge

S

eparate

high-performance processor, memory, display interconnect from lower-performance interconnect

Core0

Cache

Memory

Controller

I/O

Controller

High Performance

Interconnect

Core1

Cache

Memory

Display

I/O

Controller

Disk

I/O

Controller

Keyboard

I/O

Controller

Network

Lower Performance

Legacy Interconnect

Bridge Slide13

Bus Parameters

Width = number of wires

Transfer size

= data words per bus transaction

Synchronous

(with a bus clock)

or

asynchronous

(no bus clock / “self clocking”)Slide14

Bus Types

Processor – Memory

(“Front Side Bus”

.

Also QPI)

Short, fast, & wide

Mostly fixed topology, designed as a “chipset”

CPU + Caches + Interconnect + Memory Controller

I/O and Peripheral busses

(PCI, SCSI, USB, LPC, …)

Longer, slower, & narrower

Flexible topology, multiple/varied connections

Interoperability standards for devices

Connect to processor-memory bus through a bridgeSlide15

Attempt#3: I/O Controllers + Bridge

Separate high-performance processor, memory, display interconnect from lower-performance interconnectSlide16

Example Interconnects

Name

Use

Devics

per channel

Channel Width

Data Rate

(B/sec)

Firewire

800

External

63

4

100M

USB 2.0

External

127

2

60M

USB 3.0

External

127

2

625M

Parallel

ATA

Internal

1

16

133M

Serial

ATA (SATA)

Internal

1

4

300M

PCI 66MHz

Internal

1

32-64

533M

PCI

Express v2.x

Internal

1

2-64

16G/

dir

Hypertransport

v2.x

Internal

1

2-64

25G/

dir

QuickPath

(QPI)

Internal

1

40

12G/

dirSlide17

Interconnecting Components

Interconnects are (were?) busses

parallel set of wires for data and control

shared

channel

multiple senders/receivers

everyone can see all bus transactions

bus protocol: rules for using the bus wires

Alternative (and increasingly common):

dedicated point-to-point channels

e.g. Intel

Xeon

e.g. Intel

NehalemSlide18

Attempt#4: I/O

Controllers+Bridge+ NUMA

Remove bridge as bottleneck with Point-to-point interconnects

E.g. Non-Uniform Memory Access (NUMA)Slide19

Takeaways

Diverse I/O devices require hierarchical interconnect which is more recently transitioning to point-to-point topologies.Slide20

Next Goal

How does the processor interact with I/O devices?Slide21

I/O Device Driver Software Interface

Set of methods to write/read data to/from device and control deviceExample

: Linux Character Devices

//

Open a toy " echo " character device

int

fd

=

open("/

dev

/echo", O_RDWR);

// Write to the device

char

write_buf

[]

=

"Hello World!";

write(

fd

,

write_buf

,

sizeof

(

write_buf

));

// Read from the device

char

read_buf

[32];

read(fd, read_buf

, sizeof(read_buf));

// Close the deviceclose(fd);// Verify the result

assert(strcmp(write_buf,

read_buf)==0);Slide22

I/O Device API

Typical I/O Device APIa set of read-only or read/write register

s

Command registers

writing causes device to do something

Status registers

reading indicates what device is doing, error codes, …

Data registers

Write: transfer data to a device

Read: transfer data from a device

Every device uses this APISlide23

I/O Device API

Simple (old) example: AT Keyboard Device

8-bit Status

:

8-bit Command

:

0xAA = “self test”

0xAE = “enable

kbd

0xED = “set LEDs”

8-bit Data

:

scancode

(when reading)

LED state (when writing) or …

PE

TO

AUXB

LOCK

AL2

SYSF

IBS

OBS

Input

Buffer

Status

Input

Buffer

StatusSlide24

Communication Interface

Q: How does program

OS

code talk to device?

A: special instructions to talk over special busses

Programmed I/O

inb

$a, 0x64

outb

$a, 0x60

Specifies: device, data, direction

Protection: only allowed in kernel mode

*x86: $a implicit; also

inw

,

outw

,

inh, outh, …

Interact with

cmd

, status, and

data device registers directly

k

bd

status register

k

bd

data register

Kernel boundary

crossinging

is expensiveSlide25

Communication Interface

Q: How does program

OS

code talk to device?

A: Map registers into virtual address space

Memory-mapped I/O

Accesses to certain addresses redirected to I/O devices

Data goes over the memory bus

Protection: via bits in

pagetable

entries

OS+MMU+devices

configure mappings

Faster. Less boundary crossingSlide26

Memory-Mapped I/O

Physical

Address

Space

Virtual

Address

Space

0xFFFF FFFF

0x00FF FFFF

0x0000 0000

0x0000 0000

Display

Disk

Keyboard

Network

I/O

Controller

I/O

Controller

I/O

Controller

I/O

ControllerSlide27

Device Drivers

Programmed I/O

char

read_kbd

()

{

do {

sleep

();

status =

inb

(0x64);

} while(!(status & 1));

return

inb

(0x60);

}

Memory Mapped I/O

struct

kbd

{

char status, pad[3];

char data, pad[3];

};

kbd

*k =

mmap

(...);

char read_kbd

(){ do {

sleep(); status = k->status; } while(!(status & 1));

return k->data;}

syscall

syscall

NO

syscall

Polling examples,

But

mmap

I/O more

efficientSlide28

Comparing Programmed I/O

vs Memory Mapped I/OProgrammed I/O

Requires

special instructions

Can

require dedicated hardware interface to devices

Protection

enforced via

kernel

mode access to instructions

Virtualization

can be difficult

Memory-Mapped

I/O

Re-uses

standard load/store instructions

Re-uses

standard memory hardware interfaceProtection enforced with normal memory protection schemeVirtualization enabled with normal memory virtualization schemeSlide29

Takeaways

Diverse I/O devices require hierarchical interconnect which is more recently transitioning to point-to-point topologies.

Memory-mapped I/O is an elegant technique

to read/write

device registers with

standard load/stores.Slide30

Next Goal

How does the processor know device is ready/done?Slide31

Communication Method

Q: How does program learn device is ready/done?A:

Polling:

Periodically check I/O status register

If device ready, do operation

If device done, …

If error, take action

Pro? Con?

Predictable timing & inexpensive

But: wastes CPU cycles if nothing to do

Efficient if there is always work to do (e.g. 10Gbps NIC)

Common in small, cheap, or real-time embedded systems

Sometimes for very active devices too…

char

read_kbd

()

{

do {

sleep

();

status =

inb

(0x64);

} while(!(status & 1));

return

inb

(0x60); }Slide32

Communication Method

Q: How does program learn device is ready/done?A:

Interrupts:

Device sends interrupt to CPU

Cause register identifies the interrupting device

interrupt handler examines device, decides what to do

Priority interrupts

Urgent events can interrupt lower-priority interrupt handling

OS can

disable

defer interrupts

Pro? Con?

More efficient: only interrupt when device ready/done

Less efficient: more expensive since save CPU context

CPU context: PC, SP, registers,

etc

Con: unpredictable b/c event arrival depends on other devices’ activitySlide33

Takeaways

Diverse I/O devices require hierarchical interconnect which is more recently transitioning to point-to-point topologies.

Memory-mapped I/O is an elegant technique

to read/write

device registers with

standard load/stores.

Interrupt-based I/O avoids the wasted work in

polling-based I/O and is usually more efficientSlide34

Next Goal

How do we transfer a lot of data efficiently?Slide35

I/O Data Transfer

How to talk to device? Programmed I/O or Memory-Mapped I/O

How to get events?

Polling or Interrupts

How to transfer lots of data?

disk->

cmd

= READ_4K_SECTOR;

disk->data = 12;

while (!(disk->status & 1) { }

for (

i

= 0..4k)

buf

[

i

] = disk->data;

Very,

Very,

ExpensiveSlide36

I/O Data Transfer

Programmed I/O xfer: Device



CPU



RAM

for (

i

= 1 .. n)

CPU issues read request

Device puts data on bus

& CPU reads into registers

CPU writes data to memory

Not

efficient

CPU

RAM

DISK

Read from Disk

Write to Memory

Everything

interrupts CPU

Wastes

CPUSlide37

I/O Data Transfer

Q: How to transfer lots of data efficiently?

A: Have device access memory directly

Direct memory access (DMA)

1) OS provides starting address, length

2) controller (or device) transfers data autonomously

3) Interrupt on completion / errorSlide38

DMA: Direct Memory Access

Programmed I/O xfer: Device



CPU



RAM

for (

i

= 1 .. n)

CPU issues read request

Device puts data on bus

& CPU reads into registers

CPU writes data to memory

CPU

RAM

DISKSlide39

DMA: Direct Memory Access

Programmed I/O xfer

: Device



CPU



RAM

for (

i

= 1 .. n)

CPU issues read request

Device puts data on bus

& CPU reads into registers

CPU writes data to memory

DMA

xfer

: Device

 RAMCPU sets up DMA requestfor (i = 1 ... n) Device puts data on bus & RAM accepts itDevice interrupts CPU after done

CPU

RAM

DISK

CPU

RAM

DISK

1) Setup

2

) Transfer

3) Interrupt after doneSlide40

DMA Example

DMA example: reading from audio (mic) inputDMA engine on audio device… or I/O controller … or …

int

dma_size

= 4*PAGE_SIZE;

int

*

buf

=

alloc_dma

(

dma_size

);

...

dev->

mic_dma_baseaddr

= (int)

buf

;

dev->

mic_dma_count

=

dma_len

;

dev->

cmd

= DEV_MIC_INPUT |

DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;Slide41

DMA Issues (1): Addressing

Issue #1: DMA meets Virtual MemoryRAM: physical addresses

Programs: virtual addresses

Solution: DMA uses physical addresses

OS uses physical address when setting up DMA

OS allocates contiguous physical pages for DMA

Or: OS splits

xfer

into page-sized chunks

(many devices support DMA “chains” for this reason)

CPU

RAM

DISK

MMUSlide42

DMA Example

DMA example: reading from audio (mic) inputDMA engine on audio device… or I/O controller … or …

int

dma_size

= 4*PAGE_SIZE;

void *

buf

=

alloc_dma

(

dma_size

);

...

dev->

mic_dma_baseaddr

=

virt_to_phys

(

buf

);

dev->

mic_dma_count

=

dma_len

;

dev->

cmd

= DEV_MIC_INPUT |

DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;Slide43

DMA Issues (1): Addressing

Issue #1: DMA meets Virtual MemoryRAM: physical addresses

Programs: virtual addresses

Solution 2: DMA uses virtual addresses

OS sets up mappings on a mini-TLB

CPU

RAM

DISK

MMU

uTLBSlide44

DMA Issues (2): Virtual Mem

Issue #2: DMA meets

Paged

Virtual Memory

DMA destination page

may get swapped out

Solution:

Pin

the page before initiating DMA

Alternate solution:

Bounce Buffer

DMA to a pinned kernel page, then

memcpy

elsewhere

CPU

RAM

DISKSlide45

DMA Issues (4): Caches

Issue #4: DMA meets Caching

DMA-related data could

be cached in L1/L2

DMA to

Mem

: cache is now stale

DMA from

Mem

: dev gets stale data

Solution: (software enforced coherence)

OS flushes some/all cache before DMA begins

Or: don't touch pages during DMA

Or: mark pages as

uncacheable

in page table entries

(needed for Memory Mapped I/O too!)

CPU

RAM

DISK

L2Slide46

DMA Issues (4): Caches

Issue #4: DMA meets Caching

DMA-related data could

be cached in L1/L2

DMA to

Mem

: cache is now stale

DMA from

Mem

: dev gets stale data

Solution 2: (hardware coherence aka

snooping

)

cache listens on bus, and conspires with RAM

DMA to

Mem

: invalidate/update data seen on bus

DMA from mem: cache services request if possible, otherwise RAM services

CPU

RAM

DISK

L2Slide47

Takeaways

Diverse I/O devices require hierarchical interconnect which is more recently transitioning to

point-to-point

topologies.

Memory-mapped I/O is an elegant technique

to read/write

device registers with

standard load/stores.

Interrupt-based I/O avoids the wasted work in

polling-based I/O and is usually more

efficient.

Modern systems combine memory-mapped I/O,

interrupt-based I/O, and direct-memory access

to create sophisticated I/O device

subsystems.Slide48

I/O Summary

How to talk to device?

Programmed I/O

or

Memory-Mapped I/O

How to get events?

Polling

or

Interrupts

How to transfer lots of data?

DMASlide49

Announcements

Project3 submit “

souped

up”

bot to

CMS

Project3 Cache Race Games

night

Monday

,

May 4

th

,

5pm

Come, eat, drink, have fun and be merry!

Location: B17 Upson Hall

Prelim2:

Thursday

,

April 30

th

in evening

Time and Location:

7:30pm

sharp

in

Statler

Auditorium

Old prelims are online in CMS

Prelim Review Session:

TODAY, Tuesday

, April 28, 7-9pm in B14 Hollister Hall

Project4: Design Doc due May 5th, bring design doc to mtg May 4-6Demos: May

12 and 13Will not be able to use slip days