CS 3410 Spring 2015 Computer Science Cornell University See Online PampH Chapter 656 Announcements Prelim2 Topics Lecture Lectures 10 to 24 Data and Control Hazards Chapters 4748 ID: 651823
Download Presentation The PPT/PDF document "I/O Prof. Hakim Weatherspoon" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
I/O
Prof. Hakim WeatherspoonCS 3410, Spring 2015Computer ScienceCornell University
See:
Online
P&H
Chapter 6.5-6Slide2
Announcements
Prelim2 TopicsLecture: Lectures 10 to 24Data and Control Hazards
(Chapters 4.7-4.8)
RISC/CISC
(Chapters
2.16-2.18,
2.21)
Calling conventions and linkers
(Chapters
2.8, 2.12, Appendix
A.1-6)
Caching and Virtual Memory (Chapter 5)
Multicore/parallelism (Chapter 6)
Synchronization (Chapter 2.11)
Traps, Exceptions, OS
(Chapter
4.9, Appendix A.7, pp
445-452)
HW2, Labs 3/4, C-Labs 2/3, PA2/3
Topics from Prelim1 (not the focus, but some possible questions)Slide3
Announcements
Project3 submit “
souped
up”
bot to
CMS
Project3 Cache Race Games
night
Monday
,
May 4
th
,
5pm
Come, eat, drink, have fun and be merry!
Location: B17 Upson Hall
Prelim2:
Thursday
,
April 30
th
in evening
Time and Location:
7:30pm
sharp
in
Statler
Auditorium
Old prelims are online in CMS
Prelim Review Session:
TODAY
, Tuesday
, April 28,
7-9pm
in B14 Hollister
Hall
Project4
:
Design Doc due May 5
th
, bring design doc to
mtg
May 4-6
Demos
: May
12 and 13
Will not be able to use slip
daysSlide4
iClicker CheckSlide5
Goals for Today
Computer System Organization
How does a processor interact with its environment?
I/O Overview
How to talk to device?
Programmed I/O
or
Memory-Mapped I/O
How to get events?
Polling
or
Interrupts
How to transfer lots of data?
Direct Memory Access (DMA)Slide6
Next Goal
How does a processor interact with its environment?Slide7
Big Picture:
Input/Output (I/O)
How does a processor interact with its environment?Slide8
Big Picture:
Input/Output (I/O)How does a processor interact with its environment?
Computer System Organization =
Memory
+
Datapath
+
Control +
Input +
OutputSlide9
I/O Devices
Enables Interacting with Environment
Device
Behavior
Partner
Data
Rate (b/sec)
Keyboard
Input
Human
100
Mouse
Input
Human
3.8k
Sound Input
Input
Machine
3M
Voice Output
Output
Human
264k
Sound Output
Output
Human
8M
Laser
Printer
Output
Human
3.2M
Graphics Display
Output
Human
800M –
8G
Network/LAN
Input/Output
Machine
100M – 10G
Network/Wireless LAN
Input/Output
Machine
11
– 54M
Optical Disk
Storage
Machine
5
– 120M
Flash memory
Storage
Machine
32 –
200M
Magnetic
Disk
Storage
Machine
800M
– 3GSlide10
Attempt#1:
All devices on one interconnect
Replace
all
devices as the interconnect changes
e.g. keyboard speed == main memory speed ?!
Unified Memory and I/O Interconnect
Memory
Display
Disk
Keyboard
NetworkSlide11
Attempt#2: I/O Controllers
Decouple I/O devices from Interconnect
Enable smarter I/O interfaces
Core0
Cache
Memory
Controller
I/O
Controller
Unified Memory and I/O Interconnect
Core1
Cache
Memory
Display
I/O
Controller
Disk
I/O
Controller
Keyboard
I/O
Controller
NetworkSlide12
Attempt#3: I/O Controllers + Bridge
S
eparate
high-performance processor, memory, display interconnect from lower-performance interconnect
Core0
Cache
Memory
Controller
I/O
Controller
High Performance
Interconnect
Core1
Cache
Memory
Display
I/O
Controller
Disk
I/O
Controller
Keyboard
I/O
Controller
Network
Lower Performance
Legacy Interconnect
Bridge Slide13
Bus Parameters
Width = number of wires
Transfer size
= data words per bus transaction
Synchronous
(with a bus clock)
or
asynchronous
(no bus clock / “self clocking”)Slide14
Bus Types
Processor – Memory
(“Front Side Bus”
.
Also QPI)
Short, fast, & wide
Mostly fixed topology, designed as a “chipset”
CPU + Caches + Interconnect + Memory Controller
I/O and Peripheral busses
(PCI, SCSI, USB, LPC, …)
Longer, slower, & narrower
Flexible topology, multiple/varied connections
Interoperability standards for devices
Connect to processor-memory bus through a bridgeSlide15
Attempt#3: I/O Controllers + Bridge
Separate high-performance processor, memory, display interconnect from lower-performance interconnectSlide16
Example Interconnects
Name
Use
Devics
per channel
Channel Width
Data Rate
(B/sec)
Firewire
800
External
63
4
100M
USB 2.0
External
127
2
60M
USB 3.0
External
127
2
625M
Parallel
ATA
Internal
1
16
133M
Serial
ATA (SATA)
Internal
1
4
300M
PCI 66MHz
Internal
1
32-64
533M
PCI
Express v2.x
Internal
1
2-64
16G/
dir
Hypertransport
v2.x
Internal
1
2-64
25G/
dir
QuickPath
(QPI)
Internal
1
40
12G/
dirSlide17
Interconnecting Components
Interconnects are (were?) busses
parallel set of wires for data and control
shared
channel
multiple senders/receivers
everyone can see all bus transactions
bus protocol: rules for using the bus wires
Alternative (and increasingly common):
dedicated point-to-point channels
e.g. Intel
Xeon
e.g. Intel
NehalemSlide18
Attempt#4: I/O
Controllers+Bridge+ NUMA
Remove bridge as bottleneck with Point-to-point interconnects
E.g. Non-Uniform Memory Access (NUMA)Slide19
Takeaways
Diverse I/O devices require hierarchical interconnect which is more recently transitioning to point-to-point topologies.Slide20
Next Goal
How does the processor interact with I/O devices?Slide21
I/O Device Driver Software Interface
Set of methods to write/read data to/from device and control deviceExample
: Linux Character Devices
//
Open a toy " echo " character device
int
fd
=
open("/
dev
/echo", O_RDWR);
// Write to the device
char
write_buf
[]
=
"Hello World!";
write(
fd
,
write_buf
,
sizeof
(
write_buf
));
// Read from the device
char
read_buf
[32];
read(fd, read_buf
, sizeof(read_buf));
// Close the deviceclose(fd);// Verify the result
assert(strcmp(write_buf,
read_buf)==0);Slide22
I/O Device API
Typical I/O Device APIa set of read-only or read/write register
s
Command registers
writing causes device to do something
Status registers
reading indicates what device is doing, error codes, …
Data registers
Write: transfer data to a device
Read: transfer data from a device
Every device uses this APISlide23
I/O Device API
Simple (old) example: AT Keyboard Device
8-bit Status
:
8-bit Command
:
0xAA = “self test”
0xAE = “enable
kbd
”
0xED = “set LEDs”
…
8-bit Data
:
scancode
(when reading)
LED state (when writing) or …
PE
TO
AUXB
LOCK
AL2
SYSF
IBS
OBS
Input
Buffer
Status
Input
Buffer
StatusSlide24
Communication Interface
Q: How does program
OS
code talk to device?
A: special instructions to talk over special busses
Programmed I/O
inb
$a, 0x64
outb
$a, 0x60
Specifies: device, data, direction
Protection: only allowed in kernel mode
*x86: $a implicit; also
inw
,
outw
,
inh, outh, …
Interact with
cmd
, status, and
data device registers directly
k
bd
status register
k
bd
data register
Kernel boundary
crossinging
is expensiveSlide25
Communication Interface
Q: How does program
OS
code talk to device?
A: Map registers into virtual address space
Memory-mapped I/O
Accesses to certain addresses redirected to I/O devices
Data goes over the memory bus
Protection: via bits in
pagetable
entries
OS+MMU+devices
configure mappings
Faster. Less boundary crossingSlide26
Memory-Mapped I/O
Physical
Address
Space
Virtual
Address
Space
0xFFFF FFFF
0x00FF FFFF
0x0000 0000
0x0000 0000
Display
Disk
Keyboard
Network
I/O
Controller
I/O
Controller
I/O
Controller
I/O
ControllerSlide27
Device Drivers
Programmed I/O
char
read_kbd
()
{
do {
sleep
();
status =
inb
(0x64);
} while(!(status & 1));
return
inb
(0x60);
}
Memory Mapped I/O
struct
kbd
{
char status, pad[3];
char data, pad[3];
};
kbd
*k =
mmap
(...);
char read_kbd
(){ do {
sleep(); status = k->status; } while(!(status & 1));
return k->data;}
syscall
syscall
NO
syscall
Polling examples,
But
mmap
I/O more
efficientSlide28
Comparing Programmed I/O
vs Memory Mapped I/OProgrammed I/O
Requires
special instructions
Can
require dedicated hardware interface to devices
Protection
enforced via
kernel
mode access to instructions
Virtualization
can be difficult
Memory-Mapped
I/O
Re-uses
standard load/store instructions
Re-uses
standard memory hardware interfaceProtection enforced with normal memory protection schemeVirtualization enabled with normal memory virtualization schemeSlide29
Takeaways
Diverse I/O devices require hierarchical interconnect which is more recently transitioning to point-to-point topologies.
Memory-mapped I/O is an elegant technique
to read/write
device registers with
standard load/stores.Slide30
Next Goal
How does the processor know device is ready/done?Slide31
Communication Method
Q: How does program learn device is ready/done?A:
Polling:
Periodically check I/O status register
If device ready, do operation
If device done, …
If error, take action
Pro? Con?
Predictable timing & inexpensive
But: wastes CPU cycles if nothing to do
Efficient if there is always work to do (e.g. 10Gbps NIC)
Common in small, cheap, or real-time embedded systems
Sometimes for very active devices too…
char
read_kbd
()
{
do {
sleep
();
status =
inb
(0x64);
} while(!(status & 1));
return
inb
(0x60); }Slide32
Communication Method
Q: How does program learn device is ready/done?A:
Interrupts:
Device sends interrupt to CPU
Cause register identifies the interrupting device
interrupt handler examines device, decides what to do
Priority interrupts
Urgent events can interrupt lower-priority interrupt handling
OS can
disable
defer interrupts
Pro? Con?
More efficient: only interrupt when device ready/done
Less efficient: more expensive since save CPU context
CPU context: PC, SP, registers,
etc
Con: unpredictable b/c event arrival depends on other devices’ activitySlide33
Takeaways
Diverse I/O devices require hierarchical interconnect which is more recently transitioning to point-to-point topologies.
Memory-mapped I/O is an elegant technique
to read/write
device registers with
standard load/stores.
Interrupt-based I/O avoids the wasted work in
polling-based I/O and is usually more efficientSlide34
Next Goal
How do we transfer a lot of data efficiently?Slide35
I/O Data Transfer
How to talk to device? Programmed I/O or Memory-Mapped I/O
How to get events?
Polling or Interrupts
How to transfer lots of data?
disk->
cmd
= READ_4K_SECTOR;
disk->data = 12;
while (!(disk->status & 1) { }
for (
i
= 0..4k)
buf
[
i
] = disk->data;
Very,
Very,
ExpensiveSlide36
I/O Data Transfer
Programmed I/O xfer: Device
CPU
RAM
for (
i
= 1 .. n)
CPU issues read request
Device puts data on bus
& CPU reads into registers
CPU writes data to memory
Not
efficient
CPU
RAM
DISK
Read from Disk
Write to Memory
Everything
interrupts CPU
Wastes
CPUSlide37
I/O Data Transfer
Q: How to transfer lots of data efficiently?
A: Have device access memory directly
Direct memory access (DMA)
1) OS provides starting address, length
2) controller (or device) transfers data autonomously
3) Interrupt on completion / errorSlide38
DMA: Direct Memory Access
Programmed I/O xfer: Device
CPU
RAM
for (
i
= 1 .. n)
CPU issues read request
Device puts data on bus
& CPU reads into registers
CPU writes data to memory
CPU
RAM
DISKSlide39
DMA: Direct Memory Access
Programmed I/O xfer
: Device
CPU
RAM
for (
i
= 1 .. n)
CPU issues read request
Device puts data on bus
& CPU reads into registers
CPU writes data to memory
DMA
xfer
: Device
RAMCPU sets up DMA requestfor (i = 1 ... n) Device puts data on bus & RAM accepts itDevice interrupts CPU after done
CPU
RAM
DISK
CPU
RAM
DISK
1) Setup
2
) Transfer
3) Interrupt after doneSlide40
DMA Example
DMA example: reading from audio (mic) inputDMA engine on audio device… or I/O controller … or …
int
dma_size
= 4*PAGE_SIZE;
int
*
buf
=
alloc_dma
(
dma_size
);
...
dev->
mic_dma_baseaddr
= (int)
buf
;
dev->
mic_dma_count
=
dma_len
;
dev->
cmd
= DEV_MIC_INPUT |
DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;Slide41
DMA Issues (1): Addressing
Issue #1: DMA meets Virtual MemoryRAM: physical addresses
Programs: virtual addresses
Solution: DMA uses physical addresses
OS uses physical address when setting up DMA
OS allocates contiguous physical pages for DMA
Or: OS splits
xfer
into page-sized chunks
(many devices support DMA “chains” for this reason)
CPU
RAM
DISK
MMUSlide42
DMA Example
DMA example: reading from audio (mic) inputDMA engine on audio device… or I/O controller … or …
int
dma_size
= 4*PAGE_SIZE;
void *
buf
=
alloc_dma
(
dma_size
);
...
dev->
mic_dma_baseaddr
=
virt_to_phys
(
buf
);
dev->
mic_dma_count
=
dma_len
;
dev->
cmd
= DEV_MIC_INPUT |
DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;Slide43
DMA Issues (1): Addressing
Issue #1: DMA meets Virtual MemoryRAM: physical addresses
Programs: virtual addresses
Solution 2: DMA uses virtual addresses
OS sets up mappings on a mini-TLB
CPU
RAM
DISK
MMU
uTLBSlide44
DMA Issues (2): Virtual Mem
Issue #2: DMA meets
Paged
Virtual Memory
DMA destination page
may get swapped out
Solution:
Pin
the page before initiating DMA
Alternate solution:
Bounce Buffer
DMA to a pinned kernel page, then
memcpy
elsewhere
CPU
RAM
DISKSlide45
DMA Issues (4): Caches
Issue #4: DMA meets Caching
DMA-related data could
be cached in L1/L2
DMA to
Mem
: cache is now stale
DMA from
Mem
: dev gets stale data
Solution: (software enforced coherence)
OS flushes some/all cache before DMA begins
Or: don't touch pages during DMA
Or: mark pages as
uncacheable
in page table entries
(needed for Memory Mapped I/O too!)
CPU
RAM
DISK
L2Slide46
DMA Issues (4): Caches
Issue #4: DMA meets Caching
DMA-related data could
be cached in L1/L2
DMA to
Mem
: cache is now stale
DMA from
Mem
: dev gets stale data
Solution 2: (hardware coherence aka
snooping
)
cache listens on bus, and conspires with RAM
DMA to
Mem
: invalidate/update data seen on bus
DMA from mem: cache services request if possible, otherwise RAM services
CPU
RAM
DISK
L2Slide47
Takeaways
Diverse I/O devices require hierarchical interconnect which is more recently transitioning to
point-to-point
topologies.
Memory-mapped I/O is an elegant technique
to read/write
device registers with
standard load/stores.
Interrupt-based I/O avoids the wasted work in
polling-based I/O and is usually more
efficient.
Modern systems combine memory-mapped I/O,
interrupt-based I/O, and direct-memory access
to create sophisticated I/O device
subsystems.Slide48
I/O Summary
How to talk to device?
Programmed I/O
or
Memory-Mapped I/O
How to get events?
Polling
or
Interrupts
How to transfer lots of data?
DMASlide49
Announcements
Project3 submit “
souped
up”
bot to
CMS
Project3 Cache Race Games
night
Monday
,
May 4
th
,
5pm
Come, eat, drink, have fun and be merry!
Location: B17 Upson Hall
Prelim2:
Thursday
,
April 30
th
in evening
Time and Location:
7:30pm
sharp
in
Statler
Auditorium
Old prelims are online in CMS
Prelim Review Session:
TODAY, Tuesday
, April 28, 7-9pm in B14 Hollister Hall
Project4: Design Doc due May 5th, bring design doc to mtg May 4-6Demos: May
12 and 13Will not be able to use slip days