/
The Machine and the Kernel The Machine and the Kernel

The Machine and the Kernel - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
401 views
Uploaded On 2016-07-16

The Machine and the Kernel - PPT Presentation

Mode space and context the basics Jeff Chase Duke University 64 bytes 3 ways p 0x0 0x1f 0x0 0x1f 0x1f 0x0 char p char p int p int p p char p char p Pointers addresses are 8 bytes on a 64bit machine ID: 406584

context kernel memory process kernel context process memory count int thread space stack code core mode user threads time

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Machine and the Kernel" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Machine and the KernelMode, space, and context: the basics

Jeff Chase

Duke UniversitySlide2

64 bytes: 3 ways

p + 0x0

0x1f

0x0

0x1f

0x1f

0x0

char p[]

char *p

int p[]

int* p

p

char* p[]

char** p

Pointers (addresses) are 8 bytes on a 64-bit machine.

Memory is “fungible”.Slide3

EndiannessA silly difference among machine architectures creates a need for byte swapping when unlike machines exchange data over a network.Lilliput and Blefuscu are at war over which end of a soft-boiled egg to crack.Gulliver’s Travel’s 1726Slide4

x86 is little-endianchase$ cc -o heap heap.cchase$ ./heaphi!0x216968chase$ h=0x68i=0x69!=0x210

cb

ip

Little-endian

: the lowest-numbered byte of a word (or

longword

or

quadword

) is the least significant.Slide5

Network messageshttps://developers.google.com/protocol-buffers/docs/overviewSlide6
Slide7

Byte swapping: example struct sockaddr_in socket_addr; sock = socket(PF_INET, SOCK_STREAM, 0); memset(&socket_addr, 0, sizeof socket_addr); socket_addr.sin_family = PF_INET; socket_addr.sin_port = htons(port); socket_addr.sin_addr.s_addr = htonl(INADDR_ANY); if (bind(sock, (struct sockaddr *) &socket_addr, sizeof socket_addr) < 0) { perror("couldn't bind"); exit(1); } listen(sock, 10);buggyserver.cSlide8

Heap: dynamic memory

Allocated heap blocks for structs or objects. Align!

A contiguous chunk of memory obtained from OS kernel.

E.g., with Unix

sbrk

() system call.

A runtime library obtains the block and manages it as a “heap” for use by the programming language environment, to store dynamic objects.

E.g., with Unix

malloc

and

free

library calls.Slide9

Heap manager policyThe heap manager must find a suitable free block to return for each call to malloc().No byte can be part of two simultaneously allocated heap blocks! If any byte of memory is doubly allocated, programs will fail. We test for this!A heap manager has a policy algorithm to identify a suitable free block within the heap.Last fit, first fit, best fit, worst fitChoose your favorite!Goals: be quick, and use memory efficientlyBehavior depends on workload: pattern of malloc/free requestsThis is an old problem in computer science, and it occurs in many settings: variable partitioning.Slide10

Variable Partitioning

Variable partitioning is the strategy of parking differently sized cars

along a street with no marked parking space dividers.

Wasted space

external fragmentation

2

3

1Slide11

Fixed Partitioning

Wasted space

internal fragmentationSlide12

Time sharing vs. space sharing

time

space

Two common modes of

resource allocation

. What kinds of resources do these work for?Slide13

Operating Systems: The Classical Viewdata

data

Programs run as

independent processes.

Protected system calls

...and upcalls (e.g., signals)

Protected OS kernel mediates access to shared resources.

Threads enter the kernel for OS services.

Each process has a private virtual address space and one or more threads.

The kernel code and data are protected from untrusted processes.Slide14

0x0

0x7fffffff

Static data

Dynamic data

(heap/BSS)

Text

(code)

Stack

Reserved

0x0

0x7fffffff

Static data

Dynamic data

(heap/BSS)

Text

(code)

Stack

ReservedSlide15

“Classic Linux Address Space”http://duartes.org/gustavo/blog/category/linuxNSlide16

Windows/IA32Slide17

Windows IA-32(Kernel)Slide18

Processes: A Closer Look++user IDprocess IDparent PIDsibling linkschildren

virtual address space

process descriptor (PCB)

resources

thread

stack

Each process has a thread bound to the VAS.

The thread has a stack addressable through the VAS.

The kernel can suspend/restart the thread wherever and whenever it wants.

The OS maintains some state for each process in

the kernel’s internal

data structures: a file descriptor table, links to maintain the process tree, and a place to store the exit status.

The address space is a private name space for a set of memory segments used by the process.

The kernel must initialize the process memory for the program to run.Slide19

A process can have multiple threadsvolatile int counter = 0; int loops;void *worker(void *arg) { int i; for (i = 0; i < loops; i++) { counter++; } pthread_exit(NULL);}int main(int argc, char *argv[]) { if (argc != 2) { fprintf(stderr, "usage: threads <loops>\n"); exit(1); } loops = atoi(argv[1]); pthread_t p1, p2; printf("Initial value : %d\n", counter); pthread_create(&p1, NULL, worker, NULL); pthread_create(&p2, NULL, worker, NULL); pthread_join(p1, NULL); pthread_join(p2, NULL); printf("Final value : %d\n", counter); return 0;}data

Much more on this later!Slide20

Key Concepts for Classical OS kernelThe software component that controls the hardware directly, and implements the core privileged OS functions.Modern hardware has features that allow the OS kernel to protect itself from untrusted user code. threadAn executing instruction path and its CPU register state. virtual address spaceAn execution context for thread(s) defining a name space for executing instructions to address data and code. processAn execution of a program, consisting of a virtual address space, one or more threads, and some OS kernel state.Slide21

The theater analogyThreads

Address space

Program

script

v

irtual memory

(

stage)

[lpcox]

Running a program is like performing a play.Slide22

The sheep analogyThread

Code and data

A

ddress spaceSlide23

CPU coresCore #1Core #2The machine has a bank of CPU cores for threads to run on.

The OS allocates cores to threads.

Cores are hardware. They go where the driver tells them.

Switch drivers any time.Slide24

Threads drive coresSlide25

What was the point of that whole thing with the electric sheep actors?A process is a running program.A running program (a process) has at least one thread (“main”), but it may (optionally) create other threads.The threads execute the program (“perform the script”).The threads execute on the “stage” of the process virtual memory, with access to a private instance of the program’s code and data.A thread can access any virtual memory in its process, but is contained by the “fence” of the process virtual address space. Threads run on cores: a thread’s core executes instructions for it. Sometimes threads idle to wait for a free core, or for some event. Sometimes cores idle to wait for a ready thread to run.The operating system kernel shares/multiplexes the computer’s memory and cores among the virtual memories and threads. Slide26

Processes and threads++…

virtual address space

main thread

stack

Each process has a thread bound to the VAS, with stacks (user and kernel).

If we say a process does something, we really mean its thread does it.

The kernel can suspend/restart the thread wherever and whenever it wants.

Each process has a virtual address space (VAS): a private name space for the virtual memory it uses.

The VAS is both a “sandbox” and a “lockbox”: it limits what the process can see/do, and protects its data from others.

From now on, we suppose that a process could have

multiple threads

.

We presume

that they can all make system calls and block independently.

other threads (optional)

STOP

waitSlide27

A thread running in a process VAS0highcode libraryyour data

heap

registers

CPU

R0

Rn

PC

memory

x

x

your program

common runtime

stack

address space

(virtual or physical

)

e.g., a

virtual memory

for a process

SP

y

ySlide28

Thread contextEach thread has a context (exactly one).Context == values in the thread’s registersIncluding a (protected) identifier naming its VAS.And a pointer to thread’s stack in VAS/memory.Each core has a context (at least one).Context == a register set that can hold values.The register set is baked into the hardware.A core can change “drivers”: context switch.Save running thread’s register values into memory.Load new thread’s register values from memory.(Think of driver settings for the seat, mirrors, audio…)Enables time slicing or time sharing of machine.

registers

CPU core

R0

Rn

PC

x

SP

ySlide29

Programs gone wildintmain(){ while(1);}Can you hear the fans blow?How does the OS regain control of the core from this program?How to “make” the process save its context and give some other process a chance to run? How to “make” processes share machine resources fairly? Slide30

Timer interrupts, faults, etc.When processor core is running a user program, the user program/thread controls (“drives”) the core. The hardware has a timer device that interrupts the core after a given interval of time.Interrupt transfers control back to the OS kernel, which may switch the core to another thread, or resume.Other events also return control to the kernel.Wild pointersDivide by zeroOther program actionsPage faultsSlide31

Entry to the kernel

syscall trap/return

fault/return

interrupt/return

The handler accesses the core register context to read the details of the exception (trap, fault, or interrupt). It may call other kernel routines.

Every entry to the kernel is the result of a

trap

,

fault

, or

interrupt

. The core switches to kernel mode and transfers control to a handler routine.

OS kernel code and data for system calls (files, process fork/exit/wait, pipes, binder IPC, low-level thread support, etc.) and virtual memory management (page faults, etc.)

I/O completions

timer ticksSlide32

registers

CPU core

R0

Rn

PC

x

mode

CPU

mode

(a field in some status register) indicates whether a machine CPU (core) is running in a

user

program or in the protected

kernel

(protected mode).

Some instructions or register accesses are legal only when the CPU (core) is executing in kernel mode.

CPU mode transitions to kernel mode only on machine exception events (

trap, fault, interrupt

), which transfers control to a

trusted handler routine

registered

with

the machine at

kernel boot

time.

So only the kernel program chooses what code ever runs in the kernel mode (or so we hope and intend).

A kernel handler can read the user register values at the time of the event, and modify them arbitrarily before (optionally) returning to user mode.

CPU mode: User and Kernel

U/KSlide33

synchronouscaused by an instructionasynchronouscaused by some other eventintentionalhappens every timeunintentionalcontributing factorstrap: system callopen, close, read, write, fork, exec, exit, wait, kill, etc.

fault

invalid or protected address or opcode, page fault, overflow, etc.

interrupt

caused by an external event: I/O op completed, clock tick, power fail, etc.

software interrupt

software requests an interrupt to be delivered at a later time

Exceptions: trap, fault, interruptSlide34

Kernel Stacks and Trap/Fault Handlingdata

Threads execute user code on a

user stack

in the user virtual memory in the process virtual address space.

Each thread has a second

kernel stack

in

kernel space

(VM accessible only in kernel mode).

stack

stack

stack

stack

System calls and faults run in kernel mode on a kernel stack.

syscall dispatch table

Kernel code running in P’s process context

has

access to P’s virtual memory.

The

syscall

handler makes an indirect call through the

system call dispatch table to the handler registered for the specific system call.Slide35

Virtual resource sharing

time

space

Understand

that the OS kernel implements resource allocation (memory, CPU,…) by manipulating name spaces and contexts visible to user code.

The kernel retains control of user contexts and address spaces via the machine’s

limited direct execution

model, based on

protected mode

and

exceptions

. Slide36

“Limited direct execution”user modekernel mode

kernel “top half”

kernel “bottom half” (interrupt handlers)

syscall trap

u-start

u-return

u-start

fault

u-return

fault

clock interrupt

interrupt

return

Kernel handler manipulates CPU register context to return to selected user context.

Any kind of machine exception transfers control to a registered (trusted) kernel handler running in a

protected CPU mode

.

bootSlide37

Example: Syscall trapsPrograms in C, C++, etc. invoke system calls by linking to a standard library written in assembly.The library defines a stub or wrapper routine for each syscall.Stub executes a special trap instruction (e.g., chmk or callsys or syscall instruction) to change mode to kernel.Syscall arguments/results are passed in registers (or user stack).OS defines Application Binary Interface (ABI).

read() in Unix

libc.a

Alpha library (executes in user mode):

#define SYSCALL_READ 27 # op ID for a

read

system call

move arg0…

argn

, a0…an #

syscall

args

in registers A0..AN

move SYSCALL_READ, v0 #

syscall

dispatch index in V0

callsys

# kernel trap

move r1, _

errno

#

errno

= return status

return

Alpha

CPU ISA (defunct)Slide38

Linux x64 syscall conventionsSlide39

MacOS x86-64 syscall examplesection .datahello_world     db      "Hello World!", 0x0a section .textglobal start start:mov rax, 0x2000004      ; System call write = 4mov rdi, 1              ; Write to standard out = 1mov rsi, hello_world    ; The address of hello_world stringmov rdx, 14             ; The size to writesyscall                 ; Invoke the kernelmov rax, 0x2000001      ; System call number for exit = 1mov rdi, 0              ; Exit success = 0syscall                 ; Invoke the kernel

http://

thexploit.com

/

secdev

/mac-os-x-64-bit-assembly-system-calls/

Illustration only

: this program writes “Hello World!” to standard output.Slide40

A thread running in a process VAS0highcode libraryyour data

heap

registers

CPU

R0

Rn

PC

memory

x

x

your program

common runtime

stack

address space

(virtual or physical)

e.g., a

virtual memory

for a process

SP

y

ySlide41

Messing with the context#include <ucontext.h>int count = 0;ucontext_t context;int main(){ int i = 0; getcontext(&context); count += 1; i += 1; sleep(2); printf(”…", count, i); setcontext(&context);}ucontextStandard C library routines to:Save current register context to a block of memory (getcontext from core)Load/restore current register context from a block of memory (setcontext)Also: makecontext, swapcontextDetails of the saved context (ucontext_t structure) are machine-dependent.Slide42

Messing with the context (2)#include <ucontext.h>int count = 0;ucontext_t context;int main(){ int i = 0; getcontext(&context); count += 1; i += 1; sleep(1); printf(”…", count, i); setcontext(&context);}Loading the saved context transfers control to this block of code. (Why?)What about the stack?

Save

core context to memory

Load

core context from memorySlide43

Messing with the context (3)#include <ucontext.h>int count = 0;ucontext_t context;int main(){ int i = 0; getcontext(&context); count += 1; i += 1; sleep(1); printf(”…", count, i); setcontext(&context);}chase$ cc -o context0 context0.c< warnings: ucontext deprecated on MacOS >chase$ ./context0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 …Slide44

Reading behind the CDisassembled code:movl 0x0000017a(%rip),%ecxaddl $0x00000001,%ecxmovl %ecx,0x0000016e(%rip)movl 0xfc(%rbp),%ecxaddl $0x00000001,%ecxmovl %ecx,0xfc(%rbp)%rip and %rbp are set “right”, then these

references

work

”.

count += 1;

i

+= 1;

On

MacOS

:

chase

$

man

otool

chase$

otool

vt

context0

On this machine, with this cc:

Static global

_count

is addressed relative to the location of the code itself, as given by the PC register [

%rip

is

instruction pointer

register]Local variable i is addressed as an offset from stack frame.

[%rbp is stack frame base pointer]Slide45

Messing with the context (4)#include <ucontext.h>int count = 0;ucontext_t context;int main(){ int i = 0; getcontext(&context); count += 1; i += 1; sleep(1); printf(”…", count, i); setcontext(&context);}chase$ cc –O2 -o context0 context0.c< warnings: ucontext deprecated on MacOS >chase$ ./context0 1 1 2 1 3 1 4 1 5 1 6 1

7 1

What happened?Slide46

The point of ucontextThe system can use ucontext routines to:“Freeze” at a point in time of the executionRestart execution from a frozen moment in timeExecution continues where it left off…if the memory state is right.The system can implement multiple independent threads of execution within the same address space.Create a context for a new thread with makecontext.Modify saved contexts at will.Context switch with swapcontext: transfer a core from one thread to another (“change drivers”)Much more to this picture: need per-thread stacks, kernel support, suspend/sleep, controlled ordering, etc.Slide47

Two threads: closer look0highcode librarydata

registers

CPU

(core)

R0

Rn

PC

x

x

program

common runtime

stack

address space

SP

y

y

stack

running

thread

on deck

and ready to runSlide48

Thread context switch0highcode librarydata

registers

CPU

(core)

R0

Rn

PC

x

x

program

common runtime

stack

address space

SP

y

y

stack

1. save registers

2. load registers

switch in

switch outSlide49

A metaphor: context/switchingPage links and back button navigate a “stack” of pages in each tab.

Each tab has its own stack.

One tab is active at any given time.

You create/destroy tabs as needed.

You switch between tabs at your whim.

Similarly, each

thread has

a separate stack.

The OS switches between threads at its whim.

One thread is active per CPU core at any given time.

1

2

3

time

Slide50

Messing with the context (5)#include <ucontext.h>int count = 0;ucontext_t context;int main(){ int i = 0; getcontext(&context); count += 1; i += 1; sleep(1); printf(”…", count, i); setcontext(&context);}What does this do?Slide51

Thread/process states and transitionsrunningreadyblockedScheduler governs these transitions.wait, STOP, read, write, listen, receive, etc.

sleep

STOP

wait

wakeup

Sleep

and

wakeup

are internal primitives. Wakeup adds a thread to the scheduler’s

ready pool

: a set of threads in the

ready

state.

yield

“requesting a car”

“driving a car”

“waiting for someplace to go”

dispatchSlide52

Block maps and page tablesSlide53

Blocks are contiguousThe storage in a heap block is contiguous in the Virtual Address Space.The term block always refers to a contiguous sequence of bytes suitable for base+offset addressing.C and other PL environments require this. E.g., C compiler determines the offsets for named fields in a struct and “bakes” them into the code.This requirement complicates the heap manager because the heap blocks may be different sizes.Slide54

Block maps

map

Large data objects may be

mapped

so they don’t have to be stored contiguously in machine memory.

(e.g., files, segments)

Idea

: use a level of indirection through a

map

to assemble a storage object from “scraps” of storage in different locations.

The “scraps” can be fixed-size slots: that makes allocation easy because

the slots are interchangeable (fixed partitioning).

Example

:

page tables

that implement a VAS.Slide55
Slide56

x64, x86-64, AMD64: VM LayoutSource:System V Application Binary Interface AMD64 Architecture Processor Supplement 2005

VM page mapSlide57

IndirectionSlide58

Fixed Partitioning

Wasted space

internal fragmentationSlide59

Names and mapsBlock maps and other indexed maps are common structure to implement “machine” name spaces:sequences of logical blocks, e.g., virtual address spaces, filesprocess IDs, etc.For sparse block spaces we may use a tree hierarchy of block maps (e.g., inode maps or 2-level page tables, later).Storage system software is full of these maps.Symbolic name spaces use different kinds of maps.They are sparse and require matching  more expensive.Property list, key/value hash tableTrees of maps create nested namespaces, e.g., the file tree.Slide60

Extra slidesI hope we get to hereSlide61

The KernelToday, all “real” operating systems have protected kernels.The kernel resides in a well-known file: the “machine” automatically loads it into memory (boots) on power-on/reset. Our “kernel” is called the executive in some systems (e.g., Windows). The kernel is (mostly) a library of service procedures shared by all user programs, but the kernel is protected:User code cannot access internal kernel data structures directly.User code can invoke the kernel only at well-defined entry points (system calls).Kernel code is “just like” user code, but the kernel is privileged: The kernel has direct access to all hardware functions, and defines the handler entry points for interrupts and exceptions.Slide62

Protecting Entry to the KernelProtected events and kernel mode are the architectural foundations of kernel-based OS (Unix, Windows, etc).The machine defines a small set of exceptional event types.The machine defines what conditions raise each event.The kernel installs handlers for each event at boot time.e.g., a table in kernel memory read by the machineThe machine transitions to kernel mode only on an exceptional event.The kernel defines the event handlers.Therefore the kernel chooses what code will execute in kernel mode, and when.userkernel

interrupt

or

fault

trap/return

interrupt

or

faultSlide63

The Role of EventsA CPU event (an interrupt or exception, i.e., a trap or fault) is an “unnatural” change in control flow.Like a procedure call, an event changes the PC register.Also changes mode or context (current stack), or both.Events do not change the current space!On boot, the kernel defines a handler routine for each event type.The machine defines the event types.Event handlers execute in kernel mode.Every kernel entry results from an event.Enter at the handler for the event.

control flow

event handler (e.g., ISR:

I

nterrupt

S

ervice

R

outine)

exception.cc

In some sense, the whole kernel is a “big event handler.”Slide64

ExamplesIllegal operationReserved opcode, divide-by-zero, illegal accessThat’s a fault! Kernel generates a signal, e.g., to kill process or invoke PL exception handlers.Page faultFetch and install page, maybe block processNothing illegal about it: “transparent” to faulting processI/O completion, arriving input, clock ticks.These external events are interrupts.Include power fail etc.Kernel services interrupt in handler.May wakeup blocked processes, but no blocking.Slide65

FaultsFaults are similar to system calls in some respects:Faults occur as a result of a process executing an instruction.Fault handlers execute on the process kernel stack; the fault handler may block (sleep) in the kernel.The completed fault handler may return to the faulted context.But faults are different from syscall traps in other respects:Syscalls are deliberate, but faults are “accidents”.divide-by-zero, dereference invalid pointer, memory page faultNot every execution of the faulting instruction results in a fault.may depend on memory state or register contentsSlide66

Note: Something WildThe “Something Wild” example that follows was an earlier version of “Messing with the context”. It was not discussed in class.“Messing with the context” simplifies the example, but keeps all the essential info.“Something Wild” brings it just a little closer to coroutines a context switch from one thread to another.Slide67

Something wild (1)#include <ucontext.h>Int count = 0;int set = 0;ucontext_t contexts[2];void proc() { int i = 0; if (!set) { getcontext(&contexts[count]); } printf(…, count, i); count += 1; i += 1; if (set) { setcontext(&contexts[count&0x1]); }}

time

int

main() {

set = 0;

proc

();

proc

();

set = 1;

proc

();

}Slide68

Something wild (2)#include <ucontext.h>ucontext_t contexts[2];void proc(){ int i = 0; getcontext(&contexts[count]); printf(”…", count, i); count += 1; i += 1;}

time

int

main() {

set=0;

proc

();

proc

();

}Slide69

Something wild (3)#include <ucontext.h>ucontext_t contexts[2];void proc() { int i = 0; printf(”…", count, i); count += 1; i += 1; sleep(1); setcontext(&contexts[count&0x1]);}

time

int

main()

{

set=1;

p

roc

();

}Slide70

Something wild (4)void proc() { int i = 0; printf(”…", count, i); count += 1; i += 1; sleep(1); setcontext(…);}

time

Switch to the

other

saved register context. Alternate “even” and “odd” contexts.

We have a pair of register contexts that were saved at this point in the code.

If we load either of the saved contexts, it will transfer control to this block of code. (Why?) What about the stack?

Lather, rinse, repeat.

What will it print? The count is a global variable…but what about

i

?Slide71

void proc() { int i = 0; printf("%4d %4d\n", count, i); count += 1; i += 1; sleep(1); setcontext(…);}Something wild (5)time

What does this do?