Week 8 Discussion Section Atefeh Sohrabizadeh atefehszcsuclaedu 112219 Agenda Virtual Memory Threading and Basic Synchronization Virtual Memory As demand on the CPU increases processes slow down ID: 784275
Download The PPT/PDF document "CS33: Introduction to Computer Organizat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CS33: Introduction to Computer OrganizationWeek 8 – Discussion Section
Atefeh Sohrabizadeh
atefehsz@cs.ucla.edu
11/22/19
Slide2AgendaVirtual Memory
Threading and Basic Synchronization
Slide3Virtual MemoryAs demand on the CPU increases, processes slow down
But, if processes need too much memory, some of them may not be able to run
When a program is out of space, it can’t run
Solution: Virtual Memory (VM)
There is only one DRAM unit in machine
Provide each process with a large, uniform, and private address space so that they think they have access to entire DRAM
Main memory (DRAM) will be treated as a cache for an address space on disk
Memory management is simpler
Since each process has a uniform address space
Isolates address spaces
The address space of each process is protected from corruption by other processes
Slide4Virtual Memory (cont’)
Partition memory into fixed-size chunks: pages
Both VM and DRAM
Physical address space
Set of M =
physical, linear addresses (DRAM)
Virtual address space
Set of N = virtual, linear addresses (disk)VM is an array of N contiguous bytes stored on diskThe contents of disk are cached in physical/main memory (DRAM)Each page is either:UnallocatedOr cached Or uncached
DRAM Cache OrganizationDRAM is about 10x slower than SRAM
Disk is 10,000x slower than DRAM
So, DRAM has huge miss penalty
This results in:
Large page size
Fully associative
Any VP can be placed in any PP
But, with highly sophisticated replacement algorithmsWrite back (rather than write through)
Slide6Page TableAn array of
page table entries (PTE)
that maps virtual pages to physical pages
Resides in DRAM
Page hit:
Reference to VM word is in
DRAM
Page fault:Reference to VM word is not in DRAMCauses an exceptionPage fault handler selects a victim to be evictedRe-executes the current instruction
Slide7Using Page Table
Split virtual address into VPN and VPO
VPN is used as an index to page table
If not valid: page fault
Split physical address into PPN and PPO
Slide8Memory Management Unit (MMU)A chip in CPU that performs address translation
Virtual address to physical address
Translation Lookaside Buffer (TLB)
Cache for MMU
Caches VPN translation to PPN
Typically has high associativity (e.g. 4-way set associative)
Slide9TLB Hit and Miss
Slide10Example
TLBI
TLBT
Slide11Address Translation Example #1
Virtual Address:
0x03D4
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page Fault? __ PPN: ____
Physical Address
CO ___ CI___ CT ____ Hit? __ Byte: ____
13
12
11
10
9
8
7
6
5
4
3
2
1
0
VPO
VPN
TLBI
TLBT
11
10
9
8
7
6
5
4
3
2
1
0
PPO
PPN
CO
CI
CT
0
0
1
0
1
0
1
1
1
1
0
0
0
0
0x0F
0x3
0x03
Y
N
0x0D
0
0
0
1
0
1
0
1
1
0
1
0
0
0x5
0x0D
Y
0x36
Slide12Address Translation Example #2
Virtual Address:
0x0020
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page Fault? __ PPN: ____
Physical Address
CO___ CI___ CT ____ Hit? __ Byte: ____
13
12
11
10
9
8
7
6
5
4
3
2
1
0
VPO
VPN
TLBI
TLBT
11
10
9
8
7
6
5
4
3
2
1
0
PPO
PPN
CO
CI
CT
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0x00
0
0x00
N
N
0x28
0
0
0
0
0
0
0
0
0
1
1
1
0
0x8
0x28
N
Mem
Slide13Multi-Level Page Table
As the size of memory increases, the page table size increases
For a 48-bit address space, 4KB page size, need
PTE
But, not program use the whole address space
The address space that a program access is sparse
So, use multi-level page tables
Only need to store level 1 page table in DRAMThe rest can be paged in/out when needed
Slide14Sharing Revisited: Shared Objects
Physical
memory
Process 1
virtual memory
Process 2
virtual memory
Process 1 and 2 maps the shared object.
Notice how the virtual addresses can be different.
Slide15Sharing Revisited: Private Copy-on-write (COW) ObjectsImagine these are forked-processesTwo processes mapping a
private copy-on-write (COW)
object.
Area flagged as private copy-on-write
PTEs in COW areas are flagged as read-only
Physical
memory
Process 1
virtual memory
Process 2
virtual memory
Private
copy-on-write
area
Slide16Sharing Revisited: Private Copy-on-write (COW) ObjectsInstruction writing to COW page triggers protection fault. Handler creates new R/W page.
Instruction restarts upon handler return.
Copying deferred as long as possible!
Physical
memory
Process 1
virtual memory
Process 2
virtual memory
Private
copy-on-write
area
Slide17ConcurrencyMotivation
Increase the performance by running multiple processes concurrently
Approach:
Using processes
Pro
Address spaces are not shared
processes can’t overwrite VM of one another accidentally
ConAddress spaces are not sharedsharing information is hardTend to be slow because of large overheads for process controlUsing threads
Slide18Thread-Based ConcurrencyThread
“A logical flow that runs in the context of a process”
Like processes, they are scheduled automatically by the kernel
A single process can have multiple threads running concurrently
All threads running in a process share the entire virtual address space
Easier communication
Each thread has its own thread context
TID, stack, stack pointer, program counter, general-purpose registers, and condition codesA thread context is much smaller than a process contextFaster context switch : less overheadMust access the shared variables carefully
Slide19Thread CreationCreate a thread using:
Use NULL for argument
attr
When it returns,
tid
has the ID of the newly created thread
A thread can determine its own ID using:
A thread routine takes one argument as void* and returns one argument as void* as well
Slide20Thread CleanupThreads can be terminated by
Calling either of the “
pthread_exit
”, “exit”, and “
pthread_cancel
” functions
Or when their top-level thread routine returns
Check book for more detailsThreads wait for other threads to terminate by calling pthread_joinThen they are reapedJoinable threads can be reaped and killed by other threadsTo avoid memory leaks, each joinable thread should be either explicitly reaped by another thread or detachedIf you don’t plan on joining (reaping) make your threads detachedBy calling pthread_detachDetached threads can not be reaped and killed by other threads
Slide21Progress GraphModels the execution of n concurrent threads
As a trajectory through n-dimensional Cartesian space
Critical section
Instructions that manipulate a shared variable
Unsafe region
Intersection of the two critical sections
Safe trajectory
A trajectory that skirts the unsafe regionEnsures the mutually exclusive access to the shared variablesMust synchronize the threads so that they always have a safe trajectory
Slide22SemaphoreAll the process context excluding the thread context is shared
E.g. global and static local variables
Shared variables may cause synchronization errors
(One) Solution:
Use semaphores
A global variable with a nonnegative integer value
A counter that can get N values
A mutex is a binary semaphore (its value is always 0 or 1)Associate a semaphore with each shared variable
Slide23Semaphore Functionsint
sem_init
(
sem_t
*
sem
, 0, unsigned int value);
Initializes semaphore “sem” to valuevoid P(sem_t *s);“I want access”Waits until s becomes nonzeroIf s is nonzero, decrements s by one and returns immediatelyGrabs the lock (locking the mutex)void V(sem_t *s);“I’m done”Increments s by oneUnlocks the mutexIf there are any threads waiting at a P operation, wakes up one of them
Slide24Progress Graph with SemaphoresEach state is labeled with the value of semaphore s
Forbidden region
A collection of states made by combination of P and V operations
s < 0
Encloses the unsafe region
No feasible trajectory can include any of the states in the forbidden region
Since semaphores should be nonnegative
So every feasible trajectory is safe
Slide25ExampleWhat’s the problem with this code?
Deadlock
foo(n-1) can’t acquire the lock (mutex)
Since foo
(n) is not done
gets stuck
Slide26Concurrency IssuesThread safety
A function is thread-safe
iff
always functions correctly when called repeatedly from multiple concurrent threads
Carefully handle shared variables and use proper synchronization to achieve thread-safe function
Reentrant functions
Special form of thread-safe functions
Does not use any shared dataDeadlockWhere a collection of threads are blocked waiting for a condition that will never be true
Slide27Deadlock with SemaphoreWhen forbidden regions of two semaphores overlap there is a deadlock region
Trajectories can enter deadlock region, but can never leave
Here, each thread is waiting for the other to do a V operation that will never occur
Avoid deadlock when mutexes are used with:
Given a total ordering of all mutexes, the thread acquire the mutexes in order
Slide28Class AttendanceUse this link:
https://onlinepoll.ucla.edu/polls
/3734
Open till 3:50 PM
Slide29AcknowledgmentThe contents and figures are taken from
“
Computer Systems: A Programmer's Perspective”, Ed. 3, Bryant and
O’Hallaron
lecture slides